Adding White Spaces to Data in Spark Dataframe
You may have a use-case where you want to make value in column either string or number to have the same length. we can use “lpad” and “rpad” functions to format strings & numbers properly.
For example, you might need numbers to have the same number of digits like for month should have 2 digits and add 0 if the month has only one digit.
lpad()
lpad function is used to add padding from the left side to string or number. This is useful in the example mentioned above where we would like to add 0 to the left of the month if it has one digit only.
1 2 3 4 5 |
from pyspark.sql.functions import lpad, rpad df_csv.select("DEST_COUNTRY_NAME", \ "count", \ lpad(trim(col("count")), 4, "0").alias("formmated_data") \ ).show(2) |
In the above example, we have added 0 to the left side of the number to make it of 4 digits long in each case.
rpad()
In same way we can use rpad to add digits to right side of string or number.
1 2 3 4 |
df_csv.select("DEST_COUNTRY_NAME", \ "count", \ rpad(trim(col("count")), 4, "0").alias("formmated_data") \ ).show(2) |
I hope you found this useful. If you have any questions do let me know. See you later.