Padding Data in Spark Dataframe

Updated On September 13, 2020 | By Mahesh Mogal

You may have a use-case where you want to make value in column either string or number to have the same length. we can use "lpad" and "rpad" functions to format strings & numbers properly.

For example, you might need numbers to have the same number of digits like for month should have 2 digits and add 0 if the month has only one digit.

lpad()

lpad function is used to add padding from the left side to string or number. This is useful in the example mentioned above where we would like to add 0 to the left of the month if it has one digit only.

from pyspark.sql.functions import lpad, rpad
df_csv.select("DEST_COUNTRY_NAME", \
    "count", \
    lpad(trim(col("count")), 4, "0").alias("formmated_data") \
    ).show(2)
lpad example
Lpad Example

In the above example, we have added 0 to the left side of the number to make it of 4 digits long in each case.

rpad()

In same way we can use rpad to add digits to right side of string or number.

df_csv.select("DEST_COUNTRY_NAME", \
    "count", \
    rpad(trim(col("count")), 4, "0").alias("formmated_data") \
    ).show(2)
rpad_example
Rpad Example

I hope you found this useful. If you have any questions do let me know. See you later.

.

Mahesh Mogal

I am passionate about Cloud, Data Analytics, Machine Learning, and Artificial Intelligence. I like to learn and try out new things. I have started blogging about my experience while learning these exciting technologies.

Stay Updated with Latest Blogs

Get latest blogs delivered to your mail directly.

Recent Posts

Sorting in Spark Dataframe

In this blog, we will learn how to sort rows in spark dataframe based on some column values.

Read More
Removing White Spaces From Data in Spark

White spaces can be a headache if not removed before processing data. We will learn how to remove spaces from data in spark using inbuilt functions.

Read More
Padding Data in Spark Dataframe

In this blog, we will learn how to use rpad and lpad functions to add padding to data in spark dataframe.

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram
Share via
Copy link
Powered by Social Snap