Formatting Dates in Spark

Updated On September 13, 2020 | By Mahesh Mogal

In this blog, we will be going over spark functions used to format dates, change their format, and also convert strings to date. So let us get started.

to_date - Convert String to Date

First we will see how we can convert normal date to string. We can use to_Date function for this.

from pyspark.sql.functions import lit,to_date
df = df.withColumn("date_to_string", to_date(lit("2020-08-31")))
df.show()
+---+------------+--------------------+--------------+
| id|current_date|   current_timestamp|date_to_string|
+---+------------+--------------------+--------------+
|  0|  2020-08-19|2020-08-19 10:07:...|    2020-08-31|
|  1|  2020-08-19|2020-08-19 10:07:...|    2020-08-31|
+---+------------+--------------------+--------------+

It is simple right? But what will happen if our date is not in same format as "yyyy-MM-dd"? Let us see output of such code.

df.withColumn("date_to_string", to_date(lit("2020-31-08"))).show()
+---+------------+--------------------+--------------+
| id|current_date|   current_timestamp|date_to_string|
+---+------------+--------------------+--------------+
|  0|  2020-08-19|2020-08-19 10:08:...|          null|
|  1|  2020-08-19|2020-08-19 10:08:...|          null|

Oh, we are getting null. This is because spark is not able to understand our date string. To overcome this, we can specify the format for our date. to_date function accepts optional parameter as a format for the date.

format="yyyy-dd-MM"
df.withColumn("date_to_string", to_date(lit("2020-31-08"), format)).show()
format_with_to_date
Format with to_date function

Spark supported simple date format used in Java language

Spark Facts

So we are able to let spark know the format of our date and spark picked our date correctly this time. hurry!!

Changing Format of Date in Spark

We now our date is correct but we do not want this "yyyy-MM-dd" format. We want in "dd/MM/yyyy" format for some reason. We can do that as well. We can convert our date format easily.

from pyspark.sql.functions import date_format,col
>>> df.select("current_date", \
...     date_format(col("current_date"),"dd/MM/yyyy") \
...     ).show()
+------------+-------------------------------------+
|current_date|date_format(current_date, dd/MM/yyyy)|
+------------+-------------------------------------+
|  2020-08-19|                           19/08/2020|
|  2020-08-19|                           19/08/2020|
+------------+-------------------------------------+

As we can see, using date_format function we can change the format of date too as per our requirement.

I hope you found this useful. See you in next blog.

.

Mahesh Mogal

I am passionate about Cloud, Data Analytics, Machine Learning, and Artificial Intelligence. I like to learn and try out new things. I have started blogging about my experience while learning these exciting technologies.

Stay Updated with Latest Blogs

Get latest blogs delivered to your mail directly.

Recent Posts

Sorting in Spark Dataframe

In this blog, we will learn how to sort rows in spark dataframe based on some column values.

Read More
Removing White Spaces From Data in Spark

White spaces can be a headache if not removed before processing data. We will learn how to remove spaces from data in spark using inbuilt functions.

Read More
Padding Data in Spark Dataframe

In this blog, we will learn how to use rpad and lpad functions to add padding to data in spark dataframe.

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram
Share via
Copy link
Powered by Social Snap