Date Difference functions in Spark

Updated On September 13, 2020 | By Mahesh Mogal

We may have a use case where we need to find a difference between dates or find a date after or before "x" days from a given date. For such use cases, the spark has provided a rich list of functions to easily solve these problems. we are going over such functions in this blog.

date_add - Date After x Days

Say we ant to get date after 7 days from today, we can get using spark function date_add

from pyspark.sql.functions import date_add, date_sub,col
df.select("current_date", \
...     date_add(col("current_date"), 7) \
...         .alias("date_after_7_days") \
... ).show()
+------------+-----------------+
|current_date|date_after_7_days|
+------------+-----------------+
|  2020-08-19|       2020-08-26|
|  2020-08-19|       2020-08-26|

date_sub - Date Before x Days

similarly, we can use date_sub function to find a date before x days from the given date.

df.select("current_date", \
... date_sub(col("current_date"), 7) \
...         .alias("date_before_7_days") \
...     ).show()
+------------+------------------+
|current_date|date_before_7_days|
+------------+------------------+
|  2020-08-19|        2020-08-12|
|  2020-08-19|        2020-08-12|
+------------+------------------+

add_months - Adding Months to Date

We can add months to date using add_months function.

from pyspark.sql.functions import add_months
>>> df.select("current_date", \
...     add_months(col("current_date"), 3) \
...     ).show()
+------------+---------------------------+
|current_date|add_months(current_date, 3)|
+------------+---------------------------+
|  2020-08-19|                 2020-11-19|
|  2020-08-19|                 2020-11-19|
+------------+---------------------------+

We can see that, this function has added 3 months to our date and showing us final result.

date_diff - Finding Difference Between Dates in Days

We may need to find a difference between two days. For such a use case, we can use date_diff function, which accepts 2 arguments and return as difference between first date and second date.

df_new = df.select("current_date", \
    date_add(col("current_date"), 7) \
        .alias("date_after_7_days"), \
    date_sub(col("current_date"), 7) \
        .alias("date_before_7_days") \
    )

from pyspark.sql.functions import add_months
df.select("current_date", \
    add_months(col("current_date"), 3) \
    ).show()
date_diff_function
Difference between two dates using date_diff

months_between - Months between Dates

Though in most cases we need to find the difference between dates in days, Spark has also provided us function to find the difference of months in two dates.

df_new.select( \
...     months_between(col("date_after_7_days"), to_date(lit("2020-09-30"))) \
...     ).show()
+--------------------------------------------------------+
|months_between(date_after_7_days, to_date('2020-09-30'))|
+--------------------------------------------------------+
|                                             -1.12903226|
|                                             -1.12903226|
+--------------------------------------------------------+

Conclusion

We have gone through many useful functions to find the difference between dates and finding date after certain days. I hope you found this useful.

.

Mahesh Mogal

I am passionate about Cloud, Data Analytics, Machine Learning, and Artificial Intelligence. I like to learn and try out new things. I have started blogging about my experience while learning these exciting technologies.

Stay Updated with Latest Blogs

Get latest blogs delivered to your mail directly.

Recent Posts

Sorting in Spark Dataframe

In this blog, we will learn how to sort rows in spark dataframe based on some column values.

Read More
Removing White Spaces From Data in Spark

White spaces can be a headache if not removed before processing data. We will learn how to remove spaces from data in spark using inbuilt functions.

Read More
Padding Data in Spark Dataframe

In this blog, we will learn how to use rpad and lpad functions to add padding to data in spark dataframe.

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram
Share via
Copy link
Powered by Social Snap