Date & Timestamp Functions in Spark

Updated On September 13, 2020 | By Mahesh Mogal

Spark has multiple date and timestamp functions to make our data processing easier. handling date type data can become difficult if we do not know easy functions that we can use. Below is a list of multiple useful functions with examples from the spark. So let us get started.

current_date

Using this function, we can get current date.

from pyspark.sql.functions import current_date,current_timestamp

df = spark.range(2) \
    .withColumn("current_date", current_date()) \
    .withColumn("current_timestamp", current_timestamp())

current_timestamp

Similarly, we can use current_timestamp to get the exact time with date. You can check the above code for it. We can see its output as below.

>>> df.show()
+---+------------+--------------------+
| id|current_date|   current_timestamp|
+---+------------+--------------------+
|  0|  2020-08-19|2020-08-19 04:03:...|
|  1|  2020-08-19|2020-08-19 04:03:...|
+---+------------+--------------------+

dayofmonth

This will return day of month

>>> from pyspark.sql.functions import dayofmonth,col
>>> df.select("current_date", \
...     dayofmonth(col("current_date")) \
...     ).show()
+------------+------------------------+
|current_date|dayofmonth(current_date)|
+------------+------------------------+
|  2020-08-19|                      19|
|  2020-08-19|                      19|
+------------+------------------------+

Day of Week

We can get day of week using aptly name function"dayofweek"

>>> from pyspark.sql.functions import dayofweek
>>> df.select("current_date", \
...     dayofweek(col("current_date")) \
...     ).show()
+------------+-----------------------+
|current_date|dayofweek(current_date)|
+------------+-----------------------+
|  2020-08-19|                      4|
|  2020-08-19|                      4|
+------------+-----------------------+

Day of Year

There is also a function to get the day of the year for any date. In the below example, we are getting a day of the year for today's date which is the 19th of Aug.

>>> from pyspark.sql.functions import dayofyear
>>> df.select("current_date", \
...     dayofyear(col("current_date")) \
...     ).show()
+------------+-----------------------+
|current_date|dayofyear(current_date)|
+------------+-----------------------+
|  2020-08-19|                    232|
|  2020-08-19|                    232|
+------------+-----------------------+
day_week_year
Day of Week and Year Example

Week of Year

We can get week of year from date using code below.

>>> from pyspark.sql.functions import weekofyear
>>> df.select("current_date", \
...     weekofyear(col("current_date")) \       
...     ).show()
+------------+------------------------+
|current_date|weekofyear(current_date)|
+------------+------------------------+
|  2020-08-19|                      34|
|  2020-08-19|                      34|
+------------+------------------------+

Year, Month & Quarter from Date

We can also extract the year, month, and quarter from date using inbuilt functions in spark.

>>> from pyspark.sql.functions import year, month, quarter
>>> df.select("current_date", \
...     year(col("current_date")).alias("year"), \
...     month(col("current_date")).alias("month"), \
...     quarter(col("current_date")).alias("quarter"), \
...     ).show(1)
+------------+----+-----+-------+
|current_date|year|month|quarter|
+------------+----+-----+-------+
|  2020-08-19|2020|    8|      3|
+------------+----+-----+-------+
only showing top 1 row

Last Day of Month

There if function in spark "last_day" which returns the last day of the month from the date provided.

>>> from pyspark.sql.functions import last_day
>>> df.select("current_date", \
...     last_day(col("current_date")) \
...     ).show()
+------------+----------------------+
|current_date|last_day(current_date)|
+------------+----------------------+
|  2020-08-19|            2020-08-31|
|  2020-08-19|            2020-08-31|
+------------+----------------------+

Format Date

We can use below code to format date as well change date format.

from pyspark.sql.functions import lit,to_date
>>> df = df.withColumn("date_to_string", to_date(lit("2020-08-31")))
>>> df.show()
from pyspark.sql.functions import date_format
df.select("current_date", \
    date_format(col("current_date"),"dd-MM-yyyy") \
    ).show()

If you want to know more about formatting date you can read this blog.

Date Difference

We can find a date after or before "x days" using functions below. We can also find the difference between dates and months.

from pyspark.sql.functions import date_add, date_sub,col

df_new = df.select("current_date", \
    date_add(col("current_date"), 7) \
        .alias("date_after_7_days"), \
    date_sub(col("current_date"), 7) \
        .alias("date_before_7_days") \
    ).show()
  • add_months - adding months to current date
  • datediff - difference between dates in days
  • months_between- difference between dates in months

You can get more details about these functions in this blog.

Current Timestamp to Unix timestamp and Vice Versa

Below is an example, which shows how to convert the current timestamp to the Unix timestamp.

from pyspark.sql.functions import unix_timestamp
df.select("current_timestamp", \
    unix_timestamp(col("current_timestamp")) \
    ).show()
+--------------------+------------------------------------------------------+
|   current_timestamp|unix_timestamp(current_timestamp, yyyy-MM-dd HH:mm:ss)|
+--------------------+------------------------------------------------------+
|2020-08-19 05:06:...|                                            1597827997|
|2020-08-19 05:06:...|                                            1597827997|
+--------------------+------------------------------------------------------+

And we can get the current timestamp from Unix timestamp using function from_unixtime.

df_unix =  df.withColumn("unix_timestamp", unix_timestamp(col("current_timestamp")))

from pyspark.sql.functions import from_unixtime
df_unix.select("unix_timestamp", \
    from_unixtime(col("unix_timestamp")) \
    ).show()

If you want to know more about working with timestamps in Spark, please check out this blog.

Conclusion

We have listed many date and timestamp functions from the spark. I hope you will find it useful. if you know more useful function please let me know, i will add it to this list. See you soon.

.

Mahesh Mogal

I am passionate about Cloud, Data Analytics, Machine Learning, and Artificial Intelligence. I like to learn and try out new things. I have started blogging about my experience while learning these exciting technologies.

Stay Updated with Latest Blogs

Get latest blogs delivered to your mail directly.

Recent Posts

Sorting in Spark Dataframe

In this blog, we will learn how to sort rows in spark dataframe based on some column values.

Read More
Removing White Spaces From Data in Spark

White spaces can be a headache if not removed before processing data. We will learn how to remove spaces from data in spark using inbuilt functions.

Read More
Padding Data in Spark Dataframe

In this blog, we will learn how to use rpad and lpad functions to add padding to data in spark dataframe.

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram
Share via
Copy link
Powered by Social Snap