Date & Timestamp Functions in Spark
Spark has multiple date and timestamp functions to make our data processing easier. handling date type data can become difficult if we do not know easy functions that we can use. Below is a list of multiple useful functions with examples from the spark. So let us get started.
current_date
Using this function, we can get current date.
1 2 3 4 |
from pyspark.sql.functions import current_date,current_timestamp df = spark.range(2) \ .withColumn("current_date", current_date()) \ .withColumn("current_timestamp", current_timestamp()) |
current_timestamp
Similarly, we can use current_timestamp to get the exact time with date. You can check the above code for it. We can see its output as below.
1 2 3 4 5 6 7 |
>>> df.show() +---+------------+--------------------+ | id|current_date| current_timestamp| +---+------------+--------------------+ | 0| 2020-08-19|2020-08-19 04:03:...| | 1| 2020-08-19|2020-08-19 04:03:...| +---+------------+--------------------+ |
dayofmonth
This will return day of month
1 2 3 4 5 6 7 8 9 10 |
>>> from pyspark.sql.functions import dayofmonth,col >>> df.select("current_date", \ ... dayofmonth(col("current_date")) \ ... ).show() +------------+------------------------+ |current_date|dayofmonth(current_date)| +------------+------------------------+ | 2020-08-19| 19| | 2020-08-19| 19| +------------+------------------------+ |
Day of Week
We can get day of week using aptly name function”dayofweek”
1 2 3 4 5 6 7 8 9 10 |
>>> from pyspark.sql.functions import dayofweek >>> df.select("current_date", \ ... dayofweek(col("current_date")) \ ... ).show() +------------+-----------------------+ |current_date|dayofweek(current_date)| +------------+-----------------------+ | 2020-08-19| 4| | 2020-08-19| 4| +------------+-----------------------+ |
Day of Year
There is also a function to get the day of the year for any date. In the below example, we are getting a day of the year for today’s date which is the 19th of Aug.
1 2 3 4 5 6 7 8 9 10 |
>>> from pyspark.sql.functions import dayofyear >>> df.select("current_date", \ ... dayofyear(col("current_date")) \ ... ).show() +------------+-----------------------+ |current_date|dayofyear(current_date)| +------------+-----------------------+ | 2020-08-19| 232| | 2020-08-19| 232| +------------+-----------------------+ |
Week of Year
We can get week of year from date using code below.
1 2 3 4 5 6 7 8 9 10 |
>>> from pyspark.sql.functions import weekofyear >>> df.select("current_date", \ ... weekofyear(col("current_date")) \ ... ).show() +------------+------------------------+ |current_date|weekofyear(current_date)| +------------+------------------------+ | 2020-08-19| 34| | 2020-08-19| 34| +------------+------------------------+ |
Year, Month & Quarter from Date
We can also extract the year, month, and quarter from date using inbuilt functions in spark.
1 2 3 4 5 6 7 8 9 10 11 12 |
>>> from pyspark.sql.functions import year, month, quarter >>> df.select("current_date", \ ... year(col("current_date")).alias("year"), \ ... month(col("current_date")).alias("month"), \ ... quarter(col("current_date")).alias("quarter"), \ ... ).show(1) +------------+----+-----+-------+ |current_date|year|month|quarter| +------------+----+-----+-------+ | 2020-08-19|2020| 8| 3| +------------+----+-----+-------+ only showing top 1 row |
Last Day of Month
There if function in spark “last_day” which returns the last day of the month from the date provided.
1 2 3 4 5 6 7 8 9 10 |
>>> from pyspark.sql.functions import last_day >>> df.select("current_date", \ ... last_day(col("current_date")) \ ... ).show() +------------+----------------------+ |current_date|last_day(current_date)| +------------+----------------------+ | 2020-08-19| 2020-08-31| | 2020-08-19| 2020-08-31| +------------+----------------------+ |
Format Date
We can use below code to format date as well change date format.
1 2 3 |
from pyspark.sql.functions import lit,to_date >>> df = df.withColumn("date_to_string", to_date(lit("2020-08-31"))) >>> df.show() |
1 2 3 4 |
from pyspark.sql.functions import date_format df.select("current_date", \ date_format(col("current_date"),"dd-MM-yyyy") \ ).show() |
If you want to know more about formatting date you can read this blog.
Date Difference
We can find a date after or before “x days” using functions below. We can also find the difference between dates and months.
1 2 3 4 5 6 7 |
from pyspark.sql.functions import date_add, date_sub,col df_new = df.select("current_date", \ date_add(col("current_date"), 7) \ .alias("date_after_7_days"), \ date_sub(col("current_date"), 7) \ .alias("date_before_7_days") \ ).show() |
- add_months – adding months to current date
- datediff – difference between dates in days
- months_between- difference between dates in months
You can get more details about these functions in this blog.
Current Timestamp to Unix timestamp and Vice Versa
Below is an example, which shows how to convert the current timestamp to the Unix timestamp.
1 2 3 4 5 6 7 8 9 10 |
from pyspark.sql.functions import unix_timestamp df.select("current_timestamp", \ unix_timestamp(col("current_timestamp")) \ ).show() +--------------------+------------------------------------------------------+ | current_timestamp|unix_timestamp(current_timestamp, yyyy-MM-dd HH:mm:ss)| +--------------------+------------------------------------------------------+ |2020-08-19 05:06:...| 1597827997| |2020-08-19 05:06:...| 1597827997| +--------------------+------------------------------------------------------+ |
And we can get the current timestamp from Unix timestamp using function from_unixtime.
1 2 3 4 5 |
df_unix = df.withColumn("unix_timestamp", unix_timestamp(col("current_timestamp"))) from pyspark.sql.functions import from_unixtime df_unix.select("unix_timestamp", \ from_unixtime(col("unix_timestamp")) \ ).show() |
If you want to know more about working with timestamps in Spark, please check out this blog.
Conclusion
We have listed many date and timestamp functions from the spark. I hope you will find it useful. if you know more useful function please let me know, i will add it to this list. See you soon.