String Functions in Spark

Updated On March 20, 2021 | By Mahesh Mogal

While processing data, working with strings is one of the most used tasks. That is why spark has provided multiple functions that can be used to process string data easily. In this blog, we will be listing most of the string functions in spark.

Capitalize Word

We can use "initCap" function to capitalize word in string.

Uppercase

We can use the "upper" function to convert all characters in a string to uppercase.

Lowercase

Similarly, we can use the"lower" function to convert string to lowercase characters.

Trim - Removing White Spaces

We can use the trim function to remove leading and trailing white spaces from data in spark.

There are other two functions as well. ltrim and rtrim. These functions can be used to remove leading white spaces and trailing white spaces respectively. If you need detail about these function you can read this blog.

Padding Data in Spark

we can use lpad and rpad functions to add padding to data in spark. These function can be used to format data if needed.

lpad example
Lpad Example

For more details about padding, you can read this blog.

regexp_replace

regexp_replace is a powerful function and can be used for multiple purposes from removing white spaces to replacing the string with something else. In the below example, we are replacing the "United States" with "us".

regexp_replace_example

translate

Though regexp_replace is a powerful function, it can be difficult to use in many cases. That is why spark has provided some useful functions to deal with strings.

Using translate function we can replace one or more characters to another character.

instr

Sometimes we need to check if the string contains some other string or not. For doing this, we can use the instr function.

instr example
instr example

Conclusion

We have seen multiple useful string functions in spark. This list by no meas is exhaustive. I will keep adding more functions when i encounter them. If you know some function to handle string do let me know. I hope you found this useful. See you in the next blog (Y)

Mahesh Mogal

I am passionate about Cloud, Data Analytics, Machine Learning, and Artificial Intelligence. I like to learn and try out new things. I have started blogging about my experience while learning these exciting technologies.

Stay Updated with Latest Blogs

Get latest blogs delivered to your mail directly.

Recent Posts

Spark Join Types With Examples

In this blog, we are going to learn different spark join types. We will also write code and validate data output for each join type to better understand them.

Read More
Integrate Spark with Jupyter Notebook and Visual Studio Code

In this blog, we are going to integrate spark with jupyter notebook and visual studio code to create easy-to-use development environment.

Read More
Reading Data From SQL Tables in Spark

In this blog, we are going to learn about reading data from SQL tables in Spark. We will create Spark data frames from tables and query results as well.

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram