Where and Filter in Spark Dataframes

Updated On February 11, 2021 | By Mahesh Mogal

Filtering rows from dataframe is one of the basic tasks performed when analyzing data with Spark. Spark provides two ways to filter data. Where and Filter function. Both of these functions work in the same way, but mostly we will be using "where" due to its familiarity with SQL.

Using Where / Filter in Spark Dataframe

We can easily filter rows with some conditions as we do in SQL using "Where" function. Say we need to find all rows where the number of flights is more than 50 between the two countries.

We can also use column expressions. This time we will use "Filter" function to get desired rows from dataframe.

Chaining Multiple Conditions

Though it is possible to write multiple where conditions in one statement, it is not necessary. Even when we chain multiple conditions one after another while creating a physical plan for execution spark will optimize these operations in one single step.

That is why it is always a better idea to write multiple where conditions separately which will be easier to understand while reading code.

Multiple Where Clauses Chained Together
Multiple Where Clauses Chained Together

I hope you found this useful :). See you in next blog.

Where and Filter in Spark Dataframes

Mahesh Mogal

I am passionate about Cloud, Data Analytics, Machine Learning, and Artificial Intelligence. I like to learn and try out new things. I have started blogging about my experience while learning these exciting technologies.

Stay Updated with Latest Blogs

Get latest blogs delivered to your mail directly.

Recent Posts

Spark Join Types With Examples

In this blog, we are going to learn different spark join types. We will also write code and validate data output for each join type to better understand them.

Spark Join Types With Examples
Read More
Integrate Spark with Jupyter Notebook and Visual Studio Code

In this blog, we are going to integrate spark with jupyter notebook and visual studio code to create easy-to-use development environment.

Integrate Spark with Jupyter Notebook and Visual Studio Code
Read More
Reading Data From SQL Tables in Spark

In this blog, we are going to learn about reading data from SQL tables in Spark. We will create Spark data frames from tables and query results as well.

Reading Data From SQL Tables in Spark
Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

2 comments on “Where and Filter in Spark Dataframes”

  1. Hi Mahesh ,

    Could you pls share the Data set which you used in your example , it will be great help in practice

linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram
Copy link
Powered by Social Snap