Add, Rename, Drop Columns in Spark Dataframe

In this blog, we will go through some of the most used column operations performed on columns of a data frame in Spark. We will start with how to select columns from dataframe. After that, we will go through how to add, rename, and drop columns from spark dataframe. Let us get started.

Selecting Columns from Spark Dataframe

There are multiple ways we can select columns from dataframe. one of Easiest way is to use column names as string in select function of dataframe.

Spark has also provided few inbuilt function to work with columns. Before we can use them we need to import those functions.

There is another popular function “expr” which can be used to select and perform operations on columns.

Listing Columns

There is one simple function in Spark which you can use to list all columns of dataframe.

Adding Columns to dataframe

Spark dataframes are immutable. That means you can not change them once they are created. If you want to change the dataframe any way, you need to create a new one.

In all of the next operations (adding, renaming, and dropping column), I have not created a new dataframe but just used it to print results. If you want to persist these changes just save it to a new dataframe.

We can easily add column using with column function. We can use “expr” function to decide value of new column.

Adding New Column to Spark Dataframe
Adding New Column to Spark Dataframe

Renaming Columns

We can use with column to rename column of dataframe.

You can see that, this is actually adding new column with new name to dataframe. We can use select to remove old column but that is one extra step. There is another function in spark which renames existing column.

Renaming Column in Spark Dataframe
Renaming Column in Spark Dataframe

Dropping Column

Spark provides simple function to drop columns from dataframe.

Dropping Column From Spark Dataframe
Dropping Column From Spark Dataframe

Conclusion

We have gone through some basic operations to handle columns in spark dataframe. When we are analyzing data these will be useful. Hope this helps. See you in the next blog.

Similar Posts

Leave a Reply

Your email address will not be published.