Inserting Data In Hive Partitioned tables

Updated On February 12, 2021 | By Mahesh Mogal

We have created partitioned tables in Hive. We have also learned about static vs dynamic partitions in the last two blogs. One this missing is how do we put data to these partitioned tables. Let us learn how we can insert data in partitioned tables in different ways.

Inserting Data In Partitioned Table

We have seen bits and pieces of information about how to insert data in the Partitioned table. Say we have a file in our local system and we want to put that in the employee table. Then we can run the following query.

Here we are inserting data to HR partition. We need to make sure that this data belongs to HR partition only.

Instead of a file, we may have an un-partitioned employee table. And we want to improve performance so we decide to put this data in a partitioned table. We can do this with a multi-table insert statement like below. This will take data from the base table in insert into partitions.

Inserting Data In Dynamic Partitions

This all good. But we are using static partitioning here. If we have 100's of partitions then it is not optimal way to write 100 clauses in query. We can use dynamic partitioning for this. Before Using dynamic partitioning we need to tell hive that we want to use dynamic partitioning.

There are two properties to let hive know that we want o use dynamic partitioning. First, we need to enable dynamic partitions. This enables partitioning in strict mode. That means we need to at least have one static partition in the table before creating other partitions dynamically. If we want to avoid creating static partitions altogether, then we can set this mode to non-strict. (use below hive properties to do so).

Once we set these properties we can run below query to insert data dynamically to each partition.

This will insert data to year and month partitions for the order table. Hive takes partition values from the last two columns "ye" and "mon". I have given different names than partitioned column names to emphasize that there is no column name relationship between data nad partitioned columns. Hive always takes last column/s as partitioned column information. That is why when inserting data in the partitioned table, we have to make sure partitioned columns are last in our select query.

When inserting data to partitioned table using select query, we need to make sure that partitioned columns are at last of select query. Hive will pick those values as partitioned columns directly

Hive Facts

Mixing Static and Dynamic Partitions in Insert Queries

While inserting data in partitioned tables, we can mix static and dynamic partition in one single query. Let us take a look at query below.

In this query, we have set year value to 2019 only. This will create all month partitions under 2019. We also need to make sure we select data for the 2019 year only. This is why we are using where clause. This is how we can combine using static and dynamic partitions in a single insert query.

Conclusion

We have learned different ways to insert data in dynamic partitioned tables. In static partitioning, we have to give partitioned values. With dynamic partitioning, hive picks partition values directly from the query. We can also mix static and dynamic partition while inserting data into the table. I hope you found this article helpful. See you in the next one.

inserting data to hive partition

Mahesh Mogal

I am passionate about Cloud, Data Analytics, Machine Learning, and Artificial Intelligence. I like to learn and try out new things. I have started blogging about my experience while learning these exciting technologies.

Stay Updated with Latest Blogs

Get latest blogs delivered to your mail directly.

Recent Posts

Where and Filter in Spark Dataframes

In this blog, we will learn how to filter rows from spark dataframe using Where and Filter functions.

Where and Filter in Spark Dataframes
Read More
Distinct Rows and Distinct Count from Spark Dataframe

Getting distinct values from columns or rows is one of most used operations. We will learn how to get distinct values as well as count of distinct values.

Distinct Rows and Distinct Count from Spark Dataframe
Read More
Sorting in Spark Dataframe

In this blog, we will learn how to sort rows in spark dataframe based on some column values.

Sorting in Spark Dataframe
Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

One comment on “Inserting Data In Hive Partitioned tables”

linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram
Share via
Copy link
Powered by Social Snap