Inserting Data In Hive Partitioned tables

Updated On August 11, 2020 | By Mahesh Mogal

We have created partitioned tables in Hive. We have also learned about static vs dynamic partitions in the last two blogs. One this missing is how do we put data to these partitioned tables. Let us learn how we can insert data in partitioned tables in different ways.

Inserting Data In Partitioned Table

We have seen bits and pieces of information about how to insert data in the Partitioned table. Say we have a file in our local system and we want to put that in the employee table. Then we can run the following query.

LOAD DATA LOCAL INPATH 'hr.txt'
INTO TABLE employee_dept
PARTITION (dept_name='HR');

Here we are inserting data to HR partition. We need to make sure that this data belongs to HR partition only.

Instead of a file, we may have an un-partitioned employee table. And we want to improve performance so we decide to put this data in a partitioned table. We can do this with a multi-table insert statement like below. This will take data from the base table in insert into partitions.

FROM staged_employees se
INSERT OVERWRITE TABLE employees
 PARTITION (country = 'US', state = 'OR')
 SELECT * WHERE se.cnty = 'US' AND se.st = 'OR'
INSERT OVERWRITE TABLE employees
 PARTITION (country = 'US', state = 'CA')
 SELECT * WHERE se.cnty = 'US' AND se.st = 'CA'
INSERT OVERWRITE TABLE employees
 PARTITION (country = 'US', state = 'IL')
 SELECT * WHERE se.cnty = 'US' AND se.st = 'IL';

Inserting Data In Dynamic Partitions

This all good. But we are using static partitioning here. If we have 100's of partitions then it is not optimal way to write 100 clauses in query. We can use dynamic partitioning for this. Before Using dynamic partitioning we need to tell hive that we want to use dynamic partitioning.

There are two properties to let hive know that we want o use dynamic partitioning. First, we need to enable dynamic partitions. This enables partitioning in strict mode. That means we need to at least have one static partition in the table before creating other partitions dynamically. If we want to avoid creating static partitions altogether, then we can set this mode to non-strict. (use below hive properties to do so).

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;

Once we set these properties we can run below query to insert data dynamically to each partition.

insert overwrite table order_partition partition (year,month)
 select order_id, order_date, order_status, substr(order_date,1,4) ye,
 substr(order_date,5,2) mon from orders;

This will insert data to year and month partitions for the order table. Hive takes partition values from the last two columns "ye" and "mon". I have given different names than partitioned column names to emphasize that there is no column name relationship between data nad partitioned columns. Hive always takes last column/s as partitioned column information. That is why when inserting data in the partitioned table, we have to make sure partitioned columns are last in our select query.

When inserting data to partitioned table using select query, we need to make sure that partitioned columns are at last of select query. Hive will pick those values as partitioned columns directly

Hive Facts

Mixing Static and Dynamic Partitions in Insert Queries

While inserting data in partitioned tables, we can mix static and dynamic partition in one single query. Let us take a look at query below.

 INSERT OVERWRITE TABLE order_partition 
partition (year='2019',month)
 SELECT order_id, order_date, order_status, 
substr(order_date,1,4) ye, substr(order_date,5,2) mon 
FROM orders
WHERE substr(order_date,1,4) ="2019";

In this query, we have set year value to 2019 only. This will create all month partitions under 2019. We also need to make sure we select data for the 2019 year only. This is why we are using where clause. This is how we can combine using static and dynamic partitions in a single insert query.

Conclusion

We have learned different ways to insert data in dynamic partitioned tables. In static partitioning, we have to give partitioned values. With dynamic partitioning, hive picks partition values directly from the query. We can also mix static and dynamic partition while inserting data into the table. I hope you found this article helpful. See you in the next one.

inserting data to hive partition
.

Mahesh Mogal

I am passionate about Cloud, Data Analytics, Machine Learning, and Artificial Intelligence. I like to learn and try out new things. I have started blogging about my experience while learning these exciting technologies.

Stay Updated with Latest Blogs

Get latest blogs delivered to your mail directly.

Recent Posts

Select Expr in Spark Dataframe

In this blog, we will learn how to use select and expr in the Spark data frame. We will learn multiple use cases along with selectExpr.

Read More
Add, Rename, Drop Columns in Spark Dataframe

We will go through common column operations like add, rename, list, select, and dropping a column from spark dataframe.

Read More
MSCK Repair - Fixing Partitions in Hive Table

We will learn how to add multiple partitions to hive table using msck repair table command in hive.

msck repair hive
Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram
Share via
Copy link
Powered by Social Snap