Alter Table Partitions in Hive
We have created partitioned tables, inserted data into them. Now, what if we want to drop some partition or add a new partition to the table? Can we have one partition at different locations? Let us try to answer these questions in this blog post.
Adding Partition To Table
We can run below query to add partition to table.
1 2 3 4 |
ALTER TABLE order_partition_extrenal ADD PARTITION(year='2014', month='02') LOCATION '/apps/hive/warehouse/maheshmogal.db/order_partition/year=2014/month=02'; |
Hive is metastore for tables. Here we are adding new information about partition to table metadata. We are telling hive this partition for this table is has its data at this location.
Does this mean we can have our partitions at diffrent locations?
Of course we can. Not just in different locations but also in different file systems.
Consider use case, you have a huge amount of data but you do not use old data that frequently (something like log data). In that case, you can set up a job that will move old data to S3 ( It is Amazons cheap store service. You can learn more about it here). And then point those old partitions to S3 location. So your latest data will be in HDFS and old partitions in S3 and you can query that hive table seamlessly.
1 2 |
ALTER TABLE log_messages ADD PARTITION (year = 2019, month = 12) LOCATION 's3n://bucket_name/logs/2019/12'; |
Updating & Renaming Partitions in Hive Tables
With Alter table command, we can also update partition table location.
1 2 |
ALTER TABLE log_messages PARTITION (year = 2019, month = 12) SET LOCATION '/maheshmogal.db/order_new/year=2019/month=12'; |
We can also rename existing partitions using below query.
1 2 |
ALTER TABLE order_partition_extrenal PARTITION (year=2013,month=7) RENAME TO PARTITION(year=2019,month=07); |
Dropping Partition from Hive Tables
We can also drop partition from hive tables.
1 2 |
ALTER TABLE order_partition_extrenal DROP PARTITION (year=2019, month=7); |
This will delete the partition from the table. But what about data when you have an external hive table? Hive doe not drop that data. It just removes these details from table metadata. If you also want to drop data along with partition fro external tables then you have to do it manually.
1 |
hadoop fs -rmr /maheshmogal.db/order_new/year=2019/month=7 |
This was a short article, but quite useful. I hope you will find it useful. Partitioning is one of the important topics in the Hive. Partitioning is also one of the core strategies to improve query performance in a hive. In the last few articles, we have covered most of the details of Partitioning in Hive. Next, we will start learning about bucketing an equally important aspect in Hive with its unique features and use cases. Hope to see you there.