Create, Alter, Delete Tables in Hive
Creating a Hive table is similar like creating a table in SQL like databases. However, the Hive offers a lot of flexibility while creating tables from where to store data to which format to use store data. In this blog, we will discuss many of these options and different operations that we can perform on Hive tables.
Creating Table
Like SQL conventions, we can create a Hive table in the following way. We can use the database name prefixed with a table in create a table in that database.
1 2 3 4 5 |
CREATE TABLE hive_test (id INT, name STRING); OK Time taken: 0.632 seconds hive (maheshmogal)> CREATE TABLE my_db.hive_test (id INT, name STRING); -- creates this table in database named my_db hive (maheshmogal)> |
Using COMMENT With Tables
Like Databases, we can also use comments to give meaningful information about table columns as well as tables while creating it. Check it below query example.
1 2 3 4 5 6 7 8 |
hive (maheshmogal)> CREATE TABLE employee ( > id INT COMMENT 'employee ID', > name STRING COMMENT 'employee name' > ) > COMMENT 'Employee table for some department'; OK Time taken: 0.341 seconds hive (maheshmogal)> |
Using Table properties in Create Statement
We can also set up hive table properties as we did for databases. However, Table Properties are far more powerful. Table properties can be used to tell hive details about underlying data and can also be used to integrate hive with other databases like HBase or DyanmoDB. In the next chapters, we will learn more about table properties. For now you can use the below query to attach simple properties to the table.
1 2 3 4 5 6 |
CREATE TABLE employee ( id INT COMMENT 'employee ID', name STRING COMMENT 'employee name' ) COMMENT 'Employee table for some department' TBLPROPERTIES ('creator'='Mahesh Mogal', 'created_at'='2020-05-31') |
Issue With IF NOT EXISTS in Create table
If we run create table command and that table is already present then Hive will throw an error. To handle this gracefully we can add IF NOT EXISTS clause as we did in the creation of databases.
1 2 3 4 5 |
CREATE TABLE IF NOT EXISTS employee ( id INT COMMENT 'employee ID', name STRING COMMENT 'employee name' ) ; |
But there is a catch when using IF NOT EXISTS with a hive table. If the new table has a different schema from the existing table but the same name as the existing table, Hive will not warn you. If you want to give a new schema to this table then you will manually have to delete the old table.
1 2 3 4 5 6 |
hive (maheshmogal)> CREATE TABLE IF NOT EXISTS employee ( > id INT COMMENT 'employee ID', > name STRING COMMENT 'employee name', > address STRING COMMENT 'employee name' -- different schema from existing table > ) > ; |
Listing Tables
We can use the below queries to list tables. We can specify the database name in a query to list all tables from that database (query 2). We can also use like/rlike with a regular expression to list a subset of tables.
1 2 3 |
hive> SHOW TABLES; hive> SHOW TABLES IN my_db; hive> SHOW TABLES like 'pord*'; |
Note that using the database name and like/rlike statement (query 2 and query3) do not work together. You can use either of one in a single query.
Describing Table
In the hive, we can use describe command to see table structure, its location as well as its table properties. We can also use DESCRIBE TABLE_NAME, DESCRIBE EXTENDED TABLE_NAME, SHOW CREATE TABLE TABLE_NAME along with DESCRIBE_FORMATTED TABLE_NAME which gives table information in the well-formatted structure.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
hive (maheshmogal)> describe formatted employee; OK # col_name data_type comment id int employee ID name string employee name # Detailed Table Information Database: maheshmogal Owner: maheshmogal CreateTime: Mon May 25 07:57:03 EDT 2020 LastAccessTime: UNKNOWN Protect Mode: None Retention: 0 Location: hdfs://localhost:8020/apps/hive/warehouse/maheshmogal.db/employee Table Type: MANAGED_TABLE Table Parameters: COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"} comment Employee table for some department numFiles 0 numRows 0 rawDataSize 0 totalSize 0 transient_lastDdlTime 1590407823 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat: org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets: -1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: serialization.format 1 Time taken: 0.564 seconds, Fetched: 33 row(s) |
Alter Table
When it comes to the table, Alter Table is a versatile command which we can use to do multiple useful things like changing table name, changing column data type, etc. Let us see it in action below.
Renaming Hive Table
We can use Alter table command to rename table. Below is simple example.
1 2 3 4 |
ALTER TABLE employee RENAME TO emp; OK Time taken: 0.584 seconds hive (maheshmogal)> |
Adding Columns to Table
Using Alter Table command we can also change the structure of the table. The below query is adding columns to “emp” table.
1 2 3 4 5 6 7 8 9 |
hive (maheshmogal)> ALTER TABLE emp ADD COLUMNS (address STRING); OK Time taken: 0.349 seconds hive (maheshmogal)> show create table emp; OK CREATE TABLE `emp`( `id` int COMMENT 'employee ID', `name` string COMMENT 'employee name', `address` string) -- address column added to end of table |
Changing Order of Columns in Hive Table
We can also change the order of columns in a Table using Alter command. We can see that after running the following alter query “id” column has changed its position with “address” column.
1 2 3 4 5 6 7 8 9 10 |
-- query syntax : ALTER TABLE table_name CHANGE old_col new_col_name datatype AFTER col_name_to_replace with; hive (maheshmogal)> ALTER TABLE emp CHANGE id id int after address; OK Time taken: 0.458 seconds hive (maheshmogal)> show create table emp; OK CREATE TABLE `emp`( `name` string COMMENT 'employee name', `address` string, `id` int COMMENT 'employee ID') |
Changing Column Datatype in Hive
We can use similar query to change column data type.
1 2 3 4 5 6 7 8 9 |
hive (maheshmogal)> ALTER TABLE emp CHANGE id id bigint ; OK Time taken: 0.393 seconds hive (maheshmogal)> show create table emp; OK CREATE TABLE `emp`( `name` string COMMENT 'employee name', `address` string, `id` bigint COMMENT 'employee ID') |
Replacing All Columns in Hive Table
There might be some cases where you want to replace all column names and their data types with new columns. We Can achieve this with the ALTER query as well.
1 2 3 4 5 6 7 8 9 |
hive (maheshmogal)> ALTER TABLE emp REPLACE COLUMNS (id bigint, name string) ; OK Time taken: 0.395 seconds hive (maheshmogal)> show create table emp; OK CREATE TABLE `emp`( `id` bigint, `name` string) -- here we can see that all columns have been replaced with a new set of columns |
Changing Table Properties
With ALTER query, We can add new table properties or change the existing ones. We will dive deep into table properties in future chapters. For now we can learn how to add new table property in the hive.
1 2 3 4 |
hive (maheshmogal)> ALTER TABLE emp SET TBLPROPERTIES('auto.purge'='true'); OK Time taken: 0.313 seconds hive (maheshmogal)> |
Dropping Table In Hive
We can drop the table in hive with simple SQL like a command. We can also add IF EXISTS clause to make sure we do not get an error if the table is not present in Hive.
1 2 3 4 5 6 7 |
hive (maheshmogal)> DROP TABLE emp; OK Time taken: 0.563 seconds hive (maheshmogal)> DROP TABLE IF EXISTS emp; OK Time taken: 0.034 seconds hive (maheshmogal)> |
Conclusion
In this chapter, we have learned basic commands for managing tables in the hive. Now we will start diving deep into Hive concepts. See you there 🙂