Create, Alter, Delete Tables in Hive

Updated On August 12, 2020 | By Mahesh Mogal

Creating a Hive table is similar like creating a table in SQL like databases. However, the Hive offers a lot of flexibility while creating tables from where to store data to which format to use store data. In this blog, we will discuss many of these options and different operations that we can perform on Hive tables.

Creating Table

Like SQL conventions, we can create a Hive table in the following way. We can use the database name prefixed with a table in create a table in that database.

hive (maheshmogal)> CREATE TABLE hive_test (id INT, name STRING);
OK
Time taken: 0.632 seconds
hive (maheshmogal)> CREATE TABLE my_db.hive_test (id INT, name STRING); -- creates this table in database named my_db
hive (maheshmogal)>

Using COMMENT With Tables

Like Databases, we can also use comments to give meaningful information about table columns as well as tables while creating it. Check it below query example.

hive (maheshmogal)> CREATE TABLE employee (
                  >     id INT COMMENT 'employee ID',
                  >     name STRING COMMENT  'employee name'
                  >     )
                  >     COMMENT 'Employee table for some department';
OK
Time taken: 0.341 seconds
hive (maheshmogal)>

Using Table properties in Create Statement

We can also set up hive table properties as we did for databases. However, Table Properties are far more powerful. Table properties can be used to tell hive details about underlying data and can also be used to integrate hive with other databases like HBase or DyanmoDB. In the next chapters, we will learn more about table properties. For now you can use the below query to attach simple properties to the table.

CREATE TABLE employee (
    id INT COMMENT 'employee ID',
    name STRING COMMENT  'employee name'
    )
    COMMENT 'Employee table for some department'
    TBLPROPERTIES ('creator'='Mahesh Mogal', 'created_at'='2020-05-31')
;

Issue With IF NOT EXISTS in Create table

If we run create table command and that table is already present then Hive will throw an error. To handle this gracefully we can add IF NOT EXISTS clause as we did in the creation of databases.

CREATE TABLE IF NOT EXISTS employee (
    id INT COMMENT 'employee ID',
    name STRING COMMENT  'employee name'
    )
;

But there is a catch when using IF NOT EXISTS with a hive table. If the new table has a different schema from the existing table but the same name as the existing table, Hive will not warn you. If you want to give a new schema to this table then you will manually have to delete the old table.

hive (maheshmogal)> CREATE TABLE IF NOT EXISTS employee (
                  >     id INT COMMENT 'employee ID',
                  >     name STRING COMMENT  'employee name',
                  >     address STRING COMMENT 'employee name' -- different schema from existing table
                  >     )
                  > ;
OK
Time taken: 0.02 seconds
hive (maheshmogal)>

Listing Tables

We can use the below queries to list tables. We can specify the database name in a query to list all tables from that database (query 2). We can also use like/rlike with a regular expression to list a subset of tables.

hive> SHOW TABLES;
hive> SHOW TABLES IN my_db;
hive> SHOW TABLES like 'pord*';

Note that using the database name and like/rlike statement (query 2 and query3) do not work together. You can use either of one in a single query.

Describing Table

In the hive, we can use describe command to see table structure, its location as well as its table properties. We can also use DESCRIBE TABLE_NAME, DESCRIBE EXTENDED TABLE_NAME, SHOW CREATE TABLE TABLE_NAME along with DESCRIBE_FORMATTED TABLE_NAME which gives table information in the well-formatted structure.

hive (maheshmogal)> describe formatted employee;
OK
# col_name              data_type               comment

id                      int                     employee ID
name                    string                  employee name

# Detailed Table Information
Database:               maheshmogal
Owner:                  maheshmogal
CreateTime:             Mon May 25 07:57:03 EDT 2020
LastAccessTime:         UNKNOWN
Protect Mode:           None
Retention:              0
Location:               hdfs://localhost:8020/apps/hive/warehouse/maheshmogal.db/employee
Table Type:             MANAGED_TABLE
Table Parameters:
        COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
        comment                 Employee table for some department
        numFiles                0
        numRows                 0
        rawDataSize             0
        totalSize               0
        transient_lastDdlTime   1590407823

# Storage Information
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat:            org.apache.hadoop.mapred.TextInputFormat
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed:             No
Num Buckets:            -1
Bucket Columns:         []
Sort Columns:           []
Storage Desc Params:
        serialization.format    1
Time taken: 0.564 seconds, Fetched: 33 row(s)

Alter Table

When it comes to the table, Alter Table is a versatile command which we can use to do multiple useful things like changing table name, changing column data type, etc. Let us see it in action below.

Renaming Hive Table

We can use Alter table command to rename table. Below is simple example.

hive (maheshmogal)> ALTER TABLE employee RENAME TO emp;
OK
Time taken: 0.584 seconds
hive (maheshmogal)>

Adding Columns to Table

Using Alter Table command we can also change the structure of the table. The below query is adding columns to "emp" table.

hive (maheshmogal)> ALTER TABLE emp ADD COLUMNS (address STRING);
OK
Time taken: 0.349 seconds
hive (maheshmogal)> show create table emp;
OK
CREATE TABLE `emp`(
  `id` int COMMENT 'employee ID',
  `name` string COMMENT 'employee name',
  `address` string) -- address column added to end of table

Changing Order of Columns in Hive Table

We can also change the order of columns in a Table using Alter command. We can see that after running the following alter query "id" column has changed its position with "address" column.

-- query syntax : ALTER TABLE table_name CHANGE old_col new_col_name datatype AFTER col_name_to_replace with;
hive (maheshmogal)> ALTER TABLE emp CHANGE id id int after address;
OK
Time taken: 0.458 seconds
hive (maheshmogal)> show create table emp;
OK
CREATE TABLE `emp`(
  `name` string COMMENT 'employee name',
  `address` string,
  `id` int COMMENT 'employee ID')

Changing Column Datatype in Hive

We can use similar query to change column data type.

hive (maheshmogal)> ALTER TABLE emp CHANGE id id bigint ;
OK
Time taken: 0.393 seconds
hive (maheshmogal)> show create table emp;
OK
CREATE TABLE `emp`(
  `name` string COMMENT 'employee name',
  `address` string,
  `id` bigint COMMENT 'employee ID')

Replacing All Columns in Hive Table

There might be some cases where you want to replace all column names and their data types with new columns. We Can achieve this with the ALTER query as well.

hive (maheshmogal)> ALTER TABLE emp REPLACE COLUMNS (id bigint, name string) ;
OK
Time taken: 0.395 seconds
hive (maheshmogal)> show create table emp;
OK
CREATE TABLE `emp`(
  `id` bigint,
  `name` string)
-- here we can see that all columns have been replaced with a new set of columns

Changing Table Properties

With ALTER query, We can add new table properties or change the existing ones. We will dive deep into table properties in future chapters. For now we can learn how to add new table property in the hive.

hive (maheshmogal)> ALTER TABLE emp SET TBLPROPERTIES('auto.purge'='true');
OK
Time taken: 0.313 seconds
hive (maheshmogal)>

Dropping Table In Hive

We can drop the table in hive with simple SQL like a command. We can also add IF EXISTS clause to make sure we do not get an error if the table is not present in Hive.

hive (maheshmogal)> DROP TABLE emp;
OK
Time taken: 0.563 seconds
hive (maheshmogal)> DROP TABLE IF EXISTS emp;
OK
Time taken: 0.034 seconds
hive (maheshmogal)>

Conclusion

In this chapter, we have learned basic commands for managing tables in the hive. Now we will start diving deep into Hive concepts. See you there 🙂

manage tables in hive -2
.

Mahesh Mogal

I am passionate about Cloud, Data Analytics, Machine Learning, and Artificial Intelligence. I like to learn and try out new things. I have started blogging about my experience while learning these exciting technologies.

Stay Updated with Latest Blogs

Get latest blogs delivered to your mail directly.

Recent Posts

Select Expr in Spark Dataframe

In this blog, we will learn how to use select and expr in the Spark data frame. We will learn multiple use cases along with selectExpr.

Read More
Add, Rename, Drop Columns in Spark Dataframe

We will go through common column operations like add, rename, list, select, and dropping a column from spark dataframe.

Read More
MSCK Repair - Fixing Partitions in Hive Table

We will learn how to add multiple partitions to hive table using msck repair table command in hive.

msck repair hive
Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram
Share via
Copy link
Powered by Social Snap