Creating Database in Hive

Updated On August 11, 2020 | By Mahesh Mogal

Hive databases are nothing more than namespace for tables. However, they are very useful on large clusters or with multiple teams/individuals using Hive for different reasons. With Databases, You can have logical separation of tables, table names, and even where data is going get store by default.

Creating Database

You can use simple create database database_name command to create database in hive.

create database hive_test_analyticshut;

If this database is already present in Hive, it will throw an error. I gran the same command twice and you can see I got the following error on hive cli.

Erro creating database in hive
Error creating database if it is already present

Handling Error If a Database Is Already Present

To avoid this issue we can use IF NOT EXISTS clause. This is an optional part that we can pass to the Hive query. If this is present hive will ignore if the database is already present and will not throw any error. This is useful when you are running some script and you want your next queries to run without any issue.

hive (default)> CREATE DATABASE IF NOT EXISTS hive_test_analyticshut;

Listing All Databases

You can use following query to list all databases in Hive.

hive> SHOW DATABASES;

Listing Databases with like/rlike

If you have lot of databases and list down only few you can use like / rlike pattern to use regular expressions to list databases.

hive> SHOW DATABASES LIKE 'prod*';

This command will list down only those databases whose name starts with 'prod'.

Using a Database

To use particular database you can use following simple query.

hive> USE hive_test_analyticshut;

Showing current working DB on CLI

Unfortunately there is no way to know which database is in use currently selected in hive shell. One option is to always use "USE DATABASE" command to be sure every time you need to check the working database. There is a workaround for this. You can use hive CLI property to show the existing database on the hive CLI.

hive>set hive.cli.print.current.db=true;
hive (defualt)> use hive_test_analyticshut;
hive (hive_test_analyticshut)>

Database Properties and Comments

You can set up properties and comments while creating a database. Please note that you cannot unset database properties once you set them. You can only change them using alter command.

hive (default)> CREATE DATABASE hr_Dep
              > COMMENT "db for hr ops"
              > WITH DBPROPERTIES ("creator"="Mahesh Mogal", "date"="2020-05-31");

We can use comments to give meaningful information about database.

Describing Database

We can use describe the database to know details like comments, properties, and location.

hive (default)> DESCRIBE DATABASE hr_Dep;
OK
hr_dep  db for hr ops   hdfs://localhost:8020/apps/hive/warehouse/hr_dep.db    maheshmogal     USER
Time taken: 0.116 seconds, Fetched: 1 row(s)

hive (default)> DESCRIBE DATABASE EXTENDED hr_Dep;
OK
hr_dep  db for hr ops   hdfs://localhost:8020/apps/hive/warehouse/hr_dep.db    maheshmogal     USER    {date=2020-05-31, creator=Mahesh Mogal}
Time taken: 0.115 seconds, Fetched: 1 row(s)

Describe database commands only shows location and comment for the database. We can use Describe Database Extended to list database properties as well.

Setting Up HDFS Location for Database

In describe database command you can see that each database has HDFS directory assigned to it. All tables created in that database will be stored in this directory. Each table will have its sub-directory created under this location.

One exception to this is the default database in Hive which does not have a directory. Instead it uses a hive metastore directory to store any tables created in the default database.

We can specify particular location while creating database in hive using LOCATION clause.

hive (default)> CREATE DATABASE admin_ops LOCATION '/some/where/in/hdfs';

Once database is created you cannot change its location

Hive facts

Dropping Database

You can use DROP database query to drop empty database from hive.

hive (default)> DROP DATABASE IF EXISTS hr_Dep;
OK
Time taken: 0.532 seconds

What happens if the database has tables in it?? It will throw an error. By default Hive won't allow you to drop database which contains tables.

hive (hive_test_analyticshut)> drop database hive_test_analyticshut;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. InvalidOperationException(message:Database hive_test_analyticshut is not empty. One or more tables exist.)

To drop database with all tables within it, we can use CASCADE clause.

hive (hive_test_analyticshut)> DROP DATABASE IF EXISTS hive_test_analyticshut CASCADE;
OK
Time taken: 0.92 seconds
hive (hive_test_analyticshut)> 

Conclusion

These are basic operations related to Hive databases. Next we will see some basic operation on hive tables before diving deep in advance topics in Hive.

Create Database in Hive
.

Mahesh Mogal

I am passionate about Cloud, Data Analytics, Machine Learning, and Artificial Intelligence. I like to learn and try out new things. I have started blogging about my experience while learning these exciting technologies.

Stay Updated with Latest Blogs

Get latest blogs delivered to your mail directly.

Recent Posts

Select Expr in Spark Dataframe

In this blog, we will learn how to use select and expr in the Spark data frame. We will learn multiple use cases along with selectExpr.

Read More
Add, Rename, Drop Columns in Spark Dataframe

We will go through common column operations like add, rename, list, select, and dropping a column from spark dataframe.

Read More
MSCK Repair - Fixing Partitions in Hive Table

We will learn how to add multiple partitions to hive table using msck repair table command in hive.

msck repair hive
Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram
Share via
Copy link
Powered by Social Snap