Creating Database in Hive

Updated On February 12, 2021 | By Mahesh Mogal

Hive databases are nothing more than namespace for tables. However, they are very useful on large clusters or with multiple teams/individuals using Hive for different reasons. With Databases, You can have logical separation of tables, table names, and even where data is going get store by default.

Creating Database

You can use simple create database database_name command to create database in hive.

If this database is already present in Hive, it will throw an error. I gran the same command twice and you can see I got the following error on hive cli.

Erro creating database in hive
Error creating database if it is already present

Handling Error If a Database Is Already Present

To avoid this issue we can use IF NOT EXISTS clause. This is an optional part that we can pass to the Hive query. If this is present hive will ignore if the database is already present and will not throw any error. This is useful when you are running some script and you want your next queries to run without any issue.

Listing All Databases

You can use following query to list all databases in Hive.

Listing Databases with like/rlike

If you have lot of databases and list down only few you can use like / rlike pattern to use regular expressions to list databases.

This command will list down only those databases whose name starts with 'prod'.

Using a Database

To use particular database you can use following simple query.

Showing current working DB on CLI

Unfortunately there is no way to know which database is in use currently selected in hive shell. One option is to always use "USE DATABASE" command to be sure every time you need to check the working database. There is a workaround for this. You can use hive CLI property to show the existing database on the hive CLI.

Database Properties and Comments

You can set up properties and comments while creating a database. Please note that you cannot unset database properties once you set them. You can only change them using alter command.

We can use comments to give meaningful information about database.

Describing Database

We can use describe the database to know details like comments, properties, and location.

Describe database commands only shows location and comment for the database. We can use Describe Database Extended to list database properties as well.

Setting Up HDFS Location for Database

In describe database command you can see that each database has HDFS directory assigned to it. All tables created in that database will be stored in this directory. Each table will have its sub-directory created under this location.

One exception to this is the default database in Hive which does not have a directory. Instead it uses a hive metastore directory to store any tables created in the default database.

We can specify particular location while creating database in hive using LOCATION clause.

Once database is created you cannot change its location

Hive facts

Dropping Database

You can use DROP database query to drop empty database from hive.

What happens if the database has tables in it?? It will throw an error. By default Hive won't allow you to drop database which contains tables.

To drop database with all tables within it, we can use CASCADE clause.

Conclusion

These are basic operations related to Hive databases. Next we will see some basic operation on hive tables before diving deep in advance topics in Hive.

Mahesh Mogal

I am passionate about Cloud, Data Analytics, Machine Learning, and Artificial Intelligence. I like to learn and try out new things. I have started blogging about my experience while learning these exciting technologies.

Stay Updated with Latest Blogs

Get latest blogs delivered to your mail directly.

Recent Posts

Spark Join Types With Examples

In this blog, we are going to learn different spark join types. We will also write code and validate data output for each join type to better understand them.

Read More
Integrate Spark with Jupyter Notebook and Visual Studio Code

In this blog, we are going to integrate spark with jupyter notebook and visual studio code to create easy-to-use development environment.

Read More
Reading Data From SQL Tables in Spark

In this blog, we are going to learn about reading data from SQL tables in Spark. We will create Spark data frames from tables and query results as well.

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram