Creating Database in Hive

Hive databases are nothing more than namespace for tables. However, they are very useful on large clusters or with multiple teams/individuals using Hive for different reasons. With Databases, You can have logical separation of tables, table names, and even where data is going get store by default.

Creating Database

You can use simple create database database_name command to create database in hive.

If this database is already present in Hive, it will throw an error. I gran the same command twice and you can see I got the following error on hive cli.

Erro creating database in hive
Error creating database if it is already present

Handling Error If a Database Is Already Present

To avoid this issue we can use IF NOT EXISTS clause. This is an optional part that we can pass to the Hive query. If this is present hive will ignore if the database is already present and will not throw any error. This is useful when you are running some script and you want your next queries to run without any issue.

Listing All Databases

You can use following query to list all databases in Hive.

Listing Databases with like/rlike

If you have lot of databases and list down only few you can use like / rlike pattern to use regular expressions to list databases.

This command will list down only those databases whose name starts with ‘prod’.

Using a Database

To use particular database you can use following simple query.

Showing current working DB on CLI

Unfortunately there is no way to know which database is in use currently selected in hive shell. One option is to always use “USE DATABASE” command to be sure every time you need to check the working database. There is a workaround for this. You can use hive CLI property to show the existing database on the hive CLI.

Database Properties and Comments

You can set up properties and comments while creating a database. Please note that you cannot unset database properties once you set them. You can only change them using alter command.

We can use comments to give meaningful information about database.

Describing Database

We can use describe the database to know details like comments, properties, and location.

Describe database commands only shows location and comment for the database. We can use Describe Database Extended to list database properties as well.

Setting Up HDFS Location for Database

In describe database command you can see that each database has HDFS directory assigned to it. All tables created in that database will be stored in this directory. Each table will have its sub-directory created under this location.

One exception to this is the default database in Hive which does not have a directory. Instead it uses a hive metastore directory to store any tables created in the default database.

We can specify particular location while creating database in hive using LOCATION clause.

Once database is created you cannot change its location

Hive facts

Dropping Database

You can use DROP database query to drop empty database from hive.

What happens if the database has tables in it?? It will throw an error. By default Hive won’t allow you to drop database which contains tables.

To drop database with all tables within it, we can use CASCADE clause.

Conclusion

These are basic operations related to Hive databases. Next we will see some basic operation on hive tables before diving deep in advance topics in Hive.

Similar Posts

Leave a Reply

Your email address will not be published.