Creating Database in Hive
Hive databases are nothing more than namespace for tables. However, they are very useful on large clusters or with multiple teams/individuals using Hive for different reasons. With Databases, You can have logical separation of tables, table names, and even where data is going get store by default.
Creating Database
You can use simple create database database_name command to create database in hive.
1 |
create database hive_test_analyticshut; |
If this database is already present in Hive, it will throw an error. I gran the same command twice and you can see I got the following error on hive cli.
Handling Error If a Database Is Already Present
To avoid this issue we can use IF NOT EXISTS clause. This is an optional part that we can pass to the Hive query. If this is present hive will ignore if the database is already present and will not throw any error. This is useful when you are running some script and you want your next queries to run without any issue.
1 |
hive (default)> CREATE DATABASE IF NOT EXISTS hive_test_analyticshut; |
Listing All Databases
You can use following query to list all databases in Hive.
1 |
hive> SHOW DATABASES; |
Listing Databases with like/rlike
If you have lot of databases and list down only few you can use like / rlike pattern to use regular expressions to list databases.
1 |
hive> SHOW DATABASES LIKE 'prod*'; |
This command will list down only those databases whose name starts with ‘prod’.
Using a Database
To use particular database you can use following simple query.
1 |
hive> USE hive_test_analyticshut; |
Showing current working DB on CLI
Unfortunately there is no way to know which database is in use currently selected in hive shell. One option is to always use “USE DATABASE” command to be sure every time you need to check the working database. There is a workaround for this. You can use hive CLI property to show the existing database on the hive CLI.
1 2 3 |
hive>set hive.cli.print.current.db=true; hive (defualt)> use hive_test_analyticshut; hive (hive_test_analyticshut)> |
Database Properties and Comments
You can set up properties and comments while creating a database. Please note that you cannot unset database properties once you set them. You can only change them using alter command.
1 2 3 |
hive (default)> CREATE DATABASE hr_Dep > COMMENT "db for hr ops" > WITH DBPROPERTIES ("creator"="Mahesh Mogal", "date"="2020-05-31"); |
We can use comments to give meaningful information about database.
Describing Database
We can use describe the database to know details like comments, properties, and location.
1 2 3 4 5 6 7 8 |
hive (default)> DESCRIBE DATABASE hr_Dep; OK hr_dep db for hr ops hdfs://localhost:8020/apps/hive/warehouse/hr_dep.db maheshmogal USER Time taken: 0.116 seconds, Fetched: 1 row(s) hive (default)> DESCRIBE DATABASE EXTENDED hr_Dep; OK hr_dep db for hr ops hdfs://localhost:8020/apps/hive/warehouse/hr_dep.db maheshmogal USER {date=2020-05-31, creator=Mahesh Mogal} Time taken: 0.115 seconds, Fetched: 1 row(s) |
Describe database commands only shows location and comment for the database. We can use Describe Database Extended to list database properties as well.
Setting Up HDFS Location for Database
In describe database command you can see that each database has HDFS directory assigned to it. All tables created in that database will be stored in this directory. Each table will have its sub-directory created under this location.
One exception to this is the default database in Hive which does not have a directory. Instead it uses a hive metastore directory to store any tables created in the default database.
We can specify particular location while creating database in hive using LOCATION clause.
1 |
hive (default)> CREATE DATABASE admin_ops LOCATION '/some/where/in/hdfs'; |
Once database is created you cannot change its location
Hive facts
Dropping Database
You can use DROP database query to drop empty database from hive.
1 2 3 |
hive (default)> DROP DATABASE IF EXISTS hr_Dep; OK Time taken: 0.532 seconds |
What happens if the database has tables in it?? It will throw an error. By default Hive won’t allow you to drop database which contains tables.
1 2 |
hive (hive_test_analyticshut)> drop database hive_test_analyticshut; FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. InvalidOperationException(message:Database hive_test_analyticshut is not empty. One or more tables exist.) |
To drop database with all tables within it, we can use CASCADE clause.
1 2 3 |
hive (hive_test_analyticshut)> DROP DATABASE IF EXISTS hive_test_analyticshut CASCADE; OK Time taken: 0.92 seconds |
Conclusion
These are basic operations related to Hive databases. Next we will see some basic operation on hive tables before diving deep in advance topics in Hive.