Data Types in Hive

Updated On August 11, 2020 | By Mahesh Mogal

Like relational database, hive support multiple primitive data types. With them, Hive also supports collection data types as well to make data reading fast from disk. Let us learn more about supported data types in Hive before diving deep deep.

Primitive Data types

You can refer table below to know which primitive data types are supported by hive. Like relational databases it is pretty self explanatory.

Data TypeSize for Data TypeExample
SMALLINT2 Bytes548
INT4 Bytes78495
BIGINT8 Bytes7895462
FLOATSingle Precision3.14
DOUBLEDouble Precision3.14
STRING"This is test string"
TIMESTAMP (hive version 0.8.0+)13254869(Unix Timestamp) or 2020-05-30 12:34:24 (JDBC complaint time)
BINARY (hive version 0.8.0+)Array of Bytes
DECIMAL(give version 0.11.0+)Precision of 38 digits1.25846
DATE(Hive version 0.12.0 +)Supports only String values in Hive"2020-05-30"
INTERVAL (hive version 1.2.0+)Supports only String with HiveINTERVAL '1' DAY

It is useful to remember that these data types have been implemented in Java. So they follow basic rules of Java typecasting. That means when you try to add INT and FLOAT, INT will be converted to FLOAT and then those two numbers will be added.

Collection Data Types

Along with above mentioned primitive data types, the Hive also supports collection data types like Array, Structs, and Maps. Collection data types are useful when we want to retrieve data for processing. If we save data in normalized fashion then we might need another disk scan to get related data. To optimize disk read time, Hive stores related information at once place so that we can access it faster (for example, storing the address in Struct, or students enrolled classes in an array, etc)

Below are three collection data types supported by Hive

    • for example, STRUCT {home_no: STRING, street_name: STRING, city: STRING}
  • MAP
    • Like in many programming languages, Map holds key-value pairs.
    • for example, MAP(STRING, STRING)
    • ARRAY holds list of the same types of items.
    • for example, ['BANANA', 'APPLE', 'I AM NOT FRUIT BUT I AM STRING']

There is also UNION type in Hive but support for it is incomplete. Some queries ( like where and group by ) will fail with UNION type.

Hive Facts

Creating a Sample Table

Now that we have learnt about data types in Hive, below you can find sample table created using few of above Data types.

hive (maheshmogal)> CREATE TABLE student (
                  > id INT,
                  > name STRING,
                  > birth_date DATE,
                  > enrolled_classes ARRAY<STRING>, -- class names in string format
                  > assignments MAP<STRING, DECIMAL>, --assignment name as keys with grades in decimal
                  > address STRUCT<street: STRING, city: STRING, zip: STRING> -- address fields
                  > );
Time taken: 2.38 seconds
hive (maheshmogal)>

File Encoding for Collection Data Types

Now that we know different data types supported in Hive, the next logical question is how Hive will know in which format that data is stored in file especially for collection data types. How array elements are separated or what separates the hive key and value pair? This is where we can use table properties in Hive.

We Have briefly touched on table properties in create table section. We will see now how to let hive know about the structure of our collection data types. Consider the query below.

hive (maheshmogal)> CREATE TABLE student (
                  > id INT,
                  > name STRING,
                  > birth_date DATE,
                  > enrolled_classes ARRAY<STRING>,
                  > assignments MAP<STRING, DECIMAL>,
                  > address STRUCT<street: STRING, city: STRING, zip: STRING>
                  > )

Here we can see that how we can let hive know about our data type and how they are represented in a file. For example, Map key and value are separated by ':' and each pair in the map is separated by ','.

We can change these values depending on our data. So we can have a pipe (|) delimited data but collection items can be separated by comma(,).

Last line in query, LINES TERMINATED BY \n, Let hive know that each new line in file contains new row. Currently hive only supports new line for identifying a new row. So you can avoid mentioning it in query. '\n' is default value for this filed.

ROW FORMAT DELIMITED should be first-line after table definition followed by filed separate definitions. Only LOCATION property can be defined before ROW FORMAT DELIMITED.

Hive Facts


We have learned the basics of Hive Data types. Using hive table properties we can let hive know how that data is stored in our files. We will use these properties going forward in the next blog. See you there.

Hive data types

Mahesh Mogal

I am passionate about Cloud, Data Analytics, Machine Learning, and Artificial Intelligence. I like to learn and try out new things. I have started blogging about my experience while learning these exciting technologies.

Stay Updated with Latest Blogs

Get latest blogs delivered to your mail directly.

Recent Posts

Select Expr in Spark Dataframe

In this blog, we will learn how to use select and expr in the Spark data frame. We will learn multiple use cases along with selectExpr.

Read More
Add, Rename, Drop Columns in Spark Dataframe

We will go through common column operations like add, rename, list, select, and dropping a column from spark dataframe.

Read More
MSCK Repair - Fixing Partitions in Hive Table

We will learn how to add multiple partitions to hive table using msck repair table command in hive.

msck repair hive
Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram
Share via
Copy link
Powered by Social Snap