Supported Data Types In Hive

Like relational database, hive support multiple primitive data types. With them, Hive also supports collection data types as well to make data reading fast from disk. Let us learn more about supported data types in Hive before diving deep deep.

Primitive Data types

You can refer table below to know which primitive data types are supported by hive. Like relational databases it is pretty self explanatory.

Data Type	Size for Data Type	Example
TINYINT	1 Byte	18
SMALLINT	2 Bytes	548
INT	4 Bytes	78495
BIGINT	8 Bytes	7895462
BOOLEAN	TRUE \| FALSE	TRUE
FLOAT	Single Precision	3.14
DOUBLE	Double Precision	3.14
STRING		"This is test string"
TIMESTAMP (hive version 0.8.0+)		13254869(Unix Timestamp) or 2020-05-30 12:34:24 (JDBC complaint time)
BINARY (hive version 0.8.0+)		Array of Bytes
DECIMAL(give version 0.11.0+)	Precision of 38 digits	1.25846
DATE(Hive version 0.12.0 +)	Supports only String values in Hive	"2020-05-30"
INTERVAL (hive version 1.2.0+)	Supports only String with Hive	INTERVAL '1' DAY

It is useful to remember that these data types have been implemented in Java. So they follow basic rules of Java typecasting. That means when you try to add INT and FLOAT, INT will be converted to FLOAT and then those two numbers will be added.

Collection Data Types

Along with above mentioned primitive data types, the Hive also supports collection data types like Array, Structs, and Maps. Collection data types are useful when we want to retrieve data for processing. If we save data in normalized fashion then we might need another disk scan to get related data. To optimize disk read time, Hive stores related information at once place so that we can access it faster (for example, storing the address in Struct, or students enrolled classes in an array, etc)

Below are three collection data types supported by Hive

STRUCT
- for example, STRUCT {home_no: STRING, street_name: STRING, city: STRING}
MAP
- Like in many programming languages, Map holds key-value pairs.
- for example, MAP(STRING, STRING)
ARRAY
- ARRAY holds list of the same types of items.
- for example, [‘BANANA’, ‘APPLE’, ‘I AM NOT FRUIT BUT I AM STRING’]

There is also UNION type in Hive but support for it is incomplete. Some queries ( like where and group by ) will fail with UNION type.
Hive Facts

Creating a Sample Table

Now that we have learnt about data types in Hive, below you can find sample table created using few of above Data types.

hive (maheshmogal)> CREATE TABLE student (
                  > id INT,
                  > name STRING,
                  > birth_date DATE,
                  > enrolled_classes ARRAY&lt;STRING>, -- class names in string format
                  > assignments MAP&lt;STRING, DECIMAL>, --assignment name as keys with grades in decimal
                  > address STRUCT&lt;street: STRING, city: STRING, zip: STRING> -- address fields
                  > );
OK
Time taken: 2.38 seconds
hive (maheshmogal)>

hive (maheshmogal)> CREATE TABLE student (

> id INT,

> name STRING,

> birth_date DATE,

> enrolled_classes ARRAY<STRING>, -- class names in string format

> assignments MAP<STRING, DECIMAL>, --assignment name as keys with grades in decimal

> address STRUCT<street: STRING, city: STRING, zip: STRING> -- address fields

> );

Time taken: 2.38 seconds

hive (maheshmogal)>

File Encoding for Collection Data Types

Now that we know different data types supported in Hive, the next logical question is how Hive will know in which format that data is stored in file especially for collection data types. How array elements are separated or what separates the hive key and value pair? This is where we can use table properties in Hive.

We Have briefly touched on table properties in create table section. We will see now how to let hive know about the structure of our collection data types. Consider the query below.

hive (maheshmogal)> CREATE TABLE student (
                  > id INT,
                  > name STRING,
                  > birth_date DATE,
                  > enrolled_classes ARRAY&lt;STRING>,
                  > assignments MAP&lt;STRING, DECIMAL>,
                  > address STRUCT&lt;street: STRING, city: STRING, zip: STRING>
                  > )
		 > ROW FORMAT DELIMITED
		> FIELDS TERMINATED BY ','
		> COLLECTION ITEMS TERMINATED BY ','
		> MAP KEYS TERMINATED BY ':'
		> LINES TERMINATED BY '\n';

hive (maheshmogal)> CREATE TABLE student (

> id INT,

> name STRING,

> birth_date DATE,

> enrolled_classes ARRAY<STRING>,

> assignments MAP<STRING, DECIMAL>,

> address STRUCT<street: STRING, city: STRING, zip: STRING>

> )

> ROW FORMAT DELIMITED

> FIELDS TERMINATED BY ','

> COLLECTION ITEMS TERMINATED BY ','

> MAP KEYS TERMINATED BY ':'

> LINES TERMINATED BY '\n';

Here we can see that how we can let hive know about our data type and how they are represented in a file. For example, Map key and value are separated by ‘:’ and each pair in the map is separated by ‘,’.

We can change these values depending on our data. So we can have a pipe (|) delimited data but collection items can be separated by comma(,).

Last line in query, LINES TERMINATED BY \n, Let hive know that each new line in file contains new row. Currently hive only supports new line for identifying a new row. So you can avoid mentioning it in query. ‘\n’ is default value for this filed.

ROW FORMAT DELIMITED should be first-line after table definition followed by filed separate definitions. Only LOCATION property can be defined before ROW FORMAT DELIMITED.
Hive Facts

Conclusion

We have learned the basics of Hive Data types. Using hive table properties we can let hive know how that data is stored in our files. We will use these properties going forward in the next blog. See you there.

Data Types in Hive

Primitive Data types

Collection Data Types

Creating a Sample Table

File Encoding for Collection Data Types

Conclusion

Hive – What is the difference between Collect Set and Collect List

Partitioning in Hive

Static vs Dynamic Partitioning in Hive

MSCK Repair – Fixing Partitions in Hive Table

Hive Data Manipulation – Loading Data to Hive Tables

Hive – Advanced Aggregations with Grouping sets, Rollup and cube

Primitive Data types

Collection Data Types

Creating a Sample Table

File Encoding for Collection Data Types

Conclusion

Similar Posts