Data Types in Hive

Like relational database, hive support multiple primitive data types. With them, Hive also supports collection data types as well to make data reading fast from disk. Let us learn more about supported data types in Hive before diving deep deep.

Primitive Data types

You can refer table below to know which primitive data types are supported by hive. Like relational databases it is pretty self explanatory.

Data TypeSize for Data TypeExample
TINYINT1 Byte18
SMALLINT2 Bytes548
INT4 Bytes78495
BIGINT8 Bytes7895462
BOOLEANTRUE | FALSETRUE
FLOATSingle Precision3.14
DOUBLEDouble Precision3.14
STRING"This is test string"
TIMESTAMP (hive version 0.8.0+)13254869(Unix Timestamp) or 2020-05-30 12:34:24 (JDBC complaint time)
BINARY (hive version 0.8.0+)Array of Bytes
DECIMAL(give version 0.11.0+)Precision of 38 digits1.25846
DATE(Hive version 0.12.0 +)Supports only String values in Hive"2020-05-30"
INTERVAL (hive version 1.2.0+)Supports only String with HiveINTERVAL '1' DAY

It is useful to remember that these data types have been implemented in Java. So they follow basic rules of Java typecasting. That means when you try to add INT and FLOAT, INT will be converted to FLOAT and then those two numbers will be added.

Collection Data Types

Along with above mentioned primitive data types, the Hive also supports collection data types like Array, Structs, and Maps. Collection data types are useful when we want to retrieve data for processing. If we save data in normalized fashion then we might need another disk scan to get related data. To optimize disk read time, Hive stores related information at once place so that we can access it faster (for example, storing the address in Struct, or students enrolled classes in an array, etc)

Below are three collection data types supported by Hive

  • STRUCT
    • for example, STRUCT {home_no: STRING, street_name: STRING, city: STRING}
  • MAP
    • Like in many programming languages, Map holds key-value pairs.
    • for example, MAP(STRING, STRING)
  • ARRAY
    • ARRAY holds list of the same types of items.
    • for example, [‘BANANA’, ‘APPLE’, ‘I AM NOT FRUIT BUT I AM STRING’]

There is also UNION type in Hive but support for it is incomplete. Some queries ( like where and group by ) will fail with UNION type.

Hive Facts

Creating a Sample Table

Now that we have learnt about data types in Hive, below you can find sample table created using few of above Data types.

File Encoding for Collection Data Types

Now that we know different data types supported in Hive, the next logical question is how Hive will know in which format that data is stored in file especially for collection data types. How array elements are separated or what separates the hive key and value pair? This is where we can use table properties in Hive.

We Have briefly touched on table properties in create table section. We will see now how to let hive know about the structure of our collection data types. Consider the query below.

Here we can see that how we can let hive know about our data type and how they are represented in a file. For example, Map key and value are separated by ‘:’ and each pair in the map is separated by ‘,’.

We can change these values depending on our data. So we can have a pipe (|) delimited data but collection items can be separated by comma(,).

Last line in query, LINES TERMINATED BY \n, Let hive know that each new line in file contains new row. Currently hive only supports new line for identifying a new row. So you can avoid mentioning it in query. ‘\n’ is default value for this filed.

ROW FORMAT DELIMITED should be first-line after table definition followed by filed separate definitions. Only LOCATION property can be defined before ROW FORMAT DELIMITED.

Hive Facts

Conclusion

We have learned the basics of Hive Data types. Using hive table properties we can let hive know how that data is stored in our files. We will use these properties going forward in the next blog. See you there.

Similar Posts

Leave a Reply

Your email address will not be published.