Bucketing in Hive
With Bucketing in Hive, we can group similar kinds of data and write it to one single file. This allows better performance while reading data & when joining two tables.
With Bucketing in Hive, we can group similar kinds of data and write it to one single file. This allows better performance while reading data & when joining two tables.
We have created partitioned tables, inserted data into them. Now, we will learn how to drop some partition or add a new partition to the table in hive.
Hive supports Static and Dynamic Partitions. Let us understand what is difference between them and their use cases.
Using Partitioning, We can increase hive query performance. But if we do not choose partitioning column correctly it can create small file issue.
We will learn how to specify our custom schema with column names and data types for Spark data frames.
We will learn how to load data from JSON, CSV, TSV, Pipe Delimited or any other type for delimited file to spark Dataframe.
We will learn how to load and populate data to hive table. We will also learn how to copy data to hive tables from local system.
We will learn how to create Hive tables, also altering table columns, adding comments and table properties and deleting Hive tables.
We will learn how to create databases in Hive with simple operations like listing database, setting database location in HDFS & deleting database.
Hive supports multiple data types like SQL. On top of that, there are multiple complex data types in hive which makes it easy to process data in Hive.
Hive has two types of tables, external and managed. In this blog, we will learn about them and decide which use case is suitable for each table.
HDFS is file system designed by Google and used by Hadoop. It provides reliable, highly available store for data processing. Let us take a look at HDFS and its architecture.