Skip to content
Analyticshut
  • Home
  • AWSExpand
    • IAM
    • S3
  • Big DataExpand
    • Spark
    • Hive
    • Sqoop
    • HDFS
  • Kafka
Analyticshut

Big Data

Bucketing in Hive
Hive

Bucketing in Hive

ByMahesh Mogal August 24, 2020February 12, 2021

With Bucketing in Hive, we can group similar kinds of data and write it to one single file. This allows better performance while reading data & when joining two tables.

bucketing in hive

Read More Bucketing in HiveContinue

Alter Table Partitions in Hive
Hive

Alter Table Partitions in Hive

ByMahesh Mogal August 19, 2020February 12, 2021

We have created partitioned tables, inserted data into them. Now, we will learn how to drop some partition or add a new partition to the table in hive.

altering hive table partition

Read More Alter Table Partitions in HiveContinue

Static vs Dynamic Partitioning in Hive
Hive

Static vs Dynamic Partitioning in Hive

ByMahesh Mogal August 17, 2020February 12, 2021

Hive supports Static and Dynamic Partitions. Let us understand what is difference between them and their use cases.

static vs dynamic

Read More Static vs Dynamic Partitioning in HiveContinue

Partitioning in Hive
Hive

Partitioning in Hive

ByMahesh Mogal August 15, 2020February 12, 2021

Using Partitioning, We can increase hive query performance. But if we do not choose partitioning column correctly it can create small file issue.

Partitioning in Hive

Read More Partitioning in HiveContinue

Adding Custom Schema to Spark Dataframe
Spark

Adding Custom Schema to Spark Dataframe

ByMahesh Mogal August 12, 2020February 12, 2021

We will learn how to specify our custom schema with column names and data types for Spark data frames.

Adding Custom Schema to Spark Dataframe

Read More Adding Custom Schema to Spark DataframeContinue

Reading data from a file in Spark
Spark

Reading data from a file in Spark

ByMahesh Mogal August 12, 2020March 25, 2021

We will learn how to load data from JSON, CSV, TSV, Pipe Delimited or any other type for delimited file to spark Dataframe.

Reading data from a file in Spark

Read More Reading data from a file in SparkContinue

Hive Data Manipulation – Loading Data to Hive Tables
Hive

Hive Data Manipulation – Loading Data to Hive Tables

ByMahesh Mogal August 12, 2020February 12, 2021

We will learn how to load and populate data to hive table. We will also learn how to copy data to hive tables from local system.

Loading Data to Hive Tables

Read More Hive Data Manipulation – Loading Data to Hive TablesContinue

Create, Alter, Delete Tables in Hive
Hive

Create, Alter, Delete Tables in Hive

ByMahesh Mogal August 11, 2020February 12, 2021

We will learn how to create Hive tables, also altering table columns, adding comments and table properties and deleting Hive tables.

manage tables in hive -2

Read More Create, Alter, Delete Tables in HiveContinue

Creating Database in Hive
Hive

Creating Database in Hive

ByMahesh Mogal August 11, 2020February 12, 2021

We will learn how to create databases in Hive with simple operations like listing database, setting database location in HDFS & deleting database.

Create Database in Hive

Read More Creating Database in HiveContinue

Data Types in Hive
Hive

Data Types in Hive

ByMahesh Mogal May 27, 2020February 12, 2021

Hive supports multiple data types like SQL. On top of that, there are multiple complex data types in hive which makes it easy to process data in Hive.

Hive data types

Read More Data Types in HiveContinue

External Vs Internal(Managed) Tables in Hive
Hive

External Vs Internal(Managed) Tables in Hive

ByMahesh Mogal May 26, 2020February 12, 2021

Hive has two types of tables, external and managed. In this blog, we will learn about them and decide which use case is suitable for each table.

external vs Managed tables in Hive

Read More External Vs Internal(Managed) Tables in HiveContinue

What is HDFS – Overview of Hadoop’s distributed file system
HDFS

What is HDFS – Overview of Hadoop’s distributed file system

ByMahesh Mogal December 8, 2019February 12, 2021

HDFS is file system designed by Google and used by Hadoop. It provides reliable, highly available store for data processing. Let us take a look at HDFS and its architecture.

What is HDFS

Read More What is HDFS – Overview of Hadoop’s distributed file systemContinue

Page navigation

Previous PagePrevious 1 2 3 4 Next PageNext
  • Contact
  • About Me
  • Privacy Policy
  • Sitemap

© 2022 Analyticshut

  • Like
  • Facebook
  • Twitter
  • LinkedIn
  • WhatsApp
  • Email
  • Home
  • AWS
    • IAM
    • S3
  • Big Data
    • Spark
    • Hive
    • Sqoop
    • HDFS
  • Kafka