Category: Big Data

External Vs Internal(Managed) Tables in Hive

Hive has two types of tables, external and managed. In this blog, we will learn about them and decide which use case is suitable for each table.

external vs Managed tables in Hive
Read More
What is HDFS - Overview of Hadoop's distributed file system

HDFS is file system designed by Google and used by Hadoop. It provides reliable, highly available store for data processing. Let us take a look at HDFS and its architecture.

What is HDFS
Read More
How to set up Cloudera Quickstart VM on windows

We need Hadoop environment for practice and setting that up on Linux is no fun. There is better alternative in using Cloudera virtual machine. Let me show you how can we set up and use Cloudera quick-start VM to get hands on practice for Hadoop.

Set up Cloudera in windows
Read More
Pivot rows to columns in Hive

There are multiple use cases when we need to transpose/pivot table and Hive does not provide us with easy function to do so. Let me show you workaround for how to pivot table in hive.

pivot table - Transpose rows to columns in Hive
Read More
Hive - What is the difference between Collect Set and Collect List

There are many advanced aggregate functions in hive. Lets take a look at look at collect_set and collect_list and how can we use them effectively.

collect set and collect list in Apache Hive
Read More
Hive - Advanced Aggregations with Grouping sets, Rollup and cube

In this blog, we will take look at another set of advanced aggregation functions in hive.

hive Aggregations grouping set, role up and cube
Read More
Apache Sqoop: Import Data to HDFS - Part 2

In second part of sqoop import, We will learn additional parameters for using Import more effectively.

Sqoop Import data to HDFS
Read More
Apache Sqoop - Import data to HDFS

In this blog, we will learn how to use sqoop import command and its different parameters to move data to HDFS.

Apache Sqoop Import data to HDFS
Read More
Apache Sqoop Introduction

One reason which made Hadoop ecosystem popular is its ability to process different forms of data. But not all data is present in HDFS i.e Hadoop Distributed File System. We have been using relational databases to store and process structured data for a long time. That is why a lot of data still resides in RDBMS […]

Featured image for Sqoop Intro
Read More
linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram