Big Data | Analyticshut

Reading data from a file in Spark

Spark

Reading data from a file in Spark

ByMahesh Mogal August 12, 2020November 25, 2024

We will learn how to load data from JSON, CSV, TSV, Pipe Delimited or any other type for delimited file to spark Dataframe.

What is HDFS

HDFS

What is HDFS – Overview of Hadoop’s distributed file system

ByMahesh Mogal December 8, 2019November 25, 2024

HDFS is file system designed by Google and used by Hadoop. It provides reliable, highly available store for data processing. Let us take a look at HDFS and its architecture.

Set up Cloudera in windows

Big Data

How to set up Cloudera Quickstart VM on windows

ByMahesh Mogal December 8, 2019November 25, 2024

We need Hadoop environment for practice and setting that up on Linux is no fun. There is better alternative in using Cloudera virtual machine. Let me show you how can we set up and use Cloudera quick-start VM to get hands on practice for Hadoop.

pivot table - Transpose rows to columns in Hive

Hive

Pivot rows to columns in Hive

ByMahesh Mogal December 7, 2019November 25, 2024

There are multiple use cases when we need to transpose/pivot table and Hive does not provide us with easy function to do so. Let me show you workaround for how to pivot table in hive.

collect set and collect list in Apache Hive

Hive

Hive – What is the difference between Collect Set and Collect List

ByMahesh Mogal December 7, 2019November 25, 2024

There are many advanced aggregate functions in hive. Lets take a look at look at collect_set and collect_list and how can we use them effectively.

hive Aggregations grouping set, role up and cube

Hive

Hive – Advanced Aggregations with Grouping sets, Rollup and cube

ByMahesh Mogal December 7, 2019November 25, 2024

In this blog, we will take look at another set of advanced aggregation functions in hive.

Sqoop Import data to HDFS

Sqoop

Apache Sqoop: Import Data to HDFS – Part 2

ByMahesh Mogal December 7, 2019November 25, 2024

In second part of sqoop import, We will learn additional parameters for using Import more effectively.

Apache Sqoop Import data to HDFS

Sqoop

Apache Sqoop – Import data to HDFS

ByMahesh Mogal December 7, 2019November 25, 2024

In this blog, we will learn how to use sqoop import command and its different parameters to move data to HDFS.

Featured image for Sqoop Intro

Sqoop

Apache Sqoop Introduction

ByMahesh Mogal December 7, 2019November 25, 2024

One reason which made Hadoop ecosystem popular is its ability to process different forms of data. But not all data is present in HDFS i.e Hadoop Distributed File System. We have been using relational databases to store and process structured data for a long time. That is why a lot of data still resides in RDBMS…