Pivot rows to columns in Hive
Hello everyone. In this article, we will learn how can we pivot rows to columns in the Hive.
Pivoting/transposing means we need to convert a row into columns. We need to do this to show a different view of data, to show aggregation performed on different granularity than which is present in the existing table.
Consider you have the following data from some company. It shows how much money they spent on each resource in each quarter for some projects in that company.
From this table, you want to show data like this.
We will see how we can use CASE statements and COLLECT_SET to get these results from an existing table.
Step I – Using Case statments
First, we can use case statements to transpose required rows to columns.
1 2 3 4 5 6 |
SELECT resource_id, CASE WHEN quarter='Q1' THEN amount END AS quarter_1, CASE WHEN quarter='Q2' THEN amount END AS quarter_2, CASE WHEN quarter='Q3' THEN amount END AS quarter_3, CASE WHEN quarter='Q4' THEN amount END AS quarter_4 FROM billing_info; |
This query will you give you result in the following way.
We can see that there are null values and each resource is having multiple records. This is expected as we have not grouped this on resource id yet and for each resource, there may not be an entry for each quarter.
Step II – Aggregating Data with COLLECT_SET
In this step, we will use COLLECT_SET and group records on basic of resource id to get the desired result.
1 2 3 4 5 6 7 8 9 10 |
SELECT resource_id,COLLECT_SET(quarter_1)[0] AS quarter_1_spends, COLLECT_SET(quarter_2)[0] AS quarter_2 spends,COLLECT_SET(quarter_3)[0] AS quarter_3_spends, COLLECT_SET(quarter_4)[0] AS quarter_4_spends FROM ( SELECT resource_id, CASE WHEN quarter='Q1' THEN amount END AS quarter_1, CASE WHEN quarter='Q2' THEN amount END AS quarter_2, CASE WHEN quarter='Q3' THEN amount END AS quarter_3, CASE WHEN quarter='Q4' THEN amount END AS quarter_4 FROM billing_info)tbl1 GROUP BY resource_id; |
This is almost like our desired result.
Step III – Transposing Result with Another Case Statement
We can prettify our results in step two by modifying our query with CASE statements. This step is optional and it depends on in what form do you want to achieve your final results.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
SELECT resource_id, CASE WHEN COLLECT_SET(quarter_1)[0] IS NULL THEN 0 ELSE COLLECT_SET(quarter_1)[0] END AS quarter_1_spends, CASE WHEN COLLECT_SET(quarter_2)[0] IS NULL THEN 0 ELSE COLLECT_SET(quarter_2)[0] END AS quarter_2_spends, CASE WHEN COLLECT_SET(quarter_3)[0] IS NULL THEN 0 ELSE COLLECT_SET(quarter_3)[0] END AS quarter_3_spends, CASE WHEN COLLECT_SET(quarter_4)[0] IS NULL THEN 0 ELSE COLLECT_SET(quarter_4)[0] END AS quarter_4_spends FROM ( SELECT resource_id, CASE WHEN quarter='Q1' THEN amount END AS quarter_1, CASE WHEN quarter='Q2' THEN amount END AS quarter_2, CASE WHEN quarter='Q3' THEN amount END AS quarter_3, CASE WHEN quarter='Q4' THEN amount END AS quarter_4 FROM billing_info)tbl1 GROUP BY resource_id; |
This will be our output.
AS we can see, by using CASE and COLLECT_SET we can easily convert / pivot rows into columns in Hive.
You can learn more about COLLECT_SET in Hive at COLLECT_SET AND COLLECT_LIST IN HIVE
Nice Explaination
This is brilliant and performs exactly what I needed!