Hello everyone. In this article, we will learn how can we pivot rows to columns in Hive.

Pivoting/transposing means we need to convert a row into columns. We need to do this to show different view of data, to show aggregation performed on different granularity than which is present in the existing table.

Consider you have following data from some company. It shows how much money they spent on each resource in each quarter for some project in that company.


Input data


From this table, you want to show data like this.


Expected result after we pivot row into columns


We will see how we can use CASE statements and COLLECT_SET to get these result from existing table.

Step I:

First, we can use case statements to transpose required rows to columns.

This query will you give you result in following way.


Result of step I


We can see that there are null values and each resource is having multiple records. This is expected as we have not grouped this on resource id yet and for each resource, there may not be an entry for each quarter.


Step II:

In this step, we will use COLLECT_SET and group records on basic of resource id to get the desired result.


Result of step 2


This is almost like our desired result.

Step III:

We can prettify our results in step two by modifying our query with CASE statements. This step is optional and it depends on in what form do you want to achieve your final results.

This will be our output.


result of step III


AS we can see, by using CASE and COLLECT_SET we can easily convert / pivot rows into columns in Hive.

You can learn more about COLLECT_SET in Hive at COLLECT_SET AND COLLECT_LIST IN HIVE

  • 1

This Post Has 10 Comments

  1. Nice article. I needed this

  2. Thanks you so much for this nice article.

  3. Awesome article .. Helped us a lot.

    1. Thank you very much

  4. If instead of Quarters I’m looking to create columns for year-month combinations. Is there a way without needing to create a case when for each record? I’m going to end up with 48 columns. Just trying to avoid needing to write that many case whens.

    1. Hi Pawan, It is not possible to write these case statements dynamically using the above approach. We might need to write UDF for this.

  5. Hi,
    what are the benefit of using COLLECT_SET instead of using a query with MAX like the query below?

    “SELECT resource_id,
    MAX(CASE WHEN quarter=’Q1′ THEN amount ELSE 0) END AS quarter_1,
    MAX(CASE WHEN quarter=’Q2′ THEN amount ELSE 0) END AS quarter_2,
    MAX(CASE WHEN quarter=’Q3′ THEN amount ELSE 0) END AS quarter_3,
    MAX(CASE WHEN quarter=’Q4′ THEN amount ELSE 0) END AS quarter_4
    FROM billing_info
    GROUP BY resource_id”


    1. Hi Gaetano, You can use MAX as well.

    1. “amount” is column name. It is not function in hive

Leave a Reply

Close Menu