How to Use Grouping Sets, Cube, and Rollup in PostgreSQL

January 19, 2023

In this blog post, we are going to see how to use grouping sets, cubes, and rollups in PostgreSQL to generate result sets. These are group-by options added in PostgreSQL 9.5 that allow us to aggregate data by different GROUP BY clauses at once. 

 

GROUPING SET

Let’s start with an example of how to use grouping sets:

1. Connect to psql terminal: 

/usr/pgsql-11/bin/psql -U postgres postgres

 

2. Create a table: 

CREATE TABLE my_test (

    id int,

    salary int,

    revised_salary int,

    old_salary  int,

    leave  int,

    joining_date date

);

 

3. Insert rows:

insert into my_test values(10,5000,4000,1000,5,'2019-11-15');

insert into my_test values(10,6000,9000,2000,1,'2019-11-16');

insert into my_test values(10,7000,3000,4400,2,'2019-11-17');

insert into my_test values(20,8000,4000,6600,6,'2019-11-15');

insert into my_test values(20,9000,9400,8800,10,'2019-11-16');

insert into my_test values(20,2000,7800,9400,23,'2019-11-17');

insert into my_test values(30,4400,6600,4200,44,'2019-11-15');

insert into my_test values(30,1500,3600,4300,66,'2019-11-16');

insert into my_test values(30,2000,2600,4500,77,'2019-11-17');

 

4. Use the SELECT query to see the rows: 

postgres=# select * from my_test ;

 id | salary | revised_salary | old_salary | leave | joining_date 

----+--------+----------------+------------+-------+--------------

 10 |   5000 |           4000 |       1000 |     5 | 2019-11-15

 10 |   6000 |           9000 |       2000 |     1 | 2019-11-16

 10 |   7000 |           3000 |       4400 |     2 | 2019-11-17

 20 |   8000 |           4000 |       6600 |     6 | 2019-11-15

 20 |   9000 |           9400 |       8800 |    10 | 2019-11-16

 20 |   2000 |           7800 |       9400 |    23 | 2019-11-17

 30 |   4400 |           6600 |       4200 |    44 | 2019-11-15

 30 |   1500 |           3600 |       4300 |    66 | 2019-11-16

 30 |   2000 |           2600 |       4500 |    77 | 2019-11-17

(9 rows)



postgres=# 

 

Using GROUP BY on the dataset, the user can arrange identical data into groups.

Query A

In the example below, we are grouping the columns “salary” and “leave.” We are performing a sum operation on the “salary,” “revised_salary,” “old_salary,” and “total_leave” columns using aggregate function sum() and using the GROUP BY clause on the “id” column.

postgres=# select id, sum(salary) as total_salary,

sum(revised_salary) as total_revised_salary ,

sum(old_salary) as total_old_salary,

sum(leave) as total_leave  

from my_test 

group by id 

order by id;

 id | total_salary | total_revised_salary | total_old_salary | total_leave 

----+--------------+----------------------+------------------+------------------------

 10 |        18000 |          16000              |             7400      |           8

 20 |        19000 |          21200              |            24800     |          39

 30 |         7900  |          12800              |            13000     |         187

(3 rows)

 

Query B

Similarly, we can use GROUP BY on the “joining_date” column to generate a report: 

postgres=# select joining_date, sum(salary) as total_salary,

                   sum(revised_salary) as total_revised_salary,

                   sum(old_salary) as total_old_salary ,

                   sum(leave) as total_leave  

                   from my_test 

                   group by joining_date 

                   order by joining_date;

 joining_date | total_salary | total_revised_salary | total_old_salary | total_leave 

--------------+--------------+----------------------+------------------+----------------------------

 2019-11-15   |        17400 |                14600         |            11800     |          55

 2019-11-16   |        16500 |                22000         |            15100     |          77

 2019-11-17   |        11000 |                13400         |            18300     |         102

(3 rows)

 

Combining Query A and Query B

If we want to combine query A and query B, we would have had to write a complicated SQL query, which in turn would have meant that the table would have to scan twice. GROUPING SETS allow us to avoid such scenarios and can be used to specify multiple groupings in a single query.

This is how we can combine queries A and B using GROUPING SETS:

postgres=# select id,joining_date,sum(salary) as total_salary,

sum(revised_salary) as total_revised_salary ,

sum(old_salary) as total_old_salary ,

sum(leave) as total_leave  

from my_test 

group by grouping sets ( id, joining_date) 

order by id;

 id | joining_date | total_salary | total_revised_salary | total_old_salary | total_leave 

----+--------------+--------------+----------------------+------------------+----------------------------

 10 |                     |      18000  |          16000               |             7400      |           8

 20 |                     |      19000  |          21200               |            24800     |          39

 30 |                     |      7900    |         12800                |            13000     |         187

      | 2019-11-16  |     16500   |         22000                |            15100     |          77

      | 2019-11-15  |      7400   |         14600                 |            11800     |          55

      |  2019-11-17 |     11000  |         13400                 |            18300     |         102

(6 rows)

 

We can also use empty parentheses (), which gives us the total value:

postgres=# select id,joining_date,sum(salary) as total_salary 

from my_test 

group by grouping sets ( id, joining_date,()) 

order by id;

 id | joining_date | total_salary 

----+--------------+--------------

 10 |                      |        18000

 20 |                      |        19000

 30 |                      |         7900

      |                      |        44900  

      | 2019-11-16   |        16500

      | 2019-11-15   |        17400

      | 2019-11-17   |        11000

(7 rows)

 

 Here we get another row which contains the total of sum(id) — i.e., 180000 + 190000 + 7900 = 44900. 

 

CUBE 

CUBE is used mostly for reporting purposes. With the help of CUBE, the user can extend the functionality of GROUP BY clauses by calculating total values (the sum at each level). 

CUBE in the GROUP BY clause is used to give all the possible combinations. CUBE essentially contains every possible rollup scenario for each node (as we will see in the next section). 

CUBE(X,Y,Z) will create grouping sets like this: 

(X,Y,Z)

(  ,Y,Z)

(X,Y, )

(X, ,Y)

(X)

(Y)

(Z)

()

 

Example

We want a query which is having GROUP BY all possible combination of “id” and “joining_date” columns of table “my_test”:

postgres=# select id,joining_date,sum(salary) as total_salary

      ,sum(revised_salary) as total_revised_salary ,

      sum(old_salary) as total_old_salary ,

      sum(leave) as total_leave 

        from my_test 

      group by cube ( id, joining_date) 

      order by id;

 id | joining_date | total_salary | total_revised_salary | total_old_salary | total_leave 

----+--------------+--------------+----------------------+------------------+-------------

 10 |                      |        18000 |             16000          |             7400      |           8

 10 | 2019-11-17   |         7000 |                 3000         |             4400      |           2

 10 | 2019-11-16   |         6000 |                 9000         |             2000      |           1

 10 | 2019-11-15   |         5000 |                 4000         |             1000      |           5

 20 | 2019-11-16   |         9000 |                 9400         |             8800      |          10

 20 |                      |        19000 |                21200       |            24800     |          39

 20 | 2019-11-15   |         8000 |                 4000         |             6600      |           6

 20 | 2019-11-17   |         2000 |                 7800         |             9400      |          23

 30 | 2019-11-15   |         4400 |                 6600         |             4200      |          44

 30 |                      |         7900 |                12800        |            13000     |         187

 30 | 2019-11-16   |         1500 |                 3600         |             4300      |          66

 30 | 2019-11-17   |         2000 |                 2600         |             4500      |          77

     | 2019-11-17    |        11000 |                13400       |            18300     |         102

     | 2019-11-16    |        16500 |                22000       |            15100     |          77

     | 2019-11-15    |        17400 |                14600       |            11800     |          55

     |                       |        44900 |                50000       |            45200     |         234

(16 rows)

 

Rollup

Just like CUBE, ROLLUP is also used primarily for reporting purposes. ROLLUP is an analytical function used for bidirectional folding/calculation of data in a table, which creates a hierarchical rollup.

Example

In this example, using ROLLUP(id, joining_date), the sum(sal) is shown initially grouped by “joining_date,” and then the same sum(sal) is shown grouped by “id.”

ROLLUP(id,joining_date) will generate GROUPING SETS like this: 

(id,joining_date)

(id)

()



postgres=# select id,joining_date,sum(salary) as total_salary,

sum(revised_salary) as total_revised_salary ,

sum(old_salary) as total_old_salary ,

sum(leave) as total_leave 

 from my_test 

group by rollup ( id, joining_date)  order by 1, 2;

 id | joining_date | total_salary | total_revised_salary | total_old_salary | total_leave 

----+--------------+--------------+----------------------+------------------+-------------

 10 | 2019-11-15   |         5000 |                 4000 |             1000 |           5

 10 | 2019-11-16   |         6000 |                 9000 |             2000 |           1

 10 | 2019-11-17   |         7000 |                 3000 |             4400 |           2

 10 |                      |       18000 |               16000 |             7400 |           8

 20 | 2019-11-15   |         8000 |                 4000 |             6600 |           6

 20 | 2019-11-16   |         9000 |                 9400 |             8800 |          10

 20 | 2019-11-17   |         2000 |                 7800 |             9400 |          23

 20 |                      |       19000 |               21200 |            24800 |          39

 30 | 2019-11-15   |         4400 |                 6600 |             4200 |          44

 30 | 2019-11-16   |         1500 |                 3600 |             4300 |          66

 30 | 2019-11-17   |         2000 |                 2600 |             4500 |          77

 30 |                      |         7900 |               12800 |           13000 |         187

      |                      |       44900 |               50000 |           45200 |         234

(13 rows)

 

Hope it helps!

 

Share this

More Blogs