In this blog post, we are going to see how to use grouping sets, cubes, and rollups in PostgreSQL to generate result sets. These are group-by options added in PostgreSQL 9.5 that allow us to aggregate data by different GROUP BY clauses at once.
GROUPING SET
Let’s start with an example of how to use grouping sets:
1. Connect to psql terminal:
/usr/pgsql-11/bin/psql -U postgres postgres
2. Create a table:
CREATE TABLE my_test (
id int,
salary int,
revised_salary int,
old_salary int,
leave int,
joining_date date
);
3. Insert rows:
insert into my_test values(10,5000,4000,1000,5,'2019-11-15');
insert into my_test values(10,6000,9000,2000,1,'2019-11-16');
insert into my_test values(10,7000,3000,4400,2,'2019-11-17');
insert into my_test values(20,8000,4000,6600,6,'2019-11-15');
insert into my_test values(20,9000,9400,8800,10,'2019-11-16');
insert into my_test values(20,2000,7800,9400,23,'2019-11-17');
insert into my_test values(30,4400,6600,4200,44,'2019-11-15');
insert into my_test values(30,1500,3600,4300,66,'2019-11-16');
insert into my_test values(30,2000,2600,4500,77,'2019-11-17');
4. Use the SELECT query to see the rows:
postgres=# select * from my_test ;
id | salary | revised_salary | old_salary | leave | joining_date
----+--------+----------------+------------+-------+--------------
10 | 5000 | 4000 | 1000 | 5 | 2019-11-15
10 | 6000 | 9000 | 2000 | 1 | 2019-11-16
10 | 7000 | 3000 | 4400 | 2 | 2019-11-17
20 | 8000 | 4000 | 6600 | 6 | 2019-11-15
20 | 9000 | 9400 | 8800 | 10 | 2019-11-16
20 | 2000 | 7800 | 9400 | 23 | 2019-11-17
30 | 4400 | 6600 | 4200 | 44 | 2019-11-15
30 | 1500 | 3600 | 4300 | 66 | 2019-11-16
30 | 2000 | 2600 | 4500 | 77 | 2019-11-17
(9 rows)
postgres=#
Using GROUP BY on the dataset, the user can arrange identical data into groups.
Query A
In the example below, we are grouping the columns “salary” and “leave.” We are performing a sum operation on the “salary,” “revised_salary,” “old_salary,” and “total_leave” columns using aggregate function sum() and using the GROUP BY clause on the “id” column.
postgres=# select id, sum(salary) as total_salary,
sum(revised_salary) as total_revised_salary ,
sum(old_salary) as total_old_salary,
sum(leave) as total_leave
from my_test
group by id
order by id;
id | total_salary | total_revised_salary | total_old_salary | total_leave
----+--------------+----------------------+------------------+------------------------
10 | 18000 | 16000 | 7400 | 8
20 | 19000 | 21200 | 24800 | 39
30 | 7900 | 12800 | 13000 | 187
(3 rows)
Query B
Similarly, we can use GROUP BY on the “joining_date” column to generate a report:
postgres=# select joining_date, sum(salary) as total_salary,
sum(revised_salary) as total_revised_salary,
sum(old_salary) as total_old_salary ,
sum(leave) as total_leave
from my_test
group by joining_date
order by joining_date;
joining_date | total_salary | total_revised_salary | total_old_salary | total_leave
--------------+--------------+----------------------+------------------+----------------------------
2019-11-15 | 17400 | 14600 | 11800 | 55
2019-11-16 | 16500 | 22000 | 15100 | 77
2019-11-17 | 11000 | 13400 | 18300 | 102
(3 rows)
Combining Query A and Query B
If we want to combine query A and query B, we would have had to write a complicated SQL query, which in turn would have meant that the table would have to scan twice. GROUPING SETS allow us to avoid such scenarios and can be used to specify multiple groupings in a single query.
This is how we can combine queries A and B using GROUPING SETS:
postgres=# select id,joining_date,sum(salary) as total_salary,
sum(revised_salary) as total_revised_salary ,
sum(old_salary) as total_old_salary ,
sum(leave) as total_leave
from my_test
group by grouping sets ( id, joining_date)
order by id;
id | joining_date | total_salary | total_revised_salary | total_old_salary | total_leave
----+--------------+--------------+----------------------+------------------+----------------------------
10 | | 18000 | 16000 | 7400 | 8
20 | | 19000 | 21200 | 24800 | 39
30 | | 7900 | 12800 | 13000 | 187
| 2019-11-16 | 16500 | 22000 | 15100 | 77
| 2019-11-15 | 7400 | 14600 | 11800 | 55
| 2019-11-17 | 11000 | 13400 | 18300 | 102
(6 rows)
We can also use empty parentheses (), which gives us the total value:
postgres=# select id,joining_date,sum(salary) as total_salary
from my_test
group by grouping sets ( id, joining_date,())
order by id;
id | joining_date | total_salary
----+--------------+--------------
10 | | 18000
20 | | 19000
30 | | 7900
| | 44900
| 2019-11-16 | 16500
| 2019-11-15 | 17400
| 2019-11-17 | 11000
(7 rows)
Here we get another row which contains the total of sum(id) — i.e., 180000 + 190000 + 7900 = 44900.
CUBE
CUBE is used mostly for reporting purposes. With the help of CUBE, the user can extend the functionality of GROUP BY clauses by calculating total values (the sum at each level).
CUBE in the GROUP BY clause is used to give all the possible combinations. CUBE essentially contains every possible rollup scenario for each node (as we will see in the next section).
CUBE(X,Y,Z) will create grouping sets like this:
(X,Y,Z)
( ,Y,Z)
(X,Y, )
(X, ,Y)
(X)
(Y)
(Z)
()
Example
We want a query which is having GROUP BY all possible combination of “id” and “joining_date” columns of table “my_test”:
postgres=# select id,joining_date,sum(salary) as total_salary
,sum(revised_salary) as total_revised_salary ,
sum(old_salary) as total_old_salary ,
sum(leave) as total_leave
from my_test
group by cube ( id, joining_date)
order by id;
id | joining_date | total_salary | total_revised_salary | total_old_salary | total_leave
----+--------------+--------------+----------------------+------------------+-------------
10 | | 18000 | 16000 | 7400 | 8
10 | 2019-11-17 | 7000 | 3000 | 4400 | 2
10 | 2019-11-16 | 6000 | 9000 | 2000 | 1
10 | 2019-11-15 | 5000 | 4000 | 1000 | 5
20 | 2019-11-16 | 9000 | 9400 | 8800 | 10
20 | | 19000 | 21200 | 24800 | 39
20 | 2019-11-15 | 8000 | 4000 | 6600 | 6
20 | 2019-11-17 | 2000 | 7800 | 9400 | 23
30 | 2019-11-15 | 4400 | 6600 | 4200 | 44
30 | | 7900 | 12800 | 13000 | 187
30 | 2019-11-16 | 1500 | 3600 | 4300 | 66
30 | 2019-11-17 | 2000 | 2600 | 4500 | 77
| 2019-11-17 | 11000 | 13400 | 18300 | 102
| 2019-11-16 | 16500 | 22000 | 15100 | 77
| 2019-11-15 | 17400 | 14600 | 11800 | 55
| | 44900 | 50000 | 45200 | 234
(16 rows)
Rollup
Just like CUBE, ROLLUP is also used primarily for reporting purposes. ROLLUP is an analytical function used for bidirectional folding/calculation of data in a table, which creates a hierarchical rollup.
Example
In this example, using ROLLUP(id, joining_date), the sum(sal) is shown initially grouped by “joining_date,” and then the same sum(sal) is shown grouped by “id.”
ROLLUP(id,joining_date) will generate GROUPING SETS like this:
(id,joining_date)
(id)
()
postgres=# select id,joining_date,sum(salary) as total_salary,
sum(revised_salary) as total_revised_salary ,
sum(old_salary) as total_old_salary ,
sum(leave) as total_leave
from my_test
group by rollup ( id, joining_date) order by 1, 2;
id | joining_date | total_salary | total_revised_salary | total_old_salary | total_leave
----+--------------+--------------+----------------------+------------------+-------------
10 | 2019-11-15 | 5000 | 4000 | 1000 | 5
10 | 2019-11-16 | 6000 | 9000 | 2000 | 1
10 | 2019-11-17 | 7000 | 3000 | 4400 | 2
10 | | 18000 | 16000 | 7400 | 8
20 | 2019-11-15 | 8000 | 4000 | 6600 | 6
20 | 2019-11-16 | 9000 | 9400 | 8800 | 10
20 | 2019-11-17 | 2000 | 7800 | 9400 | 23
20 | | 19000 | 21200 | 24800 | 39
30 | 2019-11-15 | 4400 | 6600 | 4200 | 44
30 | 2019-11-16 | 1500 | 3600 | 4300 | 66
30 | 2019-11-17 | 2000 | 2600 | 4500 | 77
30 | | 7900 | 12800 | 13000 | 187
| | 44900 | 50000 | 45200 | 234
(13 rows)
Hope it helps!