How to Use Postgres Unions

January 20, 2023

Whether we remember it or not, almost all of us learned set theory in elementary school. We drew Venn diagrams with overlapping circles and learned to articulate what the overlap means, making use of a strange set of operators to formulate equations around them.

 

 

With SQL, there are occasions where we might want to use these concepts from set theory, whether it’s to concatenate two data sets or to extract information about two sets’ relationships.  For this, PostgreSQL provides syntax for set operations: UNION, INTERSECT, and EXCEPT.  In this article, we will focus on the UNION operator.

 

How UNION works

A UNION will join two sets of data to create a larger set. The end result of A UNION B is the sum total of rows in both A and B, after having removed duplicate rows. Contrast this with a JOIN, which takes the columns in A and joins them with the columns in B to create a wider table, with the number of rows in the new table less than or equal to the shorter of the two tables A and B.

An example:

Suppose we have the tables “teacher” and “student”:


postgres=# SELECT * FROM teacher;

 id |    name    |     birthdate      

----+------------+---------------------

  1 | John Smith | 1980-05-03 00:00:00

  2 | Jane Brown | 1970-04-24 00:00:00

(2 rows)


postgres=# SELECT * FROM student;

 id |       name       |     birthdate      

----+------------------+---------------------

  1 | Jimmy Peters     | 1990-01-14 00:00:00

  1 | Janice Patterson | 1993-08-27 00:00:00

(2 rows)

 

We can get the full roster of teachers and students with a UNION:


postgres=# SELECT name FROM teacher UNION SELECT id, name, birthdate FROM student;

 id |       name       |     birthdate      

----+------------------+---------------------

  2 | Jane Brown       | 1970-04-24 00:00:00

  1 | John Smith       | 1980-05-03 00:00:00

  1 | Jimmy Peters     | 1990-01-14 00:00:00

  2 | Janice Patterson | 1993-08-27 00:00:00

(4 rows)

 

If we wanted to get the average age of teachers and average age of students, we would do the following:


postgres=# SELECT avg(age(birthdate)) FROM teacher;

               avg                   

----------------------------------

 44 years 6 mons 12 days 30:53:15

(1 row)


postgres=# SELECT avg(age(birthdate)) FROM student;

               avg                    

-----------------------------------

 27 years 11 mons 36 days 30:53:23

(1 row)

 

But what if we wanted to get the average age of the entire school?  We could do this with UNION:


postgres=# SELECT avg(age(birthdate)) FROM (SELECT birthdate FROM student UNION SELECT birthdate FROM teacher);

               avg                   

----------------------------------

 36 years 2 mons 39 days 30:54:50

(1 row)

 

Caveats

Note that for UNION to work, the number of columns selected from teachers needs to match the number of columns selected from students.  Otherwise, an error occurs:


postgres=# SELECT id, name, birthdate FROM teacher UNION SELECT name FROM student;

ERROR:  each UNION query must have the same number of columns

LINE 1: ...ECT id, name, birthdate FROM teacher UNION SELECT name FROM ...

 

Users also need to be aware that the data types of the columns need to match up:


postgres=# SELECT name, birthdate FROM teacher UNION SELECT birthdate, name FROM student;

ERROR:  UNION types text and timestamp without time zone cannot be matched

LINE 1: SELECT name, birthdate FROM teacher UNION SELECT birthdate, ...

                                                         ^

postgres=# SELECT id, name FROM teacher UNION SELECT name, id FROM student;

ERROR:  UNION types integer and text cannot be matched

LINE 1: SELECT id, name FROM teacher UNION SELECT name, id FROM stud...

 

Finally, users need to be aware that the column order matters:


postgres=# SELECT id::text, name FROM teacher UNION SELECT name, id::text FROM student;

        id        |    name    

------------------+------------

 Janice Patterson | 2

 Jimmy Peters     | 1

 2                | Jane Brown

 1                | John Smith

(4 rows)

 

UNION ALL

Suppose the school hires a new teacher named Joe Washington, and also a new student named Joe Washington moves into town:


postgres=# select * from teacher;

 id |      name      |      birthdate      

----+----------------+---------------------

  1 | John Smith     | 1980-05-03 00:00:00

  2 | Jane Brown     | 1970-04-24 00:00:00

  3 | Joe Washington | 1950-11-12 00:00:00

(3 rows)


postgres=# select * from student;

 id |       name       |      birthdate      

----+------------------+---------------------

  1 | Jimmy Peters     | 1990-01-14 00:00:00

  2 | Janice Patterson | 1993-08-27 00:00:00

  3 | Joe Washington   | 1992-11-12 00:00:00

(3 rows)

 

If the administrators wish to pull a full roster, the results may be incorrect because UNION removes duplicates:


postgres=# SELECT name FROM teacher UNION SELECT name FROM student;

       name       

------------------

 Janice Patterson

 Joe Washington

 Jane Brown

 John Smith

 Jimmy Peters

(5 rows)

 

To overcome this, UNION ALL can be used to preserve duplicate rows:


postgres=# SELECT name FROM teacher UNION ALL SELECT name FROM student;

       name       

------------------

 John Smith

 Jane Brown

 Joe Washington

 Jimmy Peters

 Janice Patterson

 Joe Washington

(6 rows)

 

As you can see, UNION and UNION ALL can be used to aggregate data from multiple queries and give developers and DBAs the ability to merge sets of heterogeneous data.

 

Share this

More Blogs