This article offers tips and shortcuts for performing useful tasks in Postgres.
|
Whole-row references
Have you ever tried running the following statement?
SELECT my_table FROM my_table;
This may look strange, but what it does is return all columns as a row type in a single column. Now why would you want to do that? Well, it’s very likely you’ve referenced the table in such a manner already:
SELECT table_a.column, table_b.column
FROM table_a
INNER JOIN table_b ON table_a.id = table_b.aid;
What this does is reference the whole row, but then ask for just a single column from that set. But that's just the common case. How about this?
SELECT data, backup_data
FROM data
FULL JOIN backup_data ON data = backup_data
WHERE data IS NULL OR backup_data IS NULL;
Here we have a table and its backup. What if we want to see the difference between them, either because we want to see what has changed since the backup or to see if we have lost any historical rows that we might need to restore from our backup?
As a quick demonstration, we'll create table and insert three rows:
postgres=# CREATE TABLE data (id serial, person text, country text);
CREATE TABLE
postgres=# INSERT INTO data (person, country)
VALUES ('Tim','France'),('Dieter','Germany'),('Marcus','Finland');
INSERT 0 3
Now, let’s create an identical copy of the table and copy the same data into it:
postgres=# CREATE TABLE backup_data (id serial, person text, country text);
CREATE TABLE
postgres=# INSERT INTO backup_data SELECT * FROM data;
INSERT 0 3
We want the tables to be different, so we’ll delete one row from the original table and add a new row to the backup table:
postgres=# DELETE FROM data WHERE id = 2;
DELETE 1
postgres=# INSERT INTO data (person, country)
VALUES ('Roberto','Italy');
INSERT 0 1
Finally, let's look what happens if we run the query to compare the tables:
postgres=# SELECT data, backup_data
postgres-# FROM data
postgres-# FULL JOIN backup_data ON data = backup_data
postgres-# WHERE data IS NULL OR backup_data IS NULL;
data | backup_data
-------------------+--------------------
| (2,Dieter,Germany)
(4,Roberto,Italy) |
(2 rows)
We can see here that the backup_data table contains a row that is missing from the data table, and vice versa.
There is also this:
postgres=# SELECT to_jsonb(data) FROM data;
to_jsonb
-----------------------------------------------------
{"id": 1, "person": "Tim", "country": "France"}
{"id": 3, "person": "Marcus", "country": "Finland"}
{"id": 4, "person": "Roberto", "country": "Italy"}
(3 rows)
We've just turned all our data into JSON!
Comparing a selection of columns
This is a very handy trick that can be used to make queries shorter and easier to read.
Say we have the following query:
SELECT country, company, department
FROM suppliers
WHERE country = 'Australia'
AND company = 'Skynet'
AND department = 'Robotics';
We can get rid of those ANDs:
SELECT country, company, department
FROM suppliers
WHERE (country, company, department) = ('Australia','Skynet','Robotics');
We can also use IN to cater for OR conditions. If we adjust the original query:
SELECT country, company, department
FROM suppliers
WHERE department = 'Robotics'
AND (
(country = 'Australia'
AND company = 'Skynet')
OR
(country = 'Norway'
AND company = 'Nortech')
);
We can shorten this to:
SELECT country, company, department
FROM suppliers
WHERE department = 'Robotics'
AND (country, company) IN (('Australia','Skynet'),('Norway','Nortech'));
Hardcoded tables
Say you only had read access to the database and its tables, but you had a small set of data that you wanted to use to join with existing tables.
SELECT station, time_recorded, temperature
FROM weather_stations;
station | time_recorded | temperature
----------------+---------------------+-------------
Biggin_Hill_14 | 2020-02-02 13:02:44 | 22.4
Reigate_03 | 2020-02-02 16:05:12 | 20.9
Aberdeen_06 | 2020-02-02 15:52:49 | 8.5
Madrid_05 | 2020-02-02 14:05:27 | 30.1
(4 rows)
We want to get an idea of how warm or cold it is at each station, so we could construct this query:
SELECT station,
CASE
WHEN temperature <= 0 THEN 'freezing'
WHEN temperature < 10 THEN 'cold'
WHEN temperature < 18 THEN 'mild'
WHEN temperature < 30 THEN 'warm'
WHEN temperature < 36 THEN 'hot'
WHEN temperature >= 36 THEN 'scorching'
END AS temp_feels
FROM weather_stations;
If we just set up a pretend table to contain all this information, we can add more data that wouldn't be practical to emulate using the previous method, and to make it easier to work with, we can put it in a common table expression:
WITH temp_ranges (temp_range, feeling, colour) AS (
VALUES
('(,0]'::numrange, 'freezing', 'blue'),
('( 0,10)'::numrange, 'cold', 'white'),
('[10,18)'::numrange, 'mild', 'yellow'),
('[18,30)'::numrange, 'warm', 'orange'),
('[30,36)'::numrange, 'hot', 'red'),
('[36,)'::numrange, 'scorching', 'black')
)
SELECT ws.station, tr.feeling, tr.colour
FROM weather_stations ws
INNER JOIN temp_ranges tr ON ws.temperature <@ tr.temp_range;
Note: Those unfamiliar with range types may be confused by the "numrange" values and data type. This is one of several range types, and this one is for a range of numeric values. Round brackets mean exclusive; square brackets mean inclusive. So '(0,10]' means “from, but not including 0, and up to and including 10.” A missing value means any value before if it's the first value, or any value after if the second value.
Custom config parameters
Postgres comes with a wide range of parameters that allow you to configure all aspects of the database system, but you can also add your own and call them whatever you want, as long as you give them their own config class.
For example, you could add this to postgresql.conf:
config.cluster_type = 'staging'
And then access it with a SHOW command.
postgres=# SHOW config.cluster_type;
config.cluster_type
---------------------
staging
(1 row)
Note that these settings won't appear in the pg_settings catalog, or be output by SHOW ALL.
So why are we able to do this? And why can't we do it without providing a config prefix? Well, before PostgreSQL 9.2, there was a setting called custom_variable_classes, which took a list of classes that could be used by extensions for their own settings. You needed to add that extension's class to the list if you wanted to configure it in postgresql.conf. However, this requirement was removed in more recent versions, and you no longer need to explicitly declare them. Only built-in configuration parameters don't have prefixes, so any custom ones need to be prefixed or they will not be accepted.
As you can see by my example above, this could be handy for when you want to provide some kind of metadata about your cluster.
Booleans can stand alone
You have probably written queries like the following:
SELECT user, location, active
FROM subscriptions
WHERE active = true;
Did you know that you don't need that "= true"? You can write it like this:
WHERE active
This is because boolean values don't need to be compared to another boolean value as expressions return true or false anyway. If you want to indicate the opposite, you can write:
WHERE NOT active
It reads better too.
Convert column data types for free
Often, when changing the type of a table column with existing data in it, the whole table must be rewritten to store the data in its new data type. But there are many cases where this doesn't happen.
And here's how to find all of them:
SELECT
castsource::regtype::text,
array_agg(casttarget::regtype ORDER BY casttarget::regtype::text) casttargets
FROM pg_cast
WHERE castmethod = 'b'
GROUP BY 1
ORDER BY 1;
This statement will return a relatively small list of types and the set of types they can be converted to because they are "binary compatible." In this list you'll see that text, xml, char, and varchar are all interchangeable—as far as binary format is concerned. So if you have a table containing XML data in a text column, feel free to convert it without being penalized (note that if you have invalid XML in your data, Postgres will disallow it and tell you).
Find which rows belong to which underlying partition
You may have your data split into various partitions, but when you're selecting rows, what if you want to know which partition table each row comes from? It’s simple: just add tableoid::regclass to your SELECT clause. For example:
postgres=# SELECT tableoid::regclass, * FROM customers;
tableoid | id | name | country | subscribed
--------------+-----+----------------+----------------+------------
customers_de | 23 | Hilda Schumer | Germany | t
customers_uk | 432 | Geoff Branshaw | United Kingdom | t
customers_us | 815 | Brad Moony | USA | t
(3 rows)
This works because tableoid is a hidden system column that you just need to explicitly select in order to see. It returns the OID (Object Identifier) of the table that the row belongs to. If you cast this to the regclass type, it will return the table name.
Tables are types
Yes, you heard correct. Whenever you create a table, you are also effectively creating a new type. See for yourself:
CREATE TABLE books (isbn text, title text, rrp numeric(10,2));
We can use this table type when creating either another table, or as a function parameter or return type:
CREATE TABLE personal_favourites (book books, movie movies, song songs);
You would then enter information into it:
INSERT INTO personal_favourites (book)
VALUES (('0756404746','The Name of the Wind',9.99));
Or:
INSERT INTO personal_favourites (book.isbn, book.title, book.rrp)
VALUES ('0756404746','The Name of the Wind',9.99);
To get individual values out of the table value, you can select the column from the column:
SELECT (book).isbn, (book).title, (book).rrp
FROM personal_favourites;
Now I know what you're thinking: is that table containing a table type that contains types also a type? Well, yes, but let's not go into a demo of that, or we'll end up in a confusing Inception-style situation.
And, as I mentioned in the "whole-row references", you can convert the whole row to JSON, which returns everything the way you'd want it to:
postgres=# SELECT jsonb_pretty(to_jsonb(personal_favourites))
FROM personal_favourites;
jsonb_pretty
----------------------------------------------
{ +
"book": { +
"rrp": 9.99, +
"isbn": "0756404746", +
"title": "The Name of the Wind" +
}, +
"song": { +
"album": "Grace", +
"title": "This is our Last Goodbye",+
"artist": "Jeff Buckley" +
}, +
"movie": { +
"title": "Magnolia", +
"studio": "New Line Cinema", +
"release_date": "2000-03-24" +
} +
}
One could use such functionality to create schemas for JSON data to achieve NoSQL-like functionality, but with data that has a defined structure.
But wait, what if I want to store and query all my favorite books, songs, and movies rather than just one?
Well, that works too. Any type, including tables, can be turned into an array by adding [] after the data type name. Rather than re-create the table, let's just convert the columns to array types, and then add another book:
ALTER TABLE personal_favourites
ALTER COLUMN book TYPE books[] USING ARRAY[book];
ALTER TABLE personal_favourites
ALTER COLUMN movie TYPE movies[] USING ARRAY[movie];
ALTER TABLE personal_favourites
ALTER COLUMN song TYPE songs[] USING ARRAY[song];
And we'll add another book to the book array:
UPDATE personal_favourites
SET book = book || ('1408891468','Jonathan Strange and Mr Norrell',7.99)::books;
Now our result looks like this:
postgres=# SELECT jsonb_pretty(to_jsonb(personal_favourites))
FROM personal_favourites;
jsonb_pretty
--------------------------------------------------------
{ +
"book": [ +
{ +
"rrp": 9.99, +
"isbn": "0756404746", +
"title": "The Name of the Wind" +
}, +
{ +
"rrp": 7.99, +
"isbn": "1408891468", +
"title": "Jonathan Strange and Mr Norrell"+
} +
], +
"song": [ +
{ +
"album": "Grace", +
"title": "This is our Last Goodbye", +
"artist": "Jeff Buckley" +
} +
], +
"movie": [ +
{ +
"title": "Magnolia", +
"studio": "New Line Cinema", +
"release_date": "2000-03-24" +
} +
] +
}
The book value now contains an array of book objects, and without any changes to our query.