PostgreSQL 16 Update: Grouping Digits in SQL

October 17, 2023

PostgreSQL 16 was recently released. This is the story of a feature.

One of the minor new features in PostgreSQL 16 that I am excited about is the ability to group digits in numeric literals by separating them with underscores, like

SELECT 1_000_000; SELECT * FROM tbl WHERE id > 10_000_000;

It also works with non-integer literals:

SELECT * FROM tbl WHERE id < 3.1415_9265;

This syntax is pretty standard in contemporary programming languages. (As an exception, C and C++ use apostrophes, like 1'000'000.)

I noticed the need for something like this a few years ago. I (and others) sometimes do simplistic benchmarks that involve generating or iterating over like a few million rows and then seeing how long that runs. So it might start like:

INSERT INTO tbl SELECT generate_series(1, 1000000);

As computers get ever faster, the number of rows you will need to get any meaningful measurements gets ever bigger. So before long you might be trying

INSERT INTO tbl SELECT generate_series(1, 100000000);

and then it gets hard to read and edit, and you might make mistakes, and the whole thing ends up wasting time and effort.

I happened to come across a proposal in the Go language, which discussed adding the same feature to that language. That discussion thankfully already worked out various details and subtleties, so I didn’t have to start the research from scratch.

So I began to think that I would like to bring this syntax to PostgreSQL.

One approach would be to write a patch and propose it to the PostgreSQL hackers list. But that would have been risky, because the hackers community is generally wary of extending core SQL syntax in ways to might conflict with the future evolution of the SQL standard. (I am myself often supportive of that wariness.)

So I planned the other approach: Get this into the SQL standard first.

At the time, I was a fairly new participant in the SQL standard working group, and such a change might have been an ambitious first project. Fortunately, another participant was just starting to work on the same issue for the GQL standard (which is managed by the same working group), so I jumped right in and we collaborated and after a few months of back and forth got this change accepted into both the SQL and GQL drafts in August 2020.

A few months later, I started to implement this in PostgreSQL, but immediately ran into a problem. To my surprise, I found that at that time, PostgreSQL allowed having numeric constants and identifiers adjacent without separating whitespace. For example, this was accepted:

SELECT 123abc;

This would parse the same as

SELECT 123 abc;

which in turn has the same meaning as

SELECT 123 AS abc;

I don’t know the history of that. But I know that the ability to omit the AS was added later. So maybe before that, this syntax was accepted but later failed with a parse error for other reasons, and so it was not a big deal.

What does this have to do with my project? Well, an underscore is a valid start of an identifier, and so

SELECT 1_000;

was already valid and would parse as

SELECT 1 _000;

meaning

SELECT 1 AS "_000";


Read the rest of the blog here.

Share this