PG Phriday: Postgres 12 Revs up Vacuum

September 06, 2019

Any long-time user of Postgres is likely familiar with VACUUM, the process that ensures old data tuples are identified and reused to prevent unchecked database bloat. This critical element of maintenance has slowly, but surely, undergone incremental enhancements with each subsequent release.

Postgres 12 is no different in this regard. In fact, there are two notable changes in the upcoming release that provide long-awaited quality of life updates.

Setting the Stage

Before we begin, it’s always good to have test data available for demonstration purposes. For the sake of this particular battery, let’s just use a regular initialization of pgbench, which will supply us with four tables of varying size and a rudimentary transactional throughput engine.


#> createdb pgbench

#> pgbench --initialize --scale=100 pgbench

Each unit of scale in pgbench is worth 100k rows in the pgbench_accounts table, so our largest table should be about 10M rows. It’s not a lot, but it’s more than enough to illustrate the new VACUUM functionality.

A Quick Overview of VACUUM Basics

For those who are unfamiliar with the lower-level mechanisms and activity of a VACUUM process as it trudges through ceaseless data pages on countless tables, let’s have a short discussion on a couple of relevant key points.

Postgres maintains a map of free space for each table that is updated every time VACUUM runs. Rows referenced in that map can be replaced by new inserts or updates. It’s not just the table itself that requires this kind of regular occasional scanning, but each index as well. Each individual index requires its own scan, as updates to the indexed column will be distributed irregularly over time.

This kind of repeated scanning can—in the right circumstances—drastically increase VACUUM duration. But what if we could skip that step? Well, in Postgres 12 we can.

Skipping Pesky Indexes

To see what benefit we might gain from skipping the step of vacuuming an index, let’s use pgbench to execute a two-minute load test. This will invoke one solid minute of various read and write traffic to all of the tables in the pgbench database.


#> pgbench --no-vacuum --time=60 --client=2 --jobs=2 pgbench

By providing the --no-vacuum flag, the tables will not be vacuumed before the load test. We don’t want pgbench tainting our results by artificially injecting vacuum activity, do we?

Once the test is complete, we can invoke a manual VACUUM as we normally might, but with a VERBOSE flag so we can see all the work being done.


pgbench=# VACUUM (VERBOSE) pgbench_accounts ;

INFO:  vacuuming "public.pgbench_accounts"
INFO:  scanned index "pgbench_accounts_pkey" to remove 34367 row versions
DETAIL:  CPU: user: 1.28 s, system: 1.40 s, elapsed: 3.47 s
INFO:  "pgbench_accounts": removed 34367 row versions in 34365 pages
DETAIL:  CPU: user: 0.57 s, system: 1.82 s, elapsed: 3.49 s
INFO:  index "pgbench_accounts_pkey" now contains 10000000 row versions in 27422 pages
DETAIL:  34367 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  "pgbench_accounts": found 40098 removable, 10000000 nonremovable row versions in 164619 out of 164619 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 135349
There were 8118 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 4.97 s, system: 9.62 s, elapsed: 21.58 s.
VACUUM

Time: 21599.436 ms (00:21.599)

As we can see, Postgres details how much time elapsed for each portion of the VACUUM. The pgbench_accounts table has only a primary key, so we see the index scan followed by a table scan.

Then we “reset” the state of the table by running the same two-minute pgbench test to create a bunch of dirty pages. Then we repeat the VACUUM, but with the new INDEX_CLEANUP parameter set to False. This will cause Postgres to skip the index cleanup phase. Here is what we see:


pgbench=# VACUUM (INDEX_CLEANUP False, VERBOSE) pgbench_accounts ;

INFO:  vacuuming "public.pgbench_accounts"
INFO:  "pgbench_accounts": found 41591 removable, 10000000 nonremovable row versions in 164619 out of 164619 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 85811
There were 4249 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 2.72 s, system: 5.20 s, elapsed: 12.42 s.
VACUUM

Time: 12439.956 ms (00:12.440)

Note that we no longer observe the debugging information about cleaning up the index. As a bonus, the execution time was also much shorter.

Skipping Even More Indexes

Is this index-scanning phenomenon of VACUUM really sequential in nature? Thankfully we can test that directly by adding another index to the large pgbench_accounts table. It doesn’t even matter which column we choose:


CREATE INDEX idx_accounts_bid ON pgbench_accounts (bid);

After repeating the load test, our standard VACUUM looks like this:


pgbench=# VACUUM (INDEX_CLEANUP True, VERBOSE) pgbench_accounts ;

INFO:  vacuuming "public.pgbench_accounts"
INFO:  scanned index "pgbench_accounts_pkey" to remove 34348 row versions
DETAIL:  CPU: user: 1.43 s, system: 1.44 s, elapsed: 3.67 s
INFO:  scanned index "idx_accounts_bid" to remove 34348 row versions
DETAIL:  CPU: user: 1.52 s, system: 2.10 s, elapsed: 4.71 s
INFO:  "pgbench_accounts": removed 34348 row versions in 34348 pages
DETAIL:  CPU: user: 0.64 s, system: 1.74 s, elapsed: 3.50 s
INFO:  index "pgbench_accounts_pkey" now contains 10000000 row versions in 27422 pages
DETAIL:  34348 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  index "idx_accounts_bid" now contains 10000000 row versions in 27848 pages
DETAIL:  34348 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  "pgbench_accounts": found 40067 removable, 10000000 nonremovable row versions in 164619 out of 164619 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 228641
There were 7150 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 6.87 s, system: 12.67 s, elapsed: 28.53 s.
VACUUM

Time: 28552.941 ms (00:28.553)

Near the beginning of the debugging output, we can see that Postgres scans each index before considering the table data. As expected, the elapsed time has also increased rather notably.

Again, we execute the pgbench load test, then repeat the VACUUM by disabling the index cleanup step:


pgbench=# VACUUM (INDEX_CLEANUP False, VERBOSE) pgbench_accounts ;

INFO:  vacuuming "public.pgbench_accounts"
INFO:  "pgbench_accounts": found 31275 removable, 9929654 nonremovable row versions in 164015 out of 165177 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 1028866
There were 29090 unused item identifiers.
Skipped 0 pages due to buffer pins, 570 frozen pages.
20 pages are entirely empty.
CPU: user: 3.02 s, system: 6.50 s, elapsed: 14.26 s.
VACUUM

Time: 14275.176 ms (00:14.275)

The elapsed time is slightly higher, but we can expect a certain amount of variation. Despite the small increase, we still complete the vacuum in roughly half the time by skipping indexes. Imagine a more real-world scenario where there are more than two indexes on a table with a mere 10-million rows!

Index Skipping Woes

There is alas, one rather glaring (and somewhat obvious) caveat we must append to this handy feature. By setting the INDEX_CLEANUP flag to False, the indexes are not vacuumed.

This means two things:

  1. Indexes could slowly bloat over time more than expected.
  2. Any time INDEX_CLEANUP isn’t disabled, a regular vacuum occurs anyway.

The first point suggests some clear disadvantages, given how invasive de-bloating indexes and tables can be. Regular maintenance creates a kind of inflection point where the table and its indexes reach volumetric equilibrium. The table and indexes should be no larger than the amount of rows modified between VACUUM iterations. The background autovacuum system ensures this is generally an automatic affair.

The second point is somewhat related as well. Since we skipped an integral part of the VACUUM system, if autovacuum determines a table requires automatic maintenance, it will not use this flag. If anyone, or some other process unfamiliar with the INDEX_CLEANUP flag fails to employ it, Postgres triggers the usual vacuum process.

Consider what happens if we execute a standard VACUUM VERBOSE on the pgbench_accounts table from the previous section. Keep in mind it has already been vacuumed:


pgbench=# VACUUM (INDEX_CLEANUP True, VERBOSE) pgbench_accounts ;

INFO:  vacuuming "public.pgbench_accounts"
INFO:  scanned index "pgbench_accounts_pkey" to remove 19321 row versions
DETAIL:  CPU: user: 1.08 s, system: 1.29 s, elapsed: 3.36 s
INFO:  "pgbench_accounts": removed 19321 row versions in 19321 pages
DETAIL:  CPU: user: 0.47 s, system: 1.90 s, elapsed: 4.24 s
INFO:  index "pgbench_accounts_pkey" now contains 10000000 row versions in 27422 pages
DETAIL:  19321 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  "pgbench_accounts": found 0 removable, 9041883 nonremovable row versions in 149354 out of 165177 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 1028866
There were 25628 unused item identifiers.
Skipped 0 pages due to buffer pins, 6563 frozen pages.
20 pages are entirely empty.
CPU: user: 2.40 s, system: 4.98 s, elapsed: 10.24 s.

VACUUM
Time: 10260.330 ms (00:10.260)

Remember all of that time we saved from previously skipping the indexes? Well, we incurred a kind of debt there, and it eventually required payment. If we execute one more manual VACUUM, we can see what happens when we execute maintenance on a table that actually doesn’t need it:


pgbench=# VACUUM (INDEX_CLEANUP True, VERBOSE) pgbench_accounts ;

INFO:  vacuuming "public.pgbench_accounts"
INFO:  "pgbench_accounts": found 0 removable, 37 nonremovable row versions in 21 out of 165177 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 1061124
There were 0 unused item identifiers.
Skipped 0 pages due to buffer pins, 56050 frozen pages.
20 pages are entirely empty.
CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s.
VACUUM

Time: 31.203 ms

About Transactions and VACUUM

We won’t go into too much detail here, but here’s a short summary of how Postgres manages transactions in a way that affects VACUUM.

Every write transaction in the database increments a counter with a maximum value of 2-billion. If left this way, Postgres could never handle more than 2-billion such transactions. Thus one of the (many) jobs of VACUUM is to identify the oldest possible transaction that can still “see” certain rows. That transaction becomes the new “zero”, thus allowing roughly two-billion more transactions from that point.

But this reset isn’t free. Rows below that new zero threshold have their transaction identifier set to a special value that means “everyone can see this forever.” Rows treated this way are said to be “frozen”.

The number of transactions a table undergoes after this point is its effective “age”. The older a table gets, the more likely the autovacuum process will run in the background to turn back the clock with another freeze.

However, if transaction rates and high volume times of day line up just right, this means Postgres could invoke a required background VACUUM FREEZE operation just as peak application volume is at its highest. Remember how each row is marked with a new transaction identifier to set the age to zero? This means every page containing such a row must be written. This can incur an unexpected (and likely unwanted) increase in storage write traffic.

What is a concerned DBA to do?

Command-line VACUUM Enhancements

Beyond optionally skipping formerly required elements of the VACUUM process, some enhancements have also been made to the vacuumdb command-line tool. One of these is the ability to specify an age threshold before bothering to VACUUM a table.

Let’s look at the ages of our benchmark tables with this special query:


SELECT relname AS tablename, age(relfrozenxid) AS xact_age
  FROM pg_class
 WHERE relkind = 'r'
   AND relname LIKE 'pgbench%';

Next, let’s execute another pgbench test, but this time, increase the duration to five minutes. This should widen our transaction margin to make it more obvious what is happening. This is the current age of our tables:


    tablename     | xact_age 
------------------+----------
 pgbench_accounts |    98872
 pgbench_branches |    98872
 pgbench_history  |    98872
 pgbench_tellers  |    98872

Now we are going to execute vacuumdb manually, but specify the new min-xid-age parameter:


#> time vacuumdb --min-xid-age=100000 pgbench

vacuumdb: vacuuming database "pgbench"

real    0m0.116s
user    0m0.075s
sys 0m0.028s

Based on the fact our execution time was less than one second, we can generally assume nothing happened. And we’d be right. Now watch what happens when we specify a threshold lower than the current age of our tables:


#> time vacuumdb --min-xid-age=50000 pgbench

vacuumdb: vacuuming database "pgbench"

real    0m37.570s
user    0m0.084s
sys 0m0.021s

This time the vacuumdb command completed after 37 seconds, and vacuuming all of the pgbench tables. This time is much longer than those quoted earlier in the article, but that’s because we recommended a 5-minute test. More dirty pages require extra processing time.

Frozen VACUUM Shenanigans

Unfortunately we didn’t really accomplish anything this way. One reason autovacuum exists is to keep tables and indexes at the equilibrium point we mentioned earlier. Runaway table bloat can affect query execution times and consumes precious storage space. However, a regular vacuum will not change a table’s age; we need a FREEZE to do that.

Rows that are not frozen will continue to age, and if that age exceeds the autovacuum_freeze_max_age setting (which defaults to 200M), Postgres will launch a vacuum anyway. We might have vacuumed tables that could need another regular vacuum during peak times, but that’s not the kind of behavior we’re trying to prevent.

So let’s freeze our largest table manually to illustrate a point:


VACUUM FREEZE pgbench_accounts;

Then our table ages look like this, and note how the table ages hasn’t changed after the previous vacuumdb run:


    tablename     | xact_age 
------------------+----------
 pgbench_accounts |        0
 pgbench_branches |    98872
 pgbench_history  |    98872
 pgbench_tellers  |    98872

Then we should check the table ages again after executing another 5-minute pgbench test. This ensures there’s a wide gap between the age of pgbench_accounts and the rest of the tables:


    tablename     | xact_age 
------------------+----------
 pgbench_accounts |    98719
 pgbench_branches |   197591
 pgbench_history  |   197591
 pgbench_tellers  |   197591

Here comes the cool part. We’ll use the --min-xid-age parameter again, but this time we’ll also supply --freeze so vacuumdb freezes rows in affected tables. We’ll use a threshold that’s higher than pgbench_accounts, but lower than the rest.


#> vacuumdb --freeze --min-xid-age=100000 pgbench

Now let’s examine our table ages again:


    tablename     | xact_age 
------------------+----------
 pgbench_accounts |    98719
 pgbench_branches |        0
 pgbench_history  |        0
 pgbench_tellers  |        0

Now imagine we want to prevent Postgres from freezing any tables that are near the autovacuum_freeze_max_age limit during busy times of the day. At this point we can simply invoke vacuumdb with a slightly lower value such as 180-million during a safer window and affected tables are frozen preemptively.

If we know our daily transaction throughput, we can even subtract that amount from the maximum, and always avoid unexpected FREEZE activity.

Conclusion

By circumventing the “natural order” of the usual VACUUM process, we introduce potentially unwanted side-effects, such as our indexes bloating more than the table over time. The intent of the INDEX_CLEANUP flag isn’t necessarily intended to accelerate VACUUM.

What the new index-skipping feature can enable, is one-time maintenance tasks. Was there a transaction that was blocking vacuum progress on some tables? Do some larger tables need extra care following an extreme job, but before some other high-volume event or portion of the day? There are innumerable cases where skipping indexes can safeguard overall table health. We just don’t recommend doing so on a regular basis.

On the other hand, some Postgres users might falsely believe the best way to prevent freeze activity is to increase the autovacuum_freeze_max_age limit. This can be true in certain extremely high volume environments. Beyond these however, it’s often safer merely to avoid triggering the default threshold itself.

Otherwise, what happens if we set the limit to one or two billion, and the next FREEZE requires too much to finish every table? Now Postgres must remain offline for an extended duration for a forced FREEZE to avoid database corruption.

Now it’s possible to FREEZE on a much more controlled basis. Technically this has always been possible. Anyone with scripting experience could have used the query we provided to identify early freeze candidate tables, and then frozen each iteratively. But by making it an inherent part of vacuumdb, we encourage that use case.

Basically we’re saying, “Please use this. Feel free to freeze tables early, because we’d rather you did that than increase the freeze limit.” Unlike the index cleanup flag, we hope everyone uses this new vacuumdb functionality frequently.

There’s a common saying that goes “If it hurts, do it more often.” This phrase is highly relevant to regular vacuums and freezing. Every row reclaimed or frozen now, is one that won’t need it later.

This is really just another latency versus throughput discussion. It might be slightly less efficient to increase frequency of these maintenance tasks, but delayed often enough, vacuum can become increasingly disruptive. Now it’s easier to avoid letting things reach that point.

Share this

More Blogs