Álvaro Herrera

July 30, 2020

One of the interesting features in PostgreSQL since version 9.4 is the ability to control removal of WAL files using replication slots. The dark side is that replication slots can cause disks to fill up with old WAL, killing the main production server. In this article I explain PostgreSQL replication slots, and how a new feature in PostgreSQL 13 helps prevent this problem.

WAL Production

As you know, WAL is produced for database changes in a primary server: inserts, updates, et cetera. A more active database will produce more WAL — in a very active server, there can be many gigabytes of WAL produced every minute. WAL is written to files with names in an increasing numerical sequence, and the files are always the same size (16 MB is default and typical). Once the data in a file is no longer needed, that file can be recycled, which means to rename it to a higher-numbered position in the sequence so that it can be filled with new data later.

(There are special situations such as a surge in activity that leads to the creation of additional files; when later the surge dies down, those extra files are removed instead of recycled.)

Because all database write activity produces WAL, it is critical for disk space to be available. When the disk storing WAL is full, the server will be unable to process new transactions and can become stuck, or worse: it may fall over completely. So this is a situation to be avoided by all means possible.

Replication Slots

Replication in PostgreSQL works by processing WAL files. In order for this to work, all WAL files have to be transiently available until they are processed. Therefore a mechanism is needed to tell the main WAL management not to recycle or remove files.

Enter replication slots. Slots are a mechanism that indicates that this backup we’re taking will require that WAL file, and can you please not delete it yet; or this replica still hasn’t processed that WAL file, so can it please be left alone for a little while.

By themselves, replication slots occupy very little disk space. They just store tiny bit of metadata, including a pointer to a position in WAL. But the WAL data that it protects is another matter: in a highly active server, it can be measured in gigabytes or worse.

WAL Consumption

Feeding data to a physical replica means to copy the WAL data from its primary server. Similarly, a logical replica needs to read WAL data (and transmit an interpreted version to the replica). The WAL position being read is what the slot keeps track of. Once the replica has secured the WAL data somehow, the slot can be advanced; this tells WAL management in the primary that the WAL file is then available for removal. This happens continuously when the replica is active, so that WAL in the primary server will use the same amount of disk space or maybe just a little more. Even twice as much or ten times as much might be acceptable, depending on conditions.

The problem is that if a replica dies completely and doesn’t recover for a long period of time; or the replica is destroyed and the DBA forgets to remove the replication slot; or the slot is a forgotten leftover of some experiment; or even the replica is being fed over a slow network link, then the reserved WAL will grow without bounds. And that becomes a ticking bomb.

Limiting Slot Size

In order to fight this problem, Kyotaro Horiguchi had been working since February 2017 in a PostgreSQL patch to limit the size of WAL reserved by a slot. After a very long review and rework process I integrated it for PostgreSQL 13, improving management of high-availability PostgreSQL farms.

The main principle is that it is better to kill a replica (by somehow making its slot invalid; more on that below) than killing the primary server that feeds that replica and take all production down with it.

The way it works is pretty straightforward: set max_slot_wal_keep_size (documentation) in postgresql.conf to the maximum amount of disk space of WAL that replication slots are allowed to reserve. If a slot reaches that point and a checkpoint occurs, that slot will be marked invalid and some WAL files may be deleted. If the slot was in active use by a walsender process, that process will be signalled so that it terminates. If the walsender starts again, it’ll find that necessary WAL files won’t be there anymore. The replica using that slot will have to be recloned.

If max_slot_wal_keep_size is zero, which is the default value, then there’s no limit. I don’t recommend this, because it leads to failures when slots fill the disk.

Monitoring Slot Health

Also included are some monitoring features. Two columns in pg_replication_slots are relevant. The most critical one is wal_status. If that column is reserved, then the slot is pointing to data within max_wal_size; if it is extended then it exceeded max_wal_size, but is still protected by either wal_keep_size or max_slot_wal_keep_size (including when max_slot_wal_keep_size is zero). Either state is good and normal. However, when a slot gets over the limit, it first becomes unreserved, which means it’s in imminent danger, but can still recover if it’s quick enough. Finally, the status becomes lost when WAL files have been removed and no recovery is possible^π.

The other column is safe_wal_size: it shows the number of bytes of WAL that can be written before this slot gets in danger of having WAL files removed. We suggest to keep a close eye on this column in your monitoring system, and fire alerts when it gets low. Zero or negative means your replica will be dead as soon as a checkpoint occurs:

SELECT slot_name, active, wal_status, safe_wal_size
  FROM pg_catalog.pg_replication_slots;

We believe that this new feature makes maintenance of replicas easier and more robust; hopefully we won’t see any more disasters with production down because of these problems.

(A note: safe_wal_size was introduced in 13beta3, so be sure to consult up-to-date documentation, or you’ll see min_safe_lsn instead. Ignore that.)

Thanks

Special thanks to Kyotaro Horiguchi for working on solving this problem. Several reviewers got deep on this, among whom I’d like to thank especially Masahiko Sawada, Fujii Masao, Jehan-Guillaume de Rorthais, and Amit Kapila (in no particular order).

Resource Feature Callout 1

PostgreSQL 13: Don't let slots kill your primary

Álvaro Herrera

WAL Production

Replication Slots

WAL Consumption

Limiting Slot Size

Monitoring Slot Health

Thanks

Why the Future of Postgres Extensions in Kubernetes is Immutable and Flexible

Minimal downtime Postgres major version upgrades with EDB Postgres Distributed

Integrating EDB's AI Accelerator with Nvidia NIM

Resource Feature Callout 1

PostgreSQL 13: Don't let slots kill your primary

Álvaro Herrera

WAL Production

Replication Slots

WAL Consumption

Limiting Slot Size

Monitoring Slot Health

Thanks

More Blogs

More Blogs

Why the Future of Postgres Extensions in Kubernetes is Immutable and Flexible

Minimal downtime Postgres major version upgrades with EDB Postgres Distributed

Integrating EDB's AI Accelerator with Nvidia NIM