PostgreSQL 13 introduces two useful features to enhance automated validation of physical backups: backup manifests and a new tool called pg_verifybackup
. In this short article I attempt to provide an overview of them.
As you probably know if you are familiar with PostgreSQL and the PostgreSQL Community, I have been promoting business continuity and disaster recovery awareness in organisations for many years, including the importance of taking backups as well as systematically testing and measuring the recovery process.
About 10 years ago, I also came up with the idea to write a backup tool for PostgreSQL databases called Barman, with my teammates at 2ndQuadrant. It is probably one of the most used backup tools in the PostgreSQL panorama.
However, one of the features that Barman still lacks is the possibility to verify the validity of a backup without having to restore it first – which, as I said, should be done systematically and automatically (given that Barman allows you to do it through recovery hook scripts).
In any case, PostgreSQL itself is taking an important step towards this direction. The next major release at the time of writing, PostgreSQL 13, expected for the end of 2020, will include two important new features:
- backup manifests, listing the content taken by a physical backup using
pg_basebackup
pg_verifybackup
, a tool to verify that the content of a backup matches the given manifest
Backup manifests
When taking a full backup with pg_basebackup
a backup manifest is automatically created, by default. If you are using the plain format (“-F p
”), a file called backup_manifest
will be included in the backup of the PGDATA. This applies also if you have tablespaces, which need to be properly remapped to be backed up. If you are using the tar format (“-F t
”), the backup_manifest
file will be placed in the destination directory, alongside each tar file produced by pg_basebackup
(one for the PGDATA, one for the WALs, if requested, and one for each tablespace if applicable).
What is the content of the backup_manifest
file? It is nothing but a JSON object, with the following keys:
PostgreSQL-Backup-Manifest-Version
: the version of the manifest, currently 1Files
: the list of files included in the backup and, for each of them, the relative path from PGDATA and important metadata such as size, time of last modification and checksumWAL-Ranges
: information such as timeline, LSN of backup start and LSN of backup endManifest-Checksum
: the checksum of the manifest file
pg_basebackup
comes with several options to control the manifest, including disabling it. You can decide to encode filenames in the manifest, or to change the checksum algorithm (by default CRC32).
You can also decide to take a standalone backup (which includes all WAL files from the start to the end of the backup) or just the base backup as WALs are archived separately. The latter is important in a future integration with backup tools like Barman that have an independent management policy of WAL files in which they can be compressed, encrypted or archived in remote locations such as object stores in private/public clouds.
I tried taking just a base backup, without WALs with the following command:
pg_basebackup -X none -D ~/backups/$(date '+%s')
Note: For the sake of simplicity, I am using my home directory to store these examples of backups. Then rely on $(date '+%s')
to create different directories per backup.
The content of the backed up directory is:
total 184
drwx------ 26 gabriele staff 832 May 2 01:12 .
drwx------ 3 gabriele staff 96 May 2 01:12 ..
-rw------- 1 gabriele staff 3 May 2 01:12 PG_VERSION
-rw------- 1 gabriele staff 227 May 2 01:12 backup_label
-rw------- 1 gabriele staff 135125 May 2 01:12 backup_manifest
drwx------ 5 gabriele staff 160 May 2 01:12 base
drwx------ 59 gabriele staff 1888 May 2 01:12 global
drwx------ 2 gabriele staff 64 May 2 01:12 pg_commit_ts
drwx------ 2 gabriele staff 64 May 2 01:12 pg_dynshmem
-rw------- 1 gabriele staff 4513 May 2 01:12 pg_hba.conf
-rw------- 1 gabriele staff 1636 May 2 01:12 pg_ident.conf
drwx------ 5 gabriele staff 160 May 2 01:12 pg_logical
drwx------ 4 gabriele staff 128 May 2 01:12 pg_multixact
drwx------ 2 gabriele staff 64 May 2 01:12 pg_notify
drwx------ 2 gabriele staff 64 May 2 01:12 pg_replslot
drwx------ 2 gabriele staff 64 May 2 01:12 pg_serial
drwx------ 2 gabriele staff 64 May 2 01:12 pg_snapshots
drwx------ 2 gabriele staff 64 May 2 01:12 pg_stat
drwx------ 2 gabriele staff 64 May 2 01:12 pg_stat_tmp
drwx------ 2 gabriele staff 64 May 2 01:12 pg_subtrans
drwx------ 2 gabriele staff 64 May 2 01:12 pg_tblspc
drwx------ 2 gabriele staff 64 May 2 01:12 pg_twophase
drwx------ 3 gabriele staff 96 May 2 01:12 pg_wal
drwx------ 3 gabriele staff 96 May 2 01:12 pg_xact
-rw------- 1 gabriele staff 88 May 2 01:12 postgresql.auto.conf
-rw------- 1 gabriele staff 28106 May 2 01:12 postgresql.conf
As you can see, there is a new file called backup_manifest
containing the summary of the backup in JSON format:
{ "PostgreSQL-Backup-Manifest-Version": 1,
"Files": [
{ "Path": "backup_label", "Size": 227, "Last-Modified": "2020-05-01 23:12:15 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "14aa4bcb" },
{ "Path": "pg_multixact/members/0000", "Size": 8192, "Last-Modified": "2020-05-01 10:02:38 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "23464490" },
{ "Path": "pg_multixact/offsets/0000", "Size": 8192, "Last-Modified": "2020-05-01 10:05:40 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "23464490" },
{ "Path": "PG_VERSION", "Size": 3, "Last-Modified": "2020-05-01 10:02:38 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "b825884b" },
…
],
"WAL-Ranges": [
{ "Timeline": 1, "Start-LSN": "0/21000028", "End-LSN": "0/21000100" }
],
"Manifest-Checksum": "fae6b7aa9eaab0a29474c7281a533ec2154f0f9fb8fd1e14b879c31f22bd62eb"}
Verify the backup with pg_verifybackup
I now try to run pg_verifybackup
with default options:
freddie:backups gabriele$ pg_verifybackup ~/backups/1588374735/
pg_waldump: fatal: could not find any WAL file
pg_verifybackup: error: WAL parsing failed for timeline 1
As you can see, pg_verifybackup
expects to find also the WAL files from the start to the end of the backup, which I deliberately did not include in the backup. I can skip verification of the WAL files with the "-n
” option:
freddie:backups gabriele$ pg_verifybackup -n ~/backups/1588374735/
backup successfully verified
I now take a standalone backup that includes all WAL files from start to end of the backup in the pg_wal
directory inside PGDATA
:
pg_basebackup -D ~/backups/$(date '+%s')
I want to simulate a corruption of a WAL file, so I open one of them and change a byte. Then run pg_verifybackup
:
pg_waldump: fatal: error in WAL record at 0/2A0011D0: incorrect resource manager data checksum in record at 0/2A001218
pg_verifybackup: error: WAL parsing failed for timeline 1
As you can see, pg_verifybackup
relies on pg_waldump
to verify that the stream of WAL files on the same timeline is present in the backup.
What’s next …
I am extremely grateful to the fellow PostgreSQL developers that wrote support for backup manifests and pg_verifybackup
. It is now time to support this in Barman.
Those that use Barman with pg_basebackup
will automatically have a backup manifest in their backups. In the next versions of Barman, we need to make sure that rsync
-generated backups for PostgreSQL 13 (and later) will produce 100% compatible manifests.
Then we need to integrate pg_verifybackup
in Barman, with "-n
" mode as WAL files are archived in a compressed way.
One idea could be to add a "fetch-wal
" option to pg_verifybackup
that invokes an external command, similar to restore_command
and requests WAL files directly to "barman-wal-restore
" or to the "barman get-wal
" command. Something to bring up for PostgreSQL 14.