Barman 2.2 introduces support for parallel copy, by improving performance of both backup and recovery operations in your PostgreSQL disaster recovery solution.
Barman is a piece of software that has been incrementally improved since its conception in 2011. Brick after brick, with just one goal in mind: foster a disaster recovery culture in PostgreSQL, by making the whole backup/recovery process easier and more standard.
Barman is full of interesting features that go beyond disaster recovery (consider the WAL hub facility implemented via barman_wal_restore). Just to name a few: support for rsync/SSH and streaming backup, support for both WAL archiving and streaming (including synchronous for zero data loss clusters), a monitoring facility, incremental backup and recovery, hook scripts, and so on.
However, when managing large databases, Barman suffered from being bound to one CPU for backup and recovery operations. Lately, this was perceived as Barman’s main weak spot by users, and we decided to fill the gap.
Version 2.2 introduces support for parallel backup and recovery when using the rsync copy method, allowing you to specify how many jobs you want to run concurrently.
We have added one global option, called parallel_jobs
, that can be overridden at server level. For back compatibility, this option is set to 1 by default. This option controls parallelism for both the backup and the recover command.
[vaughan]
description = "Backup of SRV database"
ssh_command = ssh postgres@vaughan
conninfo = host=vaughan user=barman dbname=postgres
backup_method = rsync
parallel_jobs = 4
; … more options here
In some cases though, users might want to change the default behaviour and decide how many jobs are requested for a backup or recovery operation. For this reason we have implemented the --jobs
option (or -j
) for both the backup and recover command.
If you want to spread your backup over 8 rsync processes, you can simply execute:
$ barman backup -j 8 vaughan
Likewise, for recovery:
$ barman recover -j 8 [...] vaughan [...]
Another interesting change is in the show-backup
command. This is an excerpt taken from one of the Subito.it databases (thanks for the kind concession and for co-funding the development of this feature). You can appreciate the improvement:
$ barman show-backup pg95 last
... [snip] ...
Base backup information:
Disk usage : 1.8 TiB (1.8 TiB with WALs)
Incremental size : 672.6 GiB (-62.76%)
... [snip] ...
WAL number : 392
WAL compression ratio: 60.68%
Begin time : 2017-06-15 01:00:02.929344+02:00
End time : 2017-06-15 02:55:06.626676+02:00
Copy time : 1 hour, 29 minutes, 31 seconds + 6 seconds startup
Estimated throughput : 128.2 MiB/s (4 jobs)
... [snip] ...
Roughly, with their 1.8 terabyte database, Subito.it has reduced their backup time by 60% (from 3 hours and 40 minutes, to less than 1 hour and 30 minutes). Similarly, they have reduced recovery time by 40% (from 5 hours and 20 minutes, to 3 hours and 10 minutes) by increasing the number of jobs from 1 to 4.
Indeed, Subito.it automatically test their backups through post backup hooks scripts that re-create a reporting database every day from scratch (watch my presentation from 2015 at PgConf.US for details). Thanks to this feature, Subito.it is able to provision a database to their BI department almost 5 hours earlier!
It goes without saying that there is not a precise formula for this, as many variables come into play, including I/O and network throughput. But it is definitely another option you now have with Barman.
Barman 2.2 fixes a few outstanding bugs and improves robustness of your PostgreSQL disaster recovery installation by adding the max_incoming_wals_queue
option, which makes sure that your WAL files are regularly archived by Barman.
As with any other previous release, just update your current Barman installation and you will be able to experience parallel backup and recovery.
We believe this is a killer feature, let us know that you like it and share your feedback with us!