Barman 2.2 and the magic of parallel copy

July 18, 2017

Barman 2.2 introduces support for parallel copy, by improving performance of both backup and recovery operations in your PostgreSQL disaster recovery solution.

Barman is a piece of software that has been incrementally improved since its conception in 2011. Brick after brick, with just one goal in mind: foster a disaster recovery culture in PostgreSQL, by making the whole backup/recovery process easier and more standard.

Barman is full of interesting features that go beyond disaster recovery (consider the WAL hub facility implemented via barman_wal_restore). Just to name a few: support for rsync/SSH and streaming backup, support for both WAL archiving and streaming (including synchronous for zero data loss clusters), a monitoring facility, incremental backup and recovery, hook scripts, and so on.

However, when managing large databases, Barman suffered from being bound to one CPU for backup and recovery operations. Lately, this was perceived as Barman’s main weak spot by users, and we decided to fill the gap.

Version 2.2 introduces support for parallel backup and recovery when using the rsync copy method, allowing you to specify how many jobs you want to run concurrently.

We have added one global option, called parallel_jobs, that can be overridden at server level. For back compatibility, this option is set to 1 by default. This option controls parallelism for both the backup and the recover command.

[vaughan]
description =  "Backup of SRV database"
ssh_command = ssh postgres@vaughan
conninfo = host=vaughan user=barman dbname=postgres
backup_method = rsync
parallel_jobs = 4
; … more options here

In some cases though, users might want to change the default behaviour and decide how many jobs are requested for a backup or recovery operation. For this reason we have implemented the --jobs option (or -j) for both the backup and recover command.

If you want to spread your backup over 8 rsync processes, you can simply execute:

$ barman backup -j 8 vaughan

Likewise, for recovery:

$ barman recover -j 8 [...] vaughan [...]

Another interesting change is in the show-backup command. This is an excerpt taken from one of the Subito.it databases (thanks for the kind concession and for co-funding the development of this feature). You can appreciate the improvement:

$ barman show-backup pg95 last

       ... [snip] ...
  Base backup information:
    Disk usage           : 1.8 TiB (1.8 TiB with WALs)
    Incremental size     : 672.6 GiB (-62.76%)
       ... [snip] ...
    WAL number           : 392
    WAL compression ratio: 60.68%
    Begin time           : 2017-06-15 01:00:02.929344+02:00
    End time             : 2017-06-15 02:55:06.626676+02:00
    Copy time            : 1 hour, 29 minutes, 31 seconds + 6 seconds startup
    Estimated throughput : 128.2 MiB/s (4 jobs)
       ... [snip] ...

Roughly, with their 1.8 terabyte database, Subito.it has reduced their backup time by 60% (from 3 hours and 40 minutes, to less than 1 hour and 30 minutes). Similarly, they have reduced recovery time by 40% (from 5 hours and 20 minutes, to 3 hours and 10 minutes) by increasing the number of jobs from 1 to 4.

Indeed, Subito.it automatically test their backups through post backup hooks scripts that re-create a reporting database every day from scratch (watch my presentation from 2015 at PgConf.US for details). Thanks to this feature, Subito.it is able to provision a database to their BI department almost 5 hours earlier!

It goes without saying that there is not a precise formula for this, as many variables come into play, including I/O and network throughput. But it is definitely another option you now have with Barman.

Barman 2.2 fixes a few outstanding bugs and improves robustness of your PostgreSQL disaster recovery installation by adding the max_incoming_wals_queue option, which makes sure that your WAL files are regularly archived by Barman.

As with any other previous release, just update your current Barman installation and you will be able to experience parallel backup and recovery.

We believe this is a killer feature, let us know that you like it and share your feedback with us!

Share this

Relevant Blogs

Random Data

This post continues from my report on Random Numbers. I have begun working on a random data generator so I want to run some tests to see whether different random...
December 03, 2020

More Blogs

Full-text search since PostgreSQL 8.3

Welcome to the third – and last – part of this blog series, exploring how the PostgreSQL performance evolved over the years. The first part looked at OLTP workloads, represented...
November 05, 2020

Números aleatorios

He estado trabajando gradualmente en el desarrollo desde cero de herramientas para probar el rendimiento de los sistemas de bases de datos de código abierto. Uno de los componentes de...
November 04, 2020