Preamble
How many current Barman users have thought about saving backups in a remote destination in the cloud? How many have thought about taking that backup directly from the PostgreSQL server itself?
Well, since Barman 2.10 this is now possible!
How?
Let’s discover that together in the following articles.
Requirements
The following two articles are meant to be a practical introduction to the new barman-cloud-wal-archive
and barman-cloud-backup
tools added in the barman-cli
package.
The first part will cover the barman-cloud-wal-archive
command while the second one will cover the barman-cloud-backup
command.
Readers need a basic knowledge of PostgreSQL WAL archiving and backup methods, and Barman. It is also recommended that you are aware of cloud technologies for storage solutions like Amazon S3.
WAL Archive
Barman has acted as a remote WAL archive for many years, and the Barman CLI package has been designed to extend archiving reliability and robustness on the PostgreSQL side. In fact barman-cli
provides scripts like barman-wal-restore
allowing a standby node to smartly and safety restore WAL files from a Barman archive through the restore_command
parameter in the postgresql.auto.conf
file (or recovery.conf
file until PostgreSQL 12), and barman-wal-archive
to archive WAL files from a master node to Barman through the archive_command
parameter configured in the postgresql.conf
file.
Cloud WAL Archive
Thanks to users’ feedback, the Barman developers have introduced two new tools in version 2.10:
barman-cloud-wal-archive
barman-cloud-backup
Version 2.11 will include two additional tools for recovery, called barman-cloud-wal-restore
and barman-cloud-restore
.
This post is entirely dedicated to barman-cloud-wal-archive
, which can store WAL files in the cloud, enabling multi-tier archiving with Barman, and expanding the backups retention policy.
Indeed, barman-cloud-wal-archive
can be used as a hook-script configuring the pre_archive_retry_script
parameter in Barman, to copy WAL files in the configured cloud storage, increasing the redundancy of the archive, and making it possible to choose a longer retention policy than the Barman one.
That’s not all!
barman-cloud-wal-archive
can replace the barman-wal-archive
command in the archive_command
parameter, to directly archive WAL files in the cloud, instead of copying them into the Barman server. In this way, even a PostgreSQL cluster that does not have a separate dedicated backup server can rely on remote storage service to archive WAL files.
How does it work?
The following instructions are just to install and configure barman-cloud-wal-archive
as the archive_command
in PostgreSQL.
First, decide where to archive WAL files. In this article we will use Amazon S3, which, at the moment of writing is the only technology supported. Although other technologies that support S3-like API (Google Cloud, DigitalOcean, Microsoft Azure, etc.) can work with boto3 library, they have not been tested yet.
Requirements
- barman-cli 2.10 (or higher)
- Amazon AWS account
- awscli
- S3 bucket
- A PostgreSQL instance
In this article we will test Barman CLI in a virtual machine with Debian Buster and PostgreSQL 12 which is already up and running.
Installation
-
- Install the 2ndQuadrant Public repository
- Install the barman-cli package
root@vm:~# apt update root@vm:~# apt install barman-cli
- Install awscli
root@vm:~# apt install awscli
Configuration and setup
Let’s read the manual:
postgres@vm:~$ man barman-cloud-wal-archive [...] SYNOPSIS barman-cloud-wal-archive [OPTIONS] DESTINATION_URL SERVER_NAME WAL_PATH [...] POSITIONAL ARGUMENTS DESTINATION_URL URL of the cloud destination, such as a bucket in AWS S3. For example: s3://BUCKET_NAME/path/to/folder (where BUCKET_NAME is the bucket you have created in AWS). SERVER_NAME the name of the server as configured in Barman. WAL_PATH the value of the `%p' keyword (according to `archive_command'). [...]
So, to properly use it we just need to configure AWS credentials with the
awscli
tool as thepostgres
user, copying the Access Key and Secret Key previously created in the IAM section in AWS console:postgres@vm:~$ aws configure --profile barman-cloud AWS Access Key ID [None]: AKI***************** AWS Secret Access Key [None]: **************************************** Default region name [None]: eu-west-1 Default output format [None]: json
Ensure to have an available S3 bucket on AWS. I chose to call it
barman-s3-test
to make it clear.
We should be able now to test thebarman-cloud-wal-archive
command:postgres@vm:~$ barman-cloud-wal-archive -t -P barman-cloud s3://barman-s3-test/ pg12 /var/lib/postgresql/12/main/pg_wal/000000010000000000000001 postgres@vm:~$ echo $? 0
The exit status confirms that the command succeeded. We can now add the following line at the bottom of the PostgreSQL configuration file and restart the instance:
archive_mode = on
root@vm:~# systemctl restart postgresql@12-main.service
Since our data will be copied in a remote storage, outside our control, it’s important that we store them compressed and encrypted. The
barman-cloud-wal-archive
command supports two different methods for compression:postgres@vm:~$ barman-cloud-wal-archive --help [...] -z, --gzip gzip-compress the WAL while uploading to the cloud -j, --bzip2 bzip2-compress the WAL while uploading to the cloud -e ENCRYPTION, --encryption ENCRYPTION Enable server-side encryption for the transfer. Allowed values: 'AES256', 'aws:kms' [...]
The encryption option will just inform the S3 bucket which method to use to store the data encrypted. Encrypted data cannot be read by any other AWS user but the owner of the bucket. Barman cloud does not encrypt any object before sending it to S3, it just asks the bucket to store them encrypted if S3 has been properly configured. However, any connections to S3 are securely established via
https
.Let’s add the following line at the bottom of the
postgresql.conf
file:archive_command = 'barman-cloud-wal-archive -P barman-cloud -e AES256 -j s3://barman-s3-test/ pg12 %p'
This time, just a reload of the configuration is enough to apply the new changes:
postgres@vm:~$ psql -c “SELECT pg_reload_conf()”
In order to test whether the new archive_command is working, PostgreSQL should produce WAL files to be archived, therefore we have to make some traffic with the help of the
pgbench
tool:postgres@vm:~$ createdb pg_bench_db postgres@vm:~$ pgbench -i -s10 pg_bench_db [some irrelevant output here] postgres@vm:~$ pgbench -c 10 -j 2 -T 30 pg_bench_db starting vacuum...end. transaction type: <builtin: TPC-B (sort of)> scaling factor: 10 query mode: simple number of clients: 10 number of threads: 2 duration: 30 s number of transactions actually processed: 84501 latency average = 3.552 ms tps = 2815.224687 (including connections establishing) tps = 2815.427535 (excluding connections establishing)
At this point we should see WAL files archived in the S3 bucket. Let’s check it, building the target path with the server name and the WAL destination directory:
postgres@vm:~$ aws s3 --profile barman-cloud ls s3://barman-s3-test/pg12/wals/ PRE 0000000100000000/
Let’s have a look inside the 0000000100000000 directory:
postgres@vm:~$ aws s3 --profile barman-cloud ls s3://barman-s3-test/pg12/wals/0000000100000000/ 2020-01-08 08:20:54 1624168 000000010000000000000001.bz2 2020-01-08 08:21:00 293422 000000010000000000000002.bz2 2020-01-08 08:21:06 301934 000000010000000000000003.bz2 2020-01-08 08:21:11 295648 000000010000000000000004.bz2 2020-01-08 08:21:16 293675 000000010000000000000005.bz2 2020-01-08 08:21:21 299348 000000010000000000000006.bz2 2020-01-08 08:21:27 551249 000000010000000000000007.bz2 2020-01-08 08:21:33 976523 000000010000000000000008.bz2 2020-01-08 08:21:37 4542104 000000010000000000000009.bz2 2020-01-08 08:21:46 5052693 00000001000000000000000A.bz2
Great!
WAL files are being compressed before being uploaded to the S3 bucket and are stored encrypted, saving us space (and money) and increasing the security level of our data.
Conclusions
The
barman-cloud-wal-archive
command is what users have waited for a long time.If you’re one of those who has used
pre_archive_retry_script
to implement a custom script for uploading WAL files to an S3 bucket, then this can be used as a better replacement because it is developed and maintained by Barman developers, and it is tested and delivered by the 2ndQuadrant Continuous Delivery system.In case you haven’t thought about it yet, this opens up new retention policies which can be longer for cloud storage than the Barman local ones, increasing the objects’ age in the cloud, while saving space on the local storage, by properly setting a longer retention policy in the S3 buckets’ configuration.
Otherwise, it can be used as we did in this article, to archive WAL files directly from the PostgreSQL server. Although this removes an intermediate step, the RPO increases compared with the streaming method, because PostgreSQL will archive WAL file only after having closed it. Therefore in case of problems on the PostgreSQL node, we could lose some changes. When possible, we recommend implementing this method along with the streaming to a Barman server in order to achieve RPO=0 (with synchronous streaming).
Now that we have a continuous archiving system in place, we can take our first cloud backup using the
barman-cloud-backup
tool.See you in the second part of the article.