Barman Cloud - Part 1: WAL Archive

January 27, 2020
Barman Cloud - Part 1: WAL Archive

Preamble

How many current Barman users have thought about saving backups in a remote destination in the cloud? How many have thought about taking that backup directly from the PostgreSQL server itself?

Well, since Barman 2.10 this is now possible!
How?
Let’s discover that together in the following articles.

Requirements

The following two articles are meant to be a practical introduction to the new barman-cloud-wal-archive and barman-cloud-backup tools added in the barman-cli package.
The first part will cover the barman-cloud-wal-archive command while the second one will cover the barman-cloud-backup command.
Readers need a basic knowledge of PostgreSQL WAL archiving and backup methods, and Barman. It is also recommended that you are aware of cloud technologies for storage solutions like Amazon S3.

WAL Archive

Barman has acted as a remote WAL archive for many years, and the Barman CLI package has been designed to extend archiving reliability and robustness on the PostgreSQL side. In fact barman-cli provides scripts like barman-wal-restore allowing a standby node to smartly and safety restore WAL files from a Barman archive through the restore_command parameter in the postgresql.auto.conf file (or recovery.conf file until PostgreSQL 12), and barman-wal-archive to archive WAL files from a master node to Barman through the archive_command parameter configured in the postgresql.conf file.

Cloud WAL Archive

Thanks to users’ feedback, the Barman developers have introduced two new tools in version 2.10:

  • barman-cloud-wal-archive
  • barman-cloud-backup

Version 2.11 will include two additional tools for recovery, called barman-cloud-wal-restore and barman-cloud-restore.
This post is entirely dedicated to barman-cloud-wal-archive, which can store WAL files in the cloud, enabling multi-tier archiving with Barman, and expanding the backups retention policy.
Indeed, barman-cloud-wal-archive can be used as a hook-script configuring the pre_archive_retry_script parameter in Barman, to copy WAL files in the configured cloud storage, increasing the redundancy of the archive, and making it possible to choose a longer retention policy than the Barman one.

That’s not all!

barman-cloud-wal-archive can replace the barman-wal-archive command in the archive_command parameter, to directly archive WAL files in the cloud, instead of copying them into the Barman server. In this way, even a PostgreSQL cluster that does not have a separate dedicated backup server can rely on remote storage service to archive WAL files.

How does it work?

The following instructions are just to install and configure barman-cloud-wal-archive as the archive_command in PostgreSQL.
First, decide where to archive WAL files. In this article we will use Amazon S3, which, at the moment of writing is the only technology supported. Although other technologies that support S3-like API (Google Cloud, DigitalOcean, Microsoft Azure, etc.) can work with boto3 library, they have not been tested yet.

Requirements

  1. barman-cli 2.10 (or higher)
  2. Amazon AWS account
  3. awscli
  4. S3 bucket
  5. A PostgreSQL instance

In this article we will test Barman CLI in a virtual machine with Debian Buster and PostgreSQL 12 which is already up and running.

Installation

    1. Install the 2ndQuadrant Public repository
    2. Install the barman-cli package
      
      root@vm:~# apt update
      root@vm:~# apt install barman-cli
    3. Install awscli
      
      root@vm:~# apt install awscli

    Configuration and setup

    Let’s read the manual:

    
    postgres@vm:~$ man barman-cloud-wal-archive
    [...]
    SYNOPSIS
        barman-cloud-wal-archive [OPTIONS] DESTINATION_URL SERVER_NAME WAL_PATH
    [...]
    POSITIONAL ARGUMENTS
    
        DESTINATION_URL
        URL  of the cloud destination, such as a bucket in AWS S3.
        For example: s3://BUCKET_NAME/path/to/folder (where BUCKET_NAME is the bucket you have created in AWS).
    
        SERVER_NAME
        the name of the server as configured in Barman.
    
        WAL_PATH
        the value of the `%p' keyword (according to `archive_command').
    [...]
    

    So, to properly use it we just need to configure AWS credentials with the awscli tool as the postgres user, copying the Access Key and Secret Key previously created in the IAM section in AWS console:

    
    postgres@vm:~$ aws configure --profile barman-cloud
    AWS Access Key ID [None]: AKI*****************
    AWS Secret Access Key [None]: ****************************************
    Default region name [None]: eu-west-1
    Default output format [None]: json
    

    Ensure to have an available S3 bucket on AWS. I chose to call it barman-s3-test to make it clear.
    We should be able now to test the barman-cloud-wal-archive command:

    
    postgres@vm:~$ barman-cloud-wal-archive -t -P barman-cloud s3://barman-s3-test/ pg12 /var/lib/postgresql/12/main/pg_wal/000000010000000000000001
    postgres@vm:~$ echo $?
    0
    

    The exit status confirms that the command succeeded. We can now add the following line at the bottom of the PostgreSQL configuration file and restart the instance:
    archive_mode = on

    
    root@vm:~# systemctl restart postgresql@12-main.service
    

    Since our data will be copied in a remote storage, outside our control, it’s important that we store them compressed and encrypted. The barman-cloud-wal-archive command supports two different methods for compression:

    
    postgres@vm:~$ barman-cloud-wal-archive --help
    [...]
        -z, --gzip            gzip-compress the WAL while uploading to the cloud
        -j, --bzip2           bzip2-compress the WAL while uploading to the cloud
        -e ENCRYPTION, --encryption ENCRYPTION
                              Enable server-side encryption for the transfer.
                              Allowed values: 'AES256', 'aws:kms'
    [...]
    

    The encryption option will just inform the S3 bucket which method to use to store the data encrypted. Encrypted data cannot be read by any other AWS user but the owner of the bucket. Barman cloud does not encrypt any object before sending it to S3, it just asks the bucket to store them encrypted if S3 has been properly configured. However, any connections to S3 are securely established via https.

    Let’s add the following line at the bottom of the postgresql.conf file:

    archive_command = 'barman-cloud-wal-archive -P barman-cloud -e AES256 -j s3://barman-s3-test/ pg12 %p'

    This time, just a reload of the configuration is enough to apply the new changes:

    
    postgres@vm:~$ psql -c “SELECT pg_reload_conf()”
    

    In order to test whether the new archive_command is working, PostgreSQL should produce WAL files to be archived, therefore we have to make some traffic with the help of the pgbench tool:

    
    postgres@vm:~$ createdb pg_bench_db
    postgres@vm:~$ pgbench -i -s10 pg_bench_db
    
    [some irrelevant output here]
    
    postgres@vm:~$ pgbench -c 10 -j 2 -T 30 pg_bench_db
    starting vacuum...end.
    transaction type: <builtin: TPC-B (sort of)>
    scaling factor: 10
    query mode: simple
    number of clients: 10
    number of threads: 2
    duration: 30 s
    number of transactions actually processed: 84501
    latency average = 3.552 ms
    tps = 2815.224687 (including connections establishing)
    tps = 2815.427535 (excluding connections establishing)
    

    At this point we should see WAL files archived in the S3 bucket. Let’s check it, building the target path with the server name and the WAL destination directory:

    
    postgres@vm:~$ aws s3 --profile barman-cloud ls s3://barman-s3-test/pg12/wals/
                            PRE 0000000100000000/
    

    Let’s have a look inside the 0000000100000000 directory:

    
    postgres@vm:~$ aws s3 --profile barman-cloud ls s3://barman-s3-test/pg12/wals/0000000100000000/
    2020-01-08 08:20:54    1624168 000000010000000000000001.bz2
    2020-01-08 08:21:00     293422 000000010000000000000002.bz2
    2020-01-08 08:21:06     301934 000000010000000000000003.bz2
    2020-01-08 08:21:11     295648 000000010000000000000004.bz2
    2020-01-08 08:21:16     293675 000000010000000000000005.bz2
    2020-01-08 08:21:21     299348 000000010000000000000006.bz2
    2020-01-08 08:21:27     551249 000000010000000000000007.bz2
    2020-01-08 08:21:33     976523 000000010000000000000008.bz2
    2020-01-08 08:21:37    4542104 000000010000000000000009.bz2
    2020-01-08 08:21:46    5052693 00000001000000000000000A.bz2
    

    Great!

    WAL files are being compressed before being uploaded to the S3 bucket and are stored encrypted, saving us space (and money) and increasing the security level of our data.

    Conclusions

    The barman-cloud-wal-archive command is what users have waited for a long time.

    If you’re one of those who has used pre_archive_retry_script to implement a custom script for uploading WAL files to an S3 bucket, then this can be used as a better replacement because it is developed and maintained by Barman developers, and it is tested and delivered by the 2ndQuadrant Continuous Delivery system.

    In case you haven’t thought about it yet, this opens up new retention policies which can be longer for cloud storage than the Barman local ones, increasing the objects’ age in the cloud, while saving space on the local storage, by properly setting a longer retention policy in the S3 buckets’ configuration.

    Otherwise, it can be used as we did in this article, to archive WAL files directly from the PostgreSQL server. Although this removes an intermediate step, the RPO increases compared with the streaming method, because PostgreSQL will archive WAL file only after having closed it. Therefore in case of problems on the PostgreSQL node, we could lose some changes. When possible, we recommend implementing this method along with the streaming to a Barman server in order to achieve RPO=0 (with synchronous streaming).

    Now that we have a continuous archiving system in place, we can take our first cloud backup using the barman-cloud-backup tool.

    See you in the second part of the article.

Share this

More Blogs

RAG app with Postgres and pgvector

Build a RAG app using Postgres and pgvector to enhance AI applications with improved data management, privacy, and efficient local LLM integration.
October 08, 2024

Mastering PostgreSQL in Kubernetes with CloudNativePG

Previewing EDB’s training session for PGConf.EU 2024, presented by our Kubernetes experts EDB is committed to advancing PostgreSQL by sharing our expertise and insights, especially as the landscape of database...
September 30, 2024