Retention of backups with Barman

April 15, 2013

Defining a disaster recovery plan involves defining backup policies. A key aspect of backup policies is to define how long backup data is retained for disaster recovery purposes. This applies to all digital content, including PostgreSQL databases.

Barman 1.2.0 introduces automated management of backup retention policies of PostgreSQL servers.

Retention policies were one of the midterm goals that we had given ourselves when the whole Barman idea began to take shape. I am glad that, thanks to the vision of a French company (which explicitly requested to remain anonymous), we were able to undertake their open-source development.

What are retention policies?

A retention policy is a user-defined set of guidelines and principles that determines how long backups and their related archive logs need to be retained for recovery procedures. In a PostgreSQL database scenario:

  • with “backup” we refer to a full physical backup, performed periodically, when the database is online (hence the term “hot” very often used in this context);
  • with “archive logs” we refer to Write Ahead Log (WAL) files, responsible for implementing differential backup by continuously archiving every change made to the data files of the database server.

Base backups and archive logs form the so-called “backup catalogue” and allow database administrators to perform Point-In-Time recovery operations.

Further information on PostgreSQL physical backup and continuous archiving can be found in the Postgres documentation.

Why are retention policies so important?

In some countries and environments, it is the law that requires ICT departments to enforce them for data security and protection reasons. In Italy, for instance, the “Codice dell’Amministrazione Digitale” (CAD) requires that public organisations report retention periods in official documents such as their Disaster Recovery Plan.

From a more practical point of view, retention policies improve automation of a backup solution, while reducing the management and configuration complexity.

How are they implemented in Barman?

Barman at any time retains:

  • the periodical backups required to satisfy the current retention policy for a given PostgreSQL server;
  • the archived WAL files required for the complete recovery of those backups.

On the same topic, you may wish to read one of my previous blog posts about the backup catalogue and the WAL archive in Barman.

Barman users can define a retention policy in terms of:

  • backup redundancy (how many periodical backups), or
  • a recovery window (how long).
In case of a retention policy based on redundancy, the administrator decides how many periodical backups to keep.
 
On the other hand, a retention policy based on recovery window allows the DBA to specify a period of time (recovery window). Barman ensures retention of backups and/or archived WAL files required for point-in-time recovery to any time during that recovery window.
 
Retention policies are managed in Barman by the configuration option ‘retention_policy’ and the ‘barman cron’ command (responsible for maintenance operations). In the next article we will go through the configuration of retention policies with some simple examples.
 
 
 
Where can I find Barman?
 
 
Barman is an open source application for disaster recovery developed and maintained by 2ndQuadrant. It is written in Python and it can be installed through sources or PyPI. However, RPM packages and Debian packages are available for installation on RHEL/CentOS 5/6, Debian or Ubuntu 12.04 LTS.
More information on Barman can be found on the website (www.pgbarman.org) and the available documentation.
 
Share this

Relevant Blogs

Random Data

This post continues from my report on Random Numbers. I have begun working on a random data generator so I want to run some tests to see whether different random...
December 03, 2020

More Blogs

Full-text search since PostgreSQL 8.3

Welcome to the third – and last – part of this blog series, exploring how the PostgreSQL performance evolved over the years. The first part looked at OLTP workloads, represented...
November 05, 2020

Números aleatorios

He estado trabajando gradualmente en el desarrollo desde cero de herramientas para probar el rendimiento de los sistemas de bases de datos de código abierto. Uno de los componentes de...
November 04, 2020