Data Backup Made Easy

November 16, 2016

Contributed by Ahsan Hadi

Backup strategies are critical to enterprise data management, but the time, storage, and management requirements pose a host of challenges. Because companies are managing huge volumes of data, which continue to grow daily, the amount of time and storage required for backup processes can hinder database performance and ultimately, the customer experience.

Available solutions and methods for doing backups have also limited the database administrator’s flexibility in managing backup processes efficiently. Scheduling major backups during off-hours, for example, still comes at great cost to the organization in time and system disruption.

EnterpriseDB® (EDB) is working to address the DBA’s challenges and provide more flexibility in managing backups. The EDB Postgres Backup and Recovery Tool currently provides support for taking Full Hot Backups; it takes a complete file-level backup of the database server while the server is still running.

The next major release of EDB Postgres Backup and Recovery Tool will feature support for block-level incremental backup so the user can back up data that has changed since the last full or incremental backup. (EDB Postgres Backup and Recovery Tool v2.1 is planned for beta testing in December 2016.) This new capability will make backups faster and take up less storage space as well as make recovery operations faster to execute.

Incremental Backup: How it Works

For users of EDB Postgres Backup and Recovery Tool, the following is a snapshot of how to execute an incremental backup: 

In order to take an incremental backup, the user would need to pass the “--parent” switch to the EDB Postgres Backup and Recovery Tool BACKUP sub-command. Below is the help menu of EDB Postgres Backup and Recovery Tool BACKUP sub-command. Please note that EDB Postgres Backup and Recovery Tool incremental backup functionality will only work with EDB Postgres Advanced Server 9.5/PostgreSQL 9.5 and subsequent releases. It will not work with earlier versions.

$ bart BACKUP --help

./bart: backup and recovery tool

 

Usage:

 ./bart BACKUP [OPTION]...

 

Options:

  -h, --help           Show this help message and exit

  -s, --server         Name of the server or 'all' to specify all servers

  -F, --format=p|t     Backup output format (tar (default) or plain)

  -z, --gzip           Enables gzip compression of tar files

  -c, --compress-level Specifies the compression level (1 through 9, 9

                       being best compression)

 

  --backup-name        Specify a friendly name for the current backup

  --parent             Specify parent backup for incremental backup

Below is the command for creating an incremental backup. The user would pass the backup-id of an earlier full or incremental backup to the EDB Postgres Backup and Recovery Tool BACKUP sub-command using the --parent switch. This will invoke the incremental backup operation and will result in a block-level incremental backup:

$ bart -c ~/bart.cfg backup -s ppas-9.5 -F p --parent 1473923937340

The user can always run the EDB Postgres Backup and Recovery Tool show-backups command in order to see details about the full or incremental backups taken so far.

The EDB Postgres Backup and Recovery Tool show-backups command shows one full backup and one incremental backup:

$ bart show-backups -t

 

SERVER NAME    : ppas-9.5

BACKUP ID      : 1473923937340

PARENT ID      : none

BACKUP NAME    : none

BACKUP STATUS  : active

BACKUP TIME    : 2016-09-15 07:20:01 UTC

BACKUP SIZE    : 7.36 GB

WAL(S) SIZE    : 16.00 MB

NO. OF WALS    : 1

FIRST WAL FILE : 000000010000000600000011

CREATION TIME  : 2016-09-15 07:20:03 UTC

LAST WAL FILE  : 000000010000000600000011

CREATION TIME  : 2016-09-15 07:20:03 UTC

 

SERVER NAME    : ppas-9.5

BACKUP ID      : 1473536937898

PARENT ID      : 1473535329505

BACKUP NAME    : none

BACKUP STATUS  : active

BACKUP TIME    : 2016-09-10 19:48:58 UTC

BACKUP SIZE    : 35.07 MB

WAL(S) SIZE    : 6.02 GB

NO. OF WALS    : 385

FIRST WAL FILE : 000000010000000400000090

CREATION TIME  : 2016-09-15 04:31:16 UTC

LAST WAL FILE  : 000000010000000600000010

CREATION TIME  : 2016-09-15 07:18:57 UTC

FIRST RESTORE  : 2016-09-10 20:02:20 XLOG

LATEST RESTORE : 2016-09-10 20:19:53 XLOG

Restoring an incremental backup is also very simple. Simply provide the ID of the incremental backup and EDB Postgres Backup and Recovery Tool will restore all the incremental backup(s) in the chain as well as the corresponding full backup.

Here is the EDB Postgres Backup and Recovery Tool command for restoring an incremental backup:

$ bart -c ~/bart.cfg restore -s ppas-9.5 -i 1473924715211 -p /home/ec2-user/restore/ppas-9.5

In order to do a point-in-time recovery (PITR) with incremental backup, simply issue the following command and EDB Postgres Backup and Recovery Tool will perform the PITR:

$ bart -c ~/bart.cfg restore -s ppas-9.5 -i 1473478624481 -p /home/ec2-user/restore/ppas-9.5 -g '09-SEP-16 16:30:00'

PostgreSQL and EDB Postgres Advanced Server maintain the write ahead logs (WALs) in the pg_xlog subdirectory of the DATA folder. The WAL records every change that is made to the data files. The WAL files are primarily required in case of crash recovery; the database can be restored consistently to a current state by restoring the last base backup and replaying the WAL files. The purpose of mentioning this here is that the EDB Postgres Backup and Recovery Tool incremental backup feature is implemented by scanning the WAL files to identify the changes since the last backup. The WAL file contains all changes made to all data files.

In order to provide the incremental backup feature, EDB Postgres Backup and Recovery Tool requires archiving to be enabled on the backup database server and the archive_command set to send the WAL files to the BARTHOST. Even today in the current version, using the BART INIT command, customers can automatically set the archive_command for the backup database server to send the WAL files from pg_xlog directory to BARTHOST.

The block level incremental backup is implemented using the WAL scanning approach. On a high level, this is achieved by scanning and analyzing all WAL files collectively, or on a continuous basis, and copying the changed blocks from the database server to form an incremental backup. The incremental backup solution for PostgreSQL provided by other vendors is file based which, in simple terms, means that they will copy the entire data file even if there's only one small change in the file. Block level incremental backup will only copy the data blocks modified since the last full or incremental backup.

The EDB Postgres Backup and Recovery Tool incremental backup/restore solution consists of the following core modules:

1. WAL Scanner: The primary goal of the WAL scanner is to scan the WAL file and generate MBM (Modified Block Map) files. These files will be later processed by the EDB Postgres Backup and Recovery Tool Harvester. The WAL scanner can run in the background or the foreground. When run in the background, the EDB Postgres Backup and Recovery Tool scanner will spawn a process for every database server that is interested in taking an incremental backup. The EDB Postgres Backup and Recovery Tool scanner process will scan WAL files for every server and generate MBM files.

$ bart-scanner

INFO:  process created for server 'ppas-9.5', pid = 25356

INFO:  parsing WAL backlog

2. Harvester: The purpose of the EDB Postgres Backup and Recovery Tool Harvester module is to process the MBM files for an incremental backup and produce a CBM (Cumulative Block Map) that contains all of the MBM files for a given incremental backup. The Harvester then fetches the changed blocks from the database server using libpq API. The Harvester module is invoked when the user executed the command to take an incremental backup.

$ bart -c ~/bart.cfg backup -s ppas-9.5 -F p --parent 1473923937340

3. Restorer:  As the name suggests, the job of the Restorer module is to restore the incremental backup. As mentioned, every incremental backup has to have a parent full backup. It’s possible to take multiple incremental backups after a full backup but at the end of the chain there should be a full backup. The Restorer will restore the full backup and all the incremental backups in the chain. 

The EDB Postgres Backup and Recovery Tool commands below will invoke the Restorer; the first command is a complete restore and second is for point-in-time recovery with incremental backup:

$ bart -c ~/bart.cfg restore -s ppas-9.5 -i 1473924715211 -p /home/ec2-user/restore/ppas-9.5

$ bart -c ~/bart.cfg restore -s ppas-9.5 -i 1473478624481 -p /home/ec2-user/restore/ppas-9.5 -g '09-SEP-16 16:30:00'

When to Use Incremental Backup

A typical use case for incremental backup for a large database would be the following. A DBA takes a full backup on the weekend, which consumes lot of space and takes a lot of time. Each weekday, the DBA takes an incremental backup, which takes significantly less space and time as they only back up data modified since the last full or incremental backup.

It is also possible the DBA chooses not to perform a full backup every weekend; it can be done every two weeks or some other frequency. The benefit of taking the full backup every weekend is that in case of a having to do a restore on Friday, it will restore back to the last full backup from the prior weekend. If the full backup is multiple weeks apart, it will go all the way back to the last full backup.

The incremental backup and restore have the following major benefits:

  • Significant reduction in time required for a backup. Taking a full backup of a large database can take a great deal of time; taking an incremental backup since the last full or incremental backup reduces that time significantly.
  • Reduced storage requirements. Taking a full backup for a large database requires enormous storage space; an incremental backup will typically be much smaller as it’s only backing up the modified database blocks.
  • Faster recovery with incremental backup and PITR. PITR with incremental backup will be faster because there would be fewer WAL files to be replayed after incremental backup restore as compared to replaying all WAL after the full backup.

EnterpriseDB will make more information available as EDB Postgres Backup and Recovery v2.1 becomes available. But to learn more today, contact sales@enterprisedb.com

Ahsan Hadi is Senior Director, Product Development, at EnterpriseDB. 

Share this