Back to the Future Pt. 2: How to use pg_rewind with PostgreSQL 9.5

September 28, 2016

backtothefuture_02

In the previous blog article we have seen how pg_rewind works with a simple HA cluster, composed of a master node replicating to a standby. In this context, an eventual switchover involves just two nodes that have to be aligned. But what happens with HA clusters when there are several (also cascading) standbys?

Now, consider a more complicated HA cluster, composed of a master with two standbys, based on PostgreSQL 9.5; similar to what has been made in the first blog article dedicated to pg_rewind, we now create a master node replicating to two standby instances. Let’s start with the master:

# Set PATH variable
export PATH=/usr/pgsql-9.5/bin:${PATH}

# This is the directory where we will be working on
# Feel free to change it and the rest of the script
# will adapt itself
WORKDIR=/var/lib/pgsql/9.5

# Environment variables for PGDATA and archive directories
MASTER_PGDATA=${WORKDIR}/master
STANDBY1_PGDATA=${WORKDIR}/standby1
STANDBY2_PGDATA=${WORKDIR}/standby2
ARCHIVE_DIR=${WORKDIR}/archive

# Initialise the cluster
initdb --data-checksums -D ${MASTER_PGDATA}

# Basic configuration of PostgreSQL
cat >> ${MASTER_PGDATA}/postgresql.conf <<EOF
archive_command = 'cp %p ${ARCHIVE_DIR}/%f'
archive_mode = on
wal_level = hot_standby
max_wal_senders = 10
min_wal_size = '32MB'
max_wal_size = '32MB'
hot_standby = on
wal_log_hints = on
EOF

cat >> ${MASTER_PGDATA}/pg_hba.conf <<EOF
# Trust local access for replication
# BE CAREFUL WHEN DOING THIS IN PRODUCTION
local replication replication trust
EOF

# Create the archive directory
mkdir -p ${ARCHIVE_DIR}

# Start the master
pg_ctl -D ${MASTER_PGDATA} -l ${WORKDIR}/master.log start

# Create the replication user
psql -c "CREATE USER replication WITH replication"

And now let’s proceed with the first standby server:

# Create the first standby
pg_basebackup -D ${STANDBY1_PGDATA} -R -c fast -U replication -x

cat >> ${STANDBY1_PGDATA}/postgresql.conf <<EOF
port = 5433
EOF

# Start the first standby
pg_ctl -D ${STANDBY1_PGDATA} -l ${WORKDIR}/standby1.log start

Likewise, we can now create the second standby:

# Create the second standby
pg_basebackup -D ${STANDBY2_PGDATA} -R -c fast -U replication -x

cat >> ${STANDBY2_PGDATA}/postgresql.conf <<EOF
port = 5434
EOF

# Start the second standby
pg_ctl -D ${STANDBY2_PGDATA} -l ${WORKDIR}/standby2.log start

Consider also in this case, a low amount of WALs are kept on the master side (note the setting of the max_wal_size parameter), but properly archived.

Now, let’s insert some data on the master: it will be visible on both (hot) standbys.

Next, promote one of the two standbys (for instance, the one based on ${STANDBY1_PGDATA}), leaving the master and the other standby nodes as they are:

pg_ctl -D ${STANDBY1_PGDATA} promote

If you update the master, no changes will be visible from the promoted standby, while they are visible from the other one; in the archive/ directory it is possible to see the file 00000002.history, that shows there has been a change in the timeline during the promotion, as seen in the previous case.

Now, let’s recreate the original HA cluster, where the first promoted standby will be the new master, and the other two nodes will be the new standbys. Using pg_rewind, the procedure is:

  1. stop the old master, and resync it with the promoted standby through pg_rewind
  2. stop the second standby, and resync it with the promoted standby through pg_rewind

So, start with step 1:

~$ pg_ctl -D ${MASTER_PGDATA} stop
waiting for server to shut down.... done
server stopped
~$ pg_rewind --target-pgdata=${MASTER_PGDATA} \
    --source-server="port=5433 user=postgres dbname=postgres"
servers diverged at WAL position 0/501E680 on timeline 1
could not open file "/var/lib/pgsql/9.5/master/pg_xlog/000000010000000000000005": No such file or directory

could not find previous WAL record at 0/501E680
Failure, exiting

Well, we missed the WALs! Yes, I’m redundant, but as I said before be sure that WALs are archived! Here, the procedure is to manually copy the missing WALs and put them into the pg_xlog/ of the master, then run pg_rewind again (but consider using Barman for this):

~$ cp ${ARCHIVE_DIR}/00000001000000000000000[56] ${MASTER_PGDATA}/pg_xlog/
~$ pg_rewind --target-pgdata=${MASTER_PGDATA} \
    --source-server="port=5433 user=postgres dbname=postgres"
servers diverged at WAL position 0/501E680 on timeline 1
rewinding from last common checkpoint at 0/501E5D8 on timeline 1
Done!

Fix the primary_conninfo parameter in the recovery.conf and the port parameter in the postgresql.conf, and the old master is now ready to follow up the promoted standby. Let’s also do the same with the second standby:

~$ pg_ctl -D ${STANDBY2_PGDATA} stop
waiting for server to shut down.... done
server stopped
~$ pg_rewind --target-pgdata=${STANDBY2_PGDATA} \
    --source-server="port=5433 user=postgres dbname=postgres"
could not find common ancestor of the source and target cluster's timelines
Failure, exiting

So, you see, the second standby has to be resynchronised from the promoted standby through a new base backup…

Conclusions

pg_rewind is really useful to resynchronise the nodes of a PostgreSQL HA cluster. For more complex infrastructures with several standby servers, it is not possible to resynchronise each node just with this tool.

However, even in this case, eventual downtime of standbys is reduced: a single standby node can be newly aligned to a promoted standby, and further standbys can be added gradually with a full base backup.

PostgreSQL 9.6 will introduce a really amazing new feature for pg_rewind: its horizon of visibility can be extended and a common ancestor of timelines of all the nodes can be found within an HA cluster… don’t miss Part 3 – dedicated to the latest news about pg_rewind in PostgreSQL 9.6!

Share this