High-Availability Solutions: Patroni Failover and Switchover

November 06, 2023

Introduction

Patroni is an open-source tool for managing replication and failover of a PostgreSQL cluster. It is popular to use while implementing high-availability due to its handling of replication and fast failover and switchover times. To understand these times better, we benchmarked the failover and switchover performance within a Patroni cluster.

To learn more about Patroni: https://www.enterprisedb.com/docs/supported-open-source/patroni/.

Benchmark Specifications

The purpose of this benchmark is to measure the downtime of a Patroni cluster during two different scenarios:

  1. Switchover - one standby is gracefully promoted to leader
  2. Failover - one standby is promoted to leader following a Postgres crash on the leader.

In each case, downtime is measured with traffic generated by pgbench.

Environment

The Patroni cluster is composed of:

  • 3 PostgreSQL nodes, all located in the same region, each with ETCD DCS on the system
  • 1 PgBouncer pooler node configured with HAProxy

System Characteristics

The characteristics of the cloud hosts involved in the test are as follows:

AWS Instance Type r5b.2xlarge
Operating System Rocky8
vCPU 8
Memory 64GB

Postgres Configuration

The following are configuration parameters found in the postgresql.conf file:

shared_buffers 16GB
checkpoint_completion_target 0.93
checkpoint_timeout 30min
max_wal_size 200GB
effective_cache_size 80GB
random_page_cost 1
seq_page_cost 1
effective_io_concurrency 200

Patroni Configuration

The following are parameters found in each servers' respective Patroni configuration file:

ttl 30
loop_wait 10
retry_timeout 10
maximum_lag_on_failover 1048576

Downtime Measurement

In this benchmark, downtime is defined as the time during which the database system is not able to handle incoming SQL requests. Incoming requests are made through one HAProxy connection.

Downtime is measured with the help of a script connected to the pooler instance that is inserting records as fast as possible into a table.

Each record contains:

  • the current timestamp returned by the now() Postgres function;
  • the name of the current Patroni node actually making record insertion.

In case of connection loss, the script retries connecting until a new connection can be made.

With this, we can find:

  • The timestamp of the last record inserted before the switchover or failover happened;
  • The timestamp of the first record inserted after the switchover or failover happened.

Measurement Table

The following is the DDL used to create the ping table in charge of gathering timestamps:

CREATE TABLE ping(node TEXT, timestamp TIMESTAMPTZ DEFAULT NOW(), cur_lsn pg_lsn);

PgBouncer Settings

PgBouncer was run in transaction session mode.

HAProxy Settings

The following is the HAProxy configuration file:

defaults
   retries 2
   timeout connect 1s
   timeout check 1s
   timeout server 30m
   timeout client 30m

listen postgres
   stick-table type ip size 1
   stick on dst
   option httpchk OPTIONS /primary
   http-check expect status 200
   default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions

Results

Switchover

Patroni switchover operation is triggered by the execution of the following command on the current leader:

$ patronictl switchover --candidate postgres2 --force

The following are the statistical results of ten switchover tests:

Min 15.049ms
Max 15.354ms
Average 15.199ms
Std. Dev.  0.0989ms

Failover

Patroni failover operation is triggered by the execution of the following command on the current leader, where user postgres owns both PostgreSQL and Patroni services:

$ sudo pkill -u postgres

The following are the statistical results of ten failover tests:

Min 15.037ms
Max 15.347ms
Average 15.231ms
Std. Dev.   0.1215ms

Patroni Timeline

With each execution of either a failover or a switchover, Patroni moves its timeline forward by one.

The following shows the Patroni cluster information with a switchover occurring:

# Before switchover happens
# postgres1 is leader with replicas postgres2 and postgres3
Current cluster topology
+ Cluster: main -------------------------------+----+-----------+
| Member    | Host       | Role    | State     | TL | Lag in MB |
+-----------+------------+---------+-----------+----+-----------+
| postgres1 | xx.x.x.xxx | Leader  | running   | 22 |           |
| postgres2 | xx.x.x.xxx | Replica | streaming | 22 |         0 |
| postgres3 | xx.x.x.xxx | Replica | streaming | 22 |         0 |
+-----------+------------+---------+-----------+----+-----------+

# Switchover occurs, postgres1 is stopped and postgres2 is new leader
+ Cluster: main -----------------------------+----+-----------+
| Member    | Host       | Role    | State   | TL | Lag in MB |
+-----------+------------+---------+---------+----+-----------+
| postgres1 | xx.x.x.xxx | Replica | stopped |    |   unknown |
| postgres2 | xx.x.x.xxx | Leader  | running | 22 |           |
| postgres3 | xx.x.x.xxx | Replica | running | 22 |         0 |
+-----------+------------+---------+---------+----+-----------+

# postgres1 is now replica caught up with cluster
# timeline moved forward from 22 to 23
+ Cluster: main -------------------------------+----+-----------+
| Member    | Host       | Role    | State     | TL | Lag in MB |
+-----------+------------+---------+-----------+----+-----------+
| postgres1 | xx.x.x.xxx | Replica | streaming | 23 |         0 |
| postgres2 | xx.x.x.xxx | Leader  | running   | 23 |           |
| postgres3 | xx.x.x.xxx | Replica | streaming | 23 |         0 |
+-----------+------------+---------+-----------+----+-----------+

Conclusion

Both the failover and switchover times in a Patroni cluster are consistent and quick. With pooler and proxy setup, downtime can be fewer than 1 second. Using a proxy is recommended so that your cluster has a single endpoint to connect to. To improve durability, setting up a standalone ETCD server as a part of the ETCD cluster to ensure the DCS is maintained during system crashes is recommended.

Share this