Introduction
Patroni is an open-source tool for managing replication and failover of a PostgreSQL cluster. It is popular to use while implementing high-availability due to its handling of replication and fast failover and switchover times. To understand these times better, we benchmarked the failover and switchover performance within a Patroni cluster.
To learn more about Patroni: https://www.enterprisedb.com/docs/supported-open-source/patroni/.
Benchmark Specifications
The purpose of this benchmark is to measure the downtime of a Patroni cluster during two different scenarios:
- Switchover - one standby is gracefully promoted to leader
- Failover - one standby is promoted to leader following a Postgres crash on the leader.
In each case, downtime is measured with traffic generated by pgbench.
Environment
The Patroni cluster is composed of:
- 3 PostgreSQL nodes, all located in the same region, each with ETCD DCS on the system
- 1 PgBouncer pooler node configured with HAProxy
System Characteristics
The characteristics of the cloud hosts involved in the test are as follows:
AWS Instance Type | r5b.2xlarge |
Operating System | Rocky8 |
vCPU | 8 |
Memory | 64GB |
Postgres Configuration
The following are configuration parameters found in the postgresql.conf file:
shared_buffers | 16GB |
checkpoint_completion_target | 0.93 |
checkpoint_timeout | 30min |
max_wal_size | 200GB |
effective_cache_size | 80GB |
random_page_cost | 1 |
seq_page_cost | 1 |
effective_io_concurrency | 200 |
Patroni Configuration
The following are parameters found in each servers' respective Patroni configuration file:
ttl | 30 |
loop_wait | 10 |
retry_timeout | 10 |
maximum_lag_on_failover | 1048576 |
Downtime Measurement
In this benchmark, downtime is defined as the time during which the database system is not able to handle incoming SQL requests. Incoming requests are made through one HAProxy connection.
Downtime is measured with the help of a script connected to the pooler instance that is inserting records as fast as possible into a table.
Each record contains:
- the current timestamp returned by the now() Postgres function;
- the name of the current Patroni node actually making record insertion.
In case of connection loss, the script retries connecting until a new connection can be made.
With this, we can find:
- The timestamp of the last record inserted before the switchover or failover happened;
- The timestamp of the first record inserted after the switchover or failover happened.
Measurement Table
The following is the DDL used to create the ping table in charge of gathering timestamps:
CREATE TABLE ping(node TEXT, timestamp TIMESTAMPTZ DEFAULT NOW(), cur_lsn pg_lsn);
PgBouncer Settings
PgBouncer was run in transaction session mode.
HAProxy Settings
The following is the HAProxy configuration file:
defaults
retries 2
timeout connect 1s
timeout check 1s
timeout server 30m
timeout client 30m
listen postgres
stick-table type ip size 1
stick on dst
option httpchk OPTIONS /primary
http-check expect status 200
default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
Results
Switchover
Patroni switchover operation is triggered by the execution of the following command on the current leader:
$ patronictl switchover --candidate postgres2 --force
The following are the statistical results of ten switchover tests:
Min | 15.049ms |
Max | 15.354ms |
Average | 15.199ms |
Std. Dev. | 0.0989ms |
Failover
Patroni failover operation is triggered by the execution of the following command on the current leader, where user postgres owns both PostgreSQL and Patroni services:
$ sudo pkill -u postgres
The following are the statistical results of ten failover tests:
Min | 15.037ms |
Max | 15.347ms |
Average | 15.231ms |
Std. Dev. | 0.1215ms |
Patroni Timeline
With each execution of either a failover or a switchover, Patroni moves its timeline forward by one.
The following shows the Patroni cluster information with a switchover occurring:
# Before switchover happens
# postgres1 is leader with replicas postgres2 and postgres3
Current cluster topology
+ Cluster: main -------------------------------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-----------+------------+---------+-----------+----+-----------+
| postgres1 | xx.x.x.xxx | Leader | running | 22 | |
| postgres2 | xx.x.x.xxx | Replica | streaming | 22 | 0 |
| postgres3 | xx.x.x.xxx | Replica | streaming | 22 | 0 |
+-----------+------------+---------+-----------+----+-----------+
# Switchover occurs, postgres1 is stopped and postgres2 is new leader
+ Cluster: main -----------------------------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-----------+------------+---------+---------+----+-----------+
| postgres1 | xx.x.x.xxx | Replica | stopped | | unknown |
| postgres2 | xx.x.x.xxx | Leader | running | 22 | |
| postgres3 | xx.x.x.xxx | Replica | running | 22 | 0 |
+-----------+------------+---------+---------+----+-----------+
# postgres1 is now replica caught up with cluster
# timeline moved forward from 22 to 23
+ Cluster: main -------------------------------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-----------+------------+---------+-----------+----+-----------+
| postgres1 | xx.x.x.xxx | Replica | streaming | 23 | 0 |
| postgres2 | xx.x.x.xxx | Leader | running | 23 | |
| postgres3 | xx.x.x.xxx | Replica | streaming | 23 | 0 |
+-----------+------------+---------+-----------+----+-----------+
Conclusion
Both the failover and switchover times in a Patroni cluster are consistent and quick. With pooler and proxy setup, downtime can be fewer than 1 second. Using a proxy is recommended so that your cluster has a single endpoint to connect to. To improve durability, setting up a standalone ETCD server as a part of the ETCD cluster to ensure the DCS is maintained during system crashes is recommended.