This is part four of my technical response to a series of questions about the use of PgBouncer and what you need to look out for. To read the other posts in the series, use the links below.
Part 1: Installing, configuring and testing persistent PostgreSQL connection pooling
Part 2: PgBouncer Connection Pooling: What to do when persistent connectivity is lost
Part 3: Can PgBouncer handle failover to a new machine?
In this post we're going to try something a little more extreme than in the previous blogs. We’ll bring the Master database down completely (in a controlled fashion - we don’t want to break anything, just simulate a complete database outage) and see if the PgBouncer session can survive everything we throw at it!
For the purpose of demonstration, I’ll use EDB Failover Manager™ (EFM) to control failover and management of the Virtual IP (VIP) during this process. EFM detects and controls any deliberate or accidental failover between databases in streaming replication, and passes the VIP between the current Master database to allow the application a single IP address with which to access the cluster currently maintaining the single point of truth in the system.
You don’t need to use EFM; there are other options available with similar capabilities and with varying levels of support. For instance, pgpool-ii can provide connection pooling, replication and load balancing in addition to controlling the VIP. You can find information on the community edition of pgpool-ii (including watchdog VIP management) on this website. And, if you are looking for enterprise support, you can learn about the EDB version of pgpool-ii on the EnterpriseDB website.
Performing Controlled Switchover
I’ll start by performing a controlled switchover (or “promote” command) using EFM, however, later I’ll bring down the Master database completely to simulate database failure. The VIP in my setup is always set to the reserved address 192.168.15.200, and PgBouncer will always connect to this IP address. The database host holding the VIP is marked with an asterisk * to the right of the VIP column.
# ./efm promote efm -switchover
Promote/switchover command accepted by local agent. Proceeding with promotion and will reconfigure original master. Run the 'cluster-status' command for information about the new cluster state.
And checking the EFM cluster status:
# ./efm cluster-status efm
Cluster Status: efm
Agent Type Address Agent DB VIP
-----------------------------------------------------------------------
Master 192.168.15.6 UP UP 192.168.15.200*
Standby 192.168.15.8 UP UP 192.168.15.200
Witness 192.168.15.9 UP N/A 192.168.15.200
Allowed node host list:
192.168.15.9 192.168.15.6 192.168.15.8
Membership coordinator: 192.168.15.9
Standby priority host list:
192.168.15.8
Promote Status:
DB Type Address WAL Received LSN WAL Replayed LSN Info
---------------------------------------------------------------------------
Master 192.168.15.6 0/9000060
Standby 192.168.15.8 0/9000060 0/9000060
Standby database(s) in sync with master. It is safe to promote.
Bringing Down the Master Database
Let’s bring down the Master, wait for EFM to fail over to the Standby and see if PgBouncer can tell the difference. (A quick test showed that our PgBouncer connection is still active.)
On the Master:
# systemctl stop edb-as-11
And, once automatic failover has completed, check the cluster status:
# ./efm cluster-status efm
Cluster Status: efm
Agent Type Address Agent DB VIP
-----------------------------------------------------------------------
Idle 192.168.15.6 UP UNKNOWN 192.168.15.200
Master 192.168.15.8 UP UP 192.168.15.200*
Witness 192.168.15.9 UP N/A 192.168.15.200
Allowed node host list:
192.168.15.9 192.168.15.6 192.168.15.8
Membership coordinator: 192.168.15.9
Standby priority host list:
(List is empty.)
Promote Status:
DB Type Address WAL Received LSN WAL Replayed LSN Info
---------------------------------------------------------------------------
Master 192.168.15.8 0/A0001E0
No standby databases were found.
Idle Node Status (idle nodes ignored in WAL LSN comparisons):
Address WAL Received LSN WAL Replayed LSN Info
---------------------------------------------------------------
192.168.15.6 UNKNOWN UNKNOWN Connection to 192.168.15.6:5444 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
Failover to the New Master Database
The new Master has taken over, has taken control of the VIP, and there is a warning in the Node Status that the old Master is no longer accepting connections.
So what happens on PgBouncer? Still in the same session as before:
pgb=# select * from test;
x
---
1
2
3
4
(4 rows)
Which is great news. The PgBouncer session is persistent and has failed over onto the new Master without incident.
Once we reintroduce the failed Master back into the EFM cluster as a Standby, we get:
# ./efm cluster-status efm
Cluster Status: efm
Agent Type Address Agent DB VIP
-----------------------------------------------------------------------
Standby 192.168.15.6 UP UP 192.168.15.200
Master 192.168.15.8 UP UP 192.168.15.200*
Witness 192.168.15.9 UP N/A 192.168.15.200
Allowed node host list:
192.168.15.9 192.168.15.6 192.168.15.8
Membership coordinator: 192.168.15.9
Standby priority host list:
192.168.15.6
Promote Status:
DB Type Address WAL Received LSN WAL Replayed LSN Info
---------------------------------------------------------------------------
Master 192.168.15.8 0/A0001E0
Standby 192.168.15.6 null 0/A000098
One or more standby databases are not in sync with the master database. The following node(s) returned a WAL LSN that does not match the master: 192.168.15.6
Resyncing the Master Database and EFM
We now have a DBA task to bring the Master and EFM back into sync, but that’s to be expected.
So let’s do that and take everything back to the way it was before.
# ./efm cluster-status efm
Cluster Status: efm
Agent Type Address Agent DB VIP
-----------------------------------------------------------------------
Master 192.168.15.6 UP UP 192.168.15.200*
Standby 192.168.15.8 UP UP 192.168.15.200
Witness 192.168.15.9 UP N/A 192.168.15.200
Allowed node host list:
192.168.15.9 192.168.15.6 192.168.15.8
Membership coordinator: 192.168.15.9
Standby priority host list:
192.168.15.8
Promote Status:
DB Type Address WAL Received LSN WAL Replayed LSN Info
---------------------------------------------------------------------------
Master 192.168.15.6 0/11000060
Standby 192.168.15.8 0/11000060 0/11000060
Standby database(s) in sync with master. It is safe to promote.
Finally, what happens if there is a big transaction in flight, and the Master database goes down?
Back to pgbench. I’m dropping the “test” table and initialising the pgbench run, all from PgBouncer.
pgb=# drop table test;
DROP TABLE
pgb=# exit
$ /usr/pgsql-11/bin/pgbench --initialize --scale=10 -U pgb -h 127.0.0.1 -p 6543 pgb
Password:
dropping old tables...
creating tables...
generating data...
100000 of 1000000 tuples (10%) done (elapsed 0.02 s, remaining 0.14 s)
200000 of 1000000 tuples (20%) done (elapsed 0.04 s, remaining 0.18 s)
...
900000 of 1000000 tuples (90%) done (elapsed 1.69 s, remaining 0.19 s)
1000000 of 1000000 tuples (100%) done (elapsed 2.30 s, remaining 0.00 s)
vacuuming...
creating primary keys...
done.
So, a baseline test with ten users, each with a fresh connection per transaction, on this dataset:
$ /usr/pgsql-11/bin/pgbench --client=10 --connect -U pgb -h 127.0.0.1 -p 6543 pgb
Password:
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10
query mode: simple
number of clients: 10
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 100/100
latency average = 43.163 ms
tps = 231.679552 (including connections establishing)
tps = 234.195544 (excluding connections establishing)
Increasing Client Connections
But what if we increase the number of clients beyond the PgBouncer maximum connections? Well, we get a lot of these:
...
ERROR: no more connections allowed (max_client_conn)
client 997 aborted while establishing connection
connection to database "pgb" failed:
ERROR: no more connections allowed (max_client_conn)
client 998 aborted while establishing connection
connection to database "pgb" failed:
ERROR: no more connections allowed (max_client_conn)
client 999 aborted while establishing connection
transaction type: <builtin: TPC-B (sort of)>
...
number of transactions actually processed: 1000/10000
latency average = 2801.775 ms
tps = 356.916542 (including connections establishing)
tps = 356.993342 (excluding connections establishing)
What if we increase the PgBouncer maximum connections to 1000, while leaving the target database at 100?
;; Total number of clients that can connect
max_client_conn = 1000
The answer is, we hit a timeout. And get a lot of:
...
client 990 aborted in command 4 (SQL) of script 0; ERROR: query_wait_timeout
client 991 aborted in command 4 (SQL) of script 0; ERROR: query_wait_timeout
client 992 aborted in command 4 (SQL) of script 0; ERROR: query_wait_timeout
client 993 aborted in command 4 (SQL) of script 0; ERROR: query_wait_timeout
client 994 aborted in command 4 (SQL) of script 0; ERROR: query_wait_timeout
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10
query mode: simple
number of clients: 1000
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 250/10000
latency average = 484472.060 ms
tps = 2.064103 (including connections establishing)
tps = 2.066158 (excluding connections establishing)
So, while simply increasing the number of attached clients allows all of the clients to connect, PgBouncer still times out most of that connection activity, as there are simply too many clients. This behaviour can of course be controlled using the PgBouncer timeout configuration parameters. But, this may cause even more problems than it solves if the additional load is anything other than a brief spike in connections.
PgBouncer can, however, handle dialling down the number of connections to something it can process internally, while still handling more connections than the database itself will accept, like 200:
$ /usr/pgsql-11/bin/pgbench --client=200 --connect -U pgb -h 127.0.0.1 -p 6543 pgb
Password:
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10
query mode: simple
number of clients: 200
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 2000/2000
latency average = 452.066 ms
tps = 442.412851 (including connections establishing)
tps = 442.971877 (excluding connections establishing)
And, not only are we handling more clients (double!) than the database can accept at any given time, the throughput has almost doubled too.
Failover in the Middle of Job Processing
Let’s run a big batch job now, and fail over to a standby in the middle of it. A hundred million rows should do.
$ /usr/pgsql-11/bin/pgbench --initialize --scale=1000 -U pgb -h 127.0.0.1 -p 6543 pgb
Password:
dropping old tables...
creating tables...
generating data...
100000 of 100000000 tuples (0%) done (elapsed 0.01 s, remaining 14.52 s)
200000 of 100000000 tuples (0%) done (elapsed 0.05 s, remaining 26.93 s)
...
99900000 of 100000000 tuples (99%) done (elapsed 400.55 s, remaining 0.40 s)
100000000 of 100000000 tuples (100%) done (elapsed 400.97 s, remaining 0.00 s)
vacuuming...
creating primary keys...
done.
I’m going to make this more realistic by only using one client connection as if it were a “real” batch process.
Without a failover, running 10,000 transactions should take about 90 seconds:
$ /usr/pgsql-11/bin/pgbench --client=1 --transactions=10000 -U pgb -h 127.0.0.1 -p 6543 pgb
Password:
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 1000
query mode: simple
number of clients: 1
number of threads: 1
number of transactions per client: 10000
number of transactions actually processed: 10000/10000
latency average = 14.169 ms
tps = 70.576774 (including connections establishing)
tps = 70.577073 (excluding connections establishing)
And, taking the Master database down in the middle of processing:
$ /usr/pgsql-11/bin/pgbench --client=1 --transactions=10000 -U pgb -h 127.0.0.1 -p 6543 pgb
Password:
starting vacuum...end.
FATAL: terminating connection due to administrator command
ERROR: server conn crashed?
client 0 aborted in command 4 (SQL) of script 0; perhaps the backend died while processing
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 1000
query mode: simple
number of clients: 1
number of threads: 1
number of transactions per client: 10000
number of transactions actually processed: 356/10000
latency average = 14.251 ms
tps = 70.168951 (including connections establishing)
tps = 70.173780 (excluding connections establishing)
Not so good. So if the server dies during a batch job, all is lost. Which is, after all, the point of transaction management. That's the A for Atomicity in ACID…or, maybe not. Remember, we have our pooling mode set to transaction. What happens if we change it to a pooling mode of statement?
$ /usr/pgsql-11/bin/pgbench --client=1 -U pgb -h 127.0.0.1 -p 6543 pgb
Password:
starting vacuum...end.
client 0 aborted in command 4 (SQL) of script 0; ERROR: transaction blocks not allowed in statement pooling mode
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 1000
query mode: simple
number of clients: 1
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 0/10
We can’t use transactions in statement mode. However, if we run the built-in SELECT only script…
$ /usr/pgsql-11/bin/pgbench --client=1 --select-only -U pgb -h 127.0.0.1 -p 6543 pgb
Password:
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 1000
query mode: simple
number of clients: 1
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 10/10
latency average = 4.289 ms
tps = 233.178013 (including connections establishing)
tps = 235.642136 (excluding connections establishing)
No problem. And a decent throughput.
Removing the transactions, then, by using a custom script without them:
$ cat no_txn.sql
\set aid random(1, 100000 * :scale)
\set bid random(1, 1 * :scale)
\set tid random(1, 10 * :scale)
\set delta random(-5000, 5000)
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
This now works:
$ /usr/pgsql-11/bin/pgbench --client=1 -f no_txn.sql -U pgb -h 127.0.0.1 -p 6543 pgb
Password:
starting vacuum...end.
transaction type: no_txn.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 10/10
latency average = 34.313 ms
tps = 29.143190 (including connections establishing)
tps = 29.173952 (excluding connections establishing)
Putting all that together, scaling it up, and pulling the database out and failing over in the middle of processing:
$ /usr/pgsql-11/bin/pgbench --client=1 --transactions=10000 -f no_txn.sql -U pgb -h 127.0.0.1 -p 6543 pgb
Password:
starting vacuum...end.
client 0 aborted in command 8 (SQL) of script 0; FATAL: terminating connection due to administrator command
transaction type: no_txn.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
number of transactions per client: 10000
number of transactions actually processed: 122/10000
latency average = 32.393 ms
tps = 30.871287 (including connections establishing)
tps = 30.873907 (excluding connections establishing)
Not only did this not succeed overall, some, but not all of the statements did succeed individually! We obviously want to avoid this at all costs, so don’t do it!
What Did We Learn?
In summary:
- PgBouncer connections can survive failover between databases when pointing to the VIP
- PgBouncer provides connection pooling and more efficient processing during high periods of rapidly reconnecting sessions
- Levels of reconnection that are too high will cause PgBouncer to error out and cause processing errors
- PgBouncer connections work best in transaction pool mode during failover to preserve atomicity
- PgBouncer in transaction mode does not work effectively for long-running batch jobs during failover
- PgBouncer in session mode can fail over in batch jobs that do not contain transactions but will not process all commands and should not be used
In the next part of this series, we’ll evaluate whether PgBouncer can survive Transient Network Events.