(This is part two of my technical response to a series of questions about the use of pgbouncer and what you need to look out for. Part one can be found here)
So, in Part One of this blog we completed the installation of PgBouncer, configuration and a brief network timeout using PgBouncer with no problem at all. But what happens with a minute’s downtime, and what if we’re in the middle of a transaction while it happens? Does PgBouncer maintain connectivity, without error, even when the connection to the database is catastrophically lost?
On our master database, we now start a session and issue a command.
pgb=# select * from test;
x
---
1
2
3
4
(4 rows)
On the target database box we take the NIC (network interface) down for a minute, and issue another command while connectivity is lost:
# ifdown eth0 && sleep 60 && ifup eth0
Meanwhile, back in PgBouncer:
pgb=# select * from test;
...Which now hangs, however, when the database NIC is back up, psql returns...
x
---
1
2
3
4
(4 rows)
Looks good! The SELECT is held in flight during the minute long network outage, and simply reconnects and completes when it can.
Now for something a little more advanced, let’s restart the database on the remote server, and see what happens.
pgb=# select * from test;
x
---
1
2
3
4
(4 rows)
pgb=# select * from test;
FATAL: terminating connection due to administrator command
ERROR: server conn crashed?
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
pgb=# select * from test;
x
---
1
2
3
4
(4 rows)
A partial success. Our psql session connecting to PgBouncer was persistent, but we get an error message telling us that the server connection has been reset. However, immediately re-issuing the statement we get the result we want without having to reconnect to PgBouncer.
So, how do we avoid getting the error message about the server connection reset, and having to re-issue our statement?
Well, the simple answer is we change the 'pool_mode' = 'transaction'.
This is taken from the official pgbouncer documentation (slightly amended)
server_reset_query_always
Whether server_reset_query should be run in all pooling modes. When this setting is off (default), the server_reset_query will be run only in pools that are in sessions-pooling mode. Connections in transaction-pooling mode should not have any need for reset query.
It is a workaround for broken setups that run apps that use session features over transaction-pooled PgBouncer. It changes non-deterministic breakage to deterministic breakage - client always lose their state after each transaction.
Default: 0
So, this parameter would be a workaround for our problem while in session mode. But, do we want to hack a fix or do the right thing? Of course, we want to do the right thing, so we make our pool_mode (at a minimum) transaction level, which in turn should make our sessions persistent.
Therefore, rather than ‘fix’ our broken PgBouncer, we will do the sensible thing and move to transaction-level pooling:
$ cat /etc/pgbouncer/pgbouncer.ini | grep pool_mode
pool_mode = transaction
...And restart PgBouncer.
And now, on the PgBouncer side, we reconnect and…
$ psql -U pgb -h 127.0.0.1 -p 6543 pgb
Password for user pgb:
Type "help" for help.
pgb=# select * from test;
x
---
1
2
3
4
(4 rows)
...And, on the database server:
# systemctl restart edb-as-11.service
...And then, back on the PgBouncer session again:
pgb=# select * from test;
x
---
1
2
3
4
(4 rows)
Success, once more. Next, we will go on to look at what to do if you have a complete failover.