“Sorry, too many clients already” - How to Use PgPool to Solve Database Connection Load Problems

April 16, 2020

With the ongoing crisis of COVID-19, I keep myself updated with the latest situation around the globe. The WHO’s website is one I’ve used frequently—and even there I began to intermittently see the error message “sorry, too many clients already“ in my browser. In seeing this, I realized how common it is to underestimate the connection count while we are busy configuring the application.

The above is a scenario we may all face today in our daily lives where a connection load problem may arise, and there are many things you can infer from the message. The goal of today’s post is to help you understand how this can be handled in a planned manner with the help of PgPool.

What Does the Error Mean?

Your developer tries to access the application and suddenly the error message pops up—but not for everyone!  The rest of their colleagues, already logged in, continue to have their connections succeed and their requests processed. Is the database not accepting connections any more, or is the problem somewhere else, perhaps with the application?  Why can the rest of your colleagues access it? This will certainly raise concern about what's happening.

You take a deeper look.  Application logs reveal that the application accepted the frontend request but it didn’t receive the result from the database. Coming to the backend, the database logs have specific error messages:

2020-04-14 01:15:10 PDT FATAL:  sorry, too many clients already
2020-04-14 01:15:11 PDT FATAL:  sorry, too many clients already

These messages confirm that the problem is at the database end. Let’s try to understand the problem first; specifically, why is it happening?

Simply put, this error appears when the number of clients connected to the database exceeds the concurrent limit max_connections defined in the postgresql.conf file. 

 

Understanding the Cause

This error message does indicate the database is the problem and its connection limit has exhausted, which is not wrong.  However, the problem is only a symptom. We could bandage this symptom by increasing the max_connections parameter and restarting the database, but this also means we would need to increase our hardware resources in proportion to the number of connections we increase. So, rather than immediately increasing max_connections, one should try to understand why so many connections are required. 

Well, pg_stat_activity output will help us understand the state of the connections, and whether the exhaustion of connections truly translates to a system under load or a pool of connections with varying levels of activity.  Depending on the nature of the application, we may observe connection status as either genuinely active, or idle, or idle in transaction.

Good application design doesn’t leave dle connections open. Why?  It’s similar to a restaurant.  The idea is to have active patrons.  Let's assume the restaurant is fully occupied, and all tables are seated. However, half of these tables are used by parties who are done with their meals; until a table is freed, no one else can be seated, and they have to refuse serving new patrons.  Connection pooling should serve active patrons, and is helped by an application that relinquishes their seats once they finish.  Failure to do so means we’ll  exhaust the database’s connection limit eventually and risk spuriously hitting the “sorry, too many clients already“ message.

Similarly, you have to ensure independent users, like those using psql, pgadmin or other clients, also act responsibly—their connections use the same overall pool, and will affect the risk of connection exhaustion.

Solving the Problem

As DBAs we rarely have the luxury of quickly changing application logic, so we cannot rely on code modification in order to address the timely close of application-based connections. Increasing max_connections is not an easy choice, and may not always be the right choice either, as it raises the cost by using an increased share of system resources.

So how do you fix it? If only there was a way for multiple client sessions to share a single connection to database… like connection pooling.

Where Does PgPool Fit In?

There are several proven connection poolers, with PgBouncer and PgPool being the most popular ones. For this article, while PgPool offers other features like load balancing and quorum management, we will look at it purely as a pooling mechanism. PgPool is a session-based connection pooler, which maintains already-established connections to the database by reusing them. 

When configured, PgPool will keep a small number of database sessions spawned and ready to serve application session requests. Applications connect to the PgPool port, which then assigns one of its pooled sessions to the client When the client closes its application session, PgPool puts its session back into the pool, ready to be reused.

The connection from the application to the database can be divided in two parts as shown below:

1>  Connection from the application to PgPool middleware
2>  Connection from PgPool to the database.

 

As we know, active connections are our good patrons—they are not the problem children we’re aiming to fix; the main focus is how to deal with the heavy tax that idle connections place on the database.  There should be some way in which idle connections can be either terminated or reused, right?

In the above block diagram, PID1 is the process id for the client application session to PgPool and PID2 is the process id for the session from PgPool to the database. 

 

Terminal Footprint

Let’s take a look at what we see in the terminal view.

Terminal View: Capturing the pid associated with the PgPool service

[root@localhost system]# ps -ef|grep Pgpool
enterpr+ 101495      1  1 12:31 ?    00:00:00 /usr/edb/Pgpool4.1/bin/Pgpool -D -f /etc/sysconfig/edb/Pgpool4.1/Pgpool.conf -n -F /etc/sysconfig/edb/Pgpool4.1/pcp.conf -a /etc/sysconfig/edb/Pgpool4.1/pool_hba.conf
enterpr+ 101498 101495  0 12:31 ?        00:00:00 Pgpool: wait for connection request
enterpr+ 101499 101495  0 12:31 ?        00:00:00 Pgpool: wait for connection request
enterpr+ 101500 101495  0 12:31 ?        00:00:00 Pgpool: PCP: wait for connection request
enterpr+ 101501 101495  0 12:31 ?        00:00:00 Pgpool: worker process
enterpr+ 101502 101495  0 12:31 ?        00:00:00 Pgpool: health check process(0)

Here, 101495 is the main process ID of PgPool, and rest are spawned by the PgPool main process. PIDs 101498 and 101499 are the child processes and any one of them can act as PID1 (referenced in the diagram), depending on which one of these is handling incoming client requests. 

Let’s make a connection to PgPool in order to connect to the database.

-bash-4.2$ ./psql -p 9999 -d edb -U enterprisedb
Password for user enterprisedb: 
psql.bin (9.6.2.7)
Type "help" for help.

edb=# select pg_backend_pid();
 pg_backend_pid 
----------------
         102142
(1 row)
edb=# select pg_sleep('60');

This output shows that the pid for the session from Pgpool to the database is 10214, and this can be verified using the following output captured at the OS terminal:

enterpr+ 101498 101495  0 12:31 ?        00:00:00 Pgpool: enterprisedb edb  SELECT

enterpr+ 102142  98651  0 12:32 ?        00:00:00 postgres: enterprisedb edb 192.168.0.103[35116] SELECT

101498 is PID1 and 102142 is PID2.

The processing of the SELECT pg_sleep('60') statement will execute for 1 minute and then the status of the session will turn idle as shown below:

enterpr+ 101498 101495  0 12:31 ?        00:00:00 Pgpool: enterprisedb edb   idle

enterpr+ 102142  98651  0 12:32 ?        00:00:00 postgres: enterprisedb edb 192.168.0.103[35116] idle

If we don’t disconnect the psql connection, session 101498 will remain idle, and intervention will be required for others to use this connection.  The magic is in proper use of the PgPool parameter client_idle_limit.  Together with the connection_life_time parameter, we can control this problem while leaving the database configuration untouched:

  • Client_idle_limit is a configurable integer value which allows you to define the number of seconds a session can be idle before it is forcibly returned to the available pool
  • Connection_life_time is a configurable integer value which can be set to define, in seconds, how long a session’s connection to the database is cached.  The default value is 0, which actually means that cached connections are never terminated.

These two parameters help you manage available connections to the database and promotes responsible use of the sessions which connect to PgPool’s middleware layer.  This results in balancing the overall consumption of database connections with the idle thresholds of sessions, which is key to a healthy environment.

These parameters will address the existing connections, but we still have to ensure that access to the pool is managed better than an outright rejection, and this is where num_init_children comes into play.

  • Num_init_children is an integer value which allows you to define the number of preforked PgPool processes available to serve incoming connections.  The default value is 32.  

Num_init_children will ensure that each attempt, up to your maximum number of preforked server processes, will be placed in a queue without outright rejecting it.  Num_init_children should be configured based on the formula below:

max_pool*num_init_children <= (max_connections - superuser_reserved_connections)

A more detailed view of how to modify these parameters can be found in the Pgpool-II documentation here.

By making use of Pgpool’s capabilities, you now have the ability to manage not just inbound sessions to the database and their existing connections, but also have the added capability to leverage new sessions for established connections.

There are many other ways you can make use of Pgpool, and even ways you can make use of Pgpool and pgbouncer together to optimize your needs for both balancing and pooling—but these are improvements for another day.  Today, you’ve managed to fix not just this one developer’s woes, but the behavior of the database to ensure that the user connection load is handled more efficiently. 

Join Postgres Pulse Live!

As always, this example with pool and connection load was based on a real problem we’ve helped our customers solve time and time again—and one you yourself might have faced. Have problems or questions of your own?

Join us on Monday, April 20th, for our next Pulse Live Session!  We’ll continue to talk through some of the nuances mentioned above—and we encourage you to bring any questions you have to the conversation. Feel free to send them to postgrespulse@enterprisedb.com ahead of time. 

You can also check out our past Postgres Pulse Live sessions on YouTube

 

Share this