Postgresql – Postgres scalability – What is the impact of connection pooling


We had a question over on Server Fault that raised an interesting question:
In light of the scalability improvements in Postgres 9.2 scalability improvements, is it better to use connection pooling mechanisms to avoid the overhead of making extra connections to the database, or is the connection overhead worth the improved read performance?

Relating it specifically to my environment: We have a web application that is database backed and read-centric, and we're currently running on Postgres 8.4.

Our re-implementation will be launching next year, along with an upgrade to 9.2, and gives each Apache worker process its own connection to the database (and thus its own Postgres backend which is preserved for the life of the Apache worker).
Based on what we've seen this seems to be a good balance between the overhead of connecting to the database and having more workers handling the read load, though we haven't yet done any substantial benchmarking of our own to confirm this.

Does that implementation seem like a reasonable one, and are there other options / connection pooling mechanisms which we should be considering in light of the recent scalability improvements?

Best Answer

I think you see a false dichotomy that does not exist.

It can be useful to have connection pooling in place even if you expect a 1:1 mapping of clients to back-ends. If your connections are long-lived, you won't benefit from reducing backend setup/teardown overhead, as it's small and amortized across a long period. A pool like PgBouncer may remain useful for other reasons:

  • Block until a connection is available rather than return an immediate max_connections exceeded error;
  • You can switch the pool target server if you fail-over to a standby without having to reconfigure applications;
  • You can limit application database workers to lower than max_connections, so you can still make reporting and maintenance connections as a non-superuser.

Additionally, if suitable for your application you can use transaction-level pooling to greatly increase the number of clients that can be served by your server.

I would not try to keep a strict 1:1 mapping from Apache workers to PostgreSQL workers, personally. If you've got (say) 16 cores and good I/O on your PostgreSQL box you might want something like 16-20 active PostgreSQL workers for optimal performance. You're almost certain to want more Apache workers than that, since they'll be kept busy by things like:

  • persistent HTTP connections from idle clients;
  • Unresponsive or very slow clients;
  • Intentional DoS connections;
  • Network interruptions between client and server; etc

If possible, consider a transaction-pooling design with short-lived transactions instead.