The Cost Implications of PostgreSQL Synchronous Replication

August 12, 2024

Synchronous replication in Postgres is expensive. Learn about its costs, recovery point objective, client database performance, and database restore challenges.

Synchronous replication in Postgres is expensive. Very expensive. For any transaction to be completed in a synchronous context, it must be acknowledged by at least one allocated replica system. Depending on the latency between these systems, the associated overhead costs are almost tangible. And each subsequent required response only increases the burden.

For a company with an extremely low recovery point objective (RPO) – the point at which data must not be lost – this introduces a somewhat perplexing conundrum. Do we incur the merciless wrath of physics, or widen our window of acceptable data loss restrictions? Depending on the industry and various regulatory requirements, the latter may not even be legal, so what to do?

What if we’ve been framing this entire argument all wrong all these years?

CPU Workload Quirks in Database Replication

Usually, one of the first complaints against synchronous replication is due to the overhead associated with remote transaction acknowledgment. Benchmarks will bear this out, with a lower write transaction throughput from the same number of clients. Discussion over, right?

What are our server CPUs doing while waiting for some other semi-related task to complete? In a perfect world, they’ve been task-switched to another equally important inquiry and will resume when the blocking system call has been completed. CPUs on servers dedicated to Postgres are likely to handle another Postgres session in this scenario, which can be either good or bad, depending on the nature of the blocking call.

Imagine ten sessions, each simultaneously invoking a significant storage IO-intensive query. If we have ten CPUs, we’ll end up with ten idle CPUs waiting for the storage subsystem to return with the requested data. What if we only had four CPUs instead? Aside from some relatively inexpensive context switching, those four CPUs still await the same IO calls to return. Depending on the size of the resulting data sets, four CPUs may be entirely sufficient for that workload.

This also applies to PostgreSQL synchronous replication, as it is just another external blocking call. Postgres commits a write to the WAL, that change enters the replication stream. Then it waits for some kind of acknowledgment from one or more replicas before returning control to the session. The CPU could be doing something else for the entire duration of that exchange.

So why not let it?

Preparing the Environment for PostgreSQL Synchronous Replication

We here at EDB openly wondered about this; unlike storage IO waits, transaction acks are non-destructive. After all, an overloaded storage subsystem only tends to become slower as the amount of requests increases. Provided there isn’t much replication lag, transaction acknowledgments impose far lower overhead on a Postgres replica. But we wanted to test that to ensure that it could change everything if we were right. Or at least the perception of synchronous replication in Postgres.

To that end, we spun up a Postgres cluster in Amazon EC2 within a single availability zone. Each server consisted of an r5.2xlarge, which is equipped with 8 CPUs and 64GB of RAM. We assigned two 10,000 IOPS io2 EBS devices to each – one for data and one for WAL. Then, we bootstrapped the cluster with pgbench at a scale of 10,000 for a total database size of about 150GB.

Regarding the synchronous configuration itself, we set synchronous_standby_names to the names of both standby systems like this:

synchronous_standby_names = '2 ("pg-node-2", "pg-node-3")'

This provided a “worst case” scenario where a widely distributed cluster required full synchronous commit for maximum RPO guarantees. This is a situation where a company would instead stop handling transactions entirely than lose one.

When everything was said and done, the cluster looked like this:

Wait, didn’t we say it was a single availability zone? Latency between Amazon availability zones and regions isn’t exactly reliable, and we wanted consistent metrics. So, we took advantage of a Linux utility called tc, a command-line tool that lets us manipulate the network traffic control system within the kernel.

Some Linux command incantations are more esoteric than others, and tc is particularly unintuitive. Still, we eventually applied this logic to each server in the cluster, using the first server as an example:

tc qdisc del dev ens5 root
tc qdisc add dev ens5 root handle 1: prio
tc qdisc add dev ens5 parent 1:3 handle 30: netem latency 5ms
tc filter add dev ens5 protocol ip parent 1:0 prio 3 u32 \
match ip dst ip.of.server.2 flowid 1:3
tc filter add dev ens5 protocol ip parent 1:0 prio 3 u32 \
match ip dst ip.of.server.3 flowid 1:3

That’s a lot of work to say, “If you’re talking to another Postgres server, add 5ms of latency.” The rule is also reciprocal, so the total latency is 10ms between any two Postgres servers, but the pgbench system is unaffected.

Armed with 10ms of latency between each of our synchronous Postgres servers, it was time to move on to the benchmarks.

Synchronous Write Scaling

Postgres has five different possible settings for the synchronous_commit configuration parameter. We will ignore ‘off’ because it should only be used in minimal circumstances. The remaining possibilities include:

  • local – no remote guarantee at all, just local write and flush
  • remote_write – transaction has been written to the remote node’s disk
  • on – transaction has been written and flushed to the remote node’s disk
  • remote_apply – transaction has been written, flushed, and applied on the remote node

We decided to test each of these to see how much overhead write guarantees impose as they become stricter. The only variable we changed for each benchmark was the number of clients.

Let’s take a look at how everything turned out:

Unsurprisingly, local commits were the fastest by quite a wide margin. However, consider what happens as the client count increases. At 40 clients, “remote_write” is only 60% as fast as “local”, but at 80 clients, it’s 84%, and the gap only closes as the client count increases. Additionally, note that the “remote_write” and “on” synchronous modes eclipse the 40-client “local” commits starting at 120 clients. Query latency follows a very similar trajectory.

At 40 clients, “remote_write” adds 67% more write latency over “local” transactions (6.5ms), but at 80 clients, the additional latency is only 19%, at around 3.3ms. Remember that our chosen hardware is only equipped with eight total CPU threads, so the accumulated latency is steadily increasing due to context switching. Another 5ms of additional latency added to 33ms is proportionally less overhead than adding 6.5 to 9.5.

Latency Scaling with Client Database

The story gets more interesting when we reduce the network latency to 3ms. We selected 10ms to represent a relatively large geographical distribution, such as three data centers separated by several hundred miles. Proximity is one thing that can vary immensely, and availability zones within most cloud services are probably much closer than 10ms. The lingering question of round-trip-times causing effectively exponential latency amplification.

The resulting benchmark results appear to substantiate these speculations:

As expected, synchronous transactions are slower than asynchronous, but the gap has closed considerably. Not only is “remote_write” operating at 92% of “local” at 40 clients, but it reaches parity at 80 clients rather than 120. Even “on” and “remote_apply” are within 7% of “local” performance at 80 clients and reach near parity within 1% at 120 and even measure slightly higher at 160.

Given how much pgbench results can vary between runs, we attribute the faster performance of synchronous replication at 160 clients to statistical noise. It’s interesting to see that 3ms of network latency is essentially reduced to statistical noise on a generally overloaded server, but 10ms maintains a steady gap after a certain point.

The query latency graph is equally revealing:

The results are very tightly grouped throughout all tests, where all results are within 1ms of each other. The server is so busy handling client requests that any latency introduced by synchronous replication has become background noise. If this is true of 3ms network latency, lower amounts would be equally difficult to distinguish from asynchronous performance.

Discussing Overhead

Consider that we constructed a truly awful proof-of-concept here. We constrained ourselves to enforce fully synchronous writes across three servers with 10ms of latency between all network hops. Usually, such tight restrictions wouldn’t be imposed over such a wide geographical area, but as we’ve seen here, it could be.

Even if geographical distribution isn’t a concern, it’s prevalent to recommend synchronous replicas remain as close as physically possible to the Primary node to avoid incurring excessive commit latency. This can mean within the same rack, the same virtual host, the same availability zone, and so on. Perhaps those restrictions may be loosened somewhat, given our findings here.

However, this does all come with a certain caveat. Low client counts that don’t overwhelm the hardware will still perform optimally under asynchronous operation. This does imply it’s possible to under-provision hardware that’s meant to be used in a synchronous Postgres cluster and still achieve high transaction throughput.

There’s also a break-even point where saturating the hardware obfuscates performance behavior. Consider our 10ms latency test at 80 clients, where we reached peak transaction throughput on this hardware. We observed 17ms of latency for local commits at that client count and 20ms for remote writes. How significant is this 3ms?

There’s still discussions to be had if we also consider the “remote_apply” and “on” synchronous settings. They impose additional latency, and some may argue that “remote_write” is insufficient because it does not require a disk flush on the replicas. Higher connection counts narrow this gap at high network latencies but do not erase it.

Additionally, we’re very clearly overloading these servers. Higher transaction latencies help absorb network latency effects, but only because all queries generally operate with more efficiency. At 40 clients at 10ms network latency, a local commit happens at 9.5ms and a remote one at 16ms. Yes, this gap shrinks at 80 clients, but at this point, the local server is so backlogged that all transactions incur a 17ms penalty. We can only do this kind of client overloading if the application in question is not particularly latency sensitive.

In the end, perhaps the question becomes: what’s 5-10ms between friends? An idle CPU is a wasted CPU, and technologies like client database pools make it possible to multiplex client connections to specific counts at the database level. If tests show a particular piece of hardware operates best at 80 client connections, keeping CPUs saturated but not so overwhelmed that latency rises to compensate, we can make that happen.

As we’ve seen here, it’s possible to factor in network latency as part of synchronous replication, and if we do it right, nobody will ever know.

Share this
What is synchronous replication? chevron_right

Synchronous replication is the process of copying data over a network to create multiple current copies of the data. Synchronous replication is mainly used for high-end transactional applications that need instant failover if the primary node fails.

What is the primary objective of synchronous replication? chevron_right

Synchronous replication provides protection from array or site failures with no data loss. It provides a zero-recovery point objective, meaning that there is no data loss in the event of an array or site failure.

What is PostgreSQL used for? chevron_right

PostgreSQL is an object-relational database management system (ORDMBS), which means that it has relational capabilities and an object-oriented design. Using the object-oriented features of PostgreSQL, programmers can communicate with the database servers using objects in their code and define complex custom data types.

What is PostgreSQL best for? chevron_right

Postgres has powerful built-in support for advanced analytics and data warehousing, making it well-suited for use cases such as data mining, business intelligence, and reporting.

What are the different modes of PostgreSQL Replication? chevron_right

PostgreSQL replication primarily operates in two modes: Synchronous and Asynchronous. Synchronous replication involves writing data simultaneously to both the primary and secondary servers, while Asynchronous replication first writes data to the host, before copying it to the secondary server. When configuring replication settings, users should weigh the trade-offs between safety and performance associated with each mode.

Synchronous Mode of Replication: Transactions on the master database are only considered complete when changes have been replicated to all replicas. This method minimizes the risk of data loss but increases response time for write transactions due to the need to wait for confirmations. Read-only transactions are unaffected. Synchronous replication is typically employed in high-demand, failover-sensitive environments.

Asynchronous Mode of Replication: In this mode, transactions on the master server are marked as complete before being replicated to replicas, resulting in a replication lag. While there is a risk of potential data loss, the delay in committing changes is generally small if the standby server can handle the load. Asynchronous replication is the default mode and offers lower overhead, making it suitable for most scenarios, although failover from primary to secondary may take longer compared to synchronous replication.

What is the difference between synchronous and asynchronous replication in Postgres? chevron_right

There are mainly two modes of PostgreSQL replication: synchronous and asynchronous. Synchronous replication allows data to be written to both the primary and secondary server at the same time, whereas asynchronous replication ensures that the data is first written to the host and then copied to the secondary server.

In synchronous mode replication, transactions on the primary database are considered complete only when those changes have been replicated to all the replicas. The replica servers must all be available all the time for the transactions to be completed on the primary. The synchronous mode of replication is used in high-end transactional environments with immediate failover requirements.

In asynchronous mode, transactions on the primary server can be declared complete when the changes have been done on just the primary server. These changes are then replicated in the replicas later in time. The replica servers can remain out-of-sync for a certain duration, called a replication lag. In the case of a crash, data loss may occur, but the overhead provided by asynchronous replication is small, so it’s acceptable in most cases (it doesn’t overburden the host). Failover from the primary database to the secondary database takes longer than synchronous replication.

What are the advantages of synchronous replication? chevron_right

The advantage of synchronous replication is that it ensures both the local and remote FC disk arrays are updated before the local host system continues the next SCSI write. The disadvantage of synchronous replication is that the host must wait for the remote write to complete. If the remote array is distant, or connected through a high latency link, this will adversely impact performance.

Concisely, in synchronous replication mode:

  • Data must be written to both arrays before an I/O operation is done.
  • Data in both arrays is always fully synchronized and updated.
  • Distance between sites is a direct factor to consider in terms of application performance as latency increases with distance.
What is the difference between data replication and data synchronization? chevron_right

Data synchronization and data replication are related but distinct concepts. Data synchronization ensures that data stored in multiple locations is consistent and up to date. This typically involves exchanging real-time updates between devices or systems so that all devices work with the most current data version. Data replication, however, involves creating copies of data and storing them in multiple locations. Data synchronization can be achieved using data replication, but data replication does not necessarily involve data synchronization.

What are the best practices for replication in PostgreSQL? chevron_right

Best practices include using synchronous replication for critical data, using connection pooling, implementing failover strategies, and doing regular backups.

What are the benefits of using PostgreSQL replication? chevron_right

Here are a few key benefits of leveraging PostgreSQL replication:

  • Data migration: You can leverage PostgreSQL replication for data migration either through a change of database server hardware or through system deployment.
  • Fault tolerance: If the primary server fails, the standby server can act as a server because the contained data for both primary and standby servers is the same.
  • Online transactional processing (OLTP) performance: You can improve the transaction processing time and query time of an OLTP system by removing the reporting query load. Transaction processing time is the duration it takes for a given query to be executed before a transaction is finished.
  • System testing in parallel: While upgrading a new system, you need to make sure that the system fares well with existing data, hence the need to test with a production database copy before deployment.