Achieving Production-Grade Postgres Replication for High Availability

October 07, 2024

Understanding replication: The key to scalable and reliable Postgres clusters

You’re looking for production-grade Postgres. You need to keep multiple copies of your data around for high availability. And this data must stay consistent across the copies. You need to handle rolling upgrades of the cluster. You need to handle taking a node out of the cluster for maintenance. You need to handle DDL changes across nodes in the cluster. And we might be getting ahead of ourselves, but you may also need low-latency read replicas or even need to geo-distribute your data while adhering to data governance policies.

It’s true you can’t do these tasks in vanilla Postgres. Nor even with some common high availability (HA) tools. To understand why though, we need to talk about the two different ways you can do replication in Postgres: physical and logical.

Let's tackle physical replication first. Let’s say we have two Postgres nodes. An "upstream" node that directly receives queries from a user, and a "downstream" node that subscribes to the upstream node for changes. Physical replication takes bytes from disk on the upstream node and copies them over the network to the downstream node. The downstream node then writes the bytes back to disk verbatim.

Physical replication happens at such a low level that neither node has any idea what it is actually copying in terms of data it originally received from a user. So if the upstream node and the downstream node are running different versions of Postgres, there’s no guarantee the downstream node can make any sense of the physical data the upstream node sent.

Many commonly used HA tools are based on physical replication. So any requirements like rolling upgrades (e.g. temporarily replicating from a Postgres 12 node to a Postgres 13 node while upgrading the cluster) automatically rule these tools out.

What about logical replication? Unlike physical replication, logical replication retains awareness of user data. Instead of replicating opaque bytes from an upstream node to a downstream node, it replicates logical changes (inserts, updates, and deletes) from upstream to downstream node. Because it operates at this higher level, logical replication actually allows for a cluster running mixed versions of Postgres.

But logical replication in Postgres has some big limitations. The biggest one to me is that you can’t replicate DDL. This limitation is especially challenging since DDL in Postgres is transactional! We can create a table and then roll back that change. But not easily with logical replication in vanilla Postgres.

My coworker Peter has a great post on the history of replication if you want to learn more.

EDB Postgres Distributed

So, we’ve established that these production-grade Postgres requirements are hard. But I want to show you how we can do all of them pretty easily with EDB Postgres Distributed (PGD). And in particular, one of the coolest things PGD builds on top of vanilla Postgres logical replication is the ability to replicate transactional DDL in addition to DML. So on a single machine we’ll build a three-node Postgres cluster with PGD, running three different versions of Postgres. And we’ll see DDL and DML replicating between these nodes.

While we discourage you from running different versions of Postgres permanently, PGD was explicitly designed to support rolling upgrades. Running a mixed version cluster at all is an interesting feat, so we'll go ahead anyway in this post.

I’m going to show you the manual way to set everything up, so nothing feels hidden. But I think you’ll see that this is pretty easily scriptable.

While we support other Debian-based and RHEL-based Linux distros, the instructions in this post will assume you're on Debian 12. So grab an amd64 machine running Debian 12 and let's go!

Grab Postgres 14, 15, and 16

First we follow the postgresql.org instructions for configuring the package repository.

# Import the repository signing key:
sudo apt install curl ca-certificates
sudo install -d /usr/share/postgresql-common/pgdg
sudo curl -o /usr/share/postgresql-common/pgdg/apt.postgresql.org.asc --fail https://www.postgresql.org/media/keys/ACCC4CF8.asc
  
# Create the repository configuration file:
sudo sh -c 'echo "deb [signed-by=/usr/share/postgresql-common/pgdg/apt.postgresql.org.asc] https://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
  
# Update the package lists:
sudo apt update

Now we can install the three versions of Postgres.

sudo apt install -y postgresql-16 postgresql-client-16
sudo apt install -y postgresql-15 postgresql-client-15
sudo apt install -y postgresql-14 postgresql-client-14

Grab PGD

Get your EDB repository token from the EDB Repos 2.0 page. If you don't already have an EDB account, you can sign up for free to get a trial subscription to EDB's software repositories.

With that token, set an environment variable:

export EDB_SUBSCRIPTION_TOKEN=<your-repo-token>

And use our helper script to set up package repositories:

curl -1sLf "https://downloads.enterprisedb.com/$EDB_SUBSCRIPTION_TOKEN/postgres_distributed/setup.deb.sh" | sudo -E bash

Now install PGD for each Postgres version:

sudo apt install -y edb-bdr5-pg14
sudo apt install -y edb-bdr5-pg15
sudo apt install -y edb-bdr5-pg16

Starting Up Three Postgres Instances

Now that we’ve got three Postgres versions installed locally, we can start creating databases and starting them. Before we start each database though, we need to edit the database’s postgresql.conf so that it loads the PGD extension, which is called bdr.

Creating the Postgres 14 Database

Although the Debian packages set up some directories and ran initdb, let's give ourselves a clean environment.

admin@localhost $ sudo su postgres
postgres@localhost $ rm -rf /var/lib/postgresql/14
postgres@localhost $ /usr/lib/postgresql/14/bin/initdb /var/lib/postgresql/pg14
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "C.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

creating directory /var/lib/postgresql/pg14 ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

/usr/lib/postgresql/14/bin/pg_ctl -D /var/lib/postgresql/pg14 -l logfile start

Configuring postgresql.conf for PGD

Now run the following to set up the database to run PGD.

postgres@localhost $ echo "
shared_preload_libraries = 'bdr'
wal_level = 'logical'
track_commit_timestamp = on

# For bigger clusters you may want to decrease shared buffers or the
# OS may complain you are using too much shared memory.
shared_buffers = 1MB

# These depend on the size of the cluster.
max_worker_processes = 12
max_wal_senders = 6
max_replication_slots = 6" | tee -a /var/lib/postgresql/pg14/postgresql.conf

Start Up Postgres 14 Database

And start it as suggested, but running on port 8014:

postgres@localhost $ /usr/lib/postgresql/14/bin/pg_ctl -o "-p 8014" -D /var/lib/postgresql/pg14 -l /tmp/logfile14 start
waiting for server to start.... done
server started

Creating the Postgres 15 Database

Now let’s create and start the Postgres 15 database.

postgres@localhost $ rm -rf /var/lib/postgresql/15
postgres@localhost $ /usr/lib/postgresql/15/bin/initdb /var/lib/postgresql/pg15
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "C.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

creating directory /var/lib/postgresql/pg15 ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

initdb: warning: enabling "trust" authentication for local connections
initdb: hint: You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

/usr/lib/postgresql/15/bin/pg_ctl -D /var/lib/postgresql/pg15 -l logfile start

Configuring postgresql.conf for PGD

Now set up the database to run PGD.

postgres@localhost $ echo "
shared_preload_libraries = 'bdr'
wal_level = 'logical'
track_commit_timestamp = on

# For bigger clusters you may want to decrease shared buffers or the
# OS may complain you are using too much shared memory.
shared_buffers = 1MB

# These depend on the size of the cluster.
max_worker_processes = 12
max_wal_senders = 6
max_replication_slots = 6" | tee -a /var/lib/postgresql/pg15/postgresql.conf

Start Up Postgres 15 Database

Now run the Postgres 15 database.

postgres@localhost $ /usr/lib/postgresql/15/bin/pg_ctl -o "-p 8015" -D /var/lib/postgresql/pg15 -l /tmp/logfile15 start
waiting for server to start.... done
server started

Creating the Postgres 16 Database

Finally, create the Postgres 16 database.

postgres@localhost $ rm -rf /var/lib/postgresql/16
postgres@localhost $ /usr/lib/postgresql/16/bin/initdb /var/lib/postgresql/pg16
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "C.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

creating directory /var/lib/postgresql/pg15 ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

initdb: warning: enabling "trust" authentication for local connections
initdb: hint: You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

/usr/lib/postgresql/16/bin/pg_ctl -D /var/lib/postgresql/pg16 -l logfile start

Configuring postgresql.conf for PGD

Once again run the following to set up the database to run PGD.

postgres@localhost $ echo "
shared_preload_libraries = 'bdr'
wal_level = 'logical'
track_commit_timestamp = on

# For bigger clusters you may want to decrease shared buffers or the
# OS may complain you are using too much shared memory.
shared_buffers = 1MB

# These depend on the size of the cluster.
max_worker_processes = 12
max_wal_senders = 6
max_replication_slots = 6" | tee -a /var/lib/postgresql/pg16/postgresql.conf

Start Up Postgres 16 Database

Finally, start the Postgres 16 database.

postgres@localhost $ /usr/lib/postgresql/16/bin/pg_ctl -o "-p 8016" -D /var/lib/postgresql/pg16 -l /tmp/logfile16 start
waiting for server to start.... done
server started

Turning Three Databases into a Cluster

We’ve set up three nodes and they are running the PGD extension. But they aren’t connected to each other. Let’s connect them!

First just to remind ourselves where we are, let’s query each running database for its version.

postgres@localhost $ /usr/lib/postgresql/14/bin/psql -p 8014 -c 'select version()'
                                                        version
-----------------------------------------------------------------------------------------------------------------------
  PostgreSQL 14.12 (Debian 14.12-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
(1 row)

postgres@localhost $ /usr/lib/postgresql/15/bin/psql -p 8015 -c 'select version()'
                                                        version
---------------------------------------------------------------------------------------------------------------------
  PostgreSQL 15.7 (Debian 15.7-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
(1 row)

postgres@localhost $ /usr/lib/postgresql/16/bin/psql -p 8016 -c 'select version()'
                                                        version
---------------------------------------------------------------------------------------------------------------------
  PostgreSQL 16.3 (Debian 16.3-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
(1 row)

Looks good.

Building the PGD Cluster

For each node in the cluster we must call CREATE EXTENSION bdr CASCADE;, then we need to call bdr.create_node on each node in the cluster to register it as a PGD node. Finally, we pick a single node on which to create a node group. And then on the other two nodes we give it details so it can join the node group we created.

So on the Postgres 14 server we’ll create a database in which we want to replicate tables, create the extension, create the PGD node, and create the PGD node group.

postgres@localhost $ /usr/lib/postgresql/14/bin/psql -p 8014 postgres -c "CREATE DATABASE pgdtest;"
CREATE DATABASE
postgres@localhost $ /usr/lib/postgresql/14/bin/psql -p 8014 pgdtest -c "CREATE EXTENSION bdr CASCADE;"
CREATE EXTENSION
postgres@localhost $ /usr/lib/postgresql/14/bin/psql -p 8014 pgdtest -c "SELECT bdr.create_node(node_name := 'pg14', local_dsn := 'port=8014 dbname=pgdtest host=localhost user=postgres');"
  create_node
-------------
  1151355342
(1 row)

postgres@localhost $ /usr/lib/postgresql/14/bin/psql -p 8014 pgdtest -c "SELECT bdr.create_node_group(node_group_name := 'pgdtest-group');"
  create_node_group
-------------------
          117160161
(1 row)

Great! We’ve got a PGD node and a PGD node group.

Now we can do the same thing for the Postgres 15 node. But instead of creating a new node group we join the existing node group.

postgres@localhost $ /usr/lib/postgresql/15/bin/psql -p 8015 postgres -c "CREATE DATABASE pgdtest;"
CREATE DATABASE
postgres@localhost $ /usr/lib/postgresql/15/bin/psql -p 8015 pgdtest -c "CREATE EXTENSION bdr CASCADE;"
CREATE EXTENSION
postgres@localhost $ /usr/lib/postgresql/15/bin/psql -p 8015 pgdtest -c "SELECT bdr.create_node(node_name := 'pg15', local_dsn := 'port=8015 dbname=pgdtest host=localhost user=postgres');"
  create_node
-------------
  2481762504
(1 row)

postgres@localhost $ /usr/lib/postgresql/15/bin/psql -p 8015 pgdtest -c "SELECT bdr.join_node_group(join_target_dsn := 'port=8014 dbname=pgdtest host=localhost user=postgres');"
  join_node_group
-----------------

(1 row)

And now the same for the Postgres 16 node!

postgres@localhost $ /usr/lib/postgresql/16/bin/psql -p 8016 postgres -c "CREATE DATABASE pgdtest;"
CREATE DATABASE
postgres@localhost $ /usr/lib/postgresql/16/bin/psql -p 8016 pgdtest -c "CREATE EXTENSION bdr CASCADE;"
CREATE EXTENSION
postgres@localhost $ /usr/lib/postgresql/16/bin/psql -p 8016 pgdtest -c "SELECT bdr.create_node(node_name := 'pg16', local_dsn := 'port=8016 dbname=pgdtest host=localhost user=postgres');"
create_node
-------------
  1945575858
(1 row)

postgres@localhost $ /usr/lib/postgresql/16/bin/psql -p 8016 pgdtest -c "SELECT bdr.join_node_group(join_target_dsn := 'port=8014 dbname=pgdtest host=localhost user=postgres');"
join_node_group
-----------------

(1 row)

Giving It A Go

Let me remind you that we are running a single cluster with a mix of Postgres versions. Now not only can we do that but we can replicate DDL and DML within this cluster. Switching between nodes for reads and writes. (In the real world we suggest using our support for write-leaders and not mixing writes on different nodes.)

Let’s create a table on the Postgres 14 node and add some data.

postgres@localhost $ /usr/lib/postgresql/14/bin/psql -p 8014 pgdtest -c "CREATE TABLE x (a int primary key);"
CREATE TABLE
postgres@localhost $ /usr/lib/postgresql/14/bin/psql -p 8014 pgdtest -c "INSERT INTO x VALUES (1), (32), (19), (0);"
INSERT 0 4

And query the same node:

postgres@localhost $ /usr/lib/postgresql/14/bin/psql -p 8014 pgdtest -c "SELECT * FROM x;"
a
----
  1
32
19
  0
(4 rows)

It’s normal Postgres! Nothing interesting so far.

Let’s query the Postgres 15 node.

postgres@localhost $ /usr/lib/postgresql/15/bin/psql -p 8015 pgdtest -c "SELECT * FROM x;"
a
----
  1
32
19
  0
(4 rows)

Now we’re talking. We replicated data from a Postgres 14 cluster seamlessly to a Postgres 15 cluster.

What about Postgres 16?

postgres@localhost $ /usr/lib/postgresql/16/bin/psql -p 8016 pgdtest -c "SELECT * FROM x;"
a
----
  1
32
19
  0
(4 rows)

And if we add data on the Postgres 16 node?

postgres@localhost $ /usr/lib/postgresql/16/bin/psql -p 8016 pgdtest -c "INSERT INTO x VALUES (10000);"
INSERT 0 1

And read from the Postgres 14 node?

postgres@localhost $ /usr/lib/postgresql/14/bin/psql -p 8014 pgdtest -c "SELECT * FROM x;"
  a
-------
    1
    32
    19
    0
10000
(5 rows)

That’s pretty nice.

Where From Here?

We used PGD’s production-grade replication to replicate data to and from a diverse cluster of Postgres nodes. We can replicate DDL.

We certainly covered a lot of tedious steps. But these steps are trivially scriptable. Once you’ve got these steps scripted, you’ve got what I think is a pretty delightful, production-grade setup for Postgres.

Check out the PGD docs and look forward to future posts covering other PGD features like scaling read-only queries with subscriber-only nodes.

Share this
What is the difference between physical and logical replication in Postgres? chevron_right

Physical replication copies data at the byte level from an upstream node to a downstream node, whereas logical replication transfers logical changes such as inserts, updates, and deletes. Physical replication is less flexible but more consistent, while logical replication allows for mixed-version clusters and more granular control over replicated data.

What are the benefits of using EDB Postgres Distributed (PGD)? chevron_right

PGD enhances Postgres by supporting transactional DDL replication alongside DML, enabling seamless data replication across mixed-version clusters. It also facilitates rolling upgrades and geo-distributed data management while maintaining compliance with data governance policies.

How do I set up a Postgres cluster with PGD? chevron_right

Setting up a Postgres cluster with PGD involves installing different Postgres versions, configuring each database to load the PGD extension, and interconnecting nodes using commands like CREATE EXTENSION bdr CASCADE; and bdr.create_node.

Can Postgres logical replication handle DDL changes? chevron_right

Standard Postgres logical replication cannot replicate DDL changes, which is a limitation. However, PGD overcomes this by allowing transactional DDL replication, thus supporting a broader range of database modifications.

What are common challenges in Postgres replication? chevron_right

Challenges include managing rolling upgrades, ensuring data consistency across nodes, handling DDL changes, and configuring nodes properly in a cluster to avoid conflicts and ensure smooth operation.

Why might I need low-latency read replicas in a Postgres setup? chevron_right

Low-latency read replicas are crucial for reducing response times in read-heavy applications and can improve the overall performance of your database system by distributing query loads.

Is it possible to run different versions of Postgres in the same cluster? chevron_right

Yes, PGD allows running different versions of Postgres within the same cluster, which supports rolling upgrades and provides flexibility in managing database versions.

How does PGD manage data governance policies? chevron_right

PGD supports geo-distributed data management, ensuring that data replication complies with regional data governance policies, thus making it suitable for businesses operating across multiple jurisdictions.

What is the role of a write leader in a Postgres cluster? chevron_right

The write leader is a designated node responsible for handling write operations, preventing conflicts that can arise from multiple nodes attempting to perform writes simultaneously.

How can I automate the setup of a Postgres cluster with PGD? chevron_right

Automating the setup can be done using scripting to handle repetitive tasks like configuring databases, installing extensions, and connecting nodes, making the deployment process more efficient.

What is the significance of transactional DDL in PGD? chevron_right

Transactional DDL allows schema changes to be replicated across nodes without downtime or manual intervention, enhancing the robustness and flexibility of the database system.

How does PGD facilitate rolling upgrades in Postgres? chevron_right

PGD enables rolling upgrades by allowing nodes in a cluster to run different Postgres versions, thus upgrading one node at a time without affecting the entire cluster's operation.

Can PGD be used with any Linux distribution? chevron_right

While PGD supports various Debian-based and RHEL-based distributions, the setup process described typically assumes using Debian 12, though it can be adapted for other distributions.

What are some best practices for maintaining a Postgres cluster? chevron_right

Best practices include regular monitoring of node connectivity, ensuring data consistency, automating routine tasks, and following a structured upgrade path for Postgres versions.

How does PGD enhance the scalability of Postgres clusters? chevron_right

PGD enhances scalability by allowing the addition of subscriber-only nodes for read scaling, thus efficiently handling increased query loads in a growing application environment.

Enhance Your Postgres Experience with EDB

Experience seamless high availability and replication