Logical replication in Postgres: Basics

January 17, 2025

In this post we'll explore the basics of logical replication between two Postgres databases as both a user and a developer.

Background

Postgres first implemented physical replication where it shipped bytes on disk from one database A to another database B. Database B would write those bytes right back to disk. But this physical replication limits you to replicating between Postgres instances running the same Postgres version and on the same CPU architecture (and likely other operating system settings like page size).

Logical replication replicates specific messages (e.g. insert, update, delete, truncate) over the network in a mostly stable format. This allows us to replicate data between databases running different Postgres versions. You can also filter rows or columns when replicating. Swanky!

The drawback is that, since logical replication is newer than physical replication, it is still maturing. You cannot yet, for example, get DDL changes over logical replication.

In any case, I have never set up basic logical replication in Postgres before so that's what I'm going to do in this post. :)

Setup

First build Postgres 17 from source. (This is important because later we'll apply some changes to help see how logical replication works.)

$ git clone https://github.com/postgres/postgres
$ cd postgres
$ git checkout REL_17_STABLE
$ ./configure --with-libxml --without-icu --prefix=$(pwd)/build --libdir=$(pwd)/build/lib --enable-cassert --enable-debug
$ make -j8
$ make install

Next create two Postgres instances.

$ ./build/bin/initdb testdb1
$ ./build/bin/initdb testdb2

Give them both a unique port to run on, and set the wal_level to logical so we can use logical replication.

$ printf '\nport = 6001\nwal_level = logical' >> testdb1/postgresql.conf
$ printf '\nport = 6002\nwal_level = logical' >> testdb2/postgresql.conf

Then start both databases.

$ ./build/bin/pg_ctl -D testdb1 -l logfile1 start
$ ./build/bin/pg_ctl -D testdb2 -l logfile2 start

Setting up logical replication

We'll just do something very simple. Create a single table and insert a couple rows into it in testdb1. We want those rows to end up in the same table in testdb2.

We cannot replicate DDL with logical replication so we'll have to run the CREATE TABLE statement first on both databases.

$ psql -h localhost -p 6001 postgres -c "CREATE TABLE users (name TEXT PRIMARY KEY, age INT);"
CREATE TABLE
$ psql -h localhost -p 6002 postgres -c "CREATE TABLE users (name TEXT PRIMARY KEY, age INT);"
CREATE TABLE

Now we need to tell testdb1 (at port 6001) to publish changes to this table. We call the publication pub.

$ psql -h localhost -p 6001 postgres -c "CREATE PUBLICATION pub FOR TABLE users;"
CREATE PUBLICATION

And we need to tell testdb2 (at port 6002) to subscribe to changes from testdb1. We need to tell it we're referring to the pub publication created above on testdb1.

$ psql -h localhost -p 6002 postgres -c "CREATE SUBSCRIPTION sub CONNECTION 'port=6001 dbname=postgres' PUBLICATION pub;"
NOTICE:  created replication slot "sub" on publisher
CREATE SUBSCRIPTION

The table on both databases is empty.

$ psql -h localhost -p 6001 postgres -c "SELECT * FROM users;"
 name  | age
-------+-----
(0 rows)
$ psql -h localhost -p 6002 postgres -c "SELECT * FROM users;"
 name  | age
-------+-----
(0 rows)

Now let's insert two rows in testdb1.

$ psql -h localhost -p 6001 postgres -c "INSERT INTO users VALUES ('deb', 12);"
INSERT 0 1
$ psql -h localhost -p 6001 postgres -c "INSERT INTO users VALUES ('kevin', 13);"
INSERT 0 1

And now let's query testdb2.

$ psql -h localhost -p 6002 postgres -c "SELECT * FROM users;"
 name  | age
-------+-----
 deb   |  12
 kevin |  13
(2 rows)

Very nice!

Now let's take a tiny peek under the hood.

Architecture

Postgres's process model makes it pretty easy to see what is going on. Let's look at the process tree.

$ pstree -s postgres
-+= 00001 root /sbin/launchd
 |-+= 87091 phil /Users/phil/edb/postgres17/build/bin/postgres -D testdb2
 | |--= 87092 phil postgres: checkpointer
 | |--= 87093 phil postgres: background writer
 | |--= 87095 phil postgres: walwriter
 | |--= 87096 phil postgres: autovacuum launcher
 | |--= 87097 phil postgres: logical replication launcher
 | \--= 89689 phil postgres: logical replication apply worker for subscription 16407
 \-+= 87135 phil /Users/phil/edb/postgres17/build/bin/postgres -D testdb1
   |--= 87136 phil postgres: checkpointer
   |--= 87137 phil postgres: background writer
   |--= 87139 phil postgres: walwriter
   |--= 87140 phil postgres: autovacuum launcher
   |--= 87141 phil postgres: logical replication launcher
   \--= 89696 phil postgres: walsender phil postgres [local] START_REPLICATION

On the publishing side Postgres creates a new process called walsender. The process being called walsender sounds like a remnant of physical replication being implemented first historically, where Postgres literally ships one (usually 16MB) WAL segment at a time from database to database. Indeed the walsender.c code handles both physical and logical replication.

The walsender process starts up when a publication is configured. For example with the CREATE PUBLICATION SQL command. Each publication gets its own walsender process.

On the subscribing side we get a new process called apply worker. The apply worker source is in worker.c. This code is specific to logical replication. This process connects to the publishing Postgres database and ends up speaking with the walsender process.

The apply worker process starts up when a subscription is configured. For example with the CREATE SUBSCRIPTION SQL command. Each subscription gets its own apply worker process.

Communication between the apply worker and the walsender happens through what is called the logical replication protocol.

To make this a little more tangible, let's edit the Postgres source code. Let's add logs in the walsender process when we send an INSERT change and add logs in the apply worker process when we receive an INSERT change.

Hooking on send

walsender.c itself doesn't handle any WAL records. It is what's called the output plugin (that runs within the walsender process) that does. The default format for the WAL records over the wire between the walsender and the apply worker is pgoutput and is implemented in pgoutput.c. This format is set as the default in libpqwalreceiver.c.

For typical users, you don't need to think about the format. CREATE SUBSCRIPTION doesn't even let you pick the format. You can only override the default format by calling pg_create_logical_replication_slot, passing that function a different output plugin, and telling CREATE SUBSCRIPTION not to create a new slot and to instead use the one you created manually with pg_create_logical_replication_slot. To reiterate: if you don't need to override the output plugin, you probably don't need to call pg_create_logical_replication_slot directly.

Each output plugin sets up callbacks for the WAL decoder in their own _PG_output_plugin_init implementation. One of these callbacks is change_cb that handles INSERTs, UPDATEs, and DELETEs. So we will add a log line (with elog) to pgoutput_change to learn when we have decoded an INSERT, UPDATE, or DELETE and are going to send that logical change to the subscriber.

$ diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 99518c6b6d..7719d5a622 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -1556,14 +1556,17 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
        switch (action)
        {
                case REORDER_BUFFER_CHANGE_INSERT:
+                       elog(LOG, "SENDING INSERT!");
                        logicalrep_write_insert(ctx->out, xid, targetrel, new_slot,
                                                                        data->binary, relentry->columns);
                        break;
                case REORDER_BUFFER_CHANGE_UPDATE:
+                       elog(LOG, "SENDING UPDATE!");
                        logicalrep_write_update(ctx->out, xid, targetrel, old_slot,
                                                                        new_slot, data->binary, relentry->columns);
                        break;
                case REORDER_BUFFER_CHANGE_DELETE:
+                       elog(LOG, "SENDING DELETE!");
                        logicalrep_write_delete(ctx->out, xid, targetrel, old_slot,
                                                                        data->binary, relentry->columns);
                        break;

Hooking on receive

The subscription side in the apply worker process is simpler. There is a loop that interprets each message the publisher sends to the subscribing worker process. An INSERT is one type of message. The worker calls apply_handle_insert to handle an INSERT message. So we can add a log in that method, as well as logs in equivalent apply_handle_delete and apply_handle_update methods to learn about each time we receive an INSERT, UPDATE, or DELETE on the subscriber side.

$ diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index d091a1dd27..44d921ba5a 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -2427,6 +2427,7 @@ apply_handle_insert(StringInfo s)
  slot_fill_defaults(rel, estate, remoteslot);
  MemoryContextSwitchTo(oldctx);
 
+ elog(LOG, "GOT AN INSERT!");
  /* For a partitioned table, insert the tuple into a partition. */
  if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
  apply_handle_tuple_routing(edata,
@@ -2607,6 +2608,7 @@ apply_handle_update(StringInfo s)
  has_oldtup ? &oldtup : &newtup);
  MemoryContextSwitchTo(oldctx);
 
+ elog(LOG, "GOT AN UPDATE!");
  /* For a partitioned table, apply update to correct partition. */
  if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
  apply_handle_tuple_routing(edata,
@@ -2762,6 +2764,7 @@ apply_handle_delete(StringInfo s)
  slot_store_data(remoteslot, rel, &oldtup);
  MemoryContextSwitchTo(oldctx);
 
+ elog(LOG, "GOT A DELETE!");
  /* For a partitioned table, apply delete to correct partition. */
  if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
  apply_handle_tuple_routing(edata,

Then rebuild Postgres with make && make install so we can observe these logs.

Trying it out

Let's shut down, delete, recreate, and start both databases.

# Shut down both databases
$ ./build/bin/pg_ctl -D testdb1 -l logfile1 stop
waiting for server to shut down.... done
server stopped
$ ./build/bin/pg_ctl -D testdb2 -l logfile2 stop
waiting for server to shut down.... done
server stopped
# Delete both databases
$ rm -rf testdb1 testdb2 logfile1 logfile2
# Recreate both databases
$ ./build/bin/initdb testdb1
$ ./build/bin/initdb testdb2
$ printf 'port = 6002\nwal_level = logical' >> testdb2/postgresql.conf
$ printf 'port = 6001\nwal_level = logical' >> testdb1/postgresql.conf
# Start both databases
$./build/bin/pg_ctl -D testdb2 -l logfile2 start
waiting for server to start.... done
server started
$ ./build/bin/pg_ctl -D testdb1 -l logfile1 start
waiting for server to start.... done
server started

Now tail both log files so we can see the new logs we added.

$ tail -f logfile1 logfile2
==> logfile1 <==
2025-01-16 20:43:37.240 EST [93365] LOG:  starting PostgreSQL 17.2 on aarch64-apple-darwin23.6.0, compiled by Apple clang version 16.0.0 (clang-1600.0.26.6), 64-bit
2025-01-16 20:43:37.245 EST [93365] LOG:  listening on IPv6 address "::1", port 6001
2025-01-16 20:43:37.248 EST [93365] LOG:  listening on IPv4 address "127.0.0.1", port 6001
2025-01-16 20:43:37.253 EST [93365] LOG:  listening on Unix socket "/tmp/.s.PGSQL.6001"
2025-01-16 20:43:37.256 EST [93368] LOG:  database system was shut down at 2025-01-16 20:42:08 EST
2025-01-16 20:43:37.263 EST [93365] LOG:  database system is ready to accept connections
==> logfile2 <==
2025-01-16 20:43:33.662 EST [93334] LOG:  starting PostgreSQL 17.2 on aarch64-apple-darwin23.6.0, compiled by Apple clang version 16.0.0 (clang-1600.0.26.6), 64-bit
2025-01-16 20:43:33.663 EST [93334] LOG:  listening on IPv6 address "::1", port 6002
2025-01-16 20:43:33.663 EST [93334] LOG:  listening on IPv4 address "127.0.0.1", port 6002
2025-01-16 20:43:33.664 EST [93334] LOG:  listening on Unix socket "/tmp/.s.PGSQL.6002"
2025-01-16 20:43:33.666 EST [93337] LOG:  database system was shut down at 2025-01-16 20:42:11 EST
2025-01-16 20:43:33.670 EST [93334] LOG:  database system is ready to accept connections

Create the table on both databases.

$ psql -h localhost -p 6001 postgres -c "CREATE TABLE users (name TEXT PRIMARY KEY, age INT);"
CREATE TABLE
$ psql -h localhost -p 6002 postgres -c "CREATE TABLE users (name TEXT PRIMARY KEY, age INT);"
CREATE TABLE

Set up the publication on testdb1 and the subscription on testdb2.

$ psql -h localhost -p 6001 postgres -c "CREATE PUBLICATION pub FOR TABLE users;"
CREATE PUBLICATION
$ psql -h localhost -p 6002 postgres -c "CREATE SUBSCRIPTION sub CONNECTION 'port=6001 dbname=postgres' PUBLICATION pub;"
NOTICE:  created replication slot "sub" on publisher
CREATE SUBSCRIPTION

And do an insert on testdb1 and watch the log files!

$ psql -h localhost -p 6001 postgres -c "INSERT INTO users VALUES ('deb', 12);"
INSERT 0 1

The tail process for me shows:

==> logfile1 <==
2025-01-16 20:46:02.764 EST [93544] LOG:  SENDING INSERT!
2025-01-16 20:46:02.764 EST [93544] CONTEXT:  slot "sub", output plugin "pgoutput", in the change callback, associated LSN 0/15037E0
2025-01-16 20:46:02.764 EST [93544] STATEMENT:  START_REPLICATION SLOT "sub" LOGICAL 0/0 (proto_version '4', origin 'any', publication_names '"pub"')
==> logfile2 <==
2025-01-16 20:46:02.764 EST [93536] LOG:  GOT AN INSERT!
2025-01-16 20:46:02.764 EST [93536] CONTEXT:  processing remote data for replication origin "pg_16395" during message type "INSERT" for replication target relation "public.users" in transaction 751, finished at 0/15038C0

And there we have it! logfile1 corresponded to the publisher, testdb1. It logged SENDING INSERT!. logfile2 corresponded to the subscriber, testdb2. It logged GOT AN INSERT!.

What's next

Replicating logical changes is only one use case for decoding the WAL. In a future post I would like to talk about how we can decode the WAL for our own purposes, walking through some working code. The pgrx source code has a good sketch in Rust which might be exactly what I'm thinking of but I haven't looked closely yet.

Postgres is a distributed database. Some parts of logical replication aren't yet fully fleshed out in community Postgres (the lack of DDL replication looms large). Some proprietary products, like the one I work on, work around these issues in the meantime. But you can also always build this infrastructure yourself.

Share this