Logical Replication v1
PostgreSQL extends its replication capabilities beyond physical replication, which operates at the level of exact block addresses and byte-by-byte copying, by offering logical replication. Logical replication replicates data objects and their changes based on a defined replication identity, typically the primary key.
Logical replication uses a publish-and-subscribe model, where subscribers connect to publications on a publisher node. Subscribers pull data changes from these publications and can re-publish them, enabling cascading replication and complex topologies.
This flexible model is particularly useful for:
- Online data migrations
- Live PostgreSQL version upgrades
- Data distribution across systems
- Real-time analytics
- Integration with external applications
Info
For more details, examples, and limitations, please refer to the official PostgreSQL documentation on Logical Replication.
EDB Postgres for Kubernetes enhances this capability by providing declarative support for key PostgreSQL logical replication objects:
- Publications via the
Publication
resource - Subscriptions via the
Subscription
resource
Publications
In PostgreSQL's publish-and-subscribe replication model, a publication is the source of data changes. It acts as a logical container for the change sets (also known as replication sets) generated from one or more tables within a database. Publications can be defined on any PostgreSQL 10+ instance acting as the publisher, including instances managed by popular DBaaS solutions in the public cloud. Each publication is tied to a single database and provides fine-grained control over which tables and changes are replicated.
For publishers outside Kubernetes, you can create publications using SQL
or leverage the cnp publication create
plugin command.
When managing Cluster
objects with EDB Postgres for Kubernetes, PostgreSQL publications
can be defined declaratively through the Publication
resource.
Info
Please refer to the API reference
for the full list of attributes you can define for each Publication
object.
Suppose you have a cluster named freddie
and want to replicate all tables in
the app
database. Here's a Publication
manifest:
In the above example:
- The publication object is named
freddie-publisher
(metadata.name
). - The publication is created via the primary of the
freddie
cluster (spec.cluster.name
) with namepublisher
(spec.name
). - It includes all tables (
spec.target.allTables: true
) from theapp
database (spec.dbname
).
Important
While allTables
simplifies configuration, PostgreSQL offers fine-grained
control for replicating specific tables or targeted data changes. For advanced
configurations, consult the PostgreSQL documentation.
Additionally, refer to the EDB Postgres for Kubernetes API reference
for details on declaratively customizing replication targets.
Required Fields in the Publication
Manifest
The following fields are required for a Publication
object:
metadata.name
: Unique name for the KubernetesPublication
object.spec.cluster.name
: Name of the PostgreSQL cluster.spec.dbname
: Database name where the publication is created.spec.name
: Publication name in PostgreSQL.spec.target
: Specifies the tables or changes to include in the publication.
The Publication
object must reference a specific Cluster
, determining where
the publication will be created. It is managed by the cluster's primary instance,
ensuring the publication is created or updated as needed.
Reconciliation and Status
After creating a Publication
, EDB Postgres for Kubernetes manages it on the primary
instance of the specified cluster. Following a successful reconciliation cycle,
the Publication
status will reflect the following:
applied: true
, indicates the configuration has been successfully applied.observedGeneration
matchesmetadata.generation
, confirming the applied configuration corresponds to the most recent changes.
If an error occurs during reconciliation, status.applied
will be false
, and
an error message will be included in the status.message
field.
Removing a publication
The publicationReclaimPolicy
field controls the behavior when deleting a
Publication
object:
retain
(default): Leaves the publication in PostgreSQL for manual management.delete
: Automatically removes the publication from PostgreSQL.
Consider the following example:
In this case, deleting the Publication
object also removes the publisher
publication from the app
database of the freddie
cluster.
Subscriptions
In PostgreSQL's publish-and-subscribe replication model, a subscription represents the downstream component that consumes data changes. A subscription establishes the connection to a publisher's database and specifies the set of publications (one or more) it subscribes to. Subscriptions can be created on any supported PostgreSQL instance acting as the subscriber.
Important
Since schema definitions are not replicated, the subscriber must have the corresponding tables already defined before data replication begins.
EDB Postgres for Kubernetes simplifies subscription management by enabling you to define them
declaratively using the Subscription
resource.
Info
Please refer to the API reference
for the full list of attributes you can define for each Subscription
object.
Suppose you want to replicate changes from the publisher
publication on the
app
database of the freddie
cluster (publisher) to the app
database of
the king
cluster (subscriber). Here's an example of a Subscription
manifest:
In the above example:
- The subscription object is named
freddie-to-king-subscriber
(metadata.name
). - The subscription is created in the
app
database (spec.dbname
) of theking
cluster (spec.cluster.name
), with namesubscriber
(spec.name
). - It connects to the
publisher
publication in the externalfreddie
cluster, referenced byspec.externalClusterName
.
To facilitate this setup, the freddie
external cluster must be defined in the
king
cluster's configuration. Below is an example excerpt showing how to
define the external cluster in the king
manifest:
Info
For more details on configuring the externalClusters
section, see the
"Bootstrap" section of the
documentation.
As you can see, a subscription can connect to any PostgreSQL database accessible over the network. This flexibility allows you to seamlessly migrate your data into Kubernetes with nearly zero downtime. It’s an excellent option for transitioning from various environments, including popular cloud-based Database-as-a-Service (DBaaS) platforms.
Required Fields in the Subscription
Manifest
The following fields are mandatory for defining a Subscription
object:
metadata.name
: A unique name for the KubernetesSubscription
object within its namespace.spec.cluster.name
: The name of the PostgreSQL cluster where the subscription will be created.spec.dbname
: The name of the database in which the subscription will be created.spec.name
: The name of the subscription as it will appear in PostgreSQL.spec.externalClusterName
: The name of the external cluster, as defined in thespec.cluster.name
cluster's configuration. This references the publisher database.spec.publicationName
: The name of the publication in the publisher database to which the subscription will connect.
The Subscription
object must reference a specific Cluster
, determining
where the subscription will be managed. EDB Postgres for Kubernetes ensures that the
subscription is created or updated on the primary instance of the specified
cluster.
Reconciliation and Status
After creating a Subscription
, EDB Postgres for Kubernetes manages it on the primary
instance of the specified cluster. Following a successful reconciliation cycle,
the Subscription
status will reflect the following:
applied: true
, indicates the configuration has been successfully applied.observedGeneration
matchesmetadata.generation
, confirming the applied configuration corresponds to the most recent changes.
If an error occurs during reconciliation, status.applied
will be false
, and
an error message will be included in the status.message
field.
Removing a subscription
The subscriptionReclaimPolicy
field controls the behavior when deleting a
Subscription
object:
retain
(default): Leaves the subscription in PostgreSQL for manual management.delete
: Automatically removes the subscription from PostgreSQL.
Consider the following example:
In this case, deleting the Subscription
object also removes the subscriber
subscription from the app
database of the king
cluster.
Limitations
Logical replication in PostgreSQL has some inherent limitations, as outlined in the official documentation. Notably, the following objects are not replicated:
- Database schema and DDL commands
- Sequence data
- Large objects
Addressing Schema Replication
The first limitation, related to schema replication, can be easily addressed
using EDB Postgres for Kubernetes' capabilities. For instance, you can leverage the import
bootstrap feature to copy the schema of the tables you need to replicate.
Alternatively, you can manually create the schema as you would for any
PostgreSQL database.
Handling Sequences
While sequences are not automatically kept in sync through logical replication,
EDB Postgres for Kubernetes provides a solution to be used in live migrations.
You can use the cnp
plugin
to synchronize sequence values, ensuring consistency between the publisher and
subscriber databases.
Example of live migration and major Postgres upgrade with logical replication
To highlight the powerful capabilities of logical replication, this example
demonstrates how to replicate data from a publisher database (freddie
)
running PostgreSQL 16 to a subscriber database (king
) running the latest
PostgreSQL version. This setup can be deployed in your Kubernetes cluster for
evaluation and hands-on learning.
This example illustrates how logical replication facilitates live migrations and upgrades between PostgreSQL versions while ensuring data consistency. By combining logical replication with EDB Postgres for Kubernetes, you can easily set up, manage, and evaluate such scenarios in a Kubernetes environment.
Step 1: Setting Up the Publisher (freddie
)
The first step involves creating a freddie
PostgreSQL cluster with version 16.
The cluster contains a single instance and includes an app
database
initialized with a table, n
, storing 10,000 numbers. A logical replication
publication named publisher
is also configured to include all tables in the
database.
Here’s the manifest for setting up the freddie
cluster and its publication
resource:
Step 2: Setting Up the Subscriber (king
)
Next, create the king
PostgreSQL cluster, running the latest version of
PostgreSQL. This cluster initializes by importing the schema from the app
database on the freddie
cluster using the external cluster configuration. A
Subscription
resource, freddie-to-king-subscription
, is then configured to
consume changes published by the publisher
on freddie
.
Below is the manifest for setting up the king
cluster and its subscription:
Once the king
cluster is running, you can verify that the replication is
working by connecting to the app
database and counting the records in the n
table. The following example uses the psql
command provided by the cnp
plugin for simplicity:
This command should return 10000
, confirming that the data from the freddie
cluster has been successfully replicated to the king
cluster.
Using the cnp
plugin, you can also synchronize existing sequences to ensure
consistency between the publisher and subscriber. The example below
demonstrates how to synchronize a sequence for the king
cluster:
This command updates the sequence n_i_seq
in the king
cluster to match the
current value, ensuring it is in sync with the source database.