I recently had a few customers ask me about using a query for monitoring SMR XDB lag, so they can keep tabs on replication progress and set up notifications.
Unfortunately, because of the way WAL works, it's a bit tedious to try to quantify lag in terms of rows (You'd have to take each WAL file, scan, and filter for unrelated WAL entries, based on sync_ids in the XDB control database). Additionally, tracking WAL-based SMR replication lag cannot be done with an earlier solution. Fortunately, it is possible to calculate the time lag with a query:
WITH
src_db (current_xid_commit_timestamp) AS
(SELECT timestamp FROM pg_last_committed_xact()),
-- last replicated timestamp for target databases
target_db (target_db_id, last_repl_xid_timestamp) AS
(SELECT sub_db_id AS target_db_id, MAX(last_repl_xid_timestamp) AS last_repl_xid_timestamp
FROM _edb_replicator_pub.rrep_txset rt JOIN _edb_replicator_sub.xdb_subscriptions xs ON xs.sub_id=rt.sub_id WHERE status = 'C'
GROUP BY sub_db_id),
-- identity of target databases
target_db_identity (target_db_id, target_db_identity) AS
(SELECT sub_db_id, db_host || ':' || db_port || ':' || db_name
FROM _edb_replicator_sub.xdb_sub_database)
-- replication lag for each of the target databases
SELECT c.target_db_id, target_db_identity, AGE(current_xid_commit_timestamp, last_repl_xid_timestamp) repl_time_lag
FROM src_db a, target_db b, target_db_identity c WHERE b.target_db_id = c.target_db_id
ORDER BY target_db_id;
By examining the replication timestamps, we are able to get a good handle on the time delay between the associated servers. Additionally, with some more tweaking, we can use a similar query to calculate time lag for Multi-Master Replication (MMR) clusters:
WITH
src_db (current_xid_commit_timestamp) AS
(SELECT timestamp FROM pg_last_committed_xact()),
-- last replicated timestamp for target databases
target_db (target_db_id, last_repl_xid_timestamp) AS
(SELECT db_id AS target_db_id, MAX(last_repl_xid_timestamp) AS last_repl_xid_timestamp
FROM _edb_replicator_pub.rrep_mmr_txset WHERE status = 'C'
GROUP BY db_id),
-- identity of target databases
target_db_identity (target_db_id, target_db_identity) AS
(SELECT pub_db_id, db_host || ':' || db_port || ':' || db_name
FROM _edb_replicator_pub.xdb_pub_database)
-- replication lag for each of the target databases
SELECT c.target_db_id, target_db_identity, AGE(current_xid_commit_timestamp, last_repl_xid_timestamp) repl_time_lag
FROM src_db a, target_db b, target_db_identity c WHERE b.target_db_id = c.target_db_id
ORDER BY target_db_id;
With these queries, DBAs and sysadmins can easily set up notification and monitoring systems for XDB performance.