Achieving Extreme High Availability with EDB Postgres Distributed: Rescue Your Downtime from (Un)Planned Processes

May 11, 2023
EDB

In one of my previous blog posts - What Does “Database High Availability” Really Mean, I discussed the importance of high availability in maintaining business continuity and minimizing downtime. High availability is crucial for businesses to ensure that their database is up and running at all times, as downtime can result in a loss of business. In this blog post, we will focus on achieving five 9’s (99.999% uptime) of extreme high availability for your database with EDB Postgres Distributed (PGD) and how it can limit your downtime during planned processes.

 

The challenges of physical streaming replication-based high availability

To achieve high availability, Postgres users commonly rely on physical streaming replication to ensure redundancy. This approach is often combined with tools such as EDB Failover Manager, Patroni or Repmgr to enable automatic failover and switchover, ensuring the availability of Postgres in production. Despite these measures, certain activities can still compromise high availability and exceed the allowable downtime defined by your service level agreement.

Activities that can impact the high availability requirements include:

  • Database maintenance operations
  • Patching the system/Postgres
  • Major version upgrades
  • Increasing resources (CPU/Memory/Disk Space)
  • Unplanned outages
  • Configuration and parameters changes that require a restart

With physical streaming replication, activities listed above can impact the performance and availability of the system. For example, if your system is highly bloated, normal VACUUM can give you relief. However, if you want to perform a VACUUM FULL operation to reclaim unused spaces in a table, it would require an exclusive lock on the table, resulting in some outage. Similarly, if you want to reindex indexes used in the primary key or foreign key due to bloat, the operation would require a table-level exclusive lock and could result in downtime.

Similarly, the patching/major version upgrade process, changing shared_buffers, max_connections based on the workload change or requirement of adding more services to the same database can also result in downtime. We cannot forget unplanned outages, for which we keep replication/maintain redundancies.

 

The solution: EDB Postgres Distributed (PGD)

The above challenges of physical streaming replication-based high availability solutions led us to rethink our approach and develop a technology that could offer extreme high availability to our customers and enable them to maintain uptime during the important activities required for their Postgres database. PGD uses the logical replication capability of Postgres and empowers users to achieve extreme high availability  (up to five 9s i.e. 99.999%) while allowing them to perform certain activities critical for their database optimization.

PGD has the following two components in their architecture:

  • PGD Proxy
  • Bi-directional replication (BDR)

BDR  ensures the data on all nodes is synchronized and is also responsible for replication of DDLs and other objects.

On the other hand, PGD Proxy plays a key role in routing connections to lead primary (primary which is taking all read/write). It is also responsible for forwarding/re-routing connections to the elected lead primary in case something happens to the current (lead) primary which is taking all the load.

In addition to providing extreme high availability, PGD can also be used for the following:

  • Distribute Workloads Geographically: PGD can also be used for distributing the workloads on the database based on geography. For example, if you have three nodes PGD architecture and have these nodes spread across the globe (for instance, in US, UK and India), you can use the local database of each country for managing the workload of respective countries. Similar architecture and distribution of workloads can be applied based on regions.
  • Data Localization Security: Advanced logical replication in PGD also allows you to choose access rights and maintain data sovereignty—protecting your organization and limiting threats.

 

How PGD guarantees extreme high availability for your Postgres database

PGD's architecture promotes extreme high availability for your database through various techniques. Let's discuss some of them in detail:

  1. Automatic Failover/Switchover:

PGD provides a different approach to cluster management compared to tools like EFM, Repmgr and Patroni that use physical streaming replication. With PGD, there is no need to promote a node in the cluster because it uses a consensus layer to identify the new primary node and redirect connections accordingly. This method eliminates the need for decision-making regarding the selection and promotion of a standby node to a primary one, resulting in reduced downtime, higher availability and easier point-in-time recovery and disaster recovery in the event of an outage.

  1. Online database maintenance:

In PGD, all nodes receive logical changes from each other, providing users with the ability to carry out difficult maintenance tasks without affecting availability. This allows administrators to execute operations such as REINDEX/VACUUM FULL or changing the tablespace of a table without impacting users who are accessing the primary node. Maintenance operations can be performed on shadow primaries (non-lead primary nodes) and then a switchover can be performed to carry out the same activity on the lead primary. This PGD mechanism is similar to a rolling maintenance operation on a cluster, thereby preventing impacts to your business or customers.

  1. Patching the system/Postgres

PGD enables updating or patching the system while minimizing the impact on the availability of PGD clusters, much like how database maintenance is performed. This means that you can start by patching the non-lead primary nodes in PGD, and then proceed with the switchover to perform the same operation on the lead primary.

  1. In-place major version upgrade

PGD allows nodes of different versions to coexist in a cluster, unlike physical streaming replication. This feature provides flexibility for users and administrators to upgrade nodes individually to a major version and then perform the same operation on the lead primary after switchover. Additionally, with a major version upgrade, administrators and users can plan and test their applications using nodes that have already been upgraded, making their applications compatible with the new major version of Postgres after the upgrade.

  1. Increasing resources on system

As the workload changes in the production environment, the administrators must adjust the resources such as RAM, CPU and disk space on the production system accordingly. When additional resources are added to the system, the Postgres parameters must be adjusted to utilize the newly available resources. Unlike physical streaming replication, PGD allows users to configure different settings for parameters such as shared_buffers or max_connections on different nodes. Similarly, like performing a rolling upgrade, the administrator can increase or decrease resources on the nodes and carry out a switchover on the lead primary to accomplish this.

 

Experience extreme high availability with EDB Postgres Distributed

For organizations, maintaining high availability of their database is crucial to minimize downtime. To achieve this, they can rely on PGD technology which offers extreme high availability and supports essential activities for database optimization. By utilizing PGD's techniques for online maintenance, upgrading and patching, businesses can ensure their database is always operational and downtime is minimized.

 

Learn more about the full capabilities of EDB Postgres Distributed in our eBook “EDB Postgres Distributed: The Next Generation of PostgreSQL High Availability.”

Share this