Data warehouse modernization with EDB Postgres AI and Greenplum compatibility

April 10, 2025

This blog was co-authored by Dunith Danushka and Oak Barrett.

A technical guide to Postgres-based data warehouse modernization using EDB Postgres AI - Support for Greenplum Workloads and WarehousePG

EDB recently announced EDB Postgres AI - Support for Greenplum Workloads and new Apache-licensed fork of Greenplum Database, WarehousePG. For teams seeking a trusted Greenplum alternative in light of Greenplum’s closed-source shift, EDB Postgres AI - Support for Greenplum Workloads with WarehousePG is a drop-in solution that reduces risk and preserves existing investments.

This post provides a technical deep dive into the history of Greenplum and its new Apache-licensed fork, WarehousePG, which forms the foundation of EDB’s data warehouse modernization solution. We will explore the technical architecture of Greenplum and WarehousePG, including concepts like massively parallel processing (MPP) and Postgres compatibility. Furthermore, we'll illustrate why EDB Postgres AI - Support for Greenplum Workloads is the open source path forward that organizations with at-risk Greenplum investments have been looking for: enabling them to mitigate vendor lock-in, embrace open source innovation, and leverage existing Greenplum skills and infrastructure while maintaining a low total cost of ownership (TCO).

What is Greenplum Database and how has it evolved? 

Greenplum's journey in the enterprise data warehousing landscape represents a significant evolution in how organizations handle large-scale analytical workloads. As a pioneering MPP Postgres database system, it has transformed how businesses process and analyze petabyte-scale data volumes. The system's architecture, built on Postgres, established new standards for performance and scalability in distributed database systems.

Greenplum's change of custodianship has marked significant transitions that shaped its development and availability. Initially developed by Greenplum Corporation, the database system was acquired by EMC Corporation in 2010. In 2012, EMC and VMware launched Pivotal as a joint venture, bringing Greenplum under its umbrella. During the Pivotal era, Greenplum thrived as an open source project, fostering innovation and community collaboration.

In 2020, VMware acquired Pivotal, including Greenplum, continuing the open source tradition and maintaining strong community engagement. However, the landscape changed dramatically when Broadcom acquired VMware in 2023, announcing plans to transition Greenplum to a closed-source model.

Figure 1: Greenplum transitions timeline
Figure 1: Greenplum transitions timeline

This shift to closed-source has created several significant challenges for existing Greenplum users:

  • End-of-life technology risk: Using the legacy open source version exposes businesses to mounting security and compatibility issues.
  • Vendor lock-in risk: Choosing the new closed-source option may subject businesses to unfamiliar support and risk of price increases.
  • Business continuity risk: No matter the choice, business-critical systems and existing Greenplum investments are at stake.
  • Limited innovation: The closed-source model restricts customization and community contributions, limiting both feature development and the ability to optimize for specific needs.
  • Migration pressure: Users feel pressured to either accept the new terms or undertake costly migration projects to alternative solutions.

These challenges have prompted many organizations to reevaluate their data warehouse strategy and seek alternatives that offer more predictable costs and greater control over their data infrastructure.

EDB Postgres AI - Support for Greenplum Workloads is the secure, Greenplum-compatible, open source alternative these customers have been looking for.

How EDB Postgres AI - Support for Greenplum Workloads provides the best Greenplum alternative

EDB Postgres AI - Support for Greenplum Workloads provides a seamless transition for Greenplum customers to WarehousePG. This path to data warehouse modernization is painless — with no reskilling or refactoring required thanks to binary compatibility with Greenplum versions 6.x and 7.x. Moreover, the solution provides 24x7 break/fix support, patching of common vulnerabilities and exposures (CVEs), and signed packages from EDB — ensuring a secure open source supply chain. WarehousePG also accelerates innovation for advanced Analytics and AI use cases with capabilities and Postgres extension compatibility unavailable on legacy Greenplum.

Let’s dive deeper into what makes this possible by examining the foundational aspects of WarehousePG and Greenplum architecture — including data processing architecture, SQL interfaces, and more.

Technical deep dive: WarehousePG and Greenplum architecture and key features

Massively parallel processing (MPP) architecture and distributed query execution

MPP is a computing architecture that uses multiple processors working simultaneously to execute a single program. In database systems, MPP enables the processing of large datasets by distributing work across many independent nodes, each handling a portion of the data in parallel. This approach significantly improves performance and scalability compared to traditional single-node database systems.

Greenplum is based on the MPP architecture — and, as a fork of Greenplum, WarehousePG is, too. MPP is designed for high-performance analytical processing of large-scale data, consisting of a coordinator host and multiple segment nodes. The coordinator serves as the entry point for client connections, handles query planning, and coordinates distributed query execution. It also maintains system catalogs and provides the interface for query submission and result aggregation.

Segment hosts form the core processing units of the system, with each host running multiple segments (typically 2 or more, depending on available CPU, RAM, storage, and network resources). These segment hosts function as independent Postgres databases that store and process data in parallel. When a query is submitted, the coordinator creates an optimized execution plan that distributes the workload across all segments. Each segment simultaneously processes its portion of the data, with results being aggregated back through the coordinator. This parallel processing architecture enables WarehousePG to efficiently handle complex analytical queries on petabyte-scale datasets. For optimal performance and workload balance, all segment hosts should be identically configured.

Figure 02: WarehousePG and Greenplum MPP architecture
Figure 02: WarehousePG/Greenplum’s MPP architecture consists of a primary Coordinator host and a set of Segment hosts.


What happens if the coordinator host crashes? To ensure high availability, WarehousePG implements a failover mechanism through a standby coordinator host. This backup coordinator maintains a real-time copy of the primary coordinator's data and metadata through Write-Ahead Logging (WAL) streaming replication. If the primary coordinator experiences a failure, the standby coordinator automatically promotes itself to primary status, ensuring minimal disruption to database operations. This warm standby approach provides robust fault tolerance without the performance overhead of synchronous replication.

figure 3
Figure 03: In case of primary coordinator’s failure the standby coordinator automatically promotes itself to primary status.


What if a segment fails? Segment mirroring provides critical failover protection at the segment level. Each primary segment can have a mirror segment that maintains an exact copy of its data. These mirrors are strategically placed on different hosts from their primary segments for redundancy. If a primary segment becomes unavailable, the system automatically fails over to its mirror, ensuring uninterrupted database operations. Greenplum offers two mirroring configurations: group mirroring (all mirrors consolidated on a single backup host) and spread mirroring (mirrors distributed across multiple hosts for enhanced fault tolerance).

Figure 4
Figure 04: If a primary segment becomes unavailable, the system automatically fails over to its mirror, ensuring uninterrupted database operations


High-throughput, low-latency communication between the coordinator and segment hosts is essential for efficient query processing. The Greenplum Interconnect serves as the critical networking layer in WarehousePG's architecture, functioning as the nervous system of the MPP architecture. This interconnect manages inter-process communication between segments while providing the underlying network infrastructure needed for parallel query execution, data distribution, and result aggregation across the cluster.

Postgres compatibility

WarehousePG maintains strong binary compatibility with Postgres. This compatibility extends beyond just SQL syntax — it includes support for essential Postgres features like Common Table Expressions (CTEs), Window Functions, advanced indexing, and many more.

This means you can connect to WarehousePG with any Postgres client, like psql or pgcli:


psql -h hostname -p 5432 -d database -U username

This compatibility provides seamless access to the rich ecosystem of Postgres extensions, enabling a diverse range of analytics workloads, while also supporting custom extensions for specialized workloads.

Here's a list of a few notable utilities and extensions.

  • PL/R – With the Greenplum Database PL/R extension you can write database functions in the R programming language and use R packages that contain R functions and data sets.
  • PL/Java – Write Java methods and install the JAR files that contain those methods into WarehousePG Database.
  • MADlib –Apache MADlib is an open-source library for scalable in-database analytics. The MADlib extension provides the ability to run machine learning and deep learning workloads in WarehousePG.
  • PXF – It provides access to this external data via built-in connectors that map an external data source to a WarehousePG table definition.

The familiar Postgres interface also reduces the learning curve for developers and database administrators, enabling teams to leverage existing Postgres expertise while working with WarehousePG’s MPP architecture.

Additional packages and commercial support

WarehousePG offers replacements for the popular commercial modules currently available. These will come in the form of net new development, partnerships, and integrations with new or existing solutions across the EDB Postgres AI, backed by EDB’s award winning 24x7 global support.

Swapping Greenplum with WarehousePG

WarehousePG allows enterprises to continue to utilize their existing infrastructure, while avoiding any risks or costs associated with migrating their data off platform. The path forward is as simple as stopping the Greenplum process, removing the old packages, installing the  WarehousePG packages, and restarting your database. No changes to ports, AI/BI/DBA toolsets, or regularly scheduled system administration tasks. 

 

For organizations on older versions of Greenplum like 6.x, EDB professional services can also provide version upgrade assistance — swapping binaries and then bringing them to WarehousePG 7.x for improved Postgres compatibility and advanced functionality without disrupting business continuity.
 

Figure 6

Conclusion

Throughout this technical deep dive, we've explored the history of Greenplum’s versatility as an enterprise data warehousing solution and how EDB Postgres AI - Support for Greenplum Workloads provides the best Greenplum alternative by enabling seamless transition to Apache-licensed WarehousePG.

WarehousePG’s MPP architecture enables efficient processing of large-scale datasets through parallel computing. Strong Postgres compatibility ensures seamless integration with existing tools and workflows and wider access to Postgres extensions. When you complement those with EDB’s enterprise-grade support and security, EDB Postgres AI - Support for Greenplum Workloads provides a comprehensive answer for modern analytics.

To learn more about the seamless transition from Greenplum to WarehousePG, visit the EDB Postgres AI - Support for Greenplum Workloads web page or explore the GitHub repository. To get in touch with our technical experts and get a free workload assessment, contact us.

Disclaimer: Greenplum® is a registered trademark of Broadcom Inc. EDB and EDB Postgres AI are not affiliated with, endorsed by, or sponsored by Broadcom Inc. Any references to Greenplum are for comparative, educational, and interoperability purposes only.

Share this