Aggregate Push-down in mysql_fdw

November 23, 2021

We have exciting news for mysql_fdw users. In my previous blog, we talked about an exciting feature - join push-down for mysql_fdw. Continuing the spree, the mysql_fdw release 2.7.0 now announces another exciting feature - aggregate push-down. Previously we experienced some customer reports where they complained of slow-performing queries which are using aggregate functions. This happened because simple count(*) on a foreign table mysql_fdw had to fetch all the rows from the remote MySQL server just to count them.

Now, with the aggregate push-down feature, aggregate functions are getting pushed down to the remote MySQL server instead of fetching all of the rows and aggregating them locally. So, for the above example, count(*) will be executing on the remote side. This gives a very good performance boost for the cases where aggregates can be pushed down.
For better understanding, we will see how the explain plan looks without and with aggregate push-down. Before this enhancement, if you had an aggregate query, it would produce a plan like:

EXPLAIN (VERBOSE, COSTS OFF) SELECT count(*) FROM ORDERS GROUP BY id;
                       QUERY PLAN                        
---------------------------------------------------------
 HashAggregate
   Output: count(*), id
   Group Key: orders.id
   ->  Foreign Scan on public.orders
         Output: id, customer_id, item, price
         Remote query: SELECT `id` FROM `suraj`.`orders`
(6 rows)

Remote query is the SQL query sent to the remote server to fetch all the rows from the remote table. We can see that it will fetch all the rows from the orders table from the remote server and then evaluate count(*) locally using HashAggregate. But with aggregate push-down support in mysql_fdw, now the query plan looks like:

EXPLAIN (VERBOSE, COSTS OFF) SELECT count(*) FROM orders GROUP BY id;
                               QUERY PLAN                               
------------------------------------------------------------------------
 Foreign Scan
   Output: (count(*)), id
   Relations: Aggregate on (suraj.orders)
   Remote query: SELECT count(*), `id` FROM `suraj`.`orders` GROUP BY 2
(4 rows)

Remote SQL in the above plan shows us that count(*) gets computed on the foreign server. As you can see here, instead of getting all rows from the orders table, we are now getting only the aggregated result, thereby saving huge network bandwidth. We will see the performance impact this can create in a typical scenario in the next section.

Performance impact

The foreign table used in the below query orders has 1M records, but the actual table resides on the remote MySQL server. When the query is executed with aggregates, it takes approximately 6100 milliseconds without aggregate push-down functionality.

EXPLAIN (ANALYZE, COSTS OFF) SELECT count(*) FROM orders GROUP BY price;
                                    QUERY PLAN                                     
-----------------------------------------------------------------------------------
HashAggregate  (actual time=6114.381..6114.388 rows=9 loops=1)
   Group Key: orders.price
   Batches: 1  Memory Usage: 40kB
   ->  Foreign Scan on public.orders  (cost=25.00..1025.00 rows=1000 width=4) (actual time=529.477..5960.129 rows=1000000 loops=1)
Planning Time: 338.816 ms
Execution Time: 6181.273 ms
(6 rows)

The same query with the aggregate push-down functionality takes about 380 milliseconds as opposed to 6180 milliseconds without it, which is around 15x performance gain. This 15x performance gain may differ from query to query and dataset.

EXPLAIN (ANALYZE, COSTS OFF) SELECT count(*) FROM orders GROUP BY price;
                                    QUERY PLAN                                     
-----------------------------------------------------------------------------------
Foreign Scan  (actual time=370.913..370.919 rows=9 loops=1)
   Relations: Aggregate on (suraj.orders)
Planning Time: 204.795 ms
Execution Time: 379.908 ms

Join push-down and aggregate push-down together

Aggregate push-down works with join push-down functionality added previously. When two relations on which join is performing, join-push down feature pushes that entire join on the remote server. In the below query, if we have an aggregate on the joining result, this feature also pushes the aggregate on that server. This means both join and aggregate happen on the remote server itself and we get the final result post aggregation step. However, if join can't be pushed, then aggregate too cannot be pushed.

EXPLAIN (VERBOSE, COSTS OFF) SELECT count(*) FROM orders r1 LEFT JOIN customers r2 ON (r1.id = r2.id) GROUP BY r1.id, r2.id;
                                                                       QUERY PLAN                                                                       
--------------------------------------------------------------------------------------------------------------------------------------------------------
 Foreign Scan
   Output: (count(*)), r1.id, r2.id
   Relations: Aggregate on ((suraj.orders r1) LEFT JOIN (suraj.customers r2))
   Remote query: SELECT count(*), r1.`id`, r2.`id` FROM (`suraj`.`orders` r1 LEFT JOIN `suraj`.`customers` r2 ON (((r1.`id` = r2.`id`)))) GROUP BY 2, 3
(4 rows)

It would compute the whole query on the foreign server and give us just the result.

Pushing down aggregates to the partition table

This feature allows grouping or aggregation on a partitioned table performed separately for each partition. If the GROUP BY clause does include the partition keys, full aggregation for each partition can be performed on a per-partition basis.

CREATE TABLE fprt1 (c1 int, c2 int, c3 varchar, c4 varchar) PARTITION BY RANGE(c1);
CREATE FOREIGN TABLE ftprt1_p1 PARTITION OF fprt1 FOR VALUES FROM (1) TO (2)
  SERVER mysql_svr OPTIONS (dbname 'mysql_fdw_regress', table_name 'test1');
CREATE FOREIGN TABLE ftprt1_p2 PARTITION OF fprt1 FOR VALUES FROM (3) TO (4)
  SERVER mysql_svr OPTIONS (dbname 'mysql_fdw_regress', TABLE_NAME 'test2');

SET enable_partitionwise_aggregate TO on;


EXPLAIN (VERBOSE, COSTS OFF)
SELECT c1, sum(c1) FROM fprt1 GROUP BY c1 ORDER BY 2;
                                           QUERY PLAN                                           
------------------------------------------------------------------------------------------------
 Sort
   Output: fprt1.c1, (sum(fprt1.c1))
   Sort Key: (sum(fprt1.c1))
   ->  Append
         ->  Foreign Scan
               Output: fprt1.c1, (sum(fprt1.c1))
               Relations: Aggregate on (mysql_fdw_regress.ftprt1_p1 fprt1)
               Remote query: SELECT `c1`, sum(`c1`) FROM `mysql_fdw_regress`.`test1` GROUP BY 1
         ->  Foreign Scan
               Output: fprt1_1.c1, (sum(fprt1_1.c1))
               Relations: Aggregate on (mysql_fdw_regress.ftprt1_p2 fprt1)
               Remote query: SELECT `c1`, sum(`c1`) FROM `mysql_fdw_regress`.`test2` GROUP BY 1
(12 rows)

Limitations

  • This feature is currently limited to aggregate functions min, max, sum, avg, and count, to avoid pushing down the functions that are not present on the MySQL server
  • Aggregate filters and aggregate orders are not pushed down as MySQL does not support them
  • Due to infrastructure limitations, partial aggregation can’t be performed on the remote server

Thank you to my colleagues Jeevan Ladhe, Vaibhav Dalvi, and Jeevan Chalke for helping me in reviewing the code. Also a big thank you to Rajkumar Raghuwanshi for testing this feature.

Read more:  Cool New Contributions to PostgreSQL 14

Share this

More Blogs

PGVector as Embedding Store in PrivateGPT

EDB has a long history of open source contributions, and while we’re best known for our contributions to Postgres, that’s not the only project we contribute to. e.g Barman, CloudNativePG...
June 05, 2024

pgAdmin User Management in Server Mode

pgAdmin can be deployed as a web application by configuring the app to run in server mode. One can check out server deployment on how to run pgAdmin in server...
August 24, 2023

pgAdmin CI/CD

Almost exactly three years ago I wrote a blog on my personal page entitled Testing pgAdmin which went into great detail discussing how we test pgAdmin prior to releases. Back...
August 24, 2023