We have exciting news for mysql_fdw users. In my previous blog, we talked about an exciting feature - join push-down for mysql_fdw. Continuing the spree, the mysql_fdw release 2.7.0 now announces another exciting feature - aggregate push-down. Previously we experienced some customer reports where they complained of slow-performing queries which are using aggregate functions. This happened because simple count(*) on a foreign table mysql_fdw had to fetch all the rows from the remote MySQL server just to count them.
Now, with the aggregate push-down feature, aggregate functions are getting pushed down to the remote MySQL server instead of fetching all of the rows and aggregating them locally. So, for the above example, count(*) will be executing on the remote side. This gives a very good performance boost for the cases where aggregates can be pushed down.
For better understanding, we will see how the explain plan looks without and with aggregate push-down. Before this enhancement, if you had an aggregate query, it would produce a plan like:
EXPLAIN (VERBOSE, COSTS OFF) SELECT count(*) FROM ORDERS GROUP BY id;
QUERY PLAN
---------------------------------------------------------
HashAggregate
Output: count(*), id
Group Key: orders.id
-> Foreign Scan on public.orders
Output: id, customer_id, item, price
Remote query: SELECT `id` FROM `suraj`.`orders`
(6 rows)
Remote query is the SQL query sent to the remote server to fetch all the rows from the remote table. We can see that it will fetch all the rows from the orders table from the remote server and then evaluate count(*) locally using HashAggregate. But with aggregate push-down support in mysql_fdw, now the query plan looks like:
EXPLAIN (VERBOSE, COSTS OFF) SELECT count(*) FROM orders GROUP BY id;
QUERY PLAN
------------------------------------------------------------------------
Foreign Scan
Output: (count(*)), id
Relations: Aggregate on (suraj.orders)
Remote query: SELECT count(*), `id` FROM `suraj`.`orders` GROUP BY 2
(4 rows)
Remote SQL in the above plan shows us that count(*) gets computed on the foreign server. As you can see here, instead of getting all rows from the orders table, we are now getting only the aggregated result, thereby saving huge network bandwidth. We will see the performance impact this can create in a typical scenario in the next section.
Performance impact
The foreign table used in the below query orders has 1M records, but the actual table resides on the remote MySQL server. When the query is executed with aggregates, it takes approximately 6100 milliseconds without aggregate push-down functionality.
EXPLAIN (ANALYZE, COSTS OFF) SELECT count(*) FROM orders GROUP BY price;
QUERY PLAN
-----------------------------------------------------------------------------------
HashAggregate (actual time=6114.381..6114.388 rows=9 loops=1)
Group Key: orders.price
Batches: 1 Memory Usage: 40kB
-> Foreign Scan on public.orders (cost=25.00..1025.00 rows=1000 width=4) (actual time=529.477..5960.129 rows=1000000 loops=1)
Planning Time: 338.816 ms
Execution Time: 6181.273 ms
(6 rows)
The same query with the aggregate push-down functionality takes about 380 milliseconds as opposed to 6180 milliseconds without it, which is around 15x performance gain. This 15x performance gain may differ from query to query and dataset.
EXPLAIN (ANALYZE, COSTS OFF) SELECT count(*) FROM orders GROUP BY price;
QUERY PLAN
-----------------------------------------------------------------------------------
Foreign Scan (actual time=370.913..370.919 rows=9 loops=1)
Relations: Aggregate on (suraj.orders)
Planning Time: 204.795 ms
Execution Time: 379.908 ms
Join push-down and aggregate push-down together
Aggregate push-down works with join push-down functionality added previously. When two relations on which join is performing, join-push down feature pushes that entire join on the remote server. In the below query, if we have an aggregate on the joining result, this feature also pushes the aggregate on that server. This means both join and aggregate happen on the remote server itself and we get the final result post aggregation step. However, if join can't be pushed, then aggregate too cannot be pushed.
EXPLAIN (VERBOSE, COSTS OFF) SELECT count(*) FROM orders r1 LEFT JOIN customers r2 ON (r1.id = r2.id) GROUP BY r1.id, r2.id;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Foreign Scan
Output: (count(*)), r1.id, r2.id
Relations: Aggregate on ((suraj.orders r1) LEFT JOIN (suraj.customers r2))
Remote query: SELECT count(*), r1.`id`, r2.`id` FROM (`suraj`.`orders` r1 LEFT JOIN `suraj`.`customers` r2 ON (((r1.`id` = r2.`id`)))) GROUP BY 2, 3
(4 rows)
It would compute the whole query on the foreign server and give us just the result.
Pushing down aggregates to the partition table
This feature allows grouping or aggregation on a partitioned table performed separately for each partition. If the GROUP BY clause does include the partition keys, full aggregation for each partition can be performed on a per-partition basis.
CREATE TABLE fprt1 (c1 int, c2 int, c3 varchar, c4 varchar) PARTITION BY RANGE(c1);
CREATE FOREIGN TABLE ftprt1_p1 PARTITION OF fprt1 FOR VALUES FROM (1) TO (2)
SERVER mysql_svr OPTIONS (dbname 'mysql_fdw_regress', table_name 'test1');
CREATE FOREIGN TABLE ftprt1_p2 PARTITION OF fprt1 FOR VALUES FROM (3) TO (4)
SERVER mysql_svr OPTIONS (dbname 'mysql_fdw_regress', TABLE_NAME 'test2');
SET enable_partitionwise_aggregate TO on;
EXPLAIN (VERBOSE, COSTS OFF)
SELECT c1, sum(c1) FROM fprt1 GROUP BY c1 ORDER BY 2;
QUERY PLAN
------------------------------------------------------------------------------------------------
Sort
Output: fprt1.c1, (sum(fprt1.c1))
Sort Key: (sum(fprt1.c1))
-> Append
-> Foreign Scan
Output: fprt1.c1, (sum(fprt1.c1))
Relations: Aggregate on (mysql_fdw_regress.ftprt1_p1 fprt1)
Remote query: SELECT `c1`, sum(`c1`) FROM `mysql_fdw_regress`.`test1` GROUP BY 1
-> Foreign Scan
Output: fprt1_1.c1, (sum(fprt1_1.c1))
Relations: Aggregate on (mysql_fdw_regress.ftprt1_p2 fprt1)
Remote query: SELECT `c1`, sum(`c1`) FROM `mysql_fdw_regress`.`test2` GROUP BY 1
(12 rows)
Limitations
- This feature is currently limited to aggregate functions min, max, sum, avg, and count, to avoid pushing down the functions that are not present on the MySQL server
- Aggregate filters and aggregate orders are not pushed down as MySQL does not support them
- Due to infrastructure limitations, partial aggregation can’t be performed on the remote server
Thank you to my colleagues Jeevan Ladhe, Vaibhav Dalvi, and Jeevan Chalke for helping me in reviewing the code. Also a big thank you to Rajkumar Raghuwanshi for testing this feature.
Read more: Cool New Contributions to PostgreSQL 14