We have exciting news for those who are using mysql_fdw. Join queries are widely used queries in our customers’ environments so the performance of these queries is extremely important to us. In mysql_fdw, we were executing the joins on the local side by fetching all required data from remote relations, but that was not really efficient.
Now, with this enhancement, mysql_fdw supports join push-down to a remote server. This means join operations could be performed on a remote MySQL server instead of fetching all the rows from involved relations in a join and performing a join locally. Join push-down will happen only if involved foreign tables are from the same foreign server. This feature gives significant performance benefits in many practical cases especially in the case of large foreign tables which don't match much.
For better understanding, we will see how the explain plan looks without and with join push-down. Before this enhancement, if you had a join query (inner join), it would produce a plan like:
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM customers t1 INNER JOIN orders t2 ON (t1.id = t2.customer_id);
QUERY PLAN
-----------------------------------------------------------------------------------------------
Hash Join
Output: t1.id, t1.name, t1.email, t2.id, t2.customer_id, t2.item, t2.price
Hash Cond: (t1.id = t2.customer_id)
-> Foreign Scan on public.customers t1
Output: t1.id, t1.name, t1.email
Local server startup cost: 10
Remote query: SELECT `id`, `name`, `email` FROM `suraj`.`customers`
-> Hash
Output: t2.id, t2.customer_id, t2.item, t2.price
-> Foreign Scan on public.orders t2
Output: t2.id, t2.customer_id, t2.item, t2.price
Local server startup cost: 10
Remote query: SELECT `id`, `customer_id`, `item`, `price` FROM `suraj`.`orders`
(13 rows)
Remote query is the SQL query sent to the remote server to fetch all required rows from the remote table. We can see that it will fetch all the rows from the customers and orders tables from the remote server and then join them locally using hash join. But with join push-down support in mysql_fdw, now the query plan looks like this:
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM customers t1 INNER JOIN orders t2 ON (t1.id = t2.customer_id);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Foreign Scan
Output: t1.id, t1.name, t1.email, t2.id, t2.customer_id, t2.item, t2.price
Relations: (suraj.customers t1) INNER JOIN (suraj.orders t2)
Remote query: SELECT r1.`id`, r1.`name`, r1.`email`, r2.`id`, r2.`customer_id`, r2.`item`, r2.`price` FROM (`suraj`.`customers` r1 INNER JOIN `suraj`.`orders` r2 ON (((r1.`id` = r2.`
customer_id`))))
(4 rows)
In the above explain plan, remote query is the join query that will execute on the remote server. With this, it’s clear to see the performance improvement this enhancement is capable of producing.
Performance impact
The tables used in the above query customers and orders, both have 100k records. When the query is executed, 50k records match based on the join condition. As can be seen below, it takes approximately 1300 milliseconds without join push-down functionality.
EXPLAIN (ANALYZE, COSTS OFF)
SELECT * FROM customers t1 INNER JOIN orders t2 ON (t1.id = t2.customer_id);
QUERY PLAN
--------------------------------------------------------------------------------------------
Hash Join (actual time=1023.431..1326.532 rows=50000 loops=1)
Hash Cond: (t1.id = t2.customer_id)
-> Foreign Scan on customers t1 (actual time=90.242..647.187 rows=100000 loops=1)
-> Hash (actual time=635.035..635.036 rows=100000 loops=1)
Buckets: 131072 (originally 1024) Batches: 2 (originally 1) Memory Usage: 3568kB
-> Foreign Scan on orders t2 (actual time=82.140..614.967 rows=100000 loops=1)
Planning Time: 42.090 ms
Execution Time: 1336.697 ms
(8 rows)
The same query with the join push-down functionality takes about 480 milliseconds as opposed to 1300 milliseconds without it. This is really a nice performance gain with this new feature.
EXPLAIN (ANALYZE, COSTS OFF)
SELECT * FROM customers t1 INNER JOIN orders t2 ON (t1.id = t2.customer_id);
QUERY PLAN
----------------------------------------------------------------
Foreign Scan (actual time=202.932..475.895 rows=50001 loops=1)
Relations: (suraj.customers t1) INNER JOIN (suraj.orders t2)
Planning Time: 46.325 ms
Execution Time: 480.668 ms
(4 rows)
Partition-wise join push-down
The technique of breaking down a join between partitioned tables into joins between their matching partitions is called partition-wise join. Here, if partitions are foreign tables then those can be pushed down to the remote server.
Example for partition-wise join:
CREATE TABLE fprt1 (c1 int, c2 int, c3 varchar, c4 varchar) PARTITION BY RANGE(c1);
CREATE FOREIGN TABLE ftprt1_p1 PARTITION OF fprt1 FOR VALUES FROM (1) TO (4)
SERVER mysql_svr OPTIONS (dbname 'suraj', table_name 'test1');
CREATE FOREIGN TABLE ftprt1_p2 PARTITION OF fprt1 FOR VALUES FROM (5) TO (8)
SERVER mysql_svr OPTIONS (dbname ‘suraj’, TABLE_NAME 'test2');
CREATE TABLE fprt2 (c1 int, c2 int, c3 varchar) PARTITION BY RANGE(c2);
CREATE FOREIGN TABLE ftprt2_p1 PARTITION OF fprt2 FOR VALUES FROM (1) TO (4)
SERVER mysql_svr OPTIONS (dbname ‘suraj’, table_name 'test3');
CREATE FOREIGN TABLE ftprt2_p2 PARTITION OF fprt2 FOR VALUES FROM (5) TO (8)
SERVER mysql_svr OPTIONS (dbname ‘suraj’, TABLE_NAME 'test4');
SET enable_partitionwise_join TO on;
EXPLAIN (VERBOSE, COSTS OFF)
SELECT t1.c1,t2.c2,t3.c3
FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.c1 = t2.c2) INNER JOIN fprt1 t3 ON (t2.c2 = t3.c1)
WHERE t1.c1 % 2 =0 ORDER BY 1,2,3;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort
Output: t1.c1, t2.c2, t3.c3
Sort Key: t1.c1, t3.c3
-> Append
-> Foreign Scan
Output: t1_1.c1, t2_1.c2, t3_1.c3
Relations: ((suraj.ftprt1_p1 t1) INNER JOIN (suraj.ftprt2_p1 t2)) INNER JOIN (suraj.ftprt1_p1 t3)
Remote query: SELECT r6.`c1`, r8.`c2`, r10.`c3` FROM ((`suraj`.`test1` r6 INNER JOIN `suraj`.`test3` r8 ON (((r6.`c1` = r8.`c2`)))) INNER JOIN `suraj`.`test1` r10 ON (((r6.`c1` = r10.`c1`)))) WHERE (((r6.`c1` % 2) = 0))
-> Foreign Scan
Output: t1_2.c1, t2_2.c2, t3_2.c3
Relations: ((suraj.ftprt1_p2 t1) INNER JOIN (suraj.ftprt2_p2 t2)) INNER JOIN (suraj.ftprt1_p2 t3)
Remote query: SELECT r7.`c1`, r9.`c2`, r11.`c3` FROM ((`suraj`.`test2` r7 INNER JOIN `suraj`.`test4` r9 ON (((r7.`c1` = r9.`c2`)))) INNER JOIN `suraj`.`test2` r11 ON (((r7.`c1` = r11.`c1`)))) WHERE (((r7.`c1` % 2) = 0))
(12 rows)
Limitations
-
Currently, joins involving only relational and arithmetic operators in join-clauses are pushed down to avoid any potential join failure
-
Full outer, semi, and anti joins are not pushed down to the remote server as these are not supported on MySQL
Thank you to my colleagues Jeevan Ladhe Vaibhav Dalvi, and Jeevan Chalke for helping me in reviewing the code. And another thank you to Rajkumar Raghuwanshi and Kashif Zeeshan for testing this feature.
Read more: Cool New Contributions to PostgreSQL 14