How to Enhance the Efficiency of Your mysql_fdw Operations with Join Push-down

October 07, 2021

We have exciting news for those who are using mysql_fdw. Join queries are widely used queries in our customers’ environments so the performance of these queries is extremely important to us. In mysql_fdw, we were executing the joins on the local side by fetching all required data from remote relations, but that was not really efficient. 

Now, with this enhancement, mysql_fdw supports join push-down to a remote server. This means join operations could be performed on a remote MySQL server instead of fetching all the rows from involved relations in a join and performing a join locally. Join push-down will happen only if involved foreign tables are from the same foreign server. This feature gives significant performance benefits in many practical cases especially in the case of large foreign tables which don't match much.

For better understanding, we will see how the explain plan looks without and with join push-down. Before this enhancement, if you had a join query (inner join), it would produce a plan like:

EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM customers t1 INNER JOIN orders t2 ON (t1.id = t2.customer_id);
                                          QUERY PLAN                                           
-----------------------------------------------------------------------------------------------
 Hash Join
   Output: t1.id, t1.name, t1.email, t2.id, t2.customer_id, t2.item, t2.price
   Hash Cond: (t1.id = t2.customer_id)
   ->  Foreign Scan on public.customers t1
         Output: t1.id, t1.name, t1.email
         Local server startup cost: 10
         Remote query: SELECT `id`, `name`, `email` FROM `suraj`.`customers`
   ->  Hash
         Output: t2.id, t2.customer_id, t2.item, t2.price
         ->  Foreign Scan on public.orders t2
               Output: t2.id, t2.customer_id, t2.item, t2.price
               Local server startup cost: 10
               Remote query: SELECT `id`, `customer_id`, `item`, `price` FROM `suraj`.`orders`
(13 rows)

Remote query is the SQL query sent to the remote server to fetch all required rows from the remote table. We can see that it will fetch all the rows from the customers and orders tables from the remote server and then join them locally using hash join. But with join push-down support in mysql_fdw, now the query plan looks like this:

EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM customers t1 INNER JOIN orders t2 ON (t1.id = t2.customer_id);
                                                                                                QUERY PLAN                                                                               
                 
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Foreign Scan
   Output: t1.id, t1.name, t1.email, t2.id, t2.customer_id, t2.item, t2.price
   Relations: (suraj.customers t1) INNER JOIN (suraj.orders t2)
   Remote query: SELECT r1.`id`, r1.`name`, r1.`email`, r2.`id`, r2.`customer_id`, r2.`item`, r2.`price` FROM (`suraj`.`customers` r1 INNER JOIN `suraj`.`orders` r2 ON (((r1.`id` = r2.`
customer_id`))))
(4 rows)

In the above explain plan, remote query is the join query that will execute on the remote server. With this, it’s clear to see the performance improvement this enhancement is capable of producing.

 

Performance impact

The tables used in the above query customers and orders, both have 100k records. When the query is executed, 50k records match based on the join condition. As can be seen below, it takes approximately 1300 milliseconds without join push-down functionality.

EXPLAIN (ANALYZE, COSTS OFF)
SELECT * FROM customers t1 INNER JOIN orders t2 ON (t1.id = t2.customer_id);
                                         QUERY PLAN                                         
--------------------------------------------------------------------------------------------
 Hash Join (actual time=1023.431..1326.532 rows=50000 loops=1)
   Hash Cond: (t1.id = t2.customer_id)
   ->  Foreign Scan on customers t1 (actual time=90.242..647.187 rows=100000 loops=1)
   ->  Hash (actual time=635.035..635.036 rows=100000 loops=1)
         Buckets: 131072 (originally 1024)  Batches: 2 (originally 1)  Memory Usage: 3568kB
         ->  Foreign Scan on orders t2 (actual time=82.140..614.967 rows=100000 loops=1)
 Planning Time: 42.090 ms
 Execution Time: 1336.697 ms
(8 rows)

The same query with the join push-down functionality takes about 480 milliseconds as opposed to 1300 milliseconds without it. This is really a nice performance gain with this new feature. 

EXPLAIN (ANALYZE, COSTS OFF)
SELECT * FROM customers t1 INNER JOIN orders t2 ON (t1.id = t2.customer_id);
                           QUERY PLAN                           
----------------------------------------------------------------
 Foreign Scan (actual time=202.932..475.895 rows=50001 loops=1)
   Relations: (suraj.customers t1) INNER JOIN (suraj.orders t2)
 Planning Time: 46.325 ms
 Execution Time: 480.668 ms
(4 rows)

 

Partition-wise join push-down

The technique of breaking down a join between partitioned tables into joins between their matching partitions is called partition-wise join. Here, if partitions are foreign tables then those can be pushed down to the remote server.
Example for partition-wise join:

CREATE TABLE fprt1 (c1 int, c2 int, c3 varchar, c4 varchar) PARTITION BY RANGE(c1);
CREATE FOREIGN TABLE ftprt1_p1 PARTITION OF fprt1 FOR VALUES FROM (1) TO (4)
  SERVER mysql_svr OPTIONS (dbname 'suraj', table_name 'test1');
CREATE FOREIGN TABLE ftprt1_p2 PARTITION OF fprt1 FOR VALUES FROM (5) TO (8)
  SERVER mysql_svr OPTIONS (dbname ‘suraj’, TABLE_NAME 'test2');
  
CREATE TABLE fprt2 (c1 int, c2 int, c3 varchar) PARTITION BY RANGE(c2);
CREATE FOREIGN TABLE ftprt2_p1 PARTITION OF fprt2 FOR VALUES FROM (1) TO (4)
  SERVER mysql_svr OPTIONS (dbname ‘suraj’, table_name 'test3');
CREATE FOREIGN TABLE ftprt2_p2 PARTITION OF fprt2 FOR VALUES FROM (5) TO (8)
  SERVER mysql_svr OPTIONS (dbname ‘suraj’, TABLE_NAME 'test4');
  
SET enable_partitionwise_join TO on;

EXPLAIN (VERBOSE, COSTS OFF)
SELECT t1.c1,t2.c2,t3.c3
  FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.c1 = t2.c2) INNER JOIN fprt1 t3 ON (t2.c2 = t3.c1)
  WHERE t1.c1 % 2 =0 ORDER BY 1,2,3;
                                                                                                                                            QUERY PLAN                                                                                                                                   
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort
   Output: t1.c1, t2.c2, t3.c3
   Sort Key: t1.c1, t3.c3
   ->  Append
         ->  Foreign Scan
               Output: t1_1.c1, t2_1.c2, t3_1.c3
               Relations: ((suraj.ftprt1_p1 t1) INNER JOIN (suraj.ftprt2_p1 t2)) INNER JOIN (suraj.ftprt1_p1 t3)
               Remote query: SELECT r6.`c1`, r8.`c2`, r10.`c3` FROM ((`suraj`.`test1` r6 INNER JOIN `suraj`.`test3` r8 ON (((r6.`c1` = r8.`c2`)))) INNER JOIN `suraj`.`test1` r10 ON (((r6.`c1` = r10.`c1`)))) WHERE (((r6.`c1` % 2) = 0))
         ->  Foreign Scan
               Output: t1_2.c1, t2_2.c2, t3_2.c3
               Relations: ((suraj.ftprt1_p2 t1) INNER JOIN (suraj.ftprt2_p2 t2)) INNER JOIN (suraj.ftprt1_p2 t3)
               Remote query: SELECT r7.`c1`, r9.`c2`, r11.`c3` FROM ((`suraj`.`test2` r7 INNER JOIN `suraj`.`test4` r9 ON (((r7.`c1` = r9.`c2`)))) INNER JOIN `suraj`.`test2` r11 ON (((r7.`c1` = r11.`c1`)))) WHERE (((r7.`c1` % 2) = 0))
(12 rows)

 

Limitations

  • Currently, joins involving only relational and arithmetic operators in join-clauses are pushed down to avoid any potential join failure

  • Full outer, semi, and anti joins are not pushed down to the remote server as these are not supported on MySQL

Thank you to my colleagues Jeevan Ladhe Vaibhav Dalvi, and Jeevan Chalke for helping me in reviewing the code. And another thank you to Rajkumar Raghuwanshi and Kashif Zeeshan for testing this feature.

Read more: Cool New Contributions to PostgreSQL 14

Share this

More Blogs

PGVector as Embedding Store in PrivateGPT

EDB has a long history of open source contributions, and while we’re best known for our contributions to Postgres, that’s not the only project we contribute to. e.g Barman, CloudNativePG...
June 05, 2024

pgAdmin User Management in Server Mode

pgAdmin can be deployed as a web application by configuring the app to run in server mode. One can check out server deployment on how to run pgAdmin in server...
August 24, 2023

pgAdmin CI/CD

Almost exactly three years ago I wrote a blog on my personal page entitled Testing pgAdmin which went into great detail discussing how we test pgAdmin prior to releases. Back...
August 24, 2023