More exciting news for mysql_fdw users! Previously we have talked about Join pushdown and Aggregate pushdown features in mysql_fdw. A recent mysql_fdw release, 2.8.0, now includes more pushdown features: ORDER BY and LIMIT OFFSET pushdown.
ORDER BY pushdown
Now, with the ORDER BY pushdown feature, the ORDER BY clause is getting included in the query sent to the remote MySQL server, which means we get the ordered result set from the foreign server itself.
For better understanding, we will see how the explain plan looks with and without the ORDER BY pushdown. Before this enhancement, if you had a query with ORDER BY, it would produce a plan like:
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ORDERS ORDER BY 1;
QUERY PLAN
---------------------------------------------------------------------
Sort
Output: id, customer_id, item, price
Sort Key: orders.id
-> Foreign Scan on public.orders
Output: id, customer_id, item, price
Remote query: SELECT `id`, `customer_id`, `item`, `price` FROM `suraj`.`orders`
(6 rows)
Remote query is the SQL query sent to the remote server to fetch all the rows from the remote table. We can see that it will fetch all the rows from the remote server’s orders table and then perform the sort locally. But, with this feature in mysql_fdw, now the query plan looks like this:
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ORDERS ORDER BY 1;
QUERY PLAN
---------------------------------------------------------------------
Foreign Scan on public.orders
Output: id, customer_id, item, price
Remote query: SELECT `id`, `customer_id`, `item`, `price` FROM `suraj`.`orders` ORDER BY `id` IS NULL, `id` ASC
(3 rows)
Remote query in the above plan shows us that the ORDER BY clause gets computed on the foreign server.
LIMIT OFFSET pushdown
With the LIMIT OFFSET pushdown feature, the LIMIT OFFSET clause is getting pushed down to the remote MySQL server instead of fetching all of the rows and limiting them locally. This reduces network traffic between local PostgreSQL and remote MySQL servers and gives a very good performance boost.
For better understanding, we will see how the explain plan looks with and without limit pushdown. Before this enhancement, if you had a query with a LIMIT OFFSET clause, it would produce a plan like:
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ORDERS LIMIT 10 OFFSET 10;
QUERY PLAN
-----------------------------------------------------------------------------------------
Limit
Output: id, customer_id, item, price
-> Foreign Scan on public.orders
Output: id, customer_id, item, price
Remote query: SELECT `id`, `customer_id`, `item`, `price` FROM `suraj`.`orders`
(5 rows)
Remote query is the SQL query sent to the remote server to fetch all the rows from the remote table. We can see that it will fetch all the rows from the remote server’s orders table and then evaluate sorting and limit locally. But, with this feature in mysql_fdw, now the query plan looks like this:
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ORDERS LIMIT 10 OFFSET 10;
QUERY PLAN
------------------------------------------------------------------------------------------------------
Foreign Scan on public.orders
Output: id, customer_id, item, price
Remote query: SELECT `id`, `customer_id`, `item`, `price` FROM `suraj`.`orders` LIMIT 10 OFFSET 10
(3 rows)
As you can see in the above plan, instead of getting all rows from the orders table, we are now getting the limited result, thereby saving huge network bandwidth. Here’s how this affects performance in a typical scenario.
Let’s see the query output with LIMIT OFFSET clause:
SELECT * FROM ORDERS BY 1 LIMIT 5;
id | customer_id | item | price
----+-------------+-------+-------
1 | 1 | chair | 100
2 | 2 | chair | 100
3 | 3 | chair | 100
4 | 4 | chair | 100
5 | 5 | chair | 100
(5 rows)
SELECT * FROM ORDERS BY 1 LIMIT 3 OFFSET 2;
id | customer_id | item | price
----+-------------+-------+-------
3 | 3 | chair | 100
4 | 4 | chair | 100
5 | 5 | chair | 100
(3 rows)
In the above example, the first query has a LIMIT 5 clause which returns the first 5 rows from the orders table. The second query has a LIMIT 3 OFFSET 2 clause that returns 3 records by skipping the first 2 records.
Limitations for LIMIT OFFSET pushdown
- ALL/NULL options with Limit/offset clause are not supported on the MySQL server and, thus, they are not pushed down.
- OFFSET without LIMIT is not supported on the MySQL server, hence queries having that construct are not pushed.
ORDER BY pushdown with LIMIT OFFSET pushdown
ORDER BY clause can be pushed down with LIMIT OFFSET clause if used in a query. Please refer to the below example for the same:
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ORDERS ORDER BY 1 LIMIT 100 OFFSET 100;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Foreign Scan on public.orders
Output: id, customer_id, item, price
Remote query: SELECT `id`, `customer_id`, `item`, `price` FROM `suraj`.`orders` ORDER BY `id` IS NULL, `id` ASC LIMIT 100 OFFSET 100
(3 rows)
As you can see in the above plan, the remote query has ORDER BY as well as LIMIT OFFSET clause.
However, LIMIT OFFSET cannot pushdown if ORDER BY is not safe to pushdown. Refer to the below example:
EXPLAIN (VERBOSE, COSTS OFF) SELECT count(*) FROM ORDERS group by item ORDER BY item collate "de_DE" limit 5;
QUERY PLAN
--------------------------------------------------------------------------------------
Limit
Output: (count(*)), ((item)::text), item
-> Sort
Output: (count(*)), ((item)::text), item
Sort Key: ((orders.item)::text) COLLATE "de_DE"
-> Foreign Scan
Output: (count(*)), item, item
Relations: Aggregate on (suraj.orders)
Remote query: SELECT count(*), `item` FROM `suraj`.`orders` GROUP BY 2
(9 rows)
Note that, here ORDER BY can be pushed down, if possible, even if the LIMIT OFFSET clause is not safe to pushdown.
ORDER BY and LIMIT pushdown with JOIN and AGGREGATE pushdown
These pushdowns work with the join and aggregate pushdown added previously. Please refer to the below example:
EXPLAIN (VERBOSE, COSTS OFF) SELECT count(*) FROM orders r1 LEFT JOIN customers r2 ON (r1.id = r2.id) GROUP BY r1.id, r2.id ORDER BY r1.id LIMIT 100;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Foreign Scan
Output: (count(*)), r1.id, r2.id
Relations: Aggregate on ((suraj.orders r1) LEFT JOIN (suraj.customers r2))
Remote query: SELECT count(*), r1.`id`, r2.`id` FROM (`suraj`.`orders` r1 LEFT JOIN `suraj`.`customers` r2 ON (((r1.`id` = r2.`id`)))) GROUP BY 2, 3 ORDER BY r1.`id` IS NULL, r1.`id` ASC LIMIT 1
00
(4 rows)
Performance impact
The foreign table used in the below query orders has 2 million records, but the actual table resides on the remote MySQL server. When the query is executed with ORDER BY and LIMIT clause, it takes approximately 48000 milliseconds without this pushdown functionality.
EXPLAIN (ANALYZE, VERBOSE) SELECT * FROM ORDERS ORDER BY 1 LIMIT 1000 OFFSET 1000;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1062.33..1062.33 rows=1 width=44) (actual time=48343.198..48358.764 rows=1000 loops=1)
Output: id, customer_id, item, price
-> Sort (cost=1059.83..1062.33 rows=1000 width=44) (actual time=48333.063..48343.573 rows=2000 loops=1)
Output: id, customer_id, item, price
Sort Key: orders.id
Sort Method: top-N heapsort Memory: 268kB
-> Foreign Scan on public.orders (cost=10.00..1010.00 rows=1000 width=44) (actual time=19224.691..36328.095 rows=2074719 loops=1)
Output: id, customer_id, item, price
Local server startup cost: 10
Remote query: SELECT `id`, `customer_id`, `item`, `price` FROM `suraj`.`orders`
Planning Time: 0.822 ms
Execution Time: 48369.059 ms
(12 rows)
The same query with the ORDER BY and LIMIT pushdown functionality takes about 31 milliseconds as opposed to 48369 milliseconds without it. This performance gain may differ from query to query and dataset.
EXPLAIN (ANALYZE, VERBOSE) SELECT * FROM ORDERS ORDER BY 1 LIMIT 1000 OFFSET 1000;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Foreign Scan on public.orders (cost=1.00..2.00 rows=1 width=44) (actual time=16.867..25.406 rows=1000 loops=1)
Output: id, customer_id, item, price
Local server startup cost: 10
Remote query: SELECT `id`, `customer_id`, `item`, `price` FROM `suraj`.`orders` ORDER BY `id` IS NULL, `id` ASC LIMIT 1000 OFFSET 1000
Planning Time: 5.295 ms
Execution Time: 31.712 ms
(6 rows)
As you can see, there is a big performance boost after LIMIT OFFSET pushdown when we are limiting the larger data set. I hope this feature can be beneficial for many customers who are using mysql_fdw at their end.
Thank you to Vaibhav Dalvi, Sravan Velagandula, and Jeevan Chalke for helping review the code. Also a big thank you to Kashif Zeeshan for testing these features.