Boosting Query Performance with Memoize in PostgreSQL

August 16, 2024

PostgreSQL 14 introduces a powerful feature called memoize, which enhances query performance by caching results from parameterized scans inside nested-loop joins. This optimization allows PostgreSQL to skip redundant scans, significantly speeding up nested lookups. In this blog, we'll explore how memoize works and demonstrate its benefits with a practical example.

 

Understanding Memoize

The memoize functionality enables or disables the query planner's use of memoize plans for caching results from parameterized scans inside nested-loop joins. When enabled, this plan type allows scans to the underlying plans to be skipped if the results for the current parameters are already in the cache. Less frequently accessed results may be evicted from the cache when more space is required for new entries. By default, memoize is turned on.

 

Practical Example: Memoize in Action

Let's walk through a simple demo to see memoize in action. We’ll create a table and insert some data to illustrate the benefits of memoize.


CREATE TABLE strtest (n name, t text);
INSERT INTO strtest VALUES('one','one'),('two','two'),('three',repeat(md5('three'),100));
INSERT INTO strtest SELECT * FROM strtest;
CREATE INDEX strtest_n_idx ON strtest (n);
CREATE INDEX strtest_t_idx ON strtest (t);
ANALYZE strtest;

-- Tweak settings to get Memoize plan:


SET enable_hashjoin TO off;
SET enable_bitmapscan TO off;
SET enable_seqscan TO off;
SET enable_material TO off;
SET enable_mergejoin TO off;
EXPLAIN (BUFFERS, ANALYZE, VERBOSE) SELECT * FROM strtest s1 INNER JOIN strtest s2 ON s1.n = s2.n;

 

Analyzing the Query Plan with Memoize

Here's the query plan with memoize enabled:

                                       QUERY PLAN
-----------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.28..20.03 rows=12 width=188) (actual time=0.012..0.026 rows=12 loops=1)
   Output: s1.n, s1.t, s2.n, s2.t
   Buffers: shared hit=8
   ->  Index Scan using strtest_n_idx on public.strtest s1  (cost=0.13..12.22 rows=6 width=94) (actual time=0.004..0.005 rows=6 loops=1)
         Output: s1.n, s1.t
         Buffers: shared hit=2
   ->  Memoize  (cost=0.14..2.18 rows=2 width=94) (actual time=0.002..0.002 rows=2 loops=6)
         Output: s2.n, s2.t
         Cache Key: s1.n
         Cache Mode: logical
         Hits: 3  Misses: 3  Evictions: 0  Overflows: 0  Memory Usage: 2kB
         Buffers: shared hit=6
         ->  Index Scan using strtest_n_idx on public.strtest s2  (cost=0.13..2.17 rows=2 width=94) (actual time=0.001..0.002 rows=2 loops=3)
               Output: s2.n, s2.t
               Index Cond: (s2.n = s1.n)
               Buffers: shared hit=6

 

Breaking Down the Memoize Plan

We have 6 rows in the strtest table (with 3 unique values, each duplicated twice). As shown, it's a nested loop with the inner loop performing an index scan each time, while the outer loop is a sequential scan (one pass). As s1 is read (6 rows total), each value triggers a lookup to the index for s2.n to match s2.n = s1.n. The memoize plan caches these lookups, resulting in fewer index scans:

  1. "one" -> no entry/miss -> index lookup
  2. "two" -> no entry/miss -> index lookup
  3. "three" -> no entry/miss -> index lookup
  4. "one" -> hit -> no index lookup necessary
  5. "two" -> hit -> no index lookup necessary
  6. "three" -> hit -> no index lookup necessary

This results in 3 misses and 3 hits, showcasing a 50% hit rate for just 2kB of memory usage. The index scan is performed 3 times (loops=3), which would be loops=6 without memoize.

 

Comparing with a Plan Without Memoize

For comparison, here's the query plan without memoize:

EXPLAIN (BUFFERS, ANALYZE, VERBOSE) SELECT * FROM strtest s1 INNER JOIN strtest s2 ON s1.n = s2.n;

                                   QUERY PLAN
-----------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.27..25.35 rows=12 width=188) (actual time=0.018..0.029 rows=12 loops=1)
   Output: s1.n, s1.t, s2.n, s2.t
   Buffers: shared hit=14
   ->  Index Scan using strtest_n_idx on public.strtest s1  (cost=0.13..12.22 rows=6 width=94) (actual time=0.010..0.011 rows=6 loops=1)
         Output: s1.n, s1.t
         Buffers: shared hit=2
   ->  Index Scan using strtest_n_idx on public.strtest s2  (cost=0.13..2.17 rows=2 width=94) (actual time=0.001..0.002 rows=2 loops=6)
         Output: s2.n, s2.t
         Index Cond: (s2.n = s1.n)
         Buffers: shared hit=12
 Planning Time: 0.102 ms
 Execution Time: 0.046 ms

Without memoize, the inner index scan runs 6 times, doubling the logical I/O compared to the memoize plan.

 

Conclusion

The memoize feature in PostgreSQL 14 offers significant performance improvements for nested-loop joins by caching results and reducing redundant scans. This can lead to substantial savings in CPU and I/O, especially in queries with repeated lookups. By understanding and leveraging memoize, you can optimize your PostgreSQL queries for better efficiency.

Share this

More Blogs

RAG app with Postgres and pgvector

Build a RAG app using Postgres and pgvector to enhance AI applications with improved data management, privacy, and efficient local LLM integration.
October 08, 2024

Mastering PostgreSQL in Kubernetes with CloudNativePG

Previewing EDB’s training session for PGConf.EU 2024, presented by our Kubernetes experts EDB is committed to advancing PostgreSQL by sharing our expertise and insights, especially as the landscape of database...
September 30, 2024