One of my colleagues who was recently working with a customer has presented a customer case. According to him, the customer has a partitioned table and EDB Postgres™ was not applying the partition pruning in his query. So, I thought to blog about partition pruning, so that EDB Postgres developers and DBAs can benefit.
EDB Postgres supports two types of partition pruning:
1. Constraint exclusion pruning:
It is a feature introduced in PostgreSQL 8.1. This type of pruning works with the PostgreSQL-style of partition. With constraint exclusion enabled, the planner will examine the constraints of each partition and try to prove that the partition need not be scanned because it could not contain any rows meeting the query's WHERE clause. When the planner can prove this, it excludes the partition from the query plan.
However, it has some limitations. Following is the limitation of constraint_exclusion:
a. Constraint exclusion only works when the query's WHERE clause contains constants (or externally supplied parameters). For example, a comparison against a non-immutable function such as CURRENT_TIMESTAMP cannot be optimized, since the planner cannot know which partition the function value might fall into at run time.
b. Keep the partitioning constraints simple, else the planner may not be able to prove that partitions don't need to be visited. Use simple equality conditions for list partitioning, or simple range tests for range partitioning, as illustrated in the preceding examples. A good rule of thumb is that partitioning constraints should contain only comparisons of the partitioning column(s) to constants using B-tree-indexable operators.
For verification, the below shows the behavior of constraint_exclusion pruning:
1. Let's create a PostgreSQL-style partition table using table inheritance feature.
CREATE TABLE measurement (
city_id int not null,
logdate date not null,
peaktemp int,
unitsales int
);
CREATE TABLE measurement_y2004m02 (
CHECK ( date_part('month'::text, logdate) = 2)
) INHERITS (measurement);
CREATE TABLE measurement_y2004m03 (
CHECK ( date_part('month'::text, logdate) = 3 )
) INHERITS (measurement);
Execute simple query to verify the constraint_exclusion behavior based on above definition:
edb=# EXPLAIN (costs off) SELECT count(*) FROM measurement WHERE date_part('month'::text, logdate) = 3;
QUERY PLAN
-----------------------------------------------------------------------------------
Aggregate
-> Append
-> Seq Scan on measurement
Filter: (date_part('month'::text, logdate) = '3'::double precision)
-> Seq Scan on measurement_y2004m02
Filter: (date_part('month'::text, logdate) = '3'::double precision)
-> Seq Scan on measurement_y2004m03
Filter: (date_part('month'::text, logdate) = '3'::double precision)
(8 rows)
The above output of the query shows EDB Postgres considered all partitions of table measurements, even though we had included partition column and constant value in WHERE clause. This is due to check constraint which has date_part function. date_part function is not immutable in Postgres, therefore at planning time, EDB Postgres doesn't know what value it will return. And, if the user doesn't include proper WHERE clause as mentioned in check constraint, pruning will not work.
In Postgres you can make a function immutable by using ALTER FUNCTION command.
In the below example, we will make date_part function immutable to check if constraint_exclusion works with date_part immutable function or not:
Convert date_part function to immutable :
edb=# ALTER FUNCTION date_part (text, timestamp without time zone ) immutable;
ALTER FUNCTION
Perform EXPLAIN command to check the behavior of constraint_exclusion using immutable function:
edb=# EXPLAIN (costs off) SELECT count(*) FROM measurement WHERE date_part('month'::text, logdate) = 3;
QUERY PLAN
-----------------------------------------------------------------------------------
Aggregate
-> Append
-> Seq Scan on measurement
Filter: (date_part('month'::text, logdate) = '3'::double precision)
-> Seq Scan on measurement_y2004m03
Filter: (date_part('month'::text, logdate) = '3'::double precision)
(6 rows)
As you can see with immutable function EDB Postgres was able to perform constraint_exclusion pruning.
What if we change the WHERE clause a little bit and include < and = operator in our SQL queries (below are examples):
edb=# EXPLAIN (costs off) SELECT count(*) FROM measurement WHERE logdate < DATE '2004-03-01'; QUERY PLAN ------------------------------------------------------------------------------------- Aggregate -> Append
-> Seq Scan on measurement
Filter: (logdate < '01-MAR-04 00:00:00'::timestamp without time zone) -> Seq Scan on measurement_y2004m02
Filter: (logdate < '01-MAR-04 00:00:00'::timestamp without time zone) -> Seq Scan on measurement_y2004m03
Filter: (logdate < '01-MAR-04 00:00:00'::timestamp without time zone) (8 rows) edb=# EXPLAIN (costs off) SELECT count(*) FROM measurement WHERE logdate = DATE '2004-02-01'; QUERY PLAN ------------------------------------------------------------------------------------- Aggregate -> Append
-> Seq Scan on measurement
Filter: (logdate = '01-FEB-04 00:00:00'::timestamp without time zone)
-> Seq Scan on measurement_y2004m02
Filter: (logdate = '01-FEB-04 00:00:00'::timestamp without time zone)
-> Seq Scan on measurement_y2004m03
Filter: (logdate = '01-FEB-04 00:00:00'::timestamp without time zone)
(8 rows)
As you can see with a change in WHERE clause and exclusion of the way constraint defined on partition, Postgres will scan all partitions.
Based on above we can conclude that if a user is planning to use Postgres way of partition then they have to be careful about the constraint definition in order to utilize constraint_exclusion pruning.
Lets modify the definition of the measurement table and verify the ,=, <= and = operator in WHERE clause.
CREATE TABLE measurement (
city_id int not null,
logdate date not null,
peaktemp int,
unitsales int
);
CREATE TABLE measurement_y2004m02 (
CHECK ( logdate >= DATE '2004-02-01' AND logdate < DATE '2004-03-01' ) ) INHERITS (measurement); CREATE TABLE measurement_y2004m03 ( CHECK ( logdate >= DATE '2004-03-01' AND logdate < DATE '2004-04-01' )
) INHERITS (measurement);
Below is explain plan based on above definition:
edb=# EXPLAIN (costs off) SELECT count(*) FROM measurement WHERE logdate < DATE '2004-03-01'; QUERY PLAN ------------------------------------------------------------------------------------- Aggregate -> Append
-> Seq Scan on measurement
Filter: (logdate < '01-MAR-04 00:00:00'::timestamp without time zone) -> Seq Scan on measurement_y2004m02
Filter: (logdate < '01-MAR-04 00:00:00'::timestamp without time zone) (6 rows) edb=# EXPLAIN (costs off) SELECT count(*) FROM measurement WHERE logdate = DATE '2004-03-01'; QUERY PLAN ------------------------------------------------------------------------------------- Aggregate -> Append
-> Seq Scan on measurement
Filter: (logdate = '01-MAR-04 00:00:00'::timestamp without time zone)
-> Seq Scan on measurement_y2004m03
Filter: (logdate = '01-MAR-04 00:00:00'::timestamp without time zone)
(6 rows)
The above clearly shows that with correct constraint definition, constriant_exclusion pruning can work for >,<,>=, <= and = operator in WHERE clause.
2. Fast pruning:
EDB Postgres has CREATE TABLE PARTITION SYNTAX since version 9.1. PARTITION SYNTAX in EDB Postgres uses one more pruning called fast pruning. Fast pruning uses the partition metadata and query predicates to efficiently reduce the set of partitions to scan. Fast pruning in EDB Postgres happens before query plan. Let's verify the behavior of fast pruning.
As mentioned fast pruning works with partition which user created using EDB Postgres CREATE TABLE PARTITION Syntax. Let's modify the above definition of measurement table to use CREATE TABLE PARTITION SYNTAX as given below:
CREATE TABLE measurement (
city_id int not null,
logdate date not null,
peaktemp int,
unitsales int
)
PARTITION BY RANGE(logdate)
(PARTITION y2004m01 VALUES LESS THAN ('2004-02-01'),
PARTITION y2004m02 VALUES LESS THAN ('2004-03-01'),
PARTITION y2004m03 VALUES LESS THAN ('2004-04-01')
);
edb=# EXPLAIN (costs off) SELECT count(*) FROM measurement WHERE logdate < DATE '2004-03-01'; QUERY PLAN ------------------------------------------------------------------------------------- Aggregate -> Append
-> Seq Scan on measurement
Filter: (logdate < '01-MAR-04 00:00:00'::timestamp without time zone) -> Seq Scan on measurement_y2004m01
Filter: (logdate < '01-MAR-04 00:00:00'::timestamp without time zone) -> Seq Scan on measurement_y2004m02
Filter: (logdate < '01-MAR-04 00:00:00'::timestamp without time zone) (8 rows) edb=# EXPLAIN (costs off) SELECT count(*) FROM measurement WHERE logdate = DATE '2004-03-01'; QUERY PLAN ------------------------------------------------------------------------------------- Aggregate -> Append
-> Seq Scan on measurement
Filter: (logdate = '01-MAR-04 00:00:00'::timestamp without time zone)
-> Seq Scan on measurement_y2004m03
Filter: (logdate = '01-MAR-04 00:00:00'::timestamp without time zone)
(6 rows)
edb=# EXPLAIN (costs off) SELECT count(*) FROM measurement WHERE logdate > DATE '2004-03-01';
QUERY PLAN
-------------------------------------------------------------------------------------
Aggregate
-> Append
-> Seq Scan on measurement
Filter: (logdate > '01-MAR-04 00:00:00'::timestamp without time zone)
-> Seq Scan on measurement_y2004m03
Filter: (logdate > '01-MAR-04 00:00:00'::timestamp without time zone)
(6 rows)
For more information on EDB Postgres pruning please refer to the following link.
Vibhor Kumar is Director, Solution Architecture, at EnterpriseDB.
This post originally appeared on Vibhor's personal blog.