Partitioning a large table without a long-running lock

Andrew Dunstan

July 21, 2020

Let’s say you have an application that has a huge table and that needs to be available all the time. It’s got so big that managing it without partitioning it is getting increasingly difficult. But you can’t take the table offline to create a new partitioned version of the table, which would take a great deal of time since this is a huge table.

Here is a recipe for dealing with the problem. It won’t necessarily work for every situation, particularly tables with very heavy write loads, but it could work for many.

First let’s set up our sample table and populate it with some data, 10 million rows in this case:

create table orig_table
( id serial not null,
  data float default random()
);

create index orig_data_index on orig_table(data);
create index orig_id_index on orig_table(id);

insert into orig_table (id)
select nextval('orig_table_id_seq')
from generate_series(1,100000);

Now we’re going to set up the partitioning structure. In this case we’re going to use four ranges on the data field:

create table part_table
(like orig_table including defaults including indexes including constraints)
partition by range(data);

create table part_table_p1
partition of part_table
for values from (minvalue) to (0.25);

create table part_table_p2
partition of part_table
for values from (0.25) to (0.5);

create table part_table_p3
partition of part_table
for values from (0.5) to (0.75);

create table part_table_p4
partition of part_table
for values from (0.75) to (maxvalue);

We’re going to rename the original table and then create a view with that name which is a union of the rows in the new partitioned table and the old non-partitioned table. But before that, we’ll need a trigger function to handle all the insert, update and delete operations for the view.

create or replace function part_v_trigger()
returns trigger
language plpgsql
as
$TRIG$

begin
    IF TG_OP = 'INSERT'
    THEN
        INSERT INTO part_table VALUES(NEW.id, NEW.data);
        RETURN NEW;
    ELSIF TG_OP = 'DELETE'
    THEN
        DELETE FROM part_table WHERE id = OLD.id;
        DELETE FROM old_orig_table WHERE id = OLD.id;
        RETURN OLD;
    ELSE -- UPDATE
        DELETE FROM old_orig_table WHERE id = OLD.id;
        IF FOUND
        THEN
            INSERT INTO part_table VALUES(NEW.id, NEW.data);
        ELSE
            UPDATE part_table SET id = NEW.id, data = NEW.data
                WHERE id = OLD.id;
        END IF;
        RETURN NEW;
    END IF;
end

$TRIG$;

Then we can move to the transitional setup in one quick transaction. Since we won’t be adding new tuples to the old non-partitioned table any more, we disable autovacuum on it.

BEGIN;

    ALTER TABLE orig_table RENAME TO old_orig_table;

    ALTER TABLE old_orig_table SET(
      autovacuum_enabled = false, toast.autovacuum_enabled = false
   );

    CREATE VIEW orig_table AS
    SELECT id, data FROM old_orig_table
    UNION ALL
    SELECT id, data FROM part_table
    ;

    CREATE TRIGGER orig_table_part_trigger
    INSTEAD OF INSERT OR UPDATE OR DELETE on orig_table
    FOR EACH ROW
    EXECUTE FUNCTION part_v_trigger();

COMMIT;

Note that all inserts and updates are steered to the partitioned table even if the row being updated is from the old table. We’re going to use that fact to move all the old rows in batches. What we need is a looping program that selects a small number of old table rows to move and updates them so that they are moved. Here is the sample program I used – it’s written in Perl but should be pretty easy for most readers to follow even if not Perl-savvy.

#! /bin/perl

use strict;
use DBI;

my $move_rows = qq{
  WITH oldkeys AS
  (
    SELECT id
    FROM old_orig_table
    LIMIT 10000
  )
  UPDATE orig_table
  SET id = id
  WHERE ID IN (SELECT id FROM oldkeys)
};

my $dbh = DBI->connect("dbi:Pg:dbname=tpart;host=/tmp;port=5711",
    '','',{AutoCommit => 0, RaiseError => 1, PrintError => 0}
);

my $rows_done;
do
{
    $rows_done = $dbh->do($move_rows);
    $dbh->commit;
    if ($rows_done != 0) # it will be 0e0 which is 0 but true
    {
        sleep 2;
    }
} until $rows_done == 0 || ! $rows_done;
print "done\n";
$dbh->disconnect;

This program can be safely interrupted if necessary. There are other ways of writing it. If speed is an issue a slightly more complex piece of SQL can be used which avoids calling the trigger.

Once there are no more rows left in the original table, we can replace the view with the fully partitioned table. In a separate transaction (because it can take some time and it’s not critical) we finally drop the old non-partitioned table.

BEGIN;
    DROP VIEW orig_table CASCADE;
    DROP FUNCTION part_v_trigger();
    ALTER SEQUENCE orig_table_id_seq OWNED BY part_table.id;
    ALTER TABLE part_table RENAME TO orig_table;
COMMIT;

BEGIN;
    DROP TABLE old_orig_table;
COMMIT;

Our application should have remained fully functional and blissfully unaware of the changes we have been making under the hood while we were making them.

Resource Feature Callout 1

Partitioning a large table without a long-running lock

Andrew Dunstan

Why the Future of Postgres Extensions in Kubernetes is Immutable and Flexible

Minimal downtime Postgres major version upgrades with EDB Postgres Distributed

Integrating EDB's AI Accelerator with Nvidia NIM

Resource Feature Callout 1

Partitioning a large table without a long-running lock

Andrew Dunstan

More Blogs

More Blogs

Why the Future of Postgres Extensions in Kubernetes is Immutable and Flexible

Minimal downtime Postgres major version upgrades with EDB Postgres Distributed

Integrating EDB's AI Accelerator with Nvidia NIM