Oracle to PostgreSQL -- Cursors and Common Table Expressions

August 18, 2020
 

It hurts when you do that, so don’t do that.

In Oracle, cursors are taught as a part of programming 101. In many (if not most) cases, cursors are the first thing that the Oracle developer learns. The first class usually starts with: “There are 13 logical structures, the first of which is the loop, which goes like this…”

PostgreSQL, on the other hand, does not heavily rely on cursors. Yes, they exist. There are several syntax flavors for how to use them. I will cover all of the major designs at some point in this article series. But the first lesson in PostgreSQL cursors is that there are quite a few (and much better) algorithmic alternatives to using cursors in PostgreSQL. In fact, in a 23-year career with PostgreSQL, I have only actually found a need to use cursors twice. And I regret one of those.

Cursors are an expensive habit.

Iterating is better than looping. “What’s the difference?”, you might ask. Well, the difference is about O(N) vs. O(N^2). Ok, I’ll say that again in English. The complexity of using cursors is that they loop through data sets using the same pattern as a nested for loop. Each additional data set raises the complexity of the total by exponentiation. That is because each additional data set effectively creates another innermost loop. Two data sets are O(N^2), three data sets are O(N^3) and so on. Getting in the habit of using cursors when there are better algorithms to choose from can be costly.

They do this without any of the optimizations that would be available to lower level functions of the database itself. That is, they can’t use indexes in any significant way, can’t transform into sub-selects, pull up into joins or use parallel reads. They also won’t benefit from any future optimizations that the database has available. I hope you are a grandmaster coder that always gets the correct algorithm and codes it perfectly the first time, because you just defeated one of the most important benefits of a relational database. Performance by relying on best practices, or at least somebody else’s code.

Everybody is better than you. Maybe not individually, but collectively almost certainly. Aside from the declarative vs. imperative argument, coding in a language that is once-removed from the underlying function library allows everybody else to attempt to make your code run faster, better and more efficiently without consulting you. And that’s very, very good for you.

Let’s make some data to play with.

We’ll start by setting up some data to play with over the next few articles.

Contents of cursors.bash:

set -o nounset                              # Treat unset variables as an error
# This script assumes that you have PostgreSQL running locally,
#  that you have a database with the same name as the local user,
#  and that you can create all this structure.
#  If not, then:
#   sudo -iu postgres createuser -s $USER
#   createdb

# Clean up from the last run
[[ -f itisPostgreSql.zip ]] && rm itisPostgreSql.zip
subdirs=$(ls -1 itisPostgreSql* | grep : | sed -e 's/://')
for sub in ${subdirs[@]}
do
    rm -rf $sub
done

# Get the newest file
wget https://www.itis.gov/downloads/itisPostgreSql.zip
# Unpack it
unzip itisPostgreSql.zip
# This makes a directory with the stupidest f-ing name possible
#  itisPostgreSqlDDMMYY
subdir=$(\ls -1 itisPostgreSql* | grep : | sed -e 's/://')
# The script wants to create an "ITIS" database.  Let's just make that a schema.
sed -i $subdir/ITIS.sql -e '/"ITIS"/d'  # Cut the lines about making the db
sed -i $subdir/ITIS.sql -e '/-- PostgreSQL database dump/s/.*/CREATE SCHEMA IF NOT EXISTS itis;/'
sed -i $subdir/ITIS.sql -e '/SET search_path = public, pg_catalog;/s/.*/SET search_path TO itis;/'
# ok, we have a schema to put the data in, let's do the import.
#  timeout if we can't connect, fail on error.
PG_TIMEOUT=5 psql -v "ON_ERROR_STOP=1" -f $subdir/ITIS.sql

This gives us a bit over 600K records to play with in the itis.hierarchy table, which contains a taxonomy of the natural world. We’ll use this data to illustrate various methods of dealing with complex data interactions.

The first alternative.

My go-to design pattern for walking along recordsets while doing expensive operations is the Common Table Expression (CTE).

Here is an example of the basic form:

WITH RECURSIVE fauna AS (
    SELECT tsn, parent_tsn, tsn::text taxonomy
    FROM itis.hierarchy
    WHERE parent_tsn = 0
    UNION ALL
    SELECT h1.tsn, h1.parent_tsn, f.taxonomy || '.' || h1.tsn
    FROM itis.hierarchy h1
    JOIN fauna f
    ON h1.parent_tsn = f.tsn
    )
SELECT *
FROM fauna
ORDER BY taxonomy;

Which produces the following results:

┌─────────┬────────┬──────────────────────────────────────────────────────────┐
│   tsn   │ parent │             taxonomy                                     │
│         │ tsn    │                                                          │
├─────────┼────────┼──────────────────────────────────────────────────────────┤
│  202422 │      0 │202422                                                    │
│  846491 │ 202422 │202422.846491                                             │
│  660046 │ 846491 │202422.846491.660046                                      │
│  846497 │ 660046 │202422.846491.660046.846497                               │
│  846508 │ 846497 │202422.846491.660046.846497.846508                        │
│  846553 │ 846508 │202422.846491.660046.846497.846508.846553                 │
│  954935 │ 846553 │202422.846491.660046.846497.846508.846553.954935          │
│    5549 │ 954935 │202422.846491.660046.846497.846508.846553.954935.5549     │
│    5550 │   5549 │202422.846491.660046.846497.846508.846553.954935.5549.5550│
│  954936 │ 846553 │202422.846491.660046.846497.846508.846553.954936          │
│  954904 │ 660046 │202422.846491.660046.954904                               │
│  846509 │ 954904 │202422.846491.660046.954904.846509                        │
│   11473 │ 846509 │202422.846491.660046.954904.846509.11473                  │
│   11474 │  11473 │202422.846491.660046.954904.846509.11473.11474            │
│   11475 │  11474 │202422.846491.660046.954904.846509.11473.11474.11475      │
│   ...   │        │...snip...                                                │
└─────────┴────────┴──────────────────────────────────────────────────────────┘
(601187 rows)

This query is easily modifiable to perform any calculations. That includes data enrichment, complex functions or anything else your heart desires.

“But look!”, you exclaim. “It says RECURSIVE right there in the name! Isn’t it doing exactly what you said not to do?” Well, actually no. Under the hood, it doesn’t use recursion in the nested sense or looping to perform the “recursion”. It’s just a linear read of the table until the subordinate query fails to return any new results. And it works with indexes also.

Let’s look at the execution plan:

┌──────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                              QUERY PLAN                                              │
├──────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Sort  (cost=211750.51..211840.16 rows=35858 width=40)                                                │
│   Output: fauna.tsn, fauna.parent_tsn, fauna.taxonomy                                                │
│   Sort Key: fauna.taxonomy                                                                           │
│   CTE fauna                                                                                          │
│     ->  Recursive Union  (cost=1000.00..208320.69 rows=35858 width=40)                               │
│           ->  Gather  (cost=1000.00..15045.02 rows=18 width=40)                                      │
│                 Output: hierarchy.tsn, hierarchy.parent_tsn, ((hierarchy.tsn)::text)                 │
│                 Workers Planned: 2                                                                   │
│                 ->  Parallel Seq Scan on itis.hierarchy  (cost=0.00..14043.22 rows=8 width=40)       │
│                       Output: hierarchy.tsn, hierarchy.parent_tsn, (hierarchy.tsn)::text             │
│                       Filter: (hierarchy.parent_tsn = 0)                                             │
│           ->  Hash Join  (cost=5.85..19255.85 rows=3584 width=40)                                    │
│                 Output: h1.tsn, h1.parent_tsn, ((f.taxonomy || '.'::text) || (h1.tsn)::text)         │
│                 Hash Cond: (h1.parent_tsn = f.tsn)                                                   │
│                 ->  Seq Scan on itis.hierarchy h1  (cost=0.00..16923.87 rows=601187 width=8)         │
│                       Output: h1.hierarchy_string, h1.tsn, h1.parent_tsn, h1.level, h1.childrencount │
│                 ->  Hash  (cost=3.60..3.60 rows=180 width=36)                                        │
│                       Output: f.taxonomy, f.tsn                                                      │
│                       ->  WorkTable Scan on fauna f  (cost=0.00..3.60 rows=180 width=36)             │
│                             Output: f.taxonomy, f.tsn                                                │
│   ->  CTE Scan on fauna  (cost=0.00..717.16 rows=35858 width=40)                                     │
│         Output: fauna.tsn, fauna.parent_tsn, fauna.taxonomy                                          │
│ JIT:                                                                                                 │
│   Functions: 13                                                                                      │
│   Options: Inlining false, Optimization false, Expressions true, Deforming true                      │
└──────────────────────────────────────────────────────────────────────────────────────────────────────┘

Let’s go ahead and create an index, and see how that works.

CREATE UNIQUE INDEX taxonomy_parents ON itis.hierarchy (parent_tsn, tsn);

┌─────────────────────────────────────────────────────────────────────────────┐
│                             QUERY PLAN                                      │
├─────────────────────────────────────────────────────────────────────────────┤
│Sort  (cost=135148.13..135237.77 rows=35858 width=40)                        │
│  Output: fauna.tsn, fauna.parent_tsn, fauna.taxonomy                        │
│  Sort Key: fauna.taxonomy                                                   │
│  CTE fauna                                                                  │
│    ->  Recursive Union  (cost=4.56..131718.31 rows=35858 width=40)          │
│          ->  Bitmap Heap Scan on itis.hierarchy  (cost=4.56..74.69 rows=18) │
│              Output: hierarchy.tsn, hierarchy.parent_tsn, (hierarchy.tsn)   │
│                Recheck Cond: (hierarchy.parent_tsn = 0)                     │
│                ->  Bitmap Index Scan on taxonomy_parents                    │
│                                                   (cost=0.00..4.56 rows=18) │
│                      Index Cond: (hierarchy.parent_tsn = 0)                 │
│          ->  Nested Loop  (cost=0.42..13092.65 rows=3584 width=40)          │
│                Output: h1.tsn, h1.parent_tsn,((f.taxonomy || '.')||(h1.tsn))│
│                ->  WorkTable Scan on fauna f  (cost=0.00..3.60 rows=180)    │
│                      Output: f.tsn, f.parent_tsn, f.taxonomy                │
│                ->  Index Only Scan using taxonomy_parents on itis.hierarchy │
│                                   h1  (cost=0.42..72.32 rows=20 width=8)    │
│                      Output: h1.parent_tsn, h1.tsn                          │
│                      Index Cond: (h1.parent_tsn = f.tsn)                    │
│  ->  CTE Scan on fauna  (cost=0.00..717.16 rows=35858 width=40)             │
│        Output: fauna.tsn, fauna.parent_tsn, fauna.taxonomy                  │
│JIT:                                                                         │
│  Functions: 6                                                               │
└─────────────────────────────────────────────────────────────────────────────┘

Well, that was satisfying, wasn’t it? And it would have been prohibitively hard to create an index in combination with a cursor to do the same work.   This structure gets us far enough to be able to walk a fairly complex tree structure and use it for simple lookups.

In the next installment, we’ll talk about another method of getting the same result even faster. For our next article, we’ll talk about the extension ltree, and how to look at hierarchical data amazingly quickly. Stay tuned.

Share this