SUMMARY: This article describes indexes in PostgreSQL and how they can help retrieve data faster. It covers the types of indexes and offers examples for each.
An Index is the structure or object by which we can retrieve specific rows or data faster. Indexes can be created using one or multiple columns or by using the partial data depending on your query requirement conditions.
Index will create a pointer to the actual rows in the specified table.
You can create an index by using the CREATE INDEX syntax. A more detailed explanation about creating the index syntax and the available options can be found in the PostgreSQL documentation: https://www.postgresql.org/docs/12/sql-createindex.html.
The simple command to create the index is as follows:
index_demo=# \d+ public."Artist"
Table "public.Artist"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
----------+------------------------+-----------+----------+---------+----------+--------------+-------------
ArtistId | integer | | not null | | plain | |
Name | character varying(120) | | | | extended | |
Indexes:
"PK_Artist" PRIMARY KEY, btree ("ArtistId")
index_demo=# CREATE index "Artist_Name_idx" on public."Artist"("Name");
CREATE INDEX
index_demo=# \d+ public."Artist"
Table "public.Artist"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
----------+------------------------+-----------+----------+---------+----------+--------------+-------------
ArtistId | integer | | not null | | plain | |
Name | character varying(120) | | | | extended | |
Indexes:
"PK_Artist" PRIMARY KEY, btree ("ArtistId")
"Artist_Name_idx" btree ("Name")
In this example CREATE INDEX can be used to create an index on the “Name” column for the table “Artist”. By default a B-tree index will get created.
Types Of Indexes
PostgreSQL server provides following types of indexes, which each uses a different algorithm:
B-tree
Hash
GiST
SP-GiST
GIN
BRIN
Not all types of indexes are the best fit for every environment, so you should choose the one you use carefully. How you decide will depend upon your requirements. Let’s review the differences between each type:
1. B-tree index
The most common and widely used index type is the B-tree index. This is the default index type for the CREATE INDEX command, unless you explicitly mention the type during index creation.
If the indexed column is used to perform the comparison by using comparison operators such as <, <=, =, >=, and >, then the Postgres optimizer uses the index created by the B-tree option for the specified column.
Example
Create a table “public.Track” and add some dummy data:
index_demo_=# \d+ public."Track"
Table "public.Track"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------------+------------------------+-----------+----------+---------+----------+--------------+-------------
TrackId | integer | | not null | | plain | |
Name | character varying(200) | | not null | | extended | |
AlbumId | integer | | | | plain | |
Bytes | integer | | | | plain | |
UnitPrice | numeric(10,2) | | not null | | main | |
Indexes:
"PK_Track" PRIMARY KEY, btree ("TrackId")
Select data with the WHERE clause:
index_demo_=# explain (analyze,verbose) select * from public."Track" where "Name"='Princess of the Dawn';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on public."Track" (cost=4.29..8.30 rows=1 width=71) (actual time=0.923..0.924 rows=1 loops=1)
Output: "TrackId", "Name", "AlbumId", "Bytes", "UnitPrice"
Recheck Cond: (("Track"."Name")::text = 'Princess of the Dawn'::text)
Heap Blocks: exact=1
-> Bitmap Index Scan on "Track_Name_idx" (cost=0.00..4.29 rows=1 width=0) (actual time=0.578..0.578 rows=1 loops=1)
Index Cond: (("Track"."Name")::text = 'Princess of the Dawn'::text)
Planning time: 0.128 ms
Execution time: 0.953 ms
(8 rows)
Here is a query which can use the created B-tree index and then again try to select the data with the same WHERE clause:
index_demo_=# create index "Track_Name_idx" on public."Track" ("Name");
CREATE INDEX
index_demo_=#
index_demo_=# explain (analyze,verbose) select * from public."Track" where "Name"='Princess of the Dawn';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Index Scan using "Track_Name_idx" on public."Track" (cost=0.28..8.30 rows=1 width=71) (actual time=0.544..0.545 rows=1 loops=1)
Output: "TrackId", "Name", "AlbumId", "MediaTypeId", "GenreId", "Composer", "Milliseconds", "Bytes", "UnitPrice"
Index Cond: (("Track"."Name")::text = 'Princess of the Dawn'::text)
Planning time: 0.151 ms
Execution time: 0.571 ms
(5 rows)
2. Hash index
The Hash index can be used only if the equality condition = is being used in the query.
Example
Create the Hash index on the table “public.Track” using HASH for column “Name”:
index_demo_=# create index "Track_Name_idx" on public."Track" using HASH ("Name");
CREATE INDEX
index_demo_=# \d+ public."Track"
Table "public.Track"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------------+------------------------+-----------+----------+---------+----------+--------------+-------------
TrackId | integer | | not null | | plain | |
Name | character varying(200) | | not null | | extended | |
AlbumId | integer | | | | plain | |
MediaTypeId | integer | | not null | | plain | |
GenreId | integer | | | | plain | |
Composer | character varying(220) | | | | extended | |
Milliseconds | integer | | not null | | plain | |
Bytes | integer | | | | plain | |
UnitPrice | numeric(10,2) | | not null | | main | |
Indexes:
"PK_Track" PRIMARY KEY, btree ("TrackId")
"Track_Name_idx" hash ("Name")
Select data by using the equality condition, the optimizer will use the Hash index here:
index_demo_=# explain (analyze,verbose) select * from public."Track" where "Name"='Princess of the Dawn';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on public."Track" (cost=4.01..8.02 rows=1 width=71) (actual time=0.023..0.023 rows=1 loops=1)
Output: "TrackId", "Name", "AlbumId", "MediaTypeId", "GenreId", "Composer", "Milliseconds", "Bytes", "UnitPrice"
Recheck Cond: (("Track"."Name")::text = 'Princess of the Dawn'::text)
Heap Blocks: exact=1
-> Bitmap Index Scan on "Track_Name_idx" (cost=0.00..4.01 rows=1 width=0) (actual time=0.009..0.009 rows=1 loops=1)
Index Cond: (("Track"."Name")::text = 'Princess of the Dawn'::text)
Planning time: 0.272 ms
Execution time: 0.115 ms
(8 rows)
index_demo_=#
3. GiST index (Generalized Search Tree index)
GiST stands for Generalized Search Tree Index. The main point of GiST is to be able to index queries that simply are not indexable using B-tree. So GiST is useful when you have queries that are not btree-indexable. The column can be of tsvector or tsquery type.
There is a well-known, and pretty fast, geographic extension to PostgreSQL — PostGIS, which uses GiST for its purposes. GiST indexes are what is used if you want to speed up full text searches.
As per the Postgres Documentation, “the GiST indexes are lossy to an extent, it means that you have to deal with false positive and negative and need to check the table rows to eliminate false matches.(PostgreSQL does this automatically when needed.)
GiST indexes are lossy because each document is represented in the index by a fixed-length signature. The signature is generated by hashing each word into a single bit in an n-bit string, with all these bits OR-ed together to produce an n-bit document signature. When two words hash to the same bit position there will be a false match. If all the words in the query have matches (real or false) then the table row must be retrieved to see if the match is correct.”
Example
Create a table with some geographical data points such as circle or point and index it using GiST:
index_demo_=# create table geo_point(circle_details circle);
CREATE TABLE
index_demo_=# create index ON geo_point using gist(circle_details);
CREATE INDEX
index_demo_=# \d+ geo_point
Table "public.geo_point"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
----------------+--------+-----------+----------+---------+---------+--------------+-------------
circle_details | circle | | | | plain | |
Indexes:
"geo_point_circle_details_idx" gist (circle_details)
index_demo_=#
Insert a few dummy records:
index_demo_=# select * from geo_point ;
circle_details
----------------
<(1,1),5>
<(2,2),6>
<(3,3),7>
<(4,4),8>
<(5,5),9>
<(6,6),10>
<(7,7),9>
<(11,10),10>
<(1,10),10>
<(9,9),9>
<(12,12),12>
(11 rows)
Select data:
index_demo_=# explain analyze verbose select * from geo_point where circle_details <@ circle '((7,7),9)';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on public.geo_point (cost=4.14..8.15 rows=1 width=24) (actual time=0.060..0.060 rows=1 loops=1)
Output: circle_details
Recheck Cond: (geo_point.circle_details <@ '<(7,7),9>'::circle)
Rows Removed by Index Recheck: 10
Heap Blocks: exact=1
-> Bitmap Index Scan on geo_point_circle_details_idx (cost=0.00..4.14 rows=1 width=0) (actual time=0.041..0.041 rows=11 loops=1)
Index Cond: (geo_point.circle_details <@ '<(7,7),9>'::circle)
Planning time: 0.061 ms
Execution time: 0.095 ms
(9 rows)
index_demo_=#
4. SP-GiST (Space-Partitioned GiST)
SP-GiST stands for “space-partitioned GiST.” This index type supports non-balanced data structures such as quadtrees, k-d trees, and radix trees.
See the PostgreSQL documentation for the built-in operator classes supported by core PostgreSQL distribution: https://www.postgresql.org/docs/12/spgist-builtin-opclasses.html.
You can also check below for implementation of the SP-GiST operator classes: https://www.postgresql.org/docs/12/spgist-implementation.html.
5. GIN (Generalized Inverted Index)
GIN stands for “Generalized Inverted Index.” These are the “Inverted Indexes” that get used for multiple component values such as arrays.
The standard distribution of PostgreSQL includes a GIN operator class for arrays, which supports indexed queries using these operators: <@, @>, =, and &&.
The built-in operator classes provided by GIN can be searched on the PostgreSQL documentation site: https://www.postgresql.org/docs/12/gin-builtin-opclasses.html.
As per the PostgreSQL documentation, the GIN index will be slower for INSERT and UPDATE operations, so for bulk insertions into a table it is recommended to drop the GIN index and re-create the INDEX after finishing bulk operations.
6. BRIN (Block Range index)
BRIN stands for “Block Range index.” This featured index type was introduced with Postgres v 9.5.
According to the PostgreSQL documentation:
“Because a BRIN index is very small, scanning the index adds little overhead compared to a sequential scan, but may avoid scanning large parts of the table that are known not to contain matching tuples.”
“The size of the block range is determined at index creation time by the pages_per_range storage parameter. The number of index entries will be equal to the size of the relation in pages divided by the selected value for pages_per_range. Therefore, the smaller the number, the larger the index becomes (because of the need to store more index entries), but at the same time the summary data stored can be more precise and more data blocks can be skipped during an index scan.”
In a BRIN index, PostgreSQL reads your selected column's maximum and minimum values for each 8k-size page of stored data, and then stores that information (page number and minimum and maximum values of column) into the BRIN index.
B-tree indexes have entries for each row in your table, duplicating the data in the indexed columns. This permits super-fast index-only scans, but can be wasteful of disk space. A B-tree index also stores the data in sorted order, which permits very fast lookups of single rows.
It also stores the full location of each row in the table, which can require a lot of work when running queries that return many rows.
However, a BRIN index contains only the minimum and maximum values contained in a group of database pages. Therefore, the reason for using the BRIN index is that this is a very straightforward way to optimize for speed, and the BRIN index only takes up very little space compared to a B-tree index. BRIN is designed for handling very large tables in which certain columns have some natural correlation with their physical location within the table.
A good example of where you could use a BRIN index is for data you would consider immutable. For the example that follows, the BRIN index is 72kb in size, while the B-tree index is 676mb in size. Yet the performance is pretty much identical for selects.
Example
index_demo_=# create index "Track_TrackId_idx" on public."Track" using brin("TrackId");
CREATE INDEX
index_demo_=# \d+ public."Track"
Table "public.Track"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------------+------------------------+-----------+----------+---------+----------+--------------+-------------
TrackId | integer | | not null | | plain | |
Name | character varying(200) | | not null | | extended | |
AlbumId | integer | | | | plain | |
MediaTypeId | integer | | not null | | plain | |
GenreId | integer | | | | plain | |
Composer | character varying(220) | | | | extended | |
Milliseconds | integer | | not null | | plain | |
Bytes | integer | | | | plain | |
UnitPrice | numeric(10,2) | | not null | | main | |
Indexes:
"PK_Track" PRIMARY KEY, btree ("TrackId")
"Track_TrackId_idx" brin ("TrackId")
index_demo_=# \di+
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+-------------------+-------+----------+-------+-------+-------------
public | PK_Track | index | postgres | Track | 96 kB |
public | Track_TrackId_idx | index | postgres | Track | 48 kB |
(2 rows)
index_demo_=#
Tips
1. Indexes add overhead to the database system as a whole, so they should be used sensibly.
For Example: The INSERT and UPDATE statements take more time on tables having indexes, whereas the SELECT statements become fast on those tables. The reason is that while doing INSERT or UPDATE, a database needs to insert or update the index values as well.
2. You may need to run the ANALYZE command regularly to update statistics to allow the query planner to update the decisions planner.
3. You need to maintain the index properly so that it will not get bloated. You can REINDEX or you can also re-create the index concurrently.
4. A GIN index is expected to run slower than a B-tree because of the flexibility it provides.
5. In PostgreSQL versions prior to v10, Hash indexes were not WAL-logged, so they might need to be rebuilt after a database crash, if there were any unwritten changes.
Starting with v10, WAL-logging support was added to Hash indexes, which makes them crash-safe and replicable.
Reference Links
https://www.postgresql.org/docs/12/indexes-types.html