Introduction
Monitoring and alerting on various aspects of your PostgreSQL database is crucial for ensuring optimal performance and locating any bottlenecks. In this blog, we explore the views pg_stat_all_tables and pg_stat_all_indexes that are used to monitor index usage and table statistics. We can pinpoint areas for optimization and raise the general performance of our PostgreSQL database by effectively monitoring these indicators and taking appropriate actions.
Understanding the views pg_stat_all_tables and pg_stat_all_indexes
The pg_stat_all_tables and pg_stat_all_indexes views are system views provided by PostgreSQL that offer valuable insights into the performance and utilization of tables and indexes.
- pg_stat_all_tables: This view provides statistics about all tables in the current database, including the number of sequential and index scans, updates, deletes, inserts, and more. It also offers information on the number of live and dead tuples along with vacuum and analyze stats.
- pg_stat_all_indexes: This view provides detailed statistics about all indexes in the current database. It includes information about the number of index scans, last index scan timestamp and count of rows fetched by particular index.
EDB’s PostgreSQL 16 Contributions
I have contributed to record statistics on the last sequential and index scans on tables.
1: Addition of Last index scan timestamp (last_idx_scan) column inside pg_stat_all_indexes view
2: Addition of Last Sequential scan timestamp (last_seq_scan) column inside pg_stat_all_tables view
https://www.postgresql.org/docs/16/release-16.html#RELEASE-16-MONITORING
The main idea behind this enhancement is to ease the identification of seldom used indexes and tables. Though idx_scan and seq_scan already give the number of times a scan has been run, these values don't contain the data about when it was run. The when could be an indicator of criticality, even if the use is infrequent.
By checking timestamps in conjunction with the number of times a scan has been run, we can identify the tables or indexes which are not being used in production. As a result, we can delete them to save up storage, improve query performance and reduce backup times.
Monitoring index usage
Indexes play a crucial role in enhancing query performance by enabling faster data retrieval. Monitoring index usage can help identify underutilized as well as important indexes that are utilized frequently. The following columns are returned from pg_stat_all_indexes view:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Important columns to check are as follows
- Index scan count (idx_scan): The idx_scan column in the pg_stat_all_indexes view represents the number of times this index has been used in an index scan on the table. By monitoring this metric, we can identify which indexes are frequently used and which are rarely or never used.
- Last index scan timestamp (last_idx_scan): The last_idx_scan column in the pg_stat_all_indexes is a new enhancement in PostgreSQL 16. It displays the timestamp of the most recent scan for a specific index. Keep in mind that the timestamp represents the transaction end time rather than the precise moment the index was utilized (which may differ significantly if it’s a long running transaction)
- idx_tup_read is the number of index entries returned by scans on this index. Its value increases whenever an index entry is read.
- Idx_tup_fetch is the number of live table/heap rows fetched by simple index scans using this index.
Let's demonstrate the following example:
CREATE TABLE sales (
id integer NOT NULL, email VARCHAR,
location VARCHAR, item VARCHAR,
price VARCHAR
);
INSERT INTO sales (
id, email, location, item,
price
)
VALUES
(
1, 'a@hotmail.com', 'London',
'item1', '$50'
),
(
2, 'b@hotmail.com', 'Australia',
'item2', '$5'
),
(
3, 'c@hotmail.com', 'Canada',
'item3', '$12'
),
(
4, 'd@hotmail.com',
'London', 'item4', '$2'
),
(
5, 'e@hotmail.com', 'Switzerland',
'item5', '$150'
),
(
6, 'f@hotmail.com',
'Canada', 'item6',
'$49'
);
CREATE INDEX idx_sale_id ON sales(id);
Let's check the row inside pg_stat_all_indexes view:
postgres=# select * from pg_stat_all_indexes where indexrelname='idx_sale_id';
-[ RECORD 1 ]-+------------------------------
relid | 24576
indexrelid | 24581
schemaname | public
relname | sales
indexrelname | idx_sale_id
idx_scan | 0
last_idx_scan |
idx_tup_read | 0
idx_tup_fetch | 0
Now we are going to run a select query on the sales table but before that, we are disabling seqscan GUC to force the planner to use index scan.
postgres=# set enable_seqscan=off;
SET
postgres=# explain analyze select * from sales where id = 5;
QUERY PLAN
-----------------------------------------------------------------------------------------
Index Scan using idx_sale_id_2 on sales (cost=0.13..8.15 rows=1 width=132) (actual time=0.057..0.059 rows=1 loops=1)
Index Cond: (id = 5)
Planning Time: 0.076 ms
Execution Time: 0.077 ms
(4 rows)
Here, pg_stat_all_indexes view gets updated where we can see last index scan timestamp:
postgres=# select * from pg_stat_all_indexes where indexrelname='idx_sale_id';
-[ RECORD 1 ]-+------------------------------
relid | 24576
indexrelid | 24581
schemaname | public
relname | sales
indexrelname | idx_sale_id
idx_scan | 1
last_idx_scan | 2023-07-13 07:15:49.886831-04
idx_tup_read | 1
idx_tup_fetch | 1
To find unused/underutilized indexes, check the query below
SELECT
idxstat.schemaname as schema_name,
idxstat.relname AS table_name,
indexrelname AS index_name,
idxstat.idx_scan AS index_scans_count,
idxstat.last_idx_scan AS last_idx_scan_timestamp,
pg_size_pretty(pg_relation_size(idxstat.indexrelid)) AS index_size
FROM
pg_stat_all_indexes AS idxstat
JOIN
pg_index i ON idxstat.indexrelid = i.indexrelid
WHERE
idxstat.schemaname not in ('pg_catalog','information_schema','pg_toast')
AND NOT i.indisunique -- is not a UNIQUE index
ORDER BY
Idxstat.idx_scan ASC,
Idxstat.last_idx_scan ASC;
This query displays a list of all indexes sorted by scan frequency (column 4) and last scan timestamp (column 5). If some indexes have a scan frequency (index_scans_count) of zero, less than 10, or the last_idx_scan_timestamp is very old , then we should consider deleting them.
The reasons for getting rid of unused indexes:
- Database indexes frequently consume the same or even more storage than the actual data. Additionally, the kind of dependable, quick storage we need for a database can be expensive. It also increases the size and length of physical backups.
- Indexes slow downs DML operations for data modification. Whenever we run an INSERT or DELETE command on a table, all indexes must be changed along with the table (the "heap") . It is quite possible that updating an indexed table will cost more than updating an unindexed table.
- HOT updates are avoided by indexes. Every UPDATE operation results in the writing of a new row version (also known as a "tuple"), which creates a new item in each index on the table, instead of allowing the data to be updated in-place.
We must perform the following checks on candidates for deletion:
- It's possible that the planner is avoiding the index because of duplication or incorrect setting of PostgreSQL parameters such as effective_cache_size and random_page_cost, or it thinks it will be cheaper to perform a sequential scan of the table for other reasons. This requires a judgment call, as it may (or may not) be another reason to delete it.
- What size does the index occupy? Perhaps it is not currently deserving of our attention?
- How many times has the corresponding table been sequentially scanned? To stop consecutive scans, we might need to add more fields to the index.
- It might not be necessary to delete a table or index that appears to be inactive. For instance, it might be an audit log table that is rarely used but necessary for compliance, or it might be an index that is only used by a crucial year-end processing function.
Monitoring sequential scans
Sequential scans occur when PostgreSQL reads data from a table sequentially, without using an index. When dealing with larger tables, a sequential scan can take a long time reading the entire table, making the operation inefficient. When PostgreSQL manages numerous concurrent sequential scans, performance suffers dramatically.
Sequential scan monitoring can assist in locating performance problems and highlighting areas for index modification. The pg_stat_all_tables view returns the columns listed below.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Important columns to check are as follows:
- Sequential scan count (seq_scan): The seq_scan column in the pg_stat_all_tables view represents the total count of sequential scans performed on a particular table. A high number of sequential scans may indicate that appropriate indexes are missing or underutilized.
- Last sequential scan timestamp (last_seq_scan): The last_seq_scan column in the pg_stat_all tables is a new addition in PostgreSQL 16. It represents the last sequential scan performed on the table. This column is updated by setting the transaction stop time value (GetCurrentTransactionStopTimestamp()).
- Last index scan timestamp (last_idx_scan): The last_idx_scan column in the pg_stat_all_tables is the same as mentioned in pg_stat_all_indexes but it doesn’t provide information of which index was used during the latest scan.
- Fragmented rows (n_dead_tup): It shows an approximate number of dead rows that have had their data removed and are slated to be reused by our datasource when we use a write command like DELETE or UPDATE.
- Hot updates count (n_tup_hot_upd): We can calculate the HOT updates ratio for the most updated tables using this column data.The columns n_tup_upd and n_tup_hot_upd in this view represent the total number of regular and HOT updates for each individual table. The "good" tables are those with a high HOT rate; we should pay attention to tables with a high write count but a low or zero HOT rate. Changes to fillfactor settings are the general norm for these and allow free space to be reserved for when additional rows are added and tables are expanded. The presence of reserved space ensures that pages' rows will be updated, and it greatly increases the likelihood that a HOT update will take place. Go through the following link for more details on HOT updates: https://www.postgresql.org/docs/16/storage-hot.html
There are other useful columns such as last vacuum, autovacuum timings (last_vacuum and last_autovacuum), last_analyze and autoanalyze timings (last_analyze and last_autoanalyze) with count.
To find old tables inside PostgreSQL, we can run the following query:
SELECT
tabstat.schemaname AS schema_name,
tabstat.relname AS table_name,
tabstat.seq_scan AS tab_seq_scan_count,
tabstat.idx_Scan AS tab_index_scan_count,
tabstat.last_seq_scan AS tab_last_seq_scan_timestamp,
tabstat.last_idx_scan AS tab_last_idx_scan_timestamp,
pg_size_pretty(pg_total_relation_size(tabstat.relid)) AS table_size
FROM
pg_stat_all_tables AS tabstat
WHERE
tabstat.schemaname not in ('pg_catalog','information_schema','pg_toast')
ORDER BY
tabstat.last_seq_scan ASC,
tabstat.last_idx_scan ASC;
This query displays a list of all tables sorted by last sequential scan timestamp (column 5) and last index scan timestamp (column 6). The order by clause can also be replaced with seq_scan and idx_scan columns to get tables list which are rarely used.
Efficiently monitoring index usage and sequential scans
To efficiently monitor index usage and sequential scans, consider the following best practices:
a. Regularly analyze statistics
Analyze the statistics provided by the pg_stat_all_tables and pg_stat_all_indexes views regularly. Identify trends, anomalies, and areas for improvement.
b. Set up automated monitoring
Use scripts or monitoring tools to automate the gathering and analysis of statistics. This enables us to continuously scan the database and receive notifications when certain thresholds are crossed or trends are found.
c. Compare with historical data
Keep track of previous sequential scans and index usage statistics in order to compare and contrast patterns over time. As a result, trends may be found and the effects of database changes can be understood.
d. Optimize index usage
Utilize the idx_scan and last_idx_scan columns in the pg_stat_all_indexes view to locate underutilized indexes. To decrease maintenance costs and enhance query performance for inserts and updates, think about deleting or restructuring these indexes.
e. Optimize sequential scans
Identify tables with a high number of sequential scans using the seq_scan and last_seq_scan column in the pg_stat_all_tables view. Evaluate the need for appropriate indexes. We can also remove or archive those tables on which are not in use by monitoring last_seq_scan and last_idx_scan columns. This can help in reducing our disk space and as a result, backups can be completed in a quick manner.
f. Analyze Query Performance
Analyze slow-running queries using PostgreSQL EXPLAIN and EXPLAIN ANALYZE commands. Pay attention to query plans, index usage, and sequential scans mentioned in the output. By identifying queries that exhibit poor index utilization or excessive sequential scans, we can pinpoint areas that require optimization.
Conclusion
Monitoring index utilization and sequential scans is crucial for preserving your PostgreSQL database's peak performance. You may learn a lot about table and index activity by using the pg_stat_all_tables and pg_stat_all_indexes views. You may optimize query performance by identifying underutilized indexes, tracking index scans against sequential scans, and analyzing this data. You can make sure that your PostgreSQL database runs effectively and provides top performance for your applications by using these methods. To keep your database operating efficiently, bear in mind to periodically check and tweak your indexes based on the monitoring findings. Happy observing!