PostgreSQL VACUUM Guide and Best Practices

Learn more about how this SQL command is essential for PostgreSQL database maintenance and more.

Examining how this SQL command supports database maintenance

The VACUUM command is an essential maintenance tool for stable and optimal database performance. It improves query performance by recovering space occupied by "dead tuples" or dead rows in a table caused by records that have been deleted or updated (e.g., a delete followed by an insert).

PostgreSQL doesn’t physically remove the old row from the table but puts a “marker” on it so that queries don’t return that row. This causes table bloat that degrades query performance because PostgreSQL has to sift through rows of dead and live tuples.

When a vacuum process runs, the space occupied by these dead tuples is marked reusable by other tuples. Vacuum databases regularly to remove dead rows.

The VACUUM command can be used in two ways:

  • VACUUM FULL: Recovers space and reorganizes data in a table, but does not remove deleted rows.
  • VACUUM SORT ONLY: Reorganizes data, but does not reclaim disk space or remove deleted rows.

In the following sections, we’ll explore how VACUUM works with other maintenance commands, useful parameters, and database maintenance best practices.

keyboard_arrow_up

Comparing both commands and how they work together

An ANALYZE operation does what its name says: it analyzes the contents of a database’s tables and collects statistics about the distribution of values in each column of every table. PostgreSQL's query engine then uses these statistics to find the best query plan.

Column statistics also change as rows are inserted, deleted, and updated in a database. ANALYZE – either run manually by the DBA or automatically by PostgreSQL after an autovacuum – ensures the statistics are up-to-date.

Together, vacuuming and analyzing are the two most important PostgreSQL maintenance operations for a healthy and performing database:

  VACUUM ANALYZE
Functionality Cleans up dead tuples and recovers space. Gather statistics for query planner to optimize queries.
Performance Impact Improves query speed and decreases disk I/O by reducing table bloat. Improves query performance by providing up-to-date statistics for the query planner.
Recommended Run Frequency Frequently and consistently to minimize table bloat. Only after major data changes large inserts, updates, or deletes) to ensure statistics are current.
Locking Standard VACUUM does not lock the table, but VACUUM FULL requires an exclusive lock. Does not require an exclusive lock and can run alongside other operations.

VACUUM ANALYZE: Combining both operations

PostgreSQL does provide a combined VACUUM ANALYZE command that performs both operations simultaneously. Running the combined command instead of individual commands is convenient, but it has its upsides and downsides:

VACUUM ANALYZE Benefits VACUUM ANALYZE Disadvantages
More comprehensive as both vacuuming and analyzing are done together, leading to more consistent and predictable results. Longer duration, as both operations are performed simultaneously, with greater delays expected on high database loads or larger tables.
More convenient as running them together frees up disk space while providing updated statistics. Higher resource consumption, which can be problematic for databases with low processing power, memory, or I/O capacity.
More resource efficient as combining both operations means there’s no need for two separate passes of data. Potential locking as both operations do require some level of lock (not exclusive locks), which can impact database operations.
More transactionally consistent as the table remains in a more consistent state after the operation, ensuring a stable database.  

Most database administrations will only use VACUUM ANALYZE after bulk operations to reclaim space or during initial database setup to ensure everything is optimized. For regular database maintenance, it’s recommended to just run VACUUM.

Function descriptions for common VACUUM parameters on PostgreSQL

Below is a list of functions that can be used for vacuuming a database table.

Parameter Function
FULL Activates ‘full’ vacuum which reclaims more space, but takes longer and exclusively locks the table. Resource intensive as it rewrites a new table as it locks the old one.
FREEZE Freezes tuples in a table. But since the system always performs freezing on the tuples within a table during a rewrite, this operation is redundant when FULL is specified.
VERBOSE Prints a full vacuum activity report for each table.
ANALYZE Updates statistics on a table to help efficient query execution.
DISABLE_PAGE_SKIPPING Disables page skipping, even pages that are frozen or have been specified on the visibility map.
SKIP_LOCKED Specifies that VACUUM should not wait for conflicting locks to be released before starting.
INDEX_CLEANUP Usually, VACUUM will skip index vacuuming if there are very few dead tuples in a table. This option forces VACUUM to process indexes as long as there are dead tuples. Note that the resources required to run this operation may far exceed the benefits gained by removing dead index tuples.
TRUNCATE Specifies that VACUUM should attempt to truncate off empty pages at the end of the table to retrieve disk space. This is a default behavior unless vacuum_truncate has been set to false, or the FULL command is used
PARALLEL Runs index vacuuming and index cleanup cycles of VACUUM in parallel using integer background workers.

Looking for more details? We recommend referring to the official Postgres parameter list for a comprehensive and detailed function list of VACUUM parameters for PostgreSQL.

Looking to optimize maintenance for your PostgreSQL database? Contact our team of PostgreSQL professionals to help assess and guide you through the necessary steps today.

Tried and tested tips and best practices for PostgreSQL vacuuming

Vacuuming should be part of routine database maintenance, and there are optimal ways to go about it. Below are some performance tips for PostgreSQL vacuuming.

  1. Don’t run manual VACUUM or ANALYZE without reason.
    Database administrators should refrain from running manual vacuums too often on the entire database, as the autovacuum process might already have optimally vacuumed the target database. As a result, a manual vacuum may not remove any dead tuples but cause unnecessary I/O loads or CPU spikes.

    If necessary, manual vacuums should be run on a table-by-table basis only when necessary, like when there are low ratios of live rows to dead rows or large gaps between autovacuum operations. They should also be run when user activity is minimum.

    Autovacuum also keeps a table’s data distribution statistics up-to-date (it doesn’t rebuild them). When manually run, the ANALYZE command rebuilds these statistics instead of updating them. Again, rebuilding statistics when they’re already optimally updated by a regular autovacuum might cause unnecessary pressure on system resources.

    The time when you must run ANALYZE manually is immediately after bulk loading data into the target table. A large number (even a few hundred) of new rows in an existing table will significantly skew its column data distribution. The new rows will cause any existing column statistics to be out-of-date. When the query optimizer uses such statistics, query performance can be really slow.

    In these cases, running the ANALYZE command immediately after a data load to rebuild the statistics completely is better than waiting for the autovacuum to kick in.

  2. Select VACUUM FULL only when performance degrades badly
    The autovacuum functionality doesn’t recover disk space taken up by dead tuples. Running a VACUUM FULL command will do so, but has performance implications. The target table is exclusively locked during the operation, preventing even reads on the table. The process also makes a full copy of the table, which requires extra disk space when it runs. We recommend only running VACUUM FULL if there is a very high percentage of bloat and queries are suffering badly. We also recommend using periods of lowest database activity for it.

  3. Fine-tune Autovacuum Threshold
    It’s essential to check or tune the autovacuum and analyze configuration parameters in the postgresql.conf file or in individual table properties to strike a balance between autovacuum and performance gain.

    PostgreSQL uses two configuration parameters to decide when to kick off an autovacuum:

    • autovacuum_vacuum_threshold: this has a default value of 50
    • autovacuum_vacuum_scale_factor: this has a default value of 0.2

    Together, these parameters tell PostgreSQL to start an autovacuum when the number of dead rows in a table exceeds the number of rows in that table multiplied by the scale factor plus the vacuum threshold. In other words, PostgreSQL will start autovacuum on a table when:

    pg_stat_user_tables.n_dead_tup > (pg_class.reltuples x autovacuum_vacuum_scale_factor)  + autovacuum_vacuum_threshold

    This may be sufficient for small to medium-sized tables. For example, in a table with 10,000 rows, the number of dead rows has to be over 2,050 ((10,000 x 0.2) + 50) before an autovacuum kicks off.

    Not every table in a database experiences the same rate of data modification. Usually, a few large tables will experience frequent data modifications, resulting in a higher number of dead rows. The default values may not work for such tables. For example, with the default values, a table with 1 million rows must have more than 200,050 dead rows before an autovacuum starts ((1000,000 x 0.2) + 50). This can mean longer gaps between autovacuums, increasingly long autovacuum times, and worse, autovacuum not running at all if active transactions on the table are locking it.

    Therefore, the goal should be to set these thresholds to optimal values so autovacuum can happen at regular intervals and don’t take a long time (and affect user sessions) while keeping the number of dead rows relatively low.

    One approach is to use one or the other parameter. So, if we set autovacuum_vacuum_scale_factor to 0 and instead set autovacuum_vacuum_threshold to, say, 5,000, a table will be autovacuumed when its number of dead rows is more than 5,000.

  4. Fine-tune Autoanalyze Threshold
    Similar to autovacuum, autoanalyze also uses two parameters that decide when autovacuum will also trigger an autoanalyze:

    • autovacuum_analyze_threshold: this has a default value of 50
    • autovacuum_analyze_scale_factor: this has a default value of 0.1

    Like autovacuum, the autovacuum_analyze_threshold parameter can be set to a value that dictates the number of inserted, deleted, or updated tuples in a table before an autoanalyze starts. We recommend setting this parameter separately on large and high-transaction tables. The table configuration will override the postgresql.conf values.

    The code snippet below shows the SQL syntax for modifying the autovacuum_analyze_threshold setting for a table.

    ALTER TABLE <table_name> 
    SET (autovacuum_analyze_threshold = <threshold rows>)
  5. Fine-tune Autovacuum workers
    Another parameter often overlooked is autovacuum_max_workers, with a default value of 3. Autovacuum is not a single process but a number of individual vacuum threads running in parallel. The reason for specifying multiple workers is to ensure that vacuuming large tables isn’t holding up vacuuming smaller tables and user sessions. The autovacuum_max_workers parameter tells PostgreSQL to spin up the number of autovacuum worker threads to do the cleanup.

    A common practice by PostgreSQL DBAs is to increase the number of maximum worker threads to speed up autovacuum. This doesn’t work as all the threads share the same autovacuum_vacuum_cost_limit, which has a default value of 200. Each autovacuum thread is assigned a cost limit using the formula shown below:

    individual thread’s cost_limit = autovacuum_vacuum_cost_limit / autovacuum_max_workers

    The cost of work done by an autovacuum thread is calculated using three parameters:

    What these parameters mean is this:

    • When a vacuum thread finds the data page that it’s supposed to clean in the shared buffer, the cost is 1.
    • If the data page is not in the shared buffer but the OS cache, the cost will be 10.
    • If the page has to be marked dirty because the vacuum thread had to delete dead rows, the cost will be 20.

    An increased number of worker threads will lower the cost limit for each thread. As each thread is assigned a lower cost limit, it will go to sleep more often as the cost threshold is easily reached, ultimately causing the whole vacuum process to run slow. We recommend increasing the autovacuum_vacuum_cost_limit to a higher value, like 2000, and then adjusting the maximum number of worker threads.

    A better way is to tune these parameters for individual tables only when necessary. For example, if the autovacuum of a large transactional table is taking too long, the table may be temporarily configured to use its own vacuum cost limit and cost delays. The cost limit and delay will override the system-wide values set in postgresql.conf.

    The code snippet below shows how to configure individual tables.

    ALTER TABLE <table_name> SET (autovacuum_vacuum_cost_limit = <large_value>)
    ALTER TABLE <table_name> SET (autovacuum_vacuum_cost_delay = <lower_cost_delay>)

    Using the first parameter will ensure the autovacuum thread assigned to the table performs more work before going to sleep. Lowering the autovacuum_vacuum_cost_delay will also mean the thread sleeps for less time.

Get more best practice tips from our professional team of PostgreSQL experts:

Explore EDB resources on VACUUM and other PostgreSQL maintenance processes.

whitepaper

Proper VACUUM configuration is essential to ensure the scale of maintenance sufficiently meets database usage requirements and loads. In this article, PostgreSQL expert Shaun Thomas provides a highly detailed guide on VACUUM configuration, which examines the parameters needed for specific tasks, technical considerations, and more.

blog

What is parallel vacuuming in PostgreSQL 13, and how does it help database operations? In this blog, our PostgreSQL expert Dilip Kumar dives into the technical details and implementation of this performance feature.

blog

We look at how VACUUM enables and supports Multi-Version Concurrency Control (MVCC) optimization. MVCC allows concurrency control within PostgreSQL databases, which is crucial to avoid deadlocks and instability degrading the user experience. Read the blog to learn how VACUUM ties in with MVCC and best practices for using both.

What does VACUUM in PostgreSQL do? chevron_right

The VACUUM command cleans up dead tuples (obsolete rows) in PostgreSQL tables. This helps reclaim storage space, prevent transaction ID wraparound issues, and improve database performance.

What is the impact of skipping VACUUM on a PostgreSQL database? chevron_right

Skipping VACUUM can lead to bloat (unused space within the database), reduced performance, and potential transaction issues eventually leading to database downtime or corruption.

How does the VACUUM command work? chevron_right

The VACUUM command scans tables to remove dead tuples resulting from UPDATE and DELETE operations. It also updates statistics in the system catalogs to reflect the current state of the database.

Can VACUUM be run on a specific index? chevron_right

No, VACUUM operates on entire tables, including all associated indexes. To specifically target an index for cleanup, you would use the REINDEX command instead.

Can I run VACUUM while the database is in use? chevron_right

Yes, regular VACUUM can run concurrently with other database operations. However, VACUUM FULL requires exclusive access to tables and considerable system resources.

Is it necessary to run VACUUM manually? chevron_right

It is unnecessary to run VACUUM manually because PostgreSQL performs autovacuum operations based on table activity. However, manual VACUUM can be helpful for specific maintenance or performance-tuning purposes.

When should you use VACUUM FULL? chevron_right

VACUUM FULL recovers disk space occupied by dead tuples, but is very time and performance-intensive. We recommend running VACUUM FULL only if there is a very high percentage of bloat, and queries are suffering badly. We also recommend using periods of lowest database activity for it.

How to check table bloat in PostgreSQL? chevron_right

We recommend using the pgstattuple extension included by default in PostgreSQL. You can calculate manually by performing ANALYZE on the table and taking note of the dead_tup_ratio. The higher it is, the more bloat you have. However, this is an approximate and may not be entirely accurate.

What is VACUUM FREEZE and when should it be used? chevron_right

VACUUM FREEZE is a variant of the VACUUM command that marks all tuples in the table as frozen, preventing them from being subject to transaction ID wraparound. It is particularly useful for tables that rarely change. You should use it before archiving tables or after bulk data loads to avoid frequent vacuums.

What is the difference between VACUUM and VACUUM ANALYZE? chevron_right

VACUUM reclaims storage space by cleaning up dead tuples, while ANALYZE updates statistics used by the query planner. VACUUM ANALYZE performs both tasks, cleaning up dead tuples and updating statistics in a single operation, however running both operations simultaneously will require extensive system resources and may result in temporary table locks.

Can I cancel a running VACUUM or ANALYZE operation? chevron_right

You can cancel them, but it's generally safer and better to let both operations complete unless they are significantly impacting database performance. You can cancel these operations using the pg_cancel_backend() function or by sending a SIGINT signal to the PostgreSQL backend process. Note that VACUUM FULL cancellation is more problematic, as it rewrites the entire table. Canceling will roll back all changes, potentially wasting significant time and resources.

How do VACUUM and ANALYZE impact replication? chevron_right

VACUUM and ANALYZE operations in PostgreSQL can impact replication by generating Write-Ahead Logging (WAL) entries, which need to be replayed on standby servers, potentially causing replication lag. While VACUUM, especially VACUUM FULL, can produce significant WAL traffic and lock tables, ANALYZE generates minimal WAL but still contributes to overall traffic.

When does autovacuum run for Postgres? chevron_right

PostgreSQL runs an autovacuum daemon process that periodically wakes up to check if any tables require vacuuming or analyzing. By default, autovacuum starts when 20% of the rows plus 50 rows are inserted, updated, or deleted. For example, a table with 1 million rows must have more than 200,050 dead rows before an autovacuum starts ((1000,000 x 0.2) + 50). This can and should be configured based on database loads to prevent performance degradation.

How can I monitor the progress of a VACUUM operation? chevron_right

You can monitor the progress of a VACUUM operation using the pg_stat_progress_vacuum system view, which provides details such as the table being vacuumed, the number of dead tuples, and the phase of the operation.

How often should you manually VACUUM Postgres? chevron_right

It’s best practice to not run manual vacuums too often on the entire database; the target database could be already optimally vacuumed by the autovacuum process. Therefore, manual vacuuming may not remove any dead tuples but cause unnecessary I/O loads or CPU spikes. If necessary, manual vacuums should be only run on a table-by-table basis when there’s a need for it, like low ratios of live rows to dead rows, or large gaps between autovacuum. Also, manual vacuums should be run when user activity is minimum.

Explore more PostgreSQL solutions from the industry’s leading Postgres experts

Learn how EDB can support PostgreSQL database maintenance and improvements.

Let our professional team provide advice and hands-on guidance for PostgreSQL database maintenance operations like VACUUM, ANALYZE, and more. Tell us your PostgreSQL needs and let’s talk.