This article introduces a new functionality for the EnterpriseDB tool pg_catcheck that can help detect missing or inaccessible relation files that cause “could not open file” errors.
- pg_catcheck
- New pg_catcheck relation functionality
When your database throws an error like "could not open file base/XXXXX/XXXX,” it is important to understand what happened and why are you getting such errors so that you can avoid them in the future. Most of the time, such an error occurs while trying to access the relation or while performing any operation on the relations.
We have been hearing feedback from some customers that they have received "could not open file" errors, and does it mean they have a relation data file missing in the database? It may be that the relation file has been missing or inaccessible for a while, but they only come across this error when they try to access the particular relation. Currently, there isn’t a way in PostgreSQL through which a user can get a notification about a missing relation file until they try to access a particular relation.
There are a few examples of such error messages in the pgsql-bugs archives, as well as those received from EnterpriseDB customers, which leads to the question, "Is there a way we can check for such errors before it actually hits the production environment?" To meet this need, EnterpriseDB has added functionality into pg_catcheck to detect such errors.
pg_catcheck
pg_catcheck is a simple tool for diagnosing system catalog corruption. If you suspect that your system catalogs are corrupted, this tool may help you figure out exactly what problems you have and how serious they are. If you are worried about this, you can run it routinely to search for system catalog corruption that might otherwise go undetected.
pg_catcheck takes the same arguments as most other PostgreSQL utilities, such as -h for the host or -p for the port. You can also pass it a connection string or URL, just like psql. For a full list of options, run `pg_catcheck --help`.
When you run pg_catcheck, it will normally print out a line that looks like this:
progress: done (0 inconsistencies, 0 warnings, 0 errors)
If you see this, it means pg_catcheck didn’t find any problems for the given database.
New pg_catcheck relation functionality
Recently EnterpriseDB enhanced pg_catcheck to provide a new check, where it can try to detect a potential "could not open file" error or state the accessibility for a relation.
Below is a test where I have intentionally deleted and changed the permissions for the relation file node in the database. By running the pg_catcheck, with a newly added option "--select-from-relations" parameter, we can detect such problems:
rushabh@rushabh:pg_catcheck$ ./pg_catcheck edb --select-from-relations
notice: unable to query relation "public"."emp": ERROR: could not open file "base/16198/16394": Permission denied
notice: unable to query relation "public"."jobhist": ERROR: could not open file "base/16198/16405": No such file or directory
progress: done (2 inconsistencies, 0 warnings, 0 errors)
The newly added "--select-from-relations" option runs a "SELECT * FROM <table> LIMIT 0" query on each table, TOAST table, and materialized view in the database, which gives us some chance of finding out which parts of the database are completely naccessible. Note: This option still doesn't help with indexes or relation segments other than the first one.
This option won't tell you the reason why you are getting such errors. It also won't provide advice on how to fix them—for this, you should consult an expert. However, this option will help in detecting cases earlier rather than hitting errors in the production environment or runtime.
More information about pg_catcheck can be found on Github. To learn more about how to fix database corruption issues, read the blog post Troubleshooting Database Corruption from Robert Haas.