All-processes breakpoints / watchpoints for PostgreSQL

November 27, 2014

Sometimes SELECT pg_backend_pid() and gdb‘s attach aren’t enough. You might have a variable in shared memory that’s being changed by some unknown backend at some unknown time. Or a function that’s called from somewhere, but you don’t know where or when.

I’ve recently been doing quite a bit of work on code where bgworkers launch other bgworkers, which launch more bgworkers. All of them communicate via shared memory, and sometimes it’s a little bit exciting to debug.

When I need to step through a function, I have to add an elog(...) then a sleep(), watch for it in the logs, attach to the pid, and wait for the sleep to finish. It’s tedious, and I got sick of it.

So… here‘s a gdb extension python script that puts gdb in multi-process debugging mode and auto-continues execution after processes exit. Using this, I can just:


gdb -q --args make check
(pre) source pggdb.py
(pre) break MyFunction
(pre) run

… and gdb will monitor all processes spawned by the make, including the postmaster and child postgres instances.

It’ll stop execution of everything when it hits the target breakpoint, as if all the processes were threads of some large single program being debugged.

The script is pretty rough. Its handling of gdb‘s pausing whenever any process executes is a hack, as is the technique used to avoid stopping on the checkpointer receiving SIGINT. It also only correctly stops execution on SIGABRT and SIGSEGV fatal signals, though the list is easy to extend.

Do not use this on a PostgreSQL instance that isn’t a throwaway debug datadir.

You can find the latest revision of the script in my scrapcode repo.

A number of limitations in gdb‘s still-relatively-new Python scripting host make this harder than it could be:

      There’s no explicit event to let Python handle signals
      information about inferiors is very limited – no process path, for example
      It’s not easy to tell whether a process exited as a result of a fatal signal or a simple exit call

… but it’s still proving pretty handy already. The ability to suppress printing of more symbol loading messages, inferior switching messages, etc, would make it more useful.

Share this