This is a short story of where AddressSanitizer, a programming tool designed to help find memory corruption, found such a bug, followed by a brief how to use it locally because it's not always easy for people new to AddressSanitizer to get it running with the PostgreSQL regression tests.
The Story
I had submitted a series of small patches to refactor a handful of system catalog functions with an optional parameters to follow a newer coding style.
The regression tests pass when I run them locally, and the Cirrus CI tests all passed, except the one using AddressSanitizer.
There is more than one way to start reviewing the output, but I'll briefly say that the red Run test_world section indicated that there is a failure when running regression tests, and expanding the Run Cores section reveals something had dumped a core file. The top AddressSanitizer description in the core's backtrace declares that there was an 8-byte-read-stack-buffer-overflow.
Again for brevity, I'll skip down to the first line after all of the AddressSanitizer reports, which caused AddressSanitizer core dump.
[23:58:34.391] #9 0x000056200585ccf2 in pg_get_expr (fcinfo=0x7ffecac436a0) at ruleutils.c:2565
[23:58:34.391] expr = <optimized out>
[23:58:34.391] relid = <optimized out>
[23:58:34.391] pretty = <optimized out>
[23:58:34.391] result = <optimized out>
[23:58:34.391] prettyFlags = <optimized out>That came at so surprise because pg_ge_expr() is one of the functions that I was refactoring where pretty is an optional third argument to tell the function whether you want the returned expression more human readable.
2561 pg_get_expr(PG_FUNCTION_ARGS)
2562 {
2563 text *expr = PG_GETARG_TEXT_PP(0);
2564 Oid relid = PG_GETARG_OID(1);
2565 bool pretty = PG_GETARG_BOOL(2);Then a question to ask is what about line 2565, which is retrieving the third argument, could cause a problem? The next line in the backtrace helps answer that.
[23:58:34.391] #10 0x000056200594baff in DirectFunctionCall2Coll (func=0x56200585cc76 <pg_get_expr>, collation=collation@entry=0, arg1=139953427822388, arg2=<optimized out>) at fmgr.c:825
[23:58:34.391] fcinfodata = <optimized out>
[23:58:34.391] fcinfo = 0x7ffecac436a0
[23:58:34.391] result = <optimized out>
[23:58:34.391] __func__ = "DirectFunctionCall2Coll"It's because DirectFunctionCall2Coll() is passing only two argument to pg_ge_expr() is expecting three. So that's easy to address, we'll use DirectFunctionCall3Coll() to pass three arguments, as expected.
And that concludes the brief story of how AddressSanitizer helped identify a buffer-overflow situation that wasn't caught without the help runtime instrumentation to detect memory corruption.
How-to use AddressSantizer
To wrap things up is a quick guide to use AddressSanitizer locally. This information is more easily extracted from the .cirrus.tasks.yml file in the PostgreSQL source code than it is from the Cirrus CI web output.
Configure PostgreSQL to at least use AddressSanitizer and disable any compiler optimizations:
./configure --enable-cassert --enable-injection-points --enable-debug --enable-tap-tests --with-segsize-blocks=6 CLANG=clang CFLAGS="-Og -ggdb -fno-sanitize-recover=all -fsanitize=address"Define some rules for AddressSanitizer using environment variables:
export UBSAN_OPTIONS="print_stacktrace=1:disable_coredump=0:abort_on_error=1:verbosity=2"
export ASAN_OPTIONS="print_stacktrace=1:disable_coredump=0:abort_on_error=1:detect_leaks=0:detct_stack_use_after_return=0"Run the PostgreSQL regression tests:
make checkI hope this short story is helpful for others.