PG Phriday: What You Should Know About Basic Postgres Authentication

May 11, 2022

In the past few months, there’s been a single recurring question that appears with alarming regularity in the Postgres Discord server. The phrasing itself ranges anywhere from “Hi, I’ve recently installed Postgres, but I don’t know how to connect to it?” to “I keep getting errors when I try to connect to Postgres!” How does this scenario erupt so frequently? Is it something about Postgres? New users? People not familiar with databases in general?

How this Happens

New Postgres users are usually pretty easy to spot. They’re the ones most likely to express confusion or incredulity that there are apparently no instructions for connecting to Postgres once it has been installed. Or perhaps they’ve cleared that particular hurdle, but don’t know how to get further so external resources can successfully navigate those treacherous waters.

Most seasoned veterans know almost instinctively to address the latter with the pg_hba.conf file, so regularly has this been drilled into our heads. The former requires at least some knowledge of how connection strings work at minimum, including what port Postgres usually runs on, and the ins and outs of psql for local tinkering.

The issue is that the documentation isn’t very forthcoming about all this. The resources are there obviously, but a complete neophyte doesn’t know where to start, and there’s nary a Quickstart or Tutorial to be found in the official materials. Even the Getting Started tutorial skips right from installation to Postgres architecture fundamentals, creating databases, and then finally connecting. More worrying, the last two of those chapters are unlikely to match what a user will actually encounter following a fresh install of the software.

So let’s start at the beginning.

Learning your Role

Let’s assume the fresh install is on some kind of Linux distribution, using a PGDG package so it’s well supported by the community and has a known makeup. Believe it or not, users have just as much trouble with these as source installations, or those from some kind of binary executable.

So imagine we’ve just installed Postgres 14 using a command like this:

sudo apt install postgresql-14

Now what? Maybe a user heard about psql, so they try that first:

$> psql
psql: error: connection to server on socket
 "/var/run/postgresql/.s.PGSQL.5432" failed: FATAL:  role "bones" does not 
exist

Wait, my user exists because I’m connected to the operating system as that user, but Postgres says it doesn’t exist. Does that mean Postgres has its own list of users? Ok, so how do I add myself to that list? The tutorial in the documentation eventually leads to Database Roles and voilà, there’s a createuser command! Let’s try that:

createuser bones
createuser: error: connection to server on socket
"/var/run/postgresql/.s.PGSQL.5432" failed: FATAL:  role "bones" does not 
exist

Dang it, that’s the same error! Quite a few users give up at this point and come screaming into the chat asking what to do next. Honestly, it’s difficult to blame them since there’s no context on where to go next. There are two fundamentally crucial, yet missing pieces of information here:

  1. A fresh install really only has one user: postgres.
  2. Authentication for local connections is deferred to the local operating system by default.

Assuming I know how Linux works and have local superuser access, I make the logical connection and try again:

$> sudo -u postgres createuser bones
$> psql
psql: error: connection to server on socket
"/var/run/postgresql/.s.PGSQL.5432" failed: FATAL:  database "bones" does 
not exist

Argh!

Database Connection Fundamentals

I created a user, so why is it asking about a database named “bones”? I never said anything about that. Sure, my username is bones, but what does that have to do with the database? To new users who are reading this, there are two rules here you probably didn’t know:

  1. If you don’t specify a username when invoking psql, it will automatically supply the operating system user running the command.
  2. In the absence of a specified database, psql will assume it should connect to a database with the same name as the user attempting to connect.

We can actually distill this down to two more universal fundamentals:

  1. Users and databases are different things.
  2. Both must be specified to connect to Postgres.

As trivial as this appears at first glance to someone who’s been at this for a while, not every database engine works this way. The user may not even have experience with another database software before now, so this information may be entirely novel.

Ok, so now we know we must supply a username and a database name, because we don’t want to connect to the “bones” database, since that doesn’t exist anyway. But what can we connect to?

Start by getting a list of available databases. You can do that with psql as well:

$> psql --list
Timing is on.
                              	List of databases
   Name	|  Owner   | Encoding |   Collate   |	Ctype	|   Access 
privileges   
-----------+----------+----------+-------------+-------------+-------------
----------
 postgres  | postgres | UTF8 	| en_US.UTF-8 | en_US.UTF-8 |
 template0 | postgres | UTF8 	| en_US.UTF-8 | en_US.UTF-8 | =c/postgres      	
+
       	|      	|      	|         	|         	| 
postgres=CTc/postgres
 template1 | postgres | UTF8 	| en_US.UTF-8 | en_US.UTF-8 | =c/postgres      	
+
       	|      	|      	|         	|         	| 
postgres=CTc/postgres
(3 rows)

So we see that there are three databases, but what are template0 and template1? Why is there a database named postgres when the software is named Postgres? How can my user connect? Let’s try the postgres database then, since it at least seems familiar. The docs say I can use the first parameter as the database name, so let’s start there:

$> psql postgres
Timing is on.
psql (14.2 (Ubuntu 14.2-1.pgdg20.04+1+b1))
Type "help" for help.

postgres=>

Progress! Just for posterity, let’s create a database for our app rather than using this postgres thing; that’s probably bad to keep using as our main database:

$> sudo -u postgres createdb --owner bones appdb
$> psql appdb
appdb=>

Done!

Relearning How to Connect

Now that I’ve spent a few hours with my brand spanking new connection creating tables and filling them with some rudimentary INSERT and UPDATE traffic, I want my application on some other server to connect. What do I do now?

I’m not some newb anymore; I know about the CREATE USER syntax and how to use it! I even know that my “bones” user isn’t a superuser, so I have to do this as the postgres OS user.

$> sudo su - postgres
$> psql
postgres=> CREATE USER coolapp WITH PASSWORD 'super-secret';
CREATE ROLE

And now I just need to test my Python app with an extremely simple task: connect to Postgres and run a query. The tricky bit is that my app is on a separate dev server, but that shouldn’t be a problem, right? Here’s my doubtlessly genius code:

import psycopg2
import sys

try:
	conn = psycopg2.connect('host=10.0.0.238 dbname=appdb user=coolapp password=super-secret')
except psycopg2.OperationalError as e:
	print(e)
	sys.exit()

cur = conn.cursor()

cur.execute("SELECT * FROM app_users")
for (id, user, join_date) in cur:
	print("User {} joined on {}".format(user, join_date))

Now let’s run it:

$> python3 coolapp.py

connection to server at "10.0.0.238", port 5432 failed: Connection refused
    Is the server running on that host and accepting TCP/IP connections?

Wait a minute, I know for a fact that the server is running! What gives? Well, it turns out that the default installation in most cases only listens locally, and will not respond to remote connections at all. In order to fix this, we need to modify the listen_addresses parameter in postgresql.conf to bind to a specific IP address, a list of them, or all known interfaces with an asterix.

A more immediate problem is that we need to find our postgresql.conf file in the first place. A default installation on a RedHat-compatible system is typically in the home directory of the postgres user, in a subdirectory named after the postgres version. For us, that would be /var/lib/pgsql/14/data/. Debian-style systems are a bit more complicated. They store configuration files for Postgres in /etc/postgresql, again named after the Postgres version and the cluster name. The default cluster name for these systems is “main”, so configuration files would be located in /etc/postgresql/14/main for a default Postgres 14 installation such as ours.

With that in mind, we just need to open the “correct” file and change this line:

listen_addresses = '0.0.0.0'

Why 0.0.0.0 rather than our actual database server IP address or even “*”? If we used an asterix, Postgres would bind to all interfaces, including IPV6. That would just complicate our instructions, so we’ll avoid that for now. The 0.0.0.0 shortcut is a trick to only enable IPV4 connections.

Next we need to restart Postgres. Again, this is different depending on which OS we’re using.

# RHEL-based systems
$> sudo systemctl restart postgresql-14
# Debian-based systems
$> sudo systemctl restart postgresql@14-main

That was a lot of work just to enable external connections. Let’s try our app again:

$> python3 coolapp.py

connection to server at "10.0.0.238", port 5432 failed: FATAL:  no 
pg_hba.conf entry for host "10.0.2.80", user "coolapp", database "appdb", 
SSL encryption

Oh come on!

Finally Getting Through

As an end user, I’ve now encountered several roadblocks and may be ready to scrap this whole affair, if I haven’t already. Luckily this time around, we got a hint on where to look. The pg_hba.conf file is located in the same place as the postgresql.conf file, and nearly the first 100 lines are instructions on how it works.

The file on our test Ubuntu server includes all of these active lines:

local   all         	postgres                	peer
local   all         	all                     	peer
host    all         	all     	127.0.0.1/32	scram-sha-256
host	  all         	all     	::1/128     	scram-sha-256
local   replication 	all                     	peer
host    replication 	all     	127.0.0.1/32	scram-sha-256
host    replication 	all     	::1/128     	scram-sha-256

All of those lines that end with “peer” just mean that you have to be logged into the same operating system as the user that’s connecting. The “scram-sha-256” lines are just a fancy way of saying “use a password stored in Postgres.” We can ignore the “replication” lines, because they only apply to incoming replication connections and don’t apply to our illustrious Python application.

Given that’s the case, it looks like our app can connect if we just copy one of those 127.0.0.1 lines and substitute the 10.0.2.80 IP address reported by the error. So let’s do that:

local   all         	postgres                	peer
local   all         	all                     	peer
host    all         	all     	10.0.2.80/32	scram-sha-256
host    all         	all     	127.0.0.1/32	scram-sha-256
host	  all         	all     	::1/128     	scram-sha-256
local   replication 	all                     	peer
host    replication 	all     	127.0.0.1/32	scram-sha-256
host    replication 	all     	::1/128     	scram-sha-256

There are a couple of important things to consider when modifying pg_hba.conf.

  1. Much like firewall rules, Postgres applies the first valid rule for the type of incoming connection. The “local” types don’t apply to network-based traffic, so the first possible rule is the one we added. All further evaluation of the file is ignored.
  2. Changes to pg_hba.conf aren’t applied until Postgres reloads the file.

Now we need to tell Postgres to reload the configuration files so it re-processes the pg_hba.conf file and incorporates the changes. It’s possible to do that by sending a reload command to systemctl, but we can also use the pg_reload_conf() SQL function. Here’s how that would look:

$> sudo su - postgres
$> psql
postgres=# SELECT pg_reload_conf();
 pg_reload_conf
----------------
 t

And here’s the (hopefully) ultimate test of our Python app:

$> python3 coolapp.py
User janedoe joined on 2022-05-02 15:59:19.896324-05:00
User bobsmith joined on 2022-05-02 16:02:45.358852-05:00

Whew! Looks like our work is done here.

A Winding Path

Postgres experts flaunt a certain familiarity with administering it, such that many of these concepts come as second nature. We know exactly where to look and what to change, almost on an instinctual level. It’s been so long since we’ve had to confront something as basic as connecting to Postgres, that it’s easy to forget just how complicated it actually is.

For enterprise-level or managed instances (cloud or otherwise) that add databases and accounts following support tickets, all of these steps are entirely obfuscated and largely unnecessary to the end-user. Users and developers simply receive a connection string and go on with their lives. But if they want to tinker with a local instance for faster prototype iterations or immediate feedback to code modifications, the situation gets a lot more personal.

The Postgres Discord, Slack, mailing list, and other resources encounter this seemingly innocuous question, “How do I connect?” repeatedly. As demonstrated here, that’s a much deeper rabbit hole than we might think. It’s tempting to say “read the docs”, but where? How much of the docs should someone read before they can even perform one of the most basic operations? It’s easy to take this for granted when you’ve been doing it for years, or even decades.

I’ll admit I balked at first myself when I first noticed the trend. Why do people keep asking this question? That’s just how Postgres works; don’t they get it? Of course not. That’s the point. It’s an artificial barrier that rejects normally receptive—and even excited—users because they get frustrated and quit. Should they? Probably not. Should Postgres have simpler defaults, or cut-and-paste friendly quickstart? Possibly. Regardless, it’s clearly a popular topic.

So to newbies and grizzled veterans alike, hopefully this fills that gap. Connecting should be the least of your worries.

Share this