A tale of password authentication methods in PostgreSQL

April 27, 2020
A tale of password authentication methods in PostgreSQL

Let’s say you want to implement a password authentication method in a client/server protocol. How would you do that and what would be the possible problems? The following is the story of how this was done in PostgreSQL.

password

In the beginning, PostgreSQL only had the method that is now known as "password" in pg_hba.conf. It is the simplest thing you can imagine:

  1. The client says to the server, “Hello, I’m Peter, I would like to connect.”
  2. The server replies, “What’s your password?”
  3. The client prompts for the password or gets it from somewhere else and responds, “It’s ‘123456’.”
  4. Now the server looks for the actual password. It’s stored in the system catalog pg_authid column rolpassword. The server basically does strcmp(pg_authid.rolpassword, "123456"), and if that is equal, it says “OK” to the client and the session startup proceeds.

This method has some obvious problems:

  • The password is transmitted in cleartext over the wire. There are external ways around that, such as using SSL or other encryption wrappers.
  • The password is stored in cleartext in the system catalog and thus ultimately on disk. This is bad because it allows database and system administrators see other users’ passwords. Of course, one shouldn’t reuse passwords, but people do. And it could allow administrators to bypass auditing by logging in as other users using their passwords. And in general having a cleartext password lying around is bad because it might end up being copied around or accidentally seen. Better not do that.
  • More subtly, the password exists in cleartext in the memory of the server process. Why is that bad? Again, administrators could have access to it there. Also, if the server core dumps or swaps, the cleartext password could end up on disk somewhere. So this is really almost as bad as having the password in cleartext in the on-disk storage.

crypt

So another attempt was made. This is the now no longer supported "crypt" method in pg_hba.conf:

  1. The client starts again with, “Hello, I’m Peter, I would like to connect.”
  2. The server that has been configured to use the crypt method responds, “What’s crypt() of your password, using salt ‘ab’?” The salt is chosen randomly for each connection attempt.
  3. The client gets the user input and computes crypt("123456", "ab") and replies, “It’s ‘ab01FAX.bQRSU’.”
  4. The server checks whether crypt(pg_authid.rolpassword, "ab") is equal to “ab01FAX.bQRSU” and if so replies “OK”.

crypt() is a common and readily available Unix function to do, well, encryption, so it was an obvious candidate to put into service here. It fixes the issue of having the password on the wire in cleartext, but there are still a few existing and new problems:

  • The password is still in cleartext in the system catalog and storage.
  • The encryption method used by the original crypt() is now obsolete. Similarly, the salt length (2 bytes) is obsolete.
  • As a consequence of this, different vendors of Unix-like operating systems have expanded their crypt() call to use different encryption algorithms, but this has been done in incompatible ways. This is fine as long as crypt() is only used to encrypt passwords for local use, as was the original use, but it breaks when you want to communicate over a network between different systems.
  • crypt() might not be available on non-Unix systems. A replacement can be supplied, but if that is required, then it calls into question the original premise of using a facility readily available in the operating system.

md5

By this time, PostgreSQL supported SSL, so the cleartext on the wire issue was not so critical anymore. What was really bumming people out was the cleartext passwords in the system catalogs. So a new system was devised, which in pg_hba.conf is called "md5". It works like this:

  1. Client: “Hello, I’m Peter, I would like to connect.”
  2. Server: “What’s the MD5-thingy of your password, using salt ‘abcd’?” Again, the salt is chosen randomly for each connection attempt.
  3. The client gets the user input and computes: md5(md5("123456" + "peter") + "abcd"). (I’m using + here for string concatenation.) Here, “123456” is the password entered by the user, “peter” is the user name, and “abcd” is the salt. Then the client responds, “It’s ‘301eddd34d997f72bd43ba678e36a5ba’.”
  4. The server checks whether md5(pg_authid.rolpassword + "abcd") is equal to “301eddd34d997f72bd43ba678e36a5ba” and if so replies “OK”.

So what’s wrong with this?

  • Reading this now, using MD5 is obviously a red flag. Hashing methods become obsolete.
  • The salt length (4 bytes) is also obsolete.
  • The user name is used as the salt of the stored hashed password. (This was meant so that two users who happen to have the same password don’t have the same stored hash.) So renaming a user invalidates the stored hashed password and will require a new password to be assigned. This is perhaps not common but still annoying when it happens.
  • If someone happens to get a copy of the stored hashed passwords from the system catalogs (or perhaps a backup dump, say), they can use those to log in. You don’t need the actual passwords. For example, in step 3 above, a client could just send md5("the hash I found" + "abcd") without knowing “123456”. (You can’t do this with stock libpq, but making a custom version that can do this isn’t hard for a dedicated attacker.)

The lesson here is: Don’t design your own crypto.

scram

So all of this had to be rethought, and the current solution was introduced in PostgreSQL 10, using public standards: SASL (RFC 4422) and SCRAM (RFC 5802 and RFC 7677).

SASL is a protocol framework that allows clients and servers to negotiate an authentication mechanism. This is widely used for example in email: An SMTP or IMAP server might offer authentication mechanisms with names such as PLAIN, LOGIN, CRAM-MD5, or DIGEST-MD5, as well as perhaps SCRAM, although that appears to be rarer. The reason PostgreSQL uses SASL is mainly because SCRAM is defined over SASL, so it makes sense to follow it. The SASL functionality is otherwise not exposed to users.

SCRAM is an authentication mechanism. It’s in fact a family of authentication mechanisms, with different possible hash algorithms. When the implementation of SCRAM in PostgreSQL was initially considered, most previous uses of SCRAM employed SHA-1, but that was already on its way out and is as of this writing also deprecated, like MD5. So the algorithm that PostgreSQL currently uses is SHA-256, and the full name of the authentication method is SCRAM-SHA-256.

The whole thing looks approximately like this on the wire:

  1. Client: “Hello, I’m Peter, I would like to connect.”
  2. Server: “We’re going to do SASL authentication. Pick one of these methods: SCRAM-SHA-256”. (There is currently only one method, unless channel binding is offered, but I’m ignoring that in this blog post to keep things simple.)
  3. Client: “I pick SCRAM-SHA-256. Here is the first SASL data: n,,n=peter,r=rOprNGfwEbeRWgbNEkqO“. This data is assembled in accordance with the SCRAM specification and wrapped in a SASL protocol message. Here, the “n=” field contains the user name and the “r=” field contains a base64-encoded random string. The stuff at the beginning has to do with channel binding, which we will ignore here.
  4. Server: “Here is some SASL data: r=rOprNGfwEbeRWgbNEkqO%hvYDpWUa2RaTCAfuxFIlj)hNlF$k0,s=W22ZaJ0SNY7soEsUEjb6gQ==, i=4096“. The “r=” field contains the client’s random data with additional random data appended by the server. In addition, the server sends a salt (s=) and an interation count (i=), which it gets from the stored password of the user in question.
  5. Client: “Here is some SASL data: c=biws,r=rOprNGfwEbeRWgbNEkqO%hvYDpWUa2RaTCAfuxFIlj)hNlF$k0, p=dHzbZapWIk4jUhN+Ute9ytag9zjfMHgsqmmiz7AndVQ=“. The “r=” field is the same as before. The client and the server just send this back and forth to check that they are still talking to the right opposing side. The “p=” field is the password that the client user supplied, which is then hashed in a specific way, using the salt and iteration count provided. The “c=” field is for channel binding.
  6. The server now checks this data against what it has stored locally. The details of that are omitted here. If it is satisfied, then it sends back: “Here is the final SASL data: v=6rriTRBi23WpRR/wtup+mMhUZUn/dB5nLTJRsjl95G4=“. This is called a verifier that allows the client to check that the server actually checked the password and wasn’t just waving everyone through.
  7. The client then checks the verifier, and if it’s satisfied, then the session can proceed.

There is a lot going on here. This solves all of the problems we have discussed so far, and a few we hadn’t even thought of yet:

  • The password is not in cleartext on the wire.
  • The password is not in cleartext in the system catalog or underlying storage.
  • The password never exists in cleartext in the server process. In fact, the cleartext password never leaves the client. The client hashes it in a certain way and the server compares it to the hashed information it has, but the server never sees the actual password.
  • None of the information that the client sends can be used by anyone else to log in, even if the whole exchange is captured. That is because both the client and the server use different random data in each connection attempt.
  • Each stored password is hashed with a different salt, so there is virtually no risk of the stored passwords accidentally being the same for different users. Also, the salt is independent of the user name or other properties of the user.
  • Salt lengths can easily be changed. A user can just create a new password with a different salt.
  • Algorithms can be added to this design in a systematic way. Note, this would still require software changes and wouldn’t be entirely painless, but at least there is a defined way to do it.
  • As mentioned in the last point above, the client can verify whether the server actually checked the password. In client/server authentication, we usually think primarily about a server trying to prevent unauthorized clients from connecting. The idea there is that the server has valuable data that unauthorized clients might want to get at. But the reverse could also happen: A client connecting to a fake server and sending it valuable data that the server is not supposed to get. Such a server would just happily let in any client that happens to connect, without actually checking the password. SCRAM can prevent that. Obviously, SSL/TLS is a more elaborate and complete solution to checking whether a node in a network is trustworthy, and SCRAM is not meant to obviate the need for that.

So that’s where PostgreSQL is right now.

ldap et al.

There is another set of password-related authentication methods in PostgreSQL:

  • ldap
  • radius
  • pam
  • bsd

As far as the client and the protocol is concerned, these are equivalent to the plain-text authentication method "password". The only difference is that the server does not compare the password to what is stored in pg_authid but the respective external service. So for example, LDAP authentication works like this:

  1. Client: “Hello, I’m Peter, I would like to connect.”
  2. Server: “What’s your password?”
  3. Client: “It’s ‘123456’.”
  4. Now the server checks with the LDAP server about password “123456”. This itself potentially involves many details. If the check is successful, the server says “OK”.

So this avoids having the password stored in cleartext in the database, but it still has all the other problems associated with this method. Using SSL for the PostgreSQL connection and configuring the LDAP server and connection securely alleviates many of these concerns, but it won’t be as bullet-proof as SCRAM. (Another piece of advice is to consider using Kerberos instead of LDAP if the goal is to have an organization-wide centralized password storage, but that’s a whole nother topic.)

Conclusion

Security and cryptography is hard. With SCRAM, PostgreSQL uses recognized public standards and is in a good spot now and can adapt in the future.

The PostgreSQL wiki contains a list of drivers and shows which ones support SCRAM: https://wiki.postgresql.org/wiki/List_of_drivers. Support appears to be pervasive now, more than two years after the initial release of PostgreSQL 10. If you are still using MD5-hashed passwords, it’s now definitely time to upgrade.

Share this

Relevant Blogs

Random Data

This post continues from my report on Random Numbers. I have begun working on a random data generator so I want to run some tests to see whether different random...
December 03, 2020

More Blogs

Full-text search since PostgreSQL 8.3

Welcome to the third – and last – part of this blog series, exploring how the PostgreSQL performance evolved over the years. The first part looked at OLTP workloads, represented...
November 05, 2020

Números aleatorios

He estado trabajando gradualmente en el desarrollo desde cero de herramientas para probar el rendimiento de los sistemas de bases de datos de código abierto. Uno de los componentes de...
November 04, 2020