Benchmarking PostgreSQL with NOPM: The Daily 500 Users

February 18, 2021

Usually, benchmarks are measured in transactions per second, but the TPC-C and TPROC-C, benchmarks are measured in new orders per minute (NOPM). What is a new order? It’s simply a predefined operation on the database, that even fails 1% of the time. This is a better metric than transactions because you can have transactions on the database that are not part of the benchmark. Autoanalyze uses a transaction, for example.

I do TPROC-C benchmarking, which is the unofficial version of TPC-C that forgoes things like think-time. HammerDB can do both types.

Benchmarking with NOPM also allows me to use monitoring tools that cause transactions, but don’t (obviously) create new orders. If we were measuring transactions, we could just send a solitary semicolon to the database in a tight loop and get a whole bunch for free. By measuring new orders, we are testing how fast the database can do actual work. A side effect of this is that the NOPM are much lower than the TPM. That doesn’t matter. What matters is that the NOPM stay relatively constant compared to each other.

 

PostgreSQL Development with Daily Users

Every day, I do a HammerDB benchmark run for two hours using 500 users on the first commit of the day UTC. I do this to detect if a new patch has inadvertently caused a drop in performance.

Here is the graph for January 2021:

This is the raw data:

 

 

datenopmcatversiongit hash
2021-01-01490,1092020122934d3f03f42227bb351c2021a9ccea2fff9c023cfc
2021-01-02516,539202012293ca3b37487be333a1d241dab1bbdd17a211a88f43
2021-01-03 202012293 
2021-01-04476,715202012293a271a1b50e9bec07e2ef3a05e38e7285113e4ce6
2021-01-05512,718202012293fe05b6b620066aec313c43b6b4d6c169d0a346f7
2021-01-06490,62520201229314d49f483d4c8a5a356e25d5e5ff5726ca43abff
2021-01-07497,24720201229355fe26a4b580b17d721c5accb842cc6a08295273
2021-01-08521,3082020122939ffe2278372d7549547176c23564a5b3404d072e
2021-01-09423,147202012293e33d004900f76c35759293fdedd4861b198fbf5b
2021-01-10 202012293 
2021-01-11472,97520201229313a021f3e8c99915b3cc0cb2021a948d9c71ff32
2021-01-12423,274202012293d5ab79d815783fe60062cefc423b54e82fbb92ff
2021-01-13527,537202012293fce7d0e6efbef304e81846c75eddf73099628d10
2021-01-14484,433202101131aef8948f38d9f3aa58bf8c2d4c6f62a7a456a9d1
2021-01-15454,6652021011315e5f4fcd89c082bba0239e8db1552834b4905c34
2021-01-16450,128202101131c95765f47673b16ed36acbfe98e1242e3c3822a3
2021-01-17509,422202101171960869da0803427d14335bba24393f414b476e2c
2021-01-18497,267202101171a3dc926009be833ea505eebd77ce4b72fe708b18
2021-01-19469,663202101181ed43677e20369040ca4e50c698010c39d5ac0f47
2021-01-20505,07020210118121378e1fefedcaed3d855ae7aa772555295d05d6
2021-01-21467,455202101181733d670073efd2c3a9df07c225006668009ab793
2021-01-22429,751202101181af0e79c8f4f4c3c2306855045c0d02a6be6485f0
2021-01-23486,7972021011813fc81ce459e1696f7e5e5b3b8229409413bf64b4
2021-01-24476,97020210118139b66a91bdebb00af71a2c6218412ecfc89a0e13
2021-01-25422,18020210118140ab64c1ec1cb9bd73695f519cf66ddbb97d8144
2021-01-26491,500202101181ee895a655ce4341546facd6f23e3e8f2931b96bf
2021-01-27488,2902021011814c9c359d38ff1e2de388eedd860785be6a49201c
2021-01-28451,614202101181f854c69a5b36ba7aa85bee9e9590c3e517970156
2021-01-29494,281202101181514b411a2b5226167add9ab139d3a96dbe98035d
2021-01-30471,347202101181f77717b2985aa529a185e6988de26b885ca10ddb
2021-01-31445,3622021011810c4f355c6a5fd437f71349f2f3d5d491382572b7

(There were no commits all on the 3rd and the 10th.)

Because it’s only a single run as opposed to an average of five runs or so, there is some variation every day, but no major impact. This is good news.

 

Making Sense of the Data

If you would like to study all of the data that I collect before, during, and after a run; I have set up a public repository. My goal for the near future is to write some scripts to extract data and make pretty graphs, facilitating the analysis of these runs.

Tuning PostgreSQL is very important, but it is equally important to resist tuning for the benchmark. This is why I am not concerned with the exact number of NOPM I get, only that it remains relatively stable.

In a future blog post, I will share the scripts that I use to run these benchmarks so that you can run them yourselves, so stay tuned!
 

Share this