Usually, benchmarks are measured in transactions per second, but the TPC-C and TPROC-C, benchmarks are measured in new orders per minute (NOPM). What is a new order? It’s simply a predefined operation on the database, that even fails 1% of the time. This is a better metric than transactions because you can have transactions on the database that are not part of the benchmark. Autoanalyze uses a transaction, for example.
I do TPROC-C benchmarking, which is the unofficial version of TPC-C that forgoes things like think-time. HammerDB can do both types.
Benchmarking with NOPM also allows me to use monitoring tools that cause transactions, but don’t (obviously) create new orders. If we were measuring transactions, we could just send a solitary semicolon to the database in a tight loop and get a whole bunch for free. By measuring new orders, we are testing how fast the database can do actual work. A side effect of this is that the NOPM are much lower than the TPM. That doesn’t matter. What matters is that the NOPM stay relatively constant compared to each other.
PostgreSQL Development with Daily Users
Every day, I do a HammerDB benchmark run for two hours using 500 users on the first commit of the day UTC. I do this to detect if a new patch has inadvertently caused a drop in performance.
Here is the graph for January 2021:
This is the raw data:
date | nopm | catversion | git hash |
---|---|---|---|
2021-01-01 | 490,109 | 202012293 | 4d3f03f42227bb351c2021a9ccea2fff9c023cfc |
2021-01-02 | 516,539 | 202012293 | ca3b37487be333a1d241dab1bbdd17a211a88f43 |
2021-01-03 | 202012293 | ||
2021-01-04 | 476,715 | 202012293 | a271a1b50e9bec07e2ef3a05e38e7285113e4ce6 |
2021-01-05 | 512,718 | 202012293 | fe05b6b620066aec313c43b6b4d6c169d0a346f7 |
2021-01-06 | 490,625 | 202012293 | 14d49f483d4c8a5a356e25d5e5ff5726ca43abff |
2021-01-07 | 497,247 | 202012293 | 55fe26a4b580b17d721c5accb842cc6a08295273 |
2021-01-08 | 521,308 | 202012293 | 9ffe2278372d7549547176c23564a5b3404d072e |
2021-01-09 | 423,147 | 202012293 | e33d004900f76c35759293fdedd4861b198fbf5b |
2021-01-10 | 202012293 | ||
2021-01-11 | 472,975 | 202012293 | 13a021f3e8c99915b3cc0cb2021a948d9c71ff32 |
2021-01-12 | 423,274 | 202012293 | d5ab79d815783fe60062cefc423b54e82fbb92ff |
2021-01-13 | 527,537 | 202012293 | fce7d0e6efbef304e81846c75eddf73099628d10 |
2021-01-14 | 484,433 | 202101131 | aef8948f38d9f3aa58bf8c2d4c6f62a7a456a9d1 |
2021-01-15 | 454,665 | 202101131 | 5e5f4fcd89c082bba0239e8db1552834b4905c34 |
2021-01-16 | 450,128 | 202101131 | c95765f47673b16ed36acbfe98e1242e3c3822a3 |
2021-01-17 | 509,422 | 202101171 | 960869da0803427d14335bba24393f414b476e2c |
2021-01-18 | 497,267 | 202101171 | a3dc926009be833ea505eebd77ce4b72fe708b18 |
2021-01-19 | 469,663 | 202101181 | ed43677e20369040ca4e50c698010c39d5ac0f47 |
2021-01-20 | 505,070 | 202101181 | 21378e1fefedcaed3d855ae7aa772555295d05d6 |
2021-01-21 | 467,455 | 202101181 | 733d670073efd2c3a9df07c225006668009ab793 |
2021-01-22 | 429,751 | 202101181 | af0e79c8f4f4c3c2306855045c0d02a6be6485f0 |
2021-01-23 | 486,797 | 202101181 | 3fc81ce459e1696f7e5e5b3b8229409413bf64b4 |
2021-01-24 | 476,970 | 202101181 | 39b66a91bdebb00af71a2c6218412ecfc89a0e13 |
2021-01-25 | 422,180 | 202101181 | 40ab64c1ec1cb9bd73695f519cf66ddbb97d8144 |
2021-01-26 | 491,500 | 202101181 | ee895a655ce4341546facd6f23e3e8f2931b96bf |
2021-01-27 | 488,290 | 202101181 | 4c9c359d38ff1e2de388eedd860785be6a49201c |
2021-01-28 | 451,614 | 202101181 | f854c69a5b36ba7aa85bee9e9590c3e517970156 |
2021-01-29 | 494,281 | 202101181 | 514b411a2b5226167add9ab139d3a96dbe98035d |
2021-01-30 | 471,347 | 202101181 | f77717b2985aa529a185e6988de26b885ca10ddb |
2021-01-31 | 445,362 | 202101181 | 0c4f355c6a5fd437f71349f2f3d5d491382572b7 |
(There were no commits all on the 3rd and the 10th.)
Because it’s only a single run as opposed to an average of five runs or so, there is some variation every day, but no major impact. This is good news.
Making Sense of the Data
If you would like to study all of the data that I collect before, during, and after a run; I have set up a public repository. My goal for the near future is to write some scripts to extract data and make pretty graphs, facilitating the analysis of these runs.
Tuning PostgreSQL is very important, but it is equally important to resist tuning for the benchmark. This is why I am not concerned with the exact number of NOPM I get, only that it remains relatively stable.
In a future blog post, I will share the scripts that I use to run these benchmarks so that you can run them yourselves, so stay tuned!