Benchmarking Prometheus like a pro with k6

In our previous posts about choosing a Prometheus remote storage we have seen how to set up a benchmarking infrastructure and how to benchmark promql performance. We have been able to obtain results but the whole benchmark is flawned in many ways:

  • it’s expensive as you need to spawn more than necessary to assess a particular point of your remote storage.
  • it’s hard to reproduce 100% the same setup, even with the same configuration and software version you will have a similar result but not exactly the same.
  • you’re not always benchmarking what you think you are. We have spent couple of time troubleshoot performance issue which where in prometheus or haproxy configuration.
  • it focus mainly on the write path without stress from the read path which is not realistic.
Benchmarking Prometheus like a pro with k6

This blog post discuss how we should have benchmark our remote storage.

How to do a good benchmark? K6 to the rescue

A good benchmark need to be accurate and reproducible. More over for our usecase we want to have a tool who takes into account both Prometheus’s read and write path. Finally, we need to be able to remove all unnecessary pieces. This way we are able to focus on the remote storage only.

Such software could be a project on its own but fortunately for us there is one opensource solution for that: K6

K6 is a general purpose  modern load testing which can be extended with module to support Prometheus remote storage. Sounds interesting don’t you think?

In our previous blog post we have explained how we have built our benchmarking infrastructure which was rather complex  to be accurate.

generic-infrastructure.png

With k6 as benchmarking tool the infrastructure can be greatly simplified:

With k6 as benchmarking tool the infrastructure can be greatly simplified

K6 is quite flexible and configurable. Its input is a load testing script, you can either write your own script or reuse an opensourced one. As the whole logic is in the load testing script it become easily reproducible which is exactly what we need.

To launch a benchmark you need two piece of infrastructure:

For our use case we have chosen to reuse the load testing from Grafana which does exactly what we are looking for. All useful information to tune and assess your remote storage are outputed by k6.

     ✓ write worked

     █ instant query high cardinality

       ✓ expected request status to equal 200
       ✓ has valid json body
       ✓ expected status field to equal 'success'
       ✓ expected data.resultType field to equal 'vector'

     █ range query

       ✓ expected request status to equal 200
       ✓ has valid json body
       ✓ expected status field is 'success' to equal 'success'
       ✓ expected resultType is 'matrix' to equal 'matrix'

     █ instant query low cardinality

       ✓ expected request status to equal 200
       ✓ has valid json body
       ✓ expected status field to equal 'success'
       ✓ expected data.resultType field to equal 'vector'

     checks............................................................................: 100.00% ✓ 1454     ✗ 0
     ✓ { type:read }...................................................................: 0.00%   ✓ 0        ✗ 0
     ✓ { type:write }..................................................................: 100.00% ✓ 6        ✗ 0
     data_received.....................................................................: 1.0 MB  8.4 kB/s
     data_sent.........................................................................: 277 kB  2.3 kB/s
     group_duration....................................................................: avg=64.61ms min=39.94ms med=60.43ms max=230.05ms p(90)=80.39ms p(95)=107.93ms
     http_req_blocked..................................................................: avg=4.65ms  min=2µs     med=6µs     max=96.84ms  p(90)=11µs    p(95)=58.42ms
     http_req_connecting...............................................................: avg=1.31ms  min=0s      med=0s      max=21.87ms  p(90)=0s      p(95)=16.99ms
     http_req_duration.................................................................: avg=53.7ms  min=34.23ms med=52.71ms max=164.1ms  p(90)=67.02ms p(95)=71.82ms
       { expected_response:true }......................................................: avg=53.7ms  min=34.23ms med=52.71ms max=164.1ms  p(90)=67.02ms p(95)=71.82ms
     ✓ { type:read }...................................................................: avg=53.8ms  min=34.23ms med=52.76ms max=164.1ms  p(90)=66.85ms p(95)=71.62ms
     ✓ { url:https://admin:security-matters@remote-storage.poc.ovh.net/api/v1/push }...: avg=0s      min=0s      med=0s      max=0s       p(90)=0s      p(95)=0s
     http_req_failed...................................................................: 0.00%   ✓ 0        ✗ 368
     http_req_receiving................................................................: avg=92.34µs min=32µs    med=89µs    max=301µs    p(90)=125.3µs p(95)=150µs
     http_req_sending..................................................................: avg=49.05µs min=12µs    med=40µs    max=566µs    p(90)=68µs    p(95)=94.59µs
     http_req_tls_handshaking..........................................................: avg=3.11ms  min=0s      med=0s      max=54.28ms  p(90)=0s      p(95)=39.39ms
     http_req_waiting..................................................................: avg=53.56ms min=33.94ms med=52.56ms max=163.93ms p(90)=66.88ms p(95)=71.66ms
     http_reqs.........................................................................: 368     3.064697/s
     iteration_duration................................................................: avg=64.88ms min=40.34ms med=60.78ms max=230.27ms p(90)=80.87ms p(95)=108.47ms
     iterations........................................................................: 368     3.064697/s
     vus...............................................................................: 26      min=26     max=26
     vus_max...........................................................................: 26      min=26     max=26

What a time saver? With k6 we have been able to efficiently assess all remote storage solutions. This is a significative improvement if we compare it to our previous benchmarking plan.

The next and final post will be about which remote storage we have chosen to be our internal solution.

Stay tuned.

+ posts

After 10 years as a Sysadmin in High Performance Computing, Wilfried Roset is now part of OVHcloud as Engineering Manager for their Databases product Unit. He focuses on industrialization, reliability and performances for both internal and public clusters offers.