Erlenmeyer and PromQL compatibility

Today in the monitoring world, we see the rise of the Prometheus tool. It’s a great tool to deploy in your infrastructure, as it allows you to scrap all of your servers or applications to retrieve, store and analyze the metrics. And all you have to do is to extract and run it, it does all the work by itself. Of course, Prometheus comes with some trade-offs (pull, how to handle late ingestion), and some limits, as you have your data only for a couple of days.

Erlenmeyer and PromQL compatibility

Context

How is it possible to handle Prometheus long-time storage? A vast amount of Time Series DataBase are now fully compatible with Prometheus. It’s easy to check that Prometheus ingest is working well, however, how can we validate the PromQL – or Prometheus queries – part? A few months ago, PromLab released a new tool called “PromQL compliance tester“. They recently created this page where they reference the result of several products PromQL compliance tests. On this blog post, we will see how this tool helps us improve our PromQL implementation.

Compliance tester

The PromQL compliance tester is open source and contains a full set of tests. When using this tool, it generates for you around 500 PromQL queries covering the vast majority of the language. It includes tests on simple scalar, selectors, time range functions, operators, and so on. This tool will execute a request on both a Prometheus instance and the tested backend. It will then expect you to get the same result as PromQL output. It expects an exact match for all metadata of a series (tags and names). It’s more flexible for the ticks as you can set a parameter to round your check at the milliseconds. Finally, the compliance tool checks the equality of both query values, as many things can impact the floating predictability, it computes an approximated equality.

Erlenmeyer

At Metrics, we used a Warp10 TSDB with it’s own analytical query engine WarpScript. We decided to build an open source tool to transpile PromQL queries into WarpScript called Erlenmeyer. This compliance tester was a great help to validate some of our implementation and to detect which query were not fully ISO.

Erlenmeyer and PromQL compatibility

Set up

To start testing our PromQL experience, we set up a local Prometheus with a default configuration. This configuration makes Prometheus run and collect some “Demo” Metrics, then we forwarded all of them to one of our Metrics regions using Prometheus remote write. We added a local instance of Erlenmeyer to query the data stored in a distributed Warp10 backend. Then, we iterated on each set of tests of the PromLab compliance tool to identify all issues and improved our existing PromQL implementation.

To be compliant, we had to reduce the precision for the value of the compliance tool. We set the precision to 0.001 instead of 0.00001. We also had to remove the Warp10 .app label from the result. As on Warp10 instance, we identify users based on this .app label.

A test query

When running the test, you will get a full report of your failing queries. Let’s take an example:

RESULT: FAILED: Query returned different results:
  model.Matrix{
  	&{
  		Metric: Inverse(DropResultLabels, s`{instance="demo.promlabs.com:10002", job="demo"}`),
  		Values: []model.SamplePair{
  			... // 52 identical elements
  			{Timestamp: s"1606323726.058", Value: Inverse(TranslateFloat64, float64(2.6928936527e+10))},
  			{Timestamp: s"1606323736.058", Value: Inverse(TranslateFloat64, float64(2.691644054725e+10))},
  			{
  				Timestamp: s"1606323746.058",
- 				Value:     Inverse(TranslateFloat64, float64(2.6922272529119648e+10)),
+ 				Value:     Inverse(TranslateFloat64, float64(2.689432207325e+10)),
  			},
  			{Timestamp: s"1606323756.058", Value: Inverse(TranslateFloat64, float64(2.6915188293125e+10))},
  			{Timestamp: s"1606323766.058", Value: Inverse(TranslateFloat64, float64(2.69215848005e+10))},
  			... // 4 identical elements
  		},
  	},
  }

The test reports includes all errors occurring during the test. In this example, we can see, that for a single series we have 56 correct values. However one is invalid, we see it on two lines. The first one is the one starting by “-“. This stands for the expected value. And the second one starting by a “+” corresponds to the tested instance value. In this case, the value isn’t precise enough (2.68 instead of 2.69).

Results

Now that we have a full test set-up running, we can see what we improved from its results. If you want to access the full detailed fixes, you can check the code update made here. This tool helped us to fix some implementation, sanitize known issues, to know what PromQL features we missed, and detect a few new bugs! Let’s review the change.

Quick implementation fixes

Running those test was a great help for us to understand some of implementations errors we had when trying to match PromQL behavior. For example, the time range function was sampling before computing the operation. Reversing those steps provided us a direct match with a native query. It also helped us also fix some minor bugs on how to handle the comparison operators or multiple functions as label_replace, holt_winters, predict_linear or the full set of time functions (hour, minute, month…).

We improved also our handling of PromQL operator aggregators : by and without.

Sanitize known issues

We discovered recently, that we were not matching PromQL behavior on the series name. As a result we were keeping the name for all compute operations. Prometheus has, however, a different approach as the name is only kept when it’s relevant. The compliance tester helps us on how to validate this specific update for all queries.

With this tool, we test the validity of a query compared to a native PromQL query, it helps us to sanitize our query output. We knew that, in case of missing values or empty series, we were not ISO compliant. We have corrected the part of the Erlenmeyer software handling the output to match all PromQL cases included in the tests.

Unimplemented features

Running the test, lead us to discover that we missed some PromQL native features. As a matter of fact, Erlenmeyer now supports the PromQL unary or the “bool” keywords. The support of unary allows the use of “-my_series” for example. In PromQL, the bool keywords convert the result to booleans. It returns as series values 1 or 0 depending on the condition, where 1 stands for true and zero for false.

Open issues

Running all compliance tests and improving our code base lead to us to around 91% of success. For the rest, we open new issues on Erlenmeyer, we detected that:

  • the handling of the over_time function is not correct when the range is below the data point frequency,
  • rate, delta, increase and predict_linear, our result isn’t precise enough to match PromQL output when then the range is below 5 minutes,
  • some minor bugs on series selector (!=), or on the label_replace (some checks are missing on parameters validators),
  • the PromQL subqueries, as well as, some functions are not implemented: ^ and % on two series set and the deriv function.

Those are the 4 missing points to cover the full PromQL feature set with Erlenmeyer. Our documentation already contained all the missing implementations.

Actions

This tool was a great help to improve our PromQL compliance and we are happy with our compliance result. Indeed we reach 91% with the provided test result:

General query tweaks:
*  Metrics test
================================================================================
Total: 496 / 541 (91.68%) passed

Our next action, is to release those fixes and improvements on all our Metrics regions. Looking forward to see what you think about our PromQL implementation!

We now see a lot of projects are implementing Prometheus writes and reads. These projects bring Prometheus a lot of missing features like long-term storage, delete, late ingestion, historical data analysis, HA… Being able to validate PromQL implementation is a big challenge, and is a great help in choosing the right backend according to the need.

+ posts

Software Developer on the OVHcloud Metrics Data Platform and data lover!