Interpretability Engine: An open-source tool to interpret your models in ML Serving


Artificial Intelligence is now a part of our daily life and the services it provides continue to flourish in multiple industries. At OVHCloud, we have recently launched ML Serving; a service to deploy Machine Learning models, ready to use in production.

Its likely you’ve read some articles relating to Machine Learning (ML) techniques or even used ML models – this may have made you doubt their use in production systems. Indeed, even if a model is provide interesting results, the results can seem obscure to practitioners. It’s not always obvious why the model made a decision, especially when there are plenty of features, with various values and labels. For inexperienced users, a prediction will seem as if by magic. In response to this, a new field of research known as interpretability aims to demystify black-box models. The research interprets explanations of predictions, which in turn gives users confidence in the results. By the same token, it affirms a models use in production.

To overcome this challenge, we initiated a project with the University of Lille and Central Lille that consisted of making state-of-the-art model-agnostic interpretability methods.

A classification of the method is depicted in the figure above. Firstly, we separate the local methods from the global methods. The local methods explain a single prediction made by the model while global methods explore the behaviour of all the features for a set of samples. Secondly, we distinguish the way the methods represent the results (e.g., rank the feature importance, graph of the impact prediction for the values of the features, tree of rules, and so on…).

In interpretability engine, we started to focus on a local method called PDP which is computationally efficient and easy to understand.

Interpretability Engine

The use of interpretability engine is pretty simple. To install the tool you can type pip install interpretability-engine, and then use it through your CLI.

At this step, we assume that you already have a model deployed, with the token, the deployment url, and samples. Then you can try to explain your model with the following command:

interpretability-engine --token XXX --deployment-url https://localhost:8080 --samples-path my_dataset.csv --features 0 1 2 --method 'pdp'

Remark: 0 1 2 are the indexes of the features

You can also retrieve the samples stored in the object storage through swift, see: interpretability-engine --help.

Example on the dataset iris

Now let’s try a concrete example on the iris dataset. We trained a model with scikit, export it to the ONNX format and deploy on ML Serving, see documentation.

Then, we used the whole dataset iris as a sample, this allowed us to retrieve the maximum information on the model.

The samples look like this (csv format):



  • Each line is an instance and each column is a feature.
  • The first column corresponds to the sepal.length, the second one to sepal.width, the third to petal.length and the last one to petal.width.
  • The order of the feature should respect the order of the feature in the exported format, see limitations detailed below.

Then we run the following command:

interpretability-engine --token xxx --deployment-url --samples-path iris.csv --features 0 1 2 3 --feature-names "sepal.length" "sepal.width" "petal.length" "petal.width" --label-names "setosa" "vergicolor" "virginica"

And we obtain the following figure:

PDP axis corresponds to the average impact to the model. In this example, it’s their probability for each label. For example, the higher the sepal length, the higher the probability it will be a ‘verginica’ species.

Note: If you want to save your result, use the option --output-file you-result-filename.pdf

Limitations and future work

Interpretability Engine is the first step towards explaining a model, but the current implementation includes limitations.

Among them, is the lack of methods. For now, only one global method is available (PDP). We plan to add additional methods, that are more computationally efficient, such as ALE. We also plan to add local methods, with the aim of explaining one prediction. The most efficient one in terms of computation is LIME. Other methods, such as [Anchors]() or [SHAP](), sound promising, but remain expensive for now.

Another limitation relates to the feature names which depends on how you exported your model. A model trained on scikit and exported to ONNX format will only have one input with a shape corresponding to the number of features. Hence, features do not have names and are based on their indexes. To overcome this issue, we allow the user to map the index of features with names through the option --feature-names. In this way, when the explanation is exported, it’s easier to read. A similar issue exists with the labels, in this case you can use the option --label-names.

We did not try the PDP method on more complicated examples (pictures, videos, song). We only considered the data in a tabular representation and we need a mechanism to handle other cases.

For now, the exported results are readable when there are some features, but a question remains on how to represent hundreds of features with PDP methods?

Open Source

Open source is part of the culture at OVHCloud, so we made the tool available on Github. If you have any thoughts of improvements, please do not hesitate to let us know through a PR.


+ posts