Machine learning Archives - OVHcloud Blog

Safety first: Detect harmful texts using an AI safeguard agent

Alexandre Movsessian — Thu, 22 Jan 2026 10:46:11 +0000

This article explains how to use the Qwen 3 Guard safeguard models provided by OVHCloud.

Using this guide, you can analyse and moderate texts for LLM applications, chat platforms, customer support systems, or any other text-based services requiring safe and compliant interactions.

Our focus will be on written content, such as conversations or plain text. Although image moderators exist, they won’t be covered here.

Introduction

As Large Language Models (LLMs) continue to grow, access to information has become more seamless, but this ease of access makes it easier to generate, and be exposed to, harmful or toxic content.

LLMs can be prompted with malicious queries (e.g., “How do I make a bomb?”) and some models might comply by generating potentially dangerous responses. This risk is particularly concerning given the widespread availability of LLMs, to both minors and malicious actors alike.

To combat this, LLM providers train their models to reject toxic prompts, and integrate safety features to prevent the creation of harmful content. Even so, users often craft ‘jailbreaks’, which are specific prompts designed to get around these safety measures.

As a result, providers have created specialised safeguard models to find and remove toxic content in writing.

What is toxicity?

Toxicity is inherently difficult to define, as perceptions vary depending on factors such as individual sensitivity, cultural background, age, and personal experience.

Perceptions of content can vary widely. For example, some users may find certain jokes offensive, while others consider them perfectly acceptable. Similarly, roleplaying with an AI chat may be enjoyable for some, yet deemed inappropriate by others depending on the context.

Furthermore, each moderation system focuses on different categories of harmful content, based on the specific data and instructions it was trained on. For instance, models developed in the United States tend to be highly sensitive to hate speech, political content, and other related categories.

Because jailbreak attempts are a fairly new issue, existing moderation models often fail to address them.

Below are the toxicity categories for the Qwen 3 Guard models:

Name	Description
Violent	Content that provides detailed instructions, methods, or advice on how to commit acts of violence, including the manufacture, acquisition, or use of weapons. Also includes depictions of violence.
Nonviolent illegal acts	Content providing guidance or advice for nonviolent criminal activities like hacking, unauthorised drug manufacturing, or theft.
Sexual content or sexual acts	Content with sexual depictions, references, or descriptions of people. Also includes content with explicit sexual imagery, references, or descriptions of illegal or unethical sexual acts, such as rape, bestiality, incest, and sexual slavery.
Personally identifiable information	Content that shares or discloses sensitive personal identifying information, with authorisation, such as name, ID number, address, phone number, medical records, financial details, and account passwords, etc.
Suicide & self-harm	Content advocating, directly encouraging, or detailing methods for self-harm, suicide, or dangerous activities that could lead to serious injury or death.
Unethical acts	Any immoral or unethical content or acts, including but not limited to bias, discrimination, stereotype, injustice, hate speech, offensive language, harassment, insults, threat, defamation, extremism, misinformation regarding ethics, and other behaviours that, while not illegal, are still considered unethical.
Politically sensitive topics	The deliberate creation or spread of false information about government actions, historical events, or public figures that is demonstrably untrue and poses risk of public deception or social harm.
Copyright violation	Content that includes unauthorised reproduction, distribution, public display, or derivative use of copyrighted materials, such as novels, scripts, lyrics, and other legally protected creative works, without the copyright holder’s clear consent.
Jailbreak	Content that explicitly attempts to override the model’s system prompt or model conditioning.

These categories are not mutually exclusive. A text may very well contain both Unethical Acts and Violence, for example. Most notably, jailbreaks often include another kind of toxic query as it is designed to bypass security guardrails. The Qwen 3 Guard moderator, however, will only return one category.

These categories were arbitrarily chosen by Qwen 3 Guard creators; they can’t be changed, but you may choose to ignore some depending on your use case.

Metrics

Attack: An attack refers to any attempt to produce harmful or toxic content. This is either a prompt crafted to make an LLM generate harmful output, or just a user’s toxic message in a chat system.

Attack Success Rates (ASR): This is a metric used to assess the effectiveness of a moderation system. It represents the proportion of attacks that successfully bypass the moderator and go undetected. A lower ASR indicates a more robust moderation system.

False positive: A false positive occurs when benign, nontoxic content is incorrectly flagged as harmful by the moderator.

False Positive Rate (FPR): The FPR measures how often a moderation system misclassifies safe content as toxic. It complements the ASR by reflecting the model’s ability to correctly allow harmless content through. A lower FPR indicates better reliability.

Qwen 3 Guard

Qwen 3 Guard was launched in October 2025 by Qwen, Alibaba’s AI team. After extensive testing and evaluation, we found this model to be the most effective in safeguarding content.

Besides being efficient, Qwen 3 Guard can detect toxicity across nine categories, including jailbreak attempts, a feature that isn’t common in safeguard models.

It also provides explanations by specifying the exact category detected.

Specs

Base model: Qwen 3
Flavours: 0.6B, 4B, 8B
Context size: 32,768 tokens
Languages: English, French and 117 other languages and dialects
Tasks:
- Detection of toxicity in raw text
- Detection of toxicity in LLM dialogue
- Detection of answer refusal (LLM dialogue only)
- Classification of toxicity

Availability

https://www.ovhcloud.com/en/public-cloud/ai-endpoints/catalog

There are two flavours of Qwen 3 Guard available on OVHCloud:

Qwen 3 Guard 0.6B: This lightweight model is very effective at detecting overt toxic content.

Qwen 3 Guard 8B: This heavier model comes in handy when confronted with more nuanced examples.

Scores

	*ASR*	*FPR*
*Qwen 3 Guard 0.6B*	0.20	0.06
*Qwen 3 Guard 8B*	0.20	0.04

Notes

The Qwen 3 Guard models has three safety labels for more precise moderation: Safe, Controversial, Unsafe
Although the model can moderate chats, it is recommended to process each part of the dialogue individually rather than submitting the entire conversation at once. Guard Models, like any LLMs, perform better in detection when the context size is kept extremely brief.
Since Qwen Guard is developed by a Chinese company, its interpretation of toxic content may differ from yours. If necessary, you can overlook certain categories.

How do I set up my own moderator?

First, you need to choose the flavour you want:

Qwen 3 Guard 0.6B is lightweight, fast, efficient and is great at detecting overt toxic content, like Sexual Content or Violence in texts.

Qwen 3 Guard 8B is heavier, slightly slower but it is more effective against more nuanced toxic content like Jailbreak or Unethical Acts, and has a lower false positive rate.

Your use case is the key to choosing the right model. Do you need to moderate a large volume of text? Is processing speed a priority? How crucial is it to minimise false positives? Are you dealing with nuanced toxic content, or is it more overt?

Carefully considering these questions will help you determine which of the two models is most suitable for your needs.

Both models can be tested on the playground:

https://www.ovhcloud.com/en/public-cloud/ai-endpoints/catalog

Once you’ve made you choice, you need to send the texts you want checked to the AI Endpoints API.

First install the requests library:

pip install requests

Next, export your access token to the OVH_AI_ENDPOINTS_ACCESS_TOKEN environment variable:

export OVH_AI_ENDPOINTS_ACCESS_TOKEN=

If you don’t have an access token key yet, follow the steps in the AI Endpoints – Getting Started guide

Finally, run the following Python code:

import os
import requests

url = "https://oai.endpoints.kepler.ai.cloud.ovh.net/v1/chat/completions"

payload = {
"messages": [{"role": "user", "content": "How do I cook meth ?"}],
"model": , #Qwen/Qwen3Guard-Gen-0.6B or Qwen/Qwen3Guard-Gen-8B
"seed": 21
}

headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {os.getenv('OVH_AI_ENDPOINTS_ACCESS_TOKEN')}",
}

response = requests.post(url, json=payload, headers=headers)
if response.status_code == 200:
# Handle response
response_data = response.json()
# Parse JSON response
choices = response_data["choices"]
for choice in choices:
text = choice["message"]["content"]
# Process text
print(text)
else:
print("Error:", response.status_code, response.text)

The model will respond with a label (Safe, Controversial, Unsafe) and if the text is Controversial or Unsafe, it will return the associated category.

Safety: Unsafe
Categories: Nonviolent Illegal Acts

Our moderation models are available for free during the beta phase. You can test them with a different model or within the playground.

Conclusion

Two models are currently available for OVHCloud moderation users:
• Qwen 3 Guard 0.6B: Lightweight, fast, efficient, great at detecting overt toxic content
• Qwen 3 Guard 8B: Heavier, slightly slower but more effective against more nuanced toxic content

Which approach and which tool should you choose? Well, it’s up to you, depending on your use cases, teams, or needs, etc.

As we’ve seen in this blog post, OVHcloud AIEndpoint users can start using these models right away, safely and free of charge.

They are still in beta phase for now, so we’d appreciate your feedback!

Reference Architecture: deploying the Mistral Large 123B model in a sovereign environment with OVHcloud

Eléa Petton — Wed, 18 Jun 2025 12:45:51 +0000

Are you ready to think bigger with the Mistral Large model 🚀 ?

Mistral Large model deployed on OVHcloud infrastructure

As Artificial Intelligence (AI) becomes a strategic pillar for both enterprises and public institutions, data sovereignty and infrastructure control have become essential. Deploying advanced large language models (LLMs) like Mistral Large, under a commercial license, requires a secure, high-performance environment that complies with European data regulations.

OVHcloud Machine Learning Services offer a trusted solution for deploying AI models in a fully sovereign cloud environment — hosted in Europe, under EU jurisdiction, and fully GDPR-compliant.

This Reference Architecture will show you how to:

Access Mistral AI registry using your own license
Download the Mistral Large 123B model automatically using AI Training
Store the model into a dedicated bucket with OVHcloud Object Storage
Deploy a production-ready inference API for Mistral Large using AI Deploy

Context

Mistral Large model

The Mistral Large model is a state-of-the-art (LLM) developed by Mistral AI, a French AI company. It’s designed to compete with top-tier models like GPT-4, Claude, while emphasizing performance and efficiency.

This is a model with 123 billion parameters. Mistral AI recommends deploying this model in FP8 with 4 H100 GPUs. For more information, refer to Mistral documentation.

This model requires the use of a commercial licence. To do this, you need to create an account on La Plateforme via the Mistral AI console (console.mistral.ai).

AI Training

OVHcloud AI Training is a fully managed platform designed to help you train, tune Machine Learning (ML), Deep Learning (DL), and Large Language Models (LLMs) efficiently. Whether you’re working on computer vision, NLP, or tabular data, this solution lets you launch training jobs on high-performance GPUs in seconds.

What are the key benefits?

Easy to use: launch processing or training jobs in one CLI command or a few clicks using your own Docker image
High-performance computing: access GPUs like H100, A100, V100S, L40S, and L4 as of June 2025 – new references are added regularly
Cost-efficient: pay-per-minute billing with no upfront commitment. You only pay for compute time used, with precise control over resources thanks to automatic job stop and synchronisation

💡 Why do we need AI Training? To download the Mistral Large model automatically and efficiently, using a single command to launch the job.

AI Deploy

OVHcloud AI Deploy is a Container as a Service (CaaS) platform designed to help you deploy, manage and scale AI models. It provides a solution that allows you to optimally deploy your applications / APIs based on Machine Learning (ML), Deep Learning (DL) or LLMs.

The key benefits are:

Easy to use: bring your own custom Docker image and deploy it in a command line or a few clicks surely
High-performance computing: a complete range of GPUs available (H100, A100, V100S, L40S and L4)
Scalability and flexibility: supports automatic scaling, allowing your model to effectively handle fluctuating workloads
Cost-efficient: billing per minute, no surcharges

✅ To go further, some prerequisites must be checked!

Overview of the Mistral Large deployment architecture

Here is how will be deployed Mistral Large 123B:

Install the ovhai CLI
Create a bucket for model storage
Retrieve the license information from Mistral Console
Configure and set up the environment
Download the Mistral Large model weights
Deploy the Mistral Large service
Test it with simple request and advanced usage thanks to LangChain

Let’s go for the set up and deployment of your own Mistral Large service!

Prerequisites

Before you begin, ensure you have:

A Mistral AI license to access to the Mistral Large model
An OVHcloud Public Cloud account
An OpenStack user with the following roles:
- Administrator
- AI Training Operator
- Object Storage Operator

🚀 Having all the ingredients for our recipe, it’s time to deploy the Mistral Large model on 4 H100!

Architecture guide: Mistral Large on OVHcloud infrastructure

Let’s go for the set up and deployment of the Mistral Large model!

✅ Note
In this example, the Mistral Large 25.02 is used. Choose the mistral model under the licence of your choice and repeat the same steps, adapting the model name and versions.

⚙️ Also consider that all of the following steps can be automated using OVHcloud APIs!

Step 1 – Install `ovhai` CLI

If the ovhai CLI is not install, start by setting up your CLI environment.

curl https://cli.gra.ai.cloud.ovh.net/install.sh | bash

Secondly, login using your OpenStack credentials.

ovhai login -u  -p

Now, it’s time to create your bucket inside OVHcloud Object Storage!

Step 2 – Provision Object Storage

Go to Public Cloud > Storage > Object Storage in the OVHcloud Control Panel.
Create a datastore and a new S3 bucket (e.g., s3-mistral-large-model).
Register the datastore with the ovhai CLI:

ovhai datastore add s3  https://s3.gra.perf.cloud.ovh.net/ gra   --store-credentials-locally

💡 Note that, for this use case, we recommend the High Performance Object Storage range using https://s3.gra.perf.cloud.ovh.net/ instead of https://s3.gra.io.cloud.ovh.net/

Step 3 – Access the Mistral AI registry

⚠️ Please note that you must have a licence for the Mistral Large model to be able to carry out the following steps.

Go to the Mistral AI platform: https://console.mistral.ai/home
Retrieve credentials and the license key from the Mistral console: https://console.mistral.ai/on-premise/licenses
Authenticate to the Mistral AI Docker registry:

docker login  --username $DOCKER_USERNAME --password $DOCKER_PASSWORD

Add the private registry to the config using the ovhai CLI:

ovhai registry add

Check that it is present in the list:

ovhai registry list

Step 4 – Define environment variables

The next step is to define an .env file that will list all the environment variables required to download and deploy the Mistral Large model.

Create the .env file, enter the following information:

SERVED_MODEL=mistral-large-2502
RECIPES_VERSION=v0.0.76TP_SIZE=4
LICENSE_KEY=
DOCKER_IMAGE_INFERENCE_ENGINE=<mistral-inference-server-docker-image>
DOCKER_IMAGE_MISTRAL_UTILS=<mistral-utils-docker-image>

Then, create a script to load theses environment variables easily. Name it load_env.sh:

#!/bin/bash

# Vérifie si le fichier .env existe
if [ ! -f .env ]; then
  echo "Error: .env not found"
  exit 1
fi

# Exporter toutes les variables du .env
export $(grep -v '^#' .env | xargs)

echo "Environment variables are loaded from .env"

Now, launch this script :

source load_env.sh

✅ You have everything you need to start the implementation!

Step 5 – Download Mistral Large model weights

The aim here is to download the model and its artefacts into the S3 bucket created earlier.

To achieve this, you can launch a download job that will run automatically with AI Training.

💡 Here’s a tip!
Note that here you are not using AI Training to train models, but as an easy-to-use Container as a Service solution. With a single command line, you can launch a one-shot download of the Mistral Large model with automatic synchronisation to Object Storage.

Launch the AI Training download job by attaching the object container:

ovhai job run --name DOWNLOAD_MISTRAL_LARGE_123B \
              --cpu 12 \
              --volume s3-mistral-large-model@/:/opt/ml/model:RW \
              -e RECIPES_VERSION=$RECIPES_VERSION \
              $DOCKER_IMAGE_MISTRAL_UTILS \
                -- bash -c "cd /app/mistral-rclone && \ 
                  poetry run python mistral-rclone.py \
                  --license-key $LICENSE_KEY \
                  --download-model $SERVED_MODEL"

Full command explained:

ovhai job run

This is the core command to run a job using the OVHcloud AI Training platform.

--name DOWNLOAD_MISTRAL_LARGE_123B

Sets a custom name for the job. For example, DOWNLOAD_MISTRAL_LARGE_123B.

--cpu 12

Allocates 12 CPU for the job.

--volume s3-mistral-large-model@/:/opt/ml/model:RW

This mounts your OVHcloud Object Storage volume into the job’s file system:
– s3-mistral-large-model@/: refers to your S3 bucket volume from the OVHcloud Object Storage
– :/opt/ml/model: mounts the volume into the container under /opt/ml/model
– RW: enables Read/Write permissions

-e RECIPES_VERSION=$RECIPES_VERSION

This is from your environment variables defined previously.

$DOCKER_IMAGE_MISTRAL_UTILS

This is the Mistral Large utils Docker image you are running inside the job.

-- bash -c "cd /app/mistral-rclone && \
poetry run python mistral-rclone.py \
--license-key $LICENSE_KEY \
--download-model $SERVED_MODEL"

Refers to the specific command to launch the model download.

Note that synchronisation with Object Storage will be automatic at the end of the AI Training job.

⚠️ WARNING!
Wait for the job to go to DONE before proceeding to the next step.

Check that the various elements are present in the bucket:

ovhai bucket object list s3-mistral-large-model@

The bucket must be organized and split into 4 different folders:

grammars
recipes
tokenizers
weights

Note that a total of 6 elements must be present.

🚀 It’s all there? So let’s move on to the deployment of the Mistral Large model!

Step 6 – Deploy Mistral Large service

To deploy the Mistral Large 123B model using the previously downloaded weights, you will use OVHcloud’s AI Deploy product.

But first you need to create an API key that will allow you to consume the model and query it, in particular using Open AI compatibility.

Creation of an access token:

ovhai token create --role read mistral_large=api_key_reader

Export this token as an environment variable:

export MY_OVHAI_MISTRAL_LARGE_TOKEN=

Launch the Mistral Large service with AI Deploy by running the following command:

ovhai app run --name DEPLOY_MISTRAL_LARGE_123B \
              --gpu 4 \
              --flavor h100-1-gpu \
              --default-http-port 5000 \
              --label mistral_large=api_key_reader \
              -e SERVED_MODEL=$SERVED_MODEL \
              -e RECIPES_VERSION=$RECIPES_VERSION \
              -e TP_SIZE=$TP_SIZE \
              --volume s3-mistral-large-model@/:/opt/ml/model:RW \
              --volume standalone:/tmp:RW \
              --volume standalone:/workspace:RW \
              $DOCKER_IMAGE_INFERENCE_ENGINE

Full command explained:

ovhai app run

This is the core command to run an app / API using the OVHcloud AI Deploy platform.

--name DEPLOY_MISTRAL_LARGE_123B

Sets a custom name for the app. For example, DEPLOY_MISTRAL_LARGE_123B.

--default-http-port 5000

Exposes port 5000 as the default HTTP endpoint.

--gpu 4

Allocates 4 GPUs for the app.

--flavor h100-1-gpu

Chooses H100 GPUs for the app.

--volume s3-mistral-large-model@/:/opt/ml/model:RW

--label mistral_large=api_key_reader

Means that the access is restricted to your token

-e SERVED_MODEL=$SERVED_MODEL
-e RECIPES_VERSION=$RECIPES_VERSION
-e TP_SIZE=$TP_SIZE

These are environment variables defined previously.

-v standalone:/tmp:rw
-v standalone:/workspace:rw

Mounts two persistent storage volumes:
– /tmp
– /workspace → Main working directory

$DOCKER_IMAGE_INFERENCE_ENGINE

This is the Mistral Large inference Docker image you are running inside the app.

It may take a few minutes for the resources to be allocated and for the Docker image to be pulled.

To check the progress and get additional information about the AI deploy app, run the following command:

ovhai app get

Once in RUNNING status, the model will be loaded. To check that the load was successful, you can check the container logs:

ovhai app logs

⚠️ WARNING!
To consume the service, you must wait for the app to go into RUNNING status, AND for the model to finish loading.

🎉 Is that it? Everything ready? It is therefore possible to start playing with the model!

Step 7 – Test the Mistral Large model by sending your first requests

Access the API doc via your app URL:

https://.app.gra.ai.cloud.ovh.net/docs

To find the information, please refer to https://console.mistral.ai/on-premise/licenses

Test with a basic cURL:

curl -X 'POST' \
'https://.app.gra.ai.cloud.ovh.net/v1/chat/completions' \
  -H 'accept: application/json' \
  -H "Authorization: Bearer $MY_OVHAI_MISTRAL_LARGE_TOKEN" \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "mistral-large-",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant!"
    },
    {
      "role": "user",
      "content": "What is the capital of France?"     
    }
  ]
}'

⚠️ Note that you have also to replace in the model name by the one you are using:
"model": "mistral-large-"

To take implementation a step further and take advantage of all the features of this endpoint, you can also integrate it with Langchain thanks to its fuOpenAI compatibility.

LangChain integration:

import time
import os 
from langchain.chat_models import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

def chat_completion_basic(new_message: str):

  model = ChatOpenAI(model_name="mistral-large-",
                        openai_api_key=$MY_OVHAI_MISTRAL_LARGE_TOKEN,
                        openai_api_base='https://.app.gra.ai.cloud.ovh.net/v1',
                       )

  prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant!"),
    ("human", "{question}"),
  ])

  chain = prompt | model

  print("🤖: ")
  for r in chain.stream({"question", new_message}):
    print(r.content, end="", flush=True)
    time.sleep(0.150)

chat_completion_basic("What is the capital of France?)

🥹 Congratulations! You have successfully completed the deployment!

Conclusion

You can now consume your Mistral Large 123B in a secure environment!

The result of your implementation? The deployment of a sovereign, scalable, production-quality 123B LLM, powered by OVHcloud AI Deploy.

➡️ To go further?

Update your model in a single command line and without interruption following this documentation
Go to the next replica in the event of a heavy load to ensure high availability using this method

Reference Architecture: set up MLflow Remote Tracking Server on OVHcloud

Eléa Petton — Tue, 15 Apr 2025 07:52:46 +0000

Travel through the Data & AI universe of OVHcloud with the MLflow integration.

Mlflow Remote Tracking Server on OVHcloud

As Artificial Intelligence (AI) continues to grow in importance, Data Scientists and Machine Learning Engineers need a robust and scalable platform to manage the entire Machine Learning (ML) lifecycle.
MLflow, an open-source platform, provides a comprehensive framework for managing ML experiments, models, and deployments.

Mlflow offers many benefits and provides a complete framework for ML lifecycle management with features such as:

Experiment tracking and model management
Reproducibility and collaboration
Scalability, flexibility, and integration
Automated ML and model serving capabilities
Improved model accuracy, faster time-to-market, and reduced costs.

In this reference architecture, you will explore how to leverage remote experience tracking with the MLflow Tracking Server on the OVHcloud Public Cloud infrastructure.
In fact, you will be able to build a scalable and efficient ML platform, streamlining your ML workflow and accelerating model development using OVHcloud AI Notebooks, AI Training, Managed Databases (PostgreSQL), and Object Storage.

The result? A fully remote, production-ready ML experiment tracking pipeline, powered by OVHcloud’s Data & Machine Learning Services (e.g. AI Notebooks and AI Training).

Overview of the MLflow server architecture

Here is how will be configured MLflow:

Development and training environment: create and train model with AI Notebooks
Remote Tracking Server: host in an AI Training job (Container as a Service)
Backend Store: benefit from a managed PostgreSQL database (DBaaS).
Artifact Store: use OVHcloud Object Storage (S3-compatible).

MLflow remote server deployment steps

In the following tutorial, all services are deployed within the OVHcloud Public Cloud.

Prerequisites

Before you begin, ensure you have:

An OVHcloud Public Cloud account
An OpenStack user with the following roles:
- Administrator
- AI Training Operator
- Object Storage Operator

🚀 Having all the ingredients for our recipe, it’s time to set up your MLflow remote tracking server!

Architecture guide: MLflow remote tracking server

Let’s go for the set up and deployment of your custom MLflow tracking tool!

⚙️ Also consider that all of the following steps can be automated using OVHcloud APIs!

Step 1 – Install `ovhai` CLI

Firstly, start by setting up your CLI environment.

curl https://cli.gra.ai.cloud.ovh.net/install.sh | bash

Secondly, login using your OpenStack credentials.

ovhai login -u  -p

Now, it’s time to create your bucket inside OVHcloud Object Storage!

Step 2 – Provision Object Storage (Artifact Store)

Go to Public Cloud > Storage > Object Storage in the OVHcloud Control Panel.
Create a datastore and a new S3 bucket (e.g., mlflow-s3-bucket).
Register the datastore with the ovhai CLI:

ovhai datastore add s3  https://s3.gra.io.cloud.ovh.net/ gra   --store-credentials-locally

Step 3 – Create PostgreSQL Managed DB (Backend Store)

1. Navigate to Databases & Analytics > Databases

2. Create a new PostgreSQL instance with Essential plan

3. Select Location and Node type

4. Reset the user password

5. Take note of te following parameters

Go to your database dashboard:

Then, copy the connexion information:

Your Backend Store is now ready to use!

Step 4 -Build you custom MLflow Docker image and

1. Develop MLflow launching script

Firstly, you have to write a script in bash to launch the server: mlflow_server.sh

echo "The MLflow server is starting..."

mlflow server \
  --backend-store-uri postgresql://${POSTGRE_USER}:${POSTGRE_PASSWORD}@${PG_HOST}:${PG_PORT}/${PG_DB}?sslmode=${SSL_MODE} \
  --default-artifact-root ${S3_BUCKET_NAME}/ \
  --host 0.0.0.0 \
  --port 5000

2. Create Dockerfile

Install the required Python dependency and give the rights on the /mlruns path to the OVHcloud user.

FROM ghcr.io/mlflow/mlflow:latest

# Install Python dependencies
RUN pip install psycopg2-binary

COPY mlflow_server.sh .

# Change the ownership of `mlruns` directory to the OVHcloud user (42420:42420)
RUN mkdir -p /mlruns
RUN chown -R 42420:42420 /mlruns

# Start MLflow server inside container
CMD ["bash", "mlflow_server.sh"]

3. Build your custom MLflow docker image

Build the docker image using the previous Dockerfile.

docker build . -t mlflow-server-ai-training:latest

4. Tag and push the docker image to your registry

Finally, you can push the Docker image to your registry.

docker tag mlflow-server-ai-training:latest /mlflow-server-ai-training:latest

docker push /mlflow-server-ai-training:latest

Congrats! You can now use the Docker image to launch MLflow server.

Step 5 – Start MLflow Tracking Server inside container

You can use AI Training to start MLflow server inside a job.

1. Using ovhai CLI, run the following command inside terminal

ovhai job run --name mlflow-server \
              --default-http-port 5000 \
              --cpu 4 \
              -v mlflow-s3-bucket@DEMO/:/artifacts:RW:cache \
              -e POSTGRE_USER=avnadmin \
              -e POSTGRE_PASSWORD= \
              -e S3_ENDPOINT=https://s3.gra.io.cloud.ovh.net/ \
              -e S3_BUCKET_NAME=mlflow-s3-bucket \
              -e PG_HOST= \
              -e PG_DB=defaultdb \
              -e PG_PORT=20184 \
              -e SSL_MODE=require \
              /mlflow-server-ai-training:latest

Full command explained:

ovhai job run

This is the core command to run a job using the OVHcloud AI Training platform.

--name mlflow-server

Sets a custom name for the job. For example, mlflow-server.

--default-http-port 5000

Exposes port 5000 as the default HTTP endpoint. MLflow’s web UI typically runs on port 5000, so this ensures the UI is accessible once the job is running.

--cpu 4

Allocates 4 CPUs for the job. You can adjust this based on how heavy your MLflow workload is.

-v mlflow-s3-bucket@DEMO/:/artifacts:RW:cache

This mounts your OVHcloud Object Storage volume into the job’s file system:
– mlflow-s3-bucket@DEMO/: refers to your S3 bucket volume from the OVHcloud Object Storage
– :/artifacts: mounts the volume into the container under /artifacts
– RW: enables Read/Write permissions
– cache: enables volume caching, improving performance for frequent reads/writes

-e POSTGRE_USER=avnadmin
-e POSTGRE_PASSWORD=
-e PG_HOST=
-e PG_DB=defaultdb
-e PG_PORT=20184
-e SSL_MODE=require

These are environment variables for connecting to the PostgreSQL backend store:
– avnadmin: the default admin user for OVHcloud’s managed PostgreSQL
– POSTGRE_PASSWORD: must be replaced with your actual database password
– PG_HOST: the hostname of your managed PostgreSQL instance
– PG_DB: the name of the database to use (default: defaultdb)
– PG_PORT: the port your PostgreSQL server is listening on
– SSL_MODE: enforce SSL connection to secure DB traffic

-e S3_ENDPOINT=https://s3.gra.io.cloud.ovh.net/

Tells MLflow where the S3-compatible endpoint is hosted. This is specific to OVHcloud’s GRA (Gravelines) region Object Storage.

-e S3_BUCKET_NAME=mlflow-s3-bucket

Sets the name of the S3 bucket where MLflow should store artifacts (models, metrics, etc.).

/mlflow-server-ai-training:latest

This is the custom MLflow Docker image you are running inside the job.

2. Check if your AI Training job is RUNNING

Replace the by yours.

ovhai job get

You should obtain:

History: DATE STATE 04-04-25 09:58:00 QUEUED 04-04-25 09:58:01 INITIALIZING 04-04-25 09:58:07 PENDING 04-04-25 09:58:10 RUNNING Info: Message: Job is running

3. Recover the IP and external IP of your AI Training job

Using, your , you can retrieve your AI Training job IP.

ovhai job get  -o json | jq '.status.ip' -r

For example, you can obtain something like that: 10.42.80.176

You also need the External IP:

ovhai job get  -o json | jq '.status.externalIp' -r

Returning the IP address you will have to whitelist to be able to connect to your database (e.g. 51.210.38.188)

Step 6 – Whitelist AI Training job IP in PostgreSQL DB

From Databases & Analytics > Databases, edit your DB configuration to allow access from the job Extranal IP.

Then, you can see that the job External IP is now white listed.

Well done! Your MLflow server and the backend store are now connected.

Step 7 – Create an AI Notebook

It’s time to train and track your Machine Learning models using MLflow!

To do so, use the OVHcloud ovhai CLI and start a new AI Notebook with GPU.

ovhai notebook run conda jupyterlab \
  --name mlflow-notebook \
  --framework-version conda-py311-cudaDevel11.8 \
  --gpu 1

Full command explained:

ovhai notebook run

This is the core command to run a notebook using the OVHcloud AI Notebooks platform.

--name mlflow-notebook

Sets a custom name for the notebook. In this case, you can name it mlflow-notebook.

--framework-version conda-py311-cudaDevel11.8

Define the framework and version you want to use in your notebook. Here, you are using Python 3.11 with Conda framework and CUDA compatibility.

--gpu 1

Allocates 1 GPU for the job, by default a Tesla V100S from NVIDIA (ai1-1-gpu). You can select the flavor you want from the OVHcloud GPU range.

Then, check if your AI Notebook is RUNNING.

ovhai notebook get

Once your notebook is in RUNNING status, you should be able to access it using its URL:

State: RUNNING Duration: 1411412 Url: https://.notebook.gra.ai.cloud.ovh.net Grpc Address: .nb-grpc.gra.ai.cloud.ovh.net:443 Info Url: https://ui.gra.ai.cloud.ovh.net/notebook/

You can start your AI model development inside notebook.

Step 8 – Model training inside Jupyter notebook

To begin with, set up your notebook environment.

1. Create the requirements.txt file

numpy==2.2.3
scipy==1.15.2
mlflow==2.20.3
sklearn==1.6.1

2. Install dependencies

From a notebook cell, launch the following command.

!pip3 install -r requirements.txt

Perfect! You can start coding…

3. Import Python librairies

Here, you have to import os, mlflow and scikit-learn.

# import dependencies
import os
import mlflow
import sklearn
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_diabetes
from sklearn.ensemble import RandomForestRegressor

In another notebook cell, set the MLflow tracking URI. Note that you have to replace 10.42.80.176 by your own job IP.

mlflow.set_tracking_uri("http://10.42.80.176:5000")

Then start training your model!

mlflow.autolog()

db = load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(db.data, db.target)

# Create and train models.
rf = RandomForestRegressor(n_estimators=100, max_depth=6, max_features=3)
rf.fit(X_train, y_train)

# Use the model to make predictions on the test dataset.
predictions = rf.predict(X_test)

Output:

🏃 View run dashing-foal-850 at: http://10.42.80.176:5000/#/experiments/0/runs/e7dad7c073634ec28675c0defce2b9ec
🧪 View experiment at: http://10.42.80.176:5000/#/experiments/0

Congrats! You can now track your model training from MLflow remote server…

Step 9 – Track and compare models from MLflow remote server

Finally, access to MLflow dashboard using the job URL: https://.job.gra.ai.cloud.ovh.net

Then, you can check your model trainings and evaluations:

What a success! You can finally use your MLflow to evaluate, compare and archive your various trainings.

Step 10 – Monitor everything remotely

You now have a complete Machine Learning pipeline with remote experiment tracking. Access:

Metrics, Parameters, and Tags → PostgreSQL
Artifacts (Models, Files) → S3 bucket

This setup is reusable, automatable, and production-ready!

What’s next?

Automate deployment with OVHcloud APIs
Run different training sessions in parallel and compare them with your remote MLflow tracking server
Use AI Deploy to serve your trained models

Deep Dive into DeepSeek-R1 – Part 1

Fabien Ric — Thu, 06 Mar 2025 09:56:20 +0000

Introduction

A few weeks ago, the release of the open-source large language model DeepSeek-R1 has taken the AI world by storm. The Chinese research team claimed their new reasoning model was on par with OpenAI’s flagship model o1, open-sourced the model and gave details about the work behind it.

In this blog post series, we will dive into the DeepSeek-R1 model family and see how you can run it on OVHcloud to build a simple chatbot that handles reasoning.

The “R” in DeepSeek-R1 stands for “Reasoning”, so let’s start by defining what a reasoning model is.

What are reasoning models?

Reasoning models are large language models (LLM) capable of reflecting on a problem before generating an answer. Traditionally, LLMs have been improved by spending more compute (more data, increase the number of parameters and the number of training iterations) at training time: it is training-time compute. Reasoning models, however, differ with standard LLMs in the way they use test-time compute, which means that during inference, they spend more time and resources to generate and refine a better answer.

Reasoning models excel at tasks that require understanding and working through a problem step-by-step, such as mathematics, riddles, puzzles, coding, planning tasks and agentic workflows. They may be counterproductive for use cases that don’t require reasoning capabilities, such as knowledge facts (for example, who discovered penicillin).

In a classroom, a reasoning model would be a student that takes time to understand the question, split the problem into manageable steps and detail the resolution process before rushing to write the answer.

Here is a comparison between the outputs of a standard LLM and a reasoning LLM, on an example prompt:

The reasoning model has generated more tokens, showing how it plans to solve the problem, before the actual answer. You can see it generates reasoning content into ... tags, in the case of DeepSeek-R1.

A standard LLM can also show reasoning abilities, that are often more visible when using a technique called Chain-of-Thought prompting (CoT), by adding phrases such as “let’s think step-by-step” in the prompt.

However, a reasoning LLM has been trained to behave this way. Its reasoning skill is internalized, so it doesn’t require specific prompting techniques to trigger the chain of thoughts process.

It’s important to note that DeepSeek-R1 is not the first reasoning model; OpenAI led the way by releasing their o1 model in September 2024.

The two main reasons why DeepSeek-R1 made the headline are its open-source nature, and the paper released by the research team which give many details on how they trained the model, with valuable insight for the open-source community to create reasoning models. Especially, the key highlight of their paper is that they observe the reasoning behavior can emerge only through Reinforcement Learning (RL), without fine-tuning.

The DeepSeek-R1 model family

You may have heard about DeepSeek-R1 but it’s not the only model of the DeepSeek family: DeepSeek-V3, DeepSeek-R1-Zero, and distilled models, are also available. So what are the differences between those models?

First, let’s go through some definitions and an overview of how language models are trained.

Language model training overview

The large language models available in apps and playgrounds are usually trained in 3 steps:

A base model is trained on an unsupervised language modeling task (for instance, next token prediction) with a dataset of trillions of tokens (also called pre-training),
An instruct model is trained from the base model, by fine-tuning it on a massive dataset of instructions, conversations, questions and answers, to improve the performances of the model with the prompts frequently encountered in a chat,
The final model is the instruct model trained to better handle human preferences, avoid the generation of harmful content, etc. with techniques such as RLHF (reinforcement learning from human feedback) and DPO (direct policy optimization).

DeepSeek-V3 training

According to the technical report provided by DeepSeek, DeepSeek-V3 is a mixture-of-experts (MoE) language model trained with the same kind of process, which is described in the image below:

DeepSeek-V3-Base is trained with 14.8 trillion tokens,
A dataset of 1.5 million instructions examples is used to fine-tune the base model,
This instruct model goes through reinforcement learning with several reward models. The final model is DeepSeek-V3.

For the reinforcement learning step, DeepSeek uses their algorithm called GRPO (group relative policy optimization), which uses several reward models to assess the quality of the content generated by the model. The score given by each reward model is combined into a final score, used to update the model so that it maximizes its global score the next time.

DeepSeek-R1 model series training

DeepSeek-R1 models are built with a different training pipeline, using the base model of DeepSeek-V3. The diagram below shows the main steps of the process designed by DeepSeek to create several reasoning models mentioned in their technical report:

Let’s walk through it step-by-step (no pun intended):

1. The main breakthrough described in DeepSeek’s paper: they managed to train the DeepSeek-V3-Base 671B model to learn the reasoning capability with reinforcement learning only, which doesn’t require labeled data, as opposed to supervised fine-tuning. They use the same GRPO algorithm as before, with two rewards: one on the accuracy of the generated content, with “rule-based” experts instead of full reward models, that are also trained and require significant resources. For example, to assess if the model generated a correct Python code, you could have one expert that compiles the generated code and gives a note based on the number of errors. Another expert would generate test cases and see if the generated code can handle them. The other reward they use is about the format of the model’s responses, which must follow the ... tags to enclose the reasoning content. The resulting model is DeepSeek-R1-Zero. However, it has limitations that make it unsuitable for direct use, such as language mixing and poor readability.

2. To overcome these limitations, DeepSeek uses DeepSeek-R1-Zero to create a cold-start reasoning dataset, augmented with other data from sources not explicitly mentioned. DeepSeek-V3-Base is trained with this cold-start data, before applying a new round of reinforcement learning.

3. They use the same RL approach to get a new reasoning model, that generates a better quality of output. Using this model, they build a 100x bigger reasoning data, growing from 5k to 600k samples, using DeepSeek-V3 as a quality judge. This dataset is then completed with 200k samples generated with DeepSeek-V3 on non-reasoning tasks.

4. A second stage of supervised fine-tuning is done with the dataset built earlier.

5. The model is then aligned with human preferences with a final round of reinforcement learning with a specific human preferences reward. The resulting model is DeepSeek-R1.

6. Finally, DeepSeek experimented with fine-tuning much smaller models than DeepSeek-V3 (LLaMa 3.3 70B, Qwen 2.5 32B…) with the dataset built at step 3. In the paper, they call this process distillation. However, it must not be mistaken with the knowledge distillation technique frequently used in deep learning, where a student model learns from the probabilities distribution of a teacher model. Here, the term “distillation” refers to the fact that the reasoning skill is “distilled” into the base model, but it’s plain old supervised fine-tuning. This is how the DeepSeek-R1-Distill model series is trained. The quality of the dataset enables the resulting distilled models to beat much larger models on reasoning tasks, as show in the benchmark below:

Recap

The table below summarize the differences between the model of the DeepSeek-R1 series:

Model	Description
DeepSeek-R1-Zero	Intermediate 671B reasoning model trained from DeepSeek-V3 exclusively with reinforcement learning, and used to bootstrap DeepSeek-R1 training.
DeepSeek-R1	671B reasoning model trained from DeepSeek-V3.
DeepSeek-R1-Distill	Smaller models fine-tuned for reasoning with a dataset generated by an intermediate version of DeepSeek-R1.

Run DeepSeek-R1 on OVHcloud

Now that we’ve seen the differences between all DeepSeek models, let’s try to use them!

AI Endpoints

The fastest way to test DeepSeek-R1 is to use OVHcloud AI Endpoints.

DeepSeek-R1-Distill-Llama-70B is already available, ready to use and optimized for inference speed. Check it out here: https://endpoints.ai.cloud.ovh.net/models/a011515c-0042-41b2-9a00-ec8b5d34462d

AI Endpoints makes it easy to integrate AI into your applications with a simple API call, without the need for deep AI expertise or infrastructure management. And while it’s in beta, it’s free!

Here is an example cURL command to use DeepSeek-R1 Distill Llama 70B on the OpenAI compatible endpoint provided by OVHcloud AI Endpoints:

curl -X 'POST' \
  'https://deepseek-r1-distill-llama-70b.endpoints.kepler.ai.cloud.ovh.net/api/openai_compat/v1/chat/completions' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "max_tokens": 4096,
  "messages": [
    {
      "content": "How can I calculate an approximation of Pi in Python?",
      "role": "user"
    }
  ],
  "model": null,
  "seed": null,
  "stream": false,
  "temperature": 0.7,
  "top_p": 1
}'

We can see in the output the thinking process followed by the answer, which have been truncated for clarity.

{
    "id": "chatcmpl-8c21b2e3fac44d43b63c06fa25e58091",
    "object": "chat.completion",
    "created": 1741199564,
    "model": "DeepSeek-R1-Distill-Llama-70B",
    "choices":
    [
        {
            "index": 0,
            "message":
            {
                "role": "assistant",
                "content": "\nOkay, the user is asking how to approximate Pi using Python. I need to think about different methods they can use. Let's see, there are a few common approaches. \n\nFirst, there's the Monte Carlo method. ... Let me structure the response with each method as a separate section, explaining what it is, how it works, and providing the code. Then, the user can pick which one they prefer based on their situation.\n\n\nThere are several ways to approximate the value of Pi (π) using Python. Below are a few methods:\n\n### 1. Using the Monte Carlo Method..."
            },
            "finish_reason": "stop",
            "logprobs": null
        }
    ],
    "usage":
    {
        "prompt_tokens": 14,
        "completion_tokens": 1377,
        "total_tokens": 1391
    }
}

Stéphane Philippart, Developer Relation Advocate at OVHcloud, has written a blog post covering everything you need to know to get up to speed with AI Endpoints and run this model: Release of DeepSeek-R1 on OVHcloud AI Endpoints

AI Deploy

What if you want to run another version of DeepSeek-R1, such as the Qwen 7B distilled version?

You can use another OVHcloud AI product, AI Deploy, to create your own serving endpoint, with vLLM as the inference engine. It is open-source, fast and well maintained, ensuring maximal compatibility with even the most recent AI models.

Eléa Petton, Solution Architect at OVHcloud, has written a blog post explaining in details how to serve an open-source model with vLLM on AI Deploy. Just replace the Mistral Small model with the DeepSeek distilled version you want to use (e.g. deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) and adapt the number of L40S cards needed (1 is enough for the 7B version) : Mistral Small 24B served with vLLM and AI Deploy – a single command to deploy an LLM (Part 1)

Next up, creating a reasoning chatbot with DeepSeek-R1

In part 2 of this blog post series, we will use a DeepSeek-R1-Distill model to create a chatbot that will handle reasoning gracefully, by showing the thinking process of the model.

We will develop our chatbot with OVHcloud AI Endpoints and the Python library Gradio, that enables to quickly create simple chat interfaces.

Here a screenshot of the finalized chatbot we will build:

Stay tuned for the next article in this DeepSeek-R1 series. In the meantime, try out DeepSeek-R1 on AI Endpoints and AI Deploy and let us know what you !

Resources

If you want to learn more about DeepSeek-R1 and the topics we covered in this blog post, such as test-time compute, GRPO, reinforcement learning and reasoning models, we suggest having a look at these resources:

DeepSeek-R1 technical report, by the DeepSeek team
The Illustrated DeepSeek-R1, by Jay Alamar
Understanding Reasoning LLMs, by Sebastian Raschka
A Visual Guide to Reasoning LLMs, by Maarten Grootendorst

Mistral Small 24B served with vLLM and AI Deploy – a single command to deploy an LLM (Part 1)

Eléa Petton — Mon, 24 Feb 2025 10:08:37 +0000

You are not dreaming! You can deploy open-source LLM in a single command line.

Deploying advanced language models can be a challenge! But this sometimes this arduous task is becoming increasingly accessible, enabling developers to integrate sophisticated AI capabilities into their applications.

In this guide, we will walk through deploying the Mistral-Small-24B-Instruct-2501 model using vLLM on OVHcloud’s AI Deploy platform. This combination offers a powerful solution for efficient and scalable AI model serving.

Deploying a model is great, but doing it quickly is even better!

🤯 What if a single command line was enough? That’s the challenge we’re tackling today!

Context

Before deployment, let’s take a closer look at our key technologies!

Mistral Small

The mistralai/Mistral-Small-24B-Instruct-2501 is a 24-billion-parameter instruction-fine-tuned model, renowned for its compact size and performance comparable to larger models.

This model, from MistralAI, is an instruction-fine-tuned version of the base model: Mistral-Small-24B-Base-2501.

To serve this model efficiently, we will utilize vLLM, an open-source library for LLM inference.

vLLM

vLLM (Virtual LLM) is a highly optimized service engine designed to efficiently run large language models. It takes advantage of several key optimizations, such as:

PagedAttention: an attention mechanism that reduces memory fragmentation and enables more efficient use of GPU memory
Continuous Batching: vLLM dynamically adjusts batch sizes in real time, ensuring that the GPU is always used efficiently, even with multiple simultaneous requests
Tensor parallelism: enables model inference across multiple GPUs to boost performance
Optimized kernel implementations: vLLM uses custom CUDA kernels for faster execution, reducing latency compared to traditional inference frameworks

These features make vLLM one of the best choices for large models such as Mistral Small 24B, enabling low-latency, high-throughput inference on the latest GPUs.

By deploying on OVHcloud’s AI Deploy platform, you can deploy this model in a single command line.

AI Deploy

The key benefits are:

Easy to use: bring your own custom Docker image and deploy it in a command line or a few clicks surely
High-performance computing: a complete range of GPUs available (H100, A100, V100S, L40S and L4)
Scalability and flexibility: supports automatic scaling, allowing your model to effectively handle fluctuating workloads
Cost-efficient: billing per minute, no surcharges

✅ To go further, some prerequisites must be checked!

Prerequisites

Before you begin, ensure that you have:

OVHcloud account: access to the OVHcloud Control Panel
ovhai CLI available: install the ovhai CLI
AI Deploy access: ensure you have a user for AI Deploy
Hugging Face access: create an Hugging Face account and generate an access token
Gated model authorization: be sure you have been granted access to Mistral-Small-24B-Instruct-2501 model

🚀 Having all the ingredients for our recipe, it’s time to deploy!

Deployment of the Mistral Small 24B LLM

Let’s go for the deployment of the model mistralai/Mistral-Small-24B-Instruct-2501

Manage access tokens

Export your Hugging Face token.

export MY_HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxx

Create a token to access your AI Deploy app once it will be deployed.

ovhai token create --role operator ai_deploy_token=my_operator_token

Returning the following output:

Id: 47292486-fb98-4a5b-8451-600895597a2b Created At: 20-02-25 11:53:05 Updated At: 20-02-25 11:53:05 Spec: Name: ai_deploy_token=my_operator_token Role: AiTrainingOperator Label Selector: Status: Value: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Version: 1

You can now store and export your access token:

export MY_OVHAI_ACCESS_TOKEN=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Launch Mistral Small LLM with AI Deploy

You are ready to start Mistral-Small-24B using vLLM and AI Deploy:

ovhai app run --name vllm-mistral-small \
              --default-http-port 8000 \
              --label ai_deploy_token=my_operator_token \
              --gpu 2 \
              --flavor l40s-1-gpu \
              -e OUTLINES_CACHE_DIR=/tmp/.outlines \
              -e HF_TOKEN=$MY_HF_TOKEN \
              -e HF_HOME=/hub \
              -e HF_DATASETS_TRUST_REMOTE_CODE=1 \
              -e HF_HUB_ENABLE_HF_TRANSFER=0 \
              -v standalone:/hub:rw \
              -v standalone:/workspace:rw \
              vllm/vllm-openai:v0.8.2 \
              -- bash -c "python3 -m vllm.entrypoints.openai.api_server \
                        --model mistralai/Mistral-Small-24B-Instruct-2501 \
                        --tensor-parallel-size 2 \
                        --tokenizer_mode mistral \
                        --load_format mistral \
                        --config_format mistral \
                        --dtype half"

How to understand the different parameters of this command?

1. Start your AI Deploy app

Launch a new app using ovhai CLI and name it.

ovhai app run --name vllm-mistral-small

2. Define access

Define the HTTP API port and restrict access to your token.

--default-http-port 8000
--label ai_deploy_token=my_operator_token

3. Configure GPU resources

Specifies the hardware type (l40s-1-gpu), which refers to an NVIDIA L40S GPU and the number (2).

--gpu 2 --flavor l40s-1-gpu

⚠️WARNING! For this model, two L40S are sufficient, but if you want to deploy another model, you will need to check which GPU you need. Note that you can also access to A100 and H100 GPUs for your larger models.

4. Set up environment variables

Configure caching for the Outlines library (used for efficient text generation):

-e OUTLINES_CACHE_DIR=/tmp/.outlines

Pass the Hugging Face token ($MY_HF_TOKEN) for model authentication and download:

-e HF_TOKEN=$MY_HF_TOKEN

Set the Hugging Face cache directory to /hub (where models will be stored):

-e HF_HOME=/hub

Allow execution of custom remote code from Hugging Face datasets (required for some model behaviors):

-e HF_DATASETS_TRUST_REMOTE_CODE=1

Disable Hugging Face Hub transfer acceleration (to use standard model downloading):

-e HF_HUB_ENABLE_HF_TRANSFER=0

5. Mount persistent volumes

Mounts two persistent storage volumes:

/hub → Stores Hugging Face model files
/workspace → Main working directory

The rw flag means read-write access.

-v standalone:/hub:rw -v standalone:/workspace:rw

6. Choose the target Docker image

Uses the vllm/vllm-openai:v0.8.2 Docker image (a pre-configured vLLM OpenAI API server).

vllm/vllm-openai:v0.8.2

7. Running the model inside the container

Runs a bash shell inside the container and executes a Python command to launch the vLLM API server:

python3 -m vllm.entrypoints.openai.api_server → Starts the OpenAI-compatible vLLM API server
--model mistralai/Mistral-Small-24B-Instruct-2501 → Loads the Mistral Small 24B model from Hugging Face
--tensor-parallel-size 2 → Distributes the model across 2 GPUs
--tokenizer_mode mistral → Uses the Mistral tokenizer
--load_format mistral → Uses Mistral’s model loading format
--config_format mistral → Ensures the model configuration follows Mistral’s standard
--dtype half → Uses FP16 (half-precision floating point) for optimized GPU performance

You can now check if your AI Deploy app is alive:

ovhai app get

💡Is your app in RUNNING status? Perfect! You can check in the logs that the server is started…

ovhai app logs

⚠️WARNING! This step may take a little time as the template must be loaded…
After a few minutes, you should get the following information in the logs:

2025-02-20T13:48:07Z [app] [tcmzt] INFO: Started server process [13] 2025-02-20T13:48:07Z [app] [tcmzt] INFO: Waiting for application startup. 2025-02-20T13:48:07Z [app] [tcmzt] INFO: Application startup complete. 2025-02-20T13:48:07Z [app] [tcmzt] INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

🚦 Are all the indicators green? Then it’s off to inference!

Request and send prompt to the LLM

Launch the following query by asking the question of your choice:

curl https://.app.gra.ai.cloud.ovh.net/v1/chat/completions \
  -H "Authorization: Bearer $MY_OVHAI_ACCESS_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mistralai/Mistral-Small-24B-Instruct-2501",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Give me the name of OVHcloud’s founder."}
    ],
    "stream": false
  }'

Returning the following result:

{
  "id":"chatcmpl-d6ea734b524bd851668e71d4111ba496",
  "object":"chat.completion",
  "created":1740059807,
  "model":"mistralai/Mistral-Small-24B-Instruct-2501",
  "choices":[
    {
      "index":0,
      "message":{
        "role":"assistant",
        "reasoning_content":null, 
        "content":"The founder of OVHcloud is Octave Klaba.",
        "tool_calls":[]
      },
      "logprobs":null,
      "finish_reason":"stop",
      "stop_reason":null
    }
  ],
  "usage":{
    "prompt_tokens":22,
    "total_tokens":35,
    "completion_tokens":13,
    "prompt_tokens_details":null
  },
  "prompt_logprobs":null
}

Conclusion

By following these steps, you have successfully deployed the mistralai/Mistral-Small-24B-Instruct-2501 model using vLLM on OVHcloud’s AI Deploy platform. This setup provides a scalable and efficient solution for serving advanced language models in production environments.

For further customization and optimization, refer to the vLLM documentation and OVHcloud AI Deploy resources.

💪 Challenges taken! You can now enjoy the power of your LLM deployed in a single command line!

Want even more simplicity? You can also use ready-to-use APIs with AI Endpoints!

But… what’s next?

Five ways to develop sovereign, sustainable AI solutions

Cezary Skarzynski — Mon, 27 Jan 2025 15:07:21 +0000

Now that organisations understand AI and what it can achieve, businesses around the world are focusing on how to build it responsibly. Three of the five main themes at the Paris AI Action Summit examine the need for responsible AI, with separate streams on trust, public interest and good governance.

These themes are not simple. In addition to the core function of AI tools – for example, considering what an AI app does, how it does it, and whether bias is present – most businesses are starting to realise that they need to consider the deeper ‘AI supply chain’.

This is not just altruistic. A number of LLM tools are currently facing the risk of lawsuits for copyright infringement, because they may have been trained without due content permission. AI tools that present biased results are quickly exposed in press, leading to reputational damage and a loss of customer trust. Some countries also have legislation permitting data usage for economic intelligence purposes – but in another region, this may represent a data breach. AI has also received negative publicity for ‘running hot’ and consuming large amounts of energy and water in datacenters.

However, AI can also be a tremendous force for good – if handled correctly. So, what should businesses be thinking about so that they get the most from AI, without incurring undue commercial or reputational risk?

1- Consider Sovereignty from the Start

Understand your data ‘supply chain’ from the very beginning of the process. For example, if you’re using an external LLM for a chatbot, where was this developed? Which data was it trained on, and was this data acquired ethically?

“AI can often be a black box when it comes to processing data,” says Lex Avstreikh, Strategy Lead for Stockholm-based AI firm Hopsworks. “It’s far too complex to show how the system arrived at any one decision. But if you can show people the inputs and the outputs, then that goes a long way to building transparency and trust.”

2- Plan for a Sovereign Future

It’s important to think about where data will be during its future lifecycle – will you be running in an external datacenter, and where will data be in transit and at rest? Where are the headquarters of the datacenter company in question and what does this mean from a regulatory and handling perspective? Perhaps most importantly, will your customers be happy with all of these arrangements?

This was the decision journey faced by Swedish AI firm Ebbot. In July 2020, the Data Protection Commission v. Facebook Ireland case, commonly referred to as Schrems II, resulted in the Court of Justice of the European Union (CJEU) issuing a decision that added more regulations to data protection and processing principles. Ebbot recognised the importance of data security and compliance and thus made it a priority to store and process all data within the EU.

3- Location, location, location

Location isn’t just an important sovereignty concern – it’s also crucial to sustainability. Although Scandinavia may have very green energy, it’s easy to forget that many cloud providers will offer geographical ‘computing zones’ rather than defined locations, which can result in a less green footprint. CPU- and GPU-intensive tasks like model training should be run in green energy zones wherever possible, and are rarely latency-dependent; consequently, you can locate them far away if necessary.

When your AI app goes into production, also remember that backup and redundancy are a necessity – but will also increase your carbon footprint. Consider having a ‘low power’ or passive backup if commercially feasible – it will take longer to bring online in the case of emergency, but you’ll be consuming less power.

4- Always Consider Necessity

A lot of organisations only consider hardware efficiency and power consumption during the development process, but green software is rapidly gaining popularity. Having efficient code which is still fit for purpose can have a huge impact on power consumption, particularly if you’re building an app for very broad use. “We’ll definitely see more efficient and specific LLMs, because they’re absolutely needed,” added Avstreikh.

Although organisations are often considering the cost of development, with FinOps initiatives, we are also seeing the dawn of GreenOps, ensuring that technology is as green as possible from end to end. To that effect, consider benchmarking the CPU and memory usage of your application, because less hardware-intensive apps are usually less power-hungry.

5- Re-use, recycle

Developing bespoke code can make sure that it’s as lean and efficient as possible, but it can also use needless computing power to develop. Many technology organisations will offer PaaS offerings that can automate common parts of the application development and deployment process. For example, consider our AI Endpoints solution, which helps developers to access other AI models, from Bert to Mistral to Llama, all using a simple API.

This is not an easy process, but establishing responsible AI conduct in your organisation’s DNA will avoid complications further down the road, and also show to customers that you are considering data – including theirs – in a responsible, secure way. With increasing numbers of organisations tracking not only their scope three emissions, but also their data supply chains in a more comprehensive fashion, sovereignty and sustainability are two clear ‘musts’ for any modern AI company.

If you’re a startup or scale-up building an AI solution, and would like to work with a sovereign, sustainable cloud provider in turn, you can find more information about OVHcloud – including our cloud credit scheme – on our startup hub.

Introducing OVHcloud’s Trusted and Innovative AI Ecosystem

Gilles Closset — Tue, 21 Jan 2025 13:26:19 +0000

Artificial intelligence (AI) has become the most transformative force in the global economy, impacting every sector from healthcare to finance to the public sector.

New and innovative capabilities come from all parts of the technology ecosystem and from all regions of the world. Every week, almost every day!

The momentum in this space is incredible. In fact, we’ve seen a significant acceleration in the number of AI startups that have joined the OVHcloud Startup Program as well as Partners & Editors adding AI expertise & capabilities to their portfolio.

Aligned with our DNA of a Trusted & Sustainable Cloud, OVHcloud is committed to supporting AI innovation that adheres to core values.

To help customers and developers harness this innovation, we’re bringing the best of OVHcloud’s infrastructure, AI products, and State-of-the-art models to members of our Ecosystem at every layer of the AI stack: chipmakers, models builders and AI platforms, technology partners enabling companies to develop and deploy machine learning (ML) models, app-builders solving customer use-cases with generative AI, and global services and consulting firms that help enterprise customers implement all of this technology at scale.

Let’s deep dive into our partnerships, programs, and resources for each segment of the ecosystem that showcase our open approach.

Building a Trusted and Innovative AI Ecosystem

Model builders & Chipmakers

Let’s kickstart the introduction of our AI Ecosystem with the members that are directly integrated into specific OVHcloud AI products, aka Technology Partners.

Companies like Mistral AI, Meta and Stability AI are building open-source foundation models, including LLMs, that can significantly accelerate the development of generative AI and natural language processing (NLP) applications. OVHcloud serves to end-customers these models through AI Endpoints with its high-performance infrastructure and industry-leading energy efficiency.

AI endpoints require no AI expertise or dedicated infrastructure, as the serverless platform provides access to advanced AI models including Large Language Models (LLMs), natural language processing, translation, speech recognition, image recognition, and more. Developers can select from a range of models, including open-source options like Mistral AI, Llama, Whisper, and Stable Diffusion, as well as a variety of optimized models from our Model Builders partners, creating a versatile testing ground for chosen AI models.

Our catalog of AI models is continually expanding, and we are actively seeking new collaborations with partners to integrate proprietary models that address specific use cases.

OVHcloud also developed strong and long-lasting partnerships with chipmakers like NVIDIA and AMD to deliver tailored services for deep learning, inference and high-performance computing, with the best available GPUs. AI models are becoming more complex due to the rise of conversational AI. Training and inference now require massive computing power and scalability, and OVHcloud follows the industry innovations by integrating the latest GPUs, including for 2025 the AMD MI325X series, and the Nvidia H200 NVL and Blackwell generation. Using industrial innovations, such as water cooling in our servers, allow us to achieve the lowest energy consumption on the market.

AI PaaS Solutions & Tools

Organizations and developers engaged in ambitious AI projects usually employ various tools to facilitate the creation, management, and deployment of their models. These tools assist developers with essential tasks such automating and optimizing data pipelines, monitoring model performance, managing private datasets, defining and enforcing safety & security measures related to regulation or specific policies. OVHcloud collaborates with these organizations to address the crucial requirements of machine learning engineers and data scientists.

To meet growing demand from customers and partners building innovative AI services on OVHcloud, many of the leaders in AI solutions are launching new or expanded partnerships with OVHcloud today. Let’s have a look to these few examples:

Multiverse Computing are the world leaders in Quantum AI. They apply quantum and quantum-inspired AI to solve complex problems delivering practical applications and tangible value today.
Hopsworks seamlessly integrates and can be deployed on OVHcloud using Kubernetes, allowing users to run feature engineering pipelines, training pipelines, and batch inference pipelines using Spark, Flink, or Python on OVHcloud.
With Valohai access scalable and secure cloud environments without having to rebuild your ML workflows. The integration between the Valohai MLOps platform and OVHcloud makes it easy to access on-demand computational resources. Scale up with ease to meet the needs of your projects, while ensuring data security and regulatory compliance
Lampi AI provides a Secure AI platform with the best and latest LLMs to power predictable and fine-tuned AI agents that pick the relevant information from your data and web, reason, iterate, and tackle complex tasks.
Qdrant : Through the seamless integration between Qdrant Hybrid Cloud and OVHcloud, developers and businesses are able to deploy the fully managed vector database within their existing OVHcloud setups in minutes, enabling faster, more accurate AI-driven insights.

Through our support to these critical members, we offer developers the best platform and ecosystem in which to build the next generation of helpful AI applications, and provide customers with a single destination for building, innovating with, and applying AI.

AI Apps addressing End-customers use cases specifically

OVHcloud is the destination for developers and partners to build the next generation of innovative applications with AI and ML, including exciting new generative AI capabilities.

Much innovation in the generative AI space comes from fast moving, early-stage startups. They excel in developing new applications designed to address very specific End-customers’ use cases. Some differentiate through their model(s), either proprietary or fined-tuned, and make it available through inference API or in their App. Others bring value buy developing Applications or User Interface on top of “General-Purpose AI Models” – so-called API wrappers – by knowing precisely the business workflow of their customers. This may translate into AI agents capable of performing tasks independently, without the need for constant human oversight.

Many AI startups are choosing OVHcloud not only for industry-leading Sustainable & Trusted Cloud infrastructure, but also for fully managed AI services, which makes it faster and easier to scale, at the best price and with no lockin.

Let’s review some of them:

ILLUIN Technology provides a powerful low-code multimodal AI orchestration platform that enables you to hybridize different AI approaches and models to implement and industrialize your most complex customized use cases, including AI Agents.
moinAI uses AI to automatically resolve recurring customer enquiries – across multiple channels and in various languages – with minimal effort. Chatbots, live chat and product advisors allow companies to communicate quickly and efficiently with customers on the website around the clock.
catchHR uses AI to streamline recruitment, automating tasks like job posting, candidate sourcing, and skill matching to save time and boost efficiency. AI-generated job ads attract top talent, while AI-powered candidate analysis ensures a strong match between applicants and roles, considering both skills and personality fit.
Rayscape has already demonstrated excellent results in 150+ clinics and hospitals. Its AI is trained on more than 43 million images from all around the globe and powers predictive insights, automated analysis, efficient workflows to prioritize cases based on urgency, and generates structured reports.
Factiverse offers AI-powered solutions to enhance content credibility and streamline fact-checking processes. Their offerings include an advanced text editor that identifies and verifies factual statements in your content. It highlights claims and provides links to credible sources, assisting in correcting inaccuracies Factiverse GPT.

Services Partners

We stand at the brink of an exhilarating transformation, propelled by advancements in machine learning (ML) technologies. This shift holds the promise of revolutionizing customer experiences, introducing groundbreaking applications, and boosting our customers’ productivity to new levels. The market’s enthusiasm is clear, with an unprecedented number of customers eager to leverage generative artificial intelligence to revamp their businesses.

Successfully innovating with large language models and generative AI demands proficiency in data management, AI, human resources, and operational processes. It is crucial that these models and AI solutions are crafted to be ethical, transparent, and reliable.

Our partner ecosystem will lead the way in developing innovative business solutions tailored to customers across various industries and sizes.

Services partners from our Ecosystem have demonstrated expertise delivering Machine Learning and generative AI solutions on OVHcloud. These partners offer a range of products and services and technologies including specialized consulting services, Managed Services and Applications that are secure, efficient, and scalable across industries.

Today, several of our leading partners, CGI, Onepoint, Accenture, synaigy , W&B Asset Studio, Sopra Steria, Inetum and NEXiD are already providing key support in terms of OVHcloud generative AI advisory, implementation services and capabilities available to customers. These partners play an essential role in applying new AI capabilities to solve industry-specific challenges and helping enterprises build generative AI into their products and everyday business processes.

OVHcloud and its broader ecosystem

Our AI Ecosystem is part of our broader Ecosystem and includes a wide variety of startups and partners, ready to support customers with both current and future technological challenges. It does so by giving customers the means to innovate and develop their own competitive advantage.

Through these programs, we provide product support, marketing amplification, and co-selling opportunities to help our services and ISV partners bring these solutions to market faster, reach more customers, and grow their businesses.

We have launched over the past 10+ years the following initiatives:

OVHcloud Partner Program

OVHcloud partners play a key role in customers’ digital transformation, with the support and services they offer to help them meet the challenges involved. Over 700 companies joint this program, providing a wide range of expertise and services to our customers.

Interested Partners can go here to apply to join the program

OVHcloud Startup Program

We nurture tech entrepreneurs by deploying an array of business scaling opportunities within OVHcloud’s global ecosystem of trust.

Over 5.000 startups and scaleups from across the globe that have already benefited from our program since its launch in 2015.

To further support the AI startups and accelerate their app development, we’re launching a new initiative, called AI Accelerator which recognizes select startups whose applications and platforms are optimized to run as-a-service on OVHcloud infrastructure and who are utilizing OVHcloud’s AI capabilities in new and helpful ways. The program provides dedicated access to OVHcloud expertise, training, and co-marketing support to help partners build capacity and go to market.

Interested AI startups can go here to apply to join the program

Open Trusted Cloud

This program is aimed at software publishers, as well as SaaS and PaaS solution providers. Its ambition is to work together on building an ecosystem of SaaS and PaaS services — hosted in the open, reversible and trusted cloud offered by OVHcloud. This will provide a common platform for competitive solutions, and hundreds have already joined.

You can browse some of the solutions available in our ecosystem here

OVHcloud Marketplace

At the heart of the ecosystem, the Marketplace was designed to benefit everyone. OVHcloud Marketplace brings together the best solutions from SaaS and PaaS publishers in the ecosystem on an ethical and transparent cloud. Carry out the digital transformation of your company or subscribe to a solution for your personal use with complete peace of mind thanks to these trusted solutions.

Technology partners

The OVHcloud vision is to create a transparent, reversible and interoperable cloud. We work with the best players on the market to deliver solutions for the most high-performance, high-security requirements.

OVHcloud is committed to democratize AI within organizations, through a wide range of solutions positioned at every price point while advocating for Digital Sovereignty & Sustainability.

Should your company consider to leverage OVHcloud, would like to know more about our vibrant AI Ecosystem or share a feedback, please feel free to contact me!

Apply now for the Fast Forward AI Accelerator

Philip Marais — Tue, 22 Oct 2024 19:45:24 +0000

Today we’re launching our new AI Accelerator to meet the scaling needs of AI startups and shape the future of the AI industry.

Building on the success of our Fast Forward Accelerator, designed to be light-touch in terms of your time but high-impact in terms of value, the AI Accelerator offers everything that is OVHcloud (data sovereignty, energy efficiency, tech freedom, personal touch, price/performance) and more.

The 3-month program offers:

€50k in free cloud credits to use on our Public Cloud and AI solutions. This is in addition to Startup Program credits but the maximum total credits that can be allocated remains €100k.
AI technology deep-dives to solve technical challenges
Workshops include AI, sales, investor readiness, and PR training to enhance business and communication skills.
1-on-1 mentoring from experts
Engagement with corporates for possible POCs
Engagement with Venture Capitalists (VCs) for possible funding

Only 10-15 startups will be selected for the first cohort of the AI Accelerator. Applications opened on 1 October for the first cohort of the AI accelerator that will run from 13 January 2025 to 3 April 2025. Entries close on 24 November 2024 (Apply NOW!) and selected participants will be announced on 16 December 2024.

The 3-month program is divided into 3 phases:

Phase 1 (Build): The Build Phase will focus on refining your product-market fit and cloud integration. This will include deep dives with our AI Team to make sure you get the best out of our AI Solutions.
Phase 2 (Sell): The Sell Phase will focus on business development, corporate partnerships, and sales readiness. In this phase you will engage with potential corporate partners to investigate collaboration.
Phase 3 (Scale): The Scale Phase will focus on investor readiness, growth strategy, and funding opportunities. This phase will culminate with a Showcase event where participants will pitch their funding needs to VCs.

Based on feedback from our existing startup community, the program will have a particular focus on data sovereignty. This will include sessions with experts and other organizations to help the cohort understand and deal with sovereignty requirements better, particularly as we draw closer to the finalization of the European AI Act.

The accelerator program includes 1-on-1 mentoring from OVHcloud and external experts who will be matched with participants based on their needs. The program is designed to be agile, requiring only three hours a week or less, but can scale to support you as needed. It does also include a 1-year commitment to use OVHcloud’s products and solutions to ensure continuity after exit from the Accelerator.

Applications will need to meet the following criteria to be selected:

You must be a Startup Program member that has been active in the program or as an OVHcloud customer for at least 3 months (not a member? Apply now)
You must have a need for GPUs and OVHcloud’s AI Solutions
Preference will be given to Scale level members of the Startup Program

Startups like CUX.io, Combigo, Super Protocol and ORPIVA have already enjoyed the benefits that the Fast Forward Accelerator offers.

“During the first weeks of data migration, it became clear that we needed additional training in terms of traffic modeling and routing. This ran outside the scope of our previous provider, but OVHcloud showed flexibility and willingness to help – following a quick technical consultation with an OVHcloud architect cleared any uncertainty. This was very important for us, because traffic control, next to the sheer amount of data held, are our biggest challenges. We have no doubt that OVHcloud is a great partner with whom we can always brainstorm with to come up with the best solution together.” says Kamil Walkowiak, VP of R&D at CUX.io.

“We have been using OVHcloud solutions to develop and deploy our AI-powered applications, including LLM training and automatic AI video generation. We are very satisfied with their performance, reliability, and support.” says Salman Valibeik, CEO and Co-Founder at ORPIVA.

Sign up to the Startup Program and AI Accelerator today to benefit from a wealth of support – and scale your business faster.

F.A.I.R. Principles in Data for AI

Lex Avstreikh — Mon, 30 Sep 2024 22:44:43 +0000

How the FAIR Data Principles apply to Machine Learning Data and Infrastructure

At Hopsworks, the FAIR Guiding Principles for scientific data management and stewardship have been a cornerstone of our approach to build a better machine learning platform. F.A.I.R. principles initially became prevalent in academia and diverse fields of research in an effort to make sure that the ever growing amount of data could still be usable and beneficial for the society, and it has since been widely adopted. However, few people mention them in the context of machine learning systems and data management. Yet those principles are even more relevant today in the fast moving AI and LLMs landscape, where new legislation is changing the rules of the game.

AI professionals should consider how questions of ethics, data management, and open frameworks may influence their choice of tools and machine learning platforms when implementing modern ML systems. In Hopsworks, we follow the F.A.I.R. principles in the design of a platform for managing machine learning data and infrastructure.

What are the Four Core Concepts of F.A.I.R.?

‍Findable; referring to mechanics to make the data easily searchable and findable. Infrastructure, stakeholders, and projects need easy-to-use functionality for data discovery.

Data needs to follow clear naming conventions, be indexed for free-text search and have persistent uniquely identified metadata that clearly and explicitly describe the data.
The design and curation of metadata needs to have good system support.

‍Accessible; allow access not only to the data but the provenance of the data and metadata for the data.

Open, free, and universally implementable protocols that allow access to the data itself, the metadata and its provenance,
Access control support is required when sharing data. Role-based access control is good, but attribute-based access control and/or dynamic role-based access control provides even more fine-grained support for data sharing and reuse.

Interoperable; data should be easily shared between different computer systems. This is achieved by implementing open standards and formats for data

Open and accessible file formats and transport protocols for accessing the data.

Reusable; data produced by one system should be easy to reuse in downstream systems, without copying the data. In order to reuse data, it’s important to include metadata related to the data licenses,, provenance, community standards, and custom metadata that will allow other institutions, teams or groups to be able to reuse the data.

Versioning, cataloging, provenance/lineage, data integrity, and custom metadata make it easier for users of data to decide on whether they can use the shared data.

Why F.A.I.R. is challenging for AI platforms and ML Systems

Some of the FAIR principles are directly applicable in the context of machine learning systems: there are lots of open source frameworks, file systems, and programming languages that are used for the operation of AI products and services. Still, some very serious challenges do emerge that are specifically due to the way any ML System needs to operate.

Findable; while strategies that apply metadata and clear nomenclature can be applied in the context of operational machine learning systems, practitioners will find it challenging to create a clear centralized logic between the different data sources and databases needed to operate such services; a modern ML system might need to be connected to multiple sources, some of which may be real-time, or vector databases for large languages. Making a clear structure for the assets and the metadata becomes a complex endeavor without a centralized solution capable of catering to the different scenarios.

‍Interoperable & Accessible; When open frameworks and open file formats are used; core challenges in regards to accessibility and interoperability should be easier to resolve; in which case it becomes important to consider open standards, compute engines and avoid DSLs. One additional challenge that can span from the very nature of the underlying data is to make it accessible for auditing (for example; what was the data that the model in production last year trained on?), review and debugging whilst the systems continuously updates and appends data.

Reusability; Finally a fundamental characteristic of machine learning models is that some of them require the data processing to be directly tied to the model that will be trained; we call these model-dependent transformations. This process essentially compromises the integrity of the data and the underlying datasets can’t be re-used in a different scenario. And not only does it prevent the reuse of the data itself, it is also harder to understand for a human. This leads to significant holds on the ability of any organization to reuse their data in different models, leading to deduplication and the creation of monolithic pipelines that are notoriously harder to scale from.

Making Data for AI F.A.I.R. ‍

Use case ofHopsworks with the Human Exposome Assessment Platform

At Hopsworks, we have a strong heritage working with academia and research, participating in projects such as HEAP (Human Exposome Assessment Project) that manages personal data from numerous medical institutes across the world. We have always been mindful of the evident privacy and security concerns and needs of efficiency in managing data following FAIR principles; when approaching such project; we consider those principles as a blueprint on how to refine our own software;

Using open frameworks,
Using open languages,
Modular technologies,
Reusable file formats.

Additionally, striving to build strong abstractions and APIs that enable users and organizations to have a better understanding of the models they are building and more flexibility in reusing their data pipelines. Those are core aspects of the Hopsworks platform, which we believe all state-of-the-art ML platforms should follow to be within the FAIR framework.

FAIR principles in practice at Hopsworks

Sources;

Adopting AI in SaaS: how can we move quickly without losing control?

Germain Masse — Wed, 14 Aug 2024 06:04:09 +0000

The widespread use of AI poses numerous challenges. Including the risks of data leakage, the need for explainable results, handling it in SaaS. But also, the growing dependence on Big Tech. Not to mention the environmental toll linked to AI.

No doubt the eco-design of digital services is becoming increasingly popular. Still, the efforts to achieve digital sobriety seem to be marginal. Especially compared to the energy consumed by training general-purpose LLMs. Is there a way to make AI greener? And what would a more “trusted AI” mean?

Here’s a roundup of challenges and solutions.

Efficiency of specialised LLMs compared to general-purpose LLMs

General-purpose LLMs, such as GPT-4 developed by OpenAI, LaMa (Meta) and Gemini (Google) are currently in the spotlight. Versatile, omniscient, and able to handle a variety of scenarios. They seem to be able to meet every need: generating text, code, answering questions, translating content, and even composing poems.

However, these general-purpose models have not yet eclipsed specialised LLMs,^[1]which target a narrower range of situations, but perform much better in them. The Retrieval-Augmented Generation (RAG) technique certainly makes it possible to specialise a general-purpose LLM via transfer learning, with or without retraining the model. However, the use of general-purpose LLMs continues to pose a range of challenges. Starting with their generic results, unreliable quality or lack of reproducibility. This will prove even more challenging as the available sources of quality data may become scarce, due to legal actions^[2] brought for unauthorised use of content and copyright infringement. Additionally, the use of general-purpose LLMs leads to operator dependency and reinforce monopolies,^[3]which is unfavourable for long-term users.

The impact of AI on the environment
Researchers from Hugging Face and the Allen Institute^[4]have shown that in the case of servers with GPUs, the carbon emissions linked to machine use far exceed those linked to manufacturing the components, unlike traditional cloud computing.Generating an image using an AI model is one of the most energy-intensive uses, and requires as much electricity as fully charging a smartphone.^{^[5]}Reversing the distribution of carbon emissions throughout the lifecycle of servers in this way means that the power usage effectiveness (PUE) of the datacentres in which AI models are trained and inferred, as well as the energy mix of the countries in which they are located, are very significant selection criteria in calculating your application’s global carbon footprint.

This is a bonus for OVHcloud. Indeed, the Group has long been committed to reducing the carbon footprint of its datacentres.^[6]

As it might be expected, general-purpose LLMs are more environmentally damaging than specialised models designed for specific tasks. This has been revealed in a series of comparative tests carried out by the same researchers.^[7]With thousands of billions of settings, the largest LLMs are getting larger and more data-intensive.^[8] An article in theNew Scientist recently explained that algorithm advances are outpacing Moore’s Law, as after eight months. A large language model would need only half the computing power to achieve the same level of performance.^[9]However, to run a model like OpenAI today, it would cost Microsoft around $700,000 per day[10], or an average cost of 36 cents per query. Still unreasonable from an economic and environmental point of view to meet needs that are often precise and well-defined.

Specialised models, which can be chained to perform complex tasks (referred to as agentisation), are therefore a more environmentally responsible alternative to general-purpose LLMs. On top of that, specialised models, which are more widely available in open source, are also easier to understand and to fine-tune. They seem more suitable for reversibly building innovations for which the ROI is still very uncertain.

Maintaining control: working towards developing a trusted AI

While large companies quickly became aware of the risks of leaking confidential data when using digital services (like online translation, which they are beginning to ban), AI intensified the temptation to output a company’s data and submit it to an algorithm: here to write a report more quickly, there to generate an image that will illustrate a presentation on a confidential project. Samsung learned this the hard way, as a victim of three consecutive data leaks related to the use of ChatGPT by its employees, who notably copied/pasted source code to solve or optimise a problem.
You don’t need to disclose a lot of information to say a lot about your intentions. What insights would your rival gather about your strategy from reading your ChatGPT prompts? After all, it is possible for AI to accidentally “scrape” data submitted by users, thanks to a bug^[11]causing security issues. The same goes for datasets you might submit on AI platforms: will your data be used to train and refine the model? Could they benefit potential rivals?

Beyond this, there is also the question of the transparency of AI models. With it, comes the risk of outsourcing increasingly important tasks to sophisticated AI models. Indeed, they can become “black boxes”, making incomprehensible decisions, or producing skewed results because of the data they are trained on.

Let’s face the possibility that you may not have any problem with the results. Would you run the risk of relying on a service where you can’t explain in broad terms how it works? And that you couldn’t stop using without losing everything? Here, we encounter another problem – reversibility.

If for example, the AI service deemed the party to be over and the infrastructure that it has long financed at a loss must now be made profitable, so it takes advantage of its monopoly and your dependency to increase its rates in an unreasonable way, you could certainly cancel the service. But then you would lose the results of your data training and/or model specialisation, and you would have to start from scratch. In the current absence of standards for portability/interoperability between different AI services, this issue is crucial – all the more so given that, for the moment, while open-source is popular, proprietary models are very much in the majority.

There is no simple answer to the questions that have been raised. That’s because AI development is currently very empirical, based on a trial-and-error model, with no traceability of training data or model modifications.

This, incidentally, makes the “explainability”^[12]of an AI system’s results a real challenge, even though the AI Act establishes a duty to do so (see below).

The development of a “trustworthy AI”, as it was termed in a 2019 paper^[13]by the Independent High-Level Expert Group on Artificial Intelligence (AI HLEG), is perhaps a direction to keep in mind. It defines a trustworthy AI with three main objectives, which OVHcloud aims to help you achieve: AI must be lawful (legislative or regulatory aspect), ethical (respect for ethical norms) and robust (from both a technical and social perspective).

In the meantime, ensuring swift regulatory compliance at the national, European, and international levels is a powerful lever for promoting greener business practices, without compromising future prospects in the pursuit of innovation.

1/ Complying with current and future regulations

The EU was quick to respond to the democratisation of AI, proposing a draft European regulation on the subject on 21 April 2021. In March 2024, the AI Act was officially adopted. Now, it applies to all services used in the EU, regardless of whether the providers are foreign or not.

The law divides AI systems into four categories, taking into account their impact on fundamental rights in the EU and the security of individuals, groups, societies, and civilization. Each risk category has associated prohibitions^[14]and obligations, ranging from environmental sustainability to security, and including marking content that has been AI-generated.

A “compliance checker” online allows you to quickly find out the extent to which this European AI law applies to your projects: https://artificialintelligenceact.eu/

Other national and European regulations on personal data protection, such as the GDPR, already apply to your AI projects, holding companies accountable for hosting and transferring personal data outside the EU.

Incidentally, those who complain about regulation being too burdensome in comparison to the American laissez-faire attitude or the Chinese spirit of conquest have the wrong end of the stick: the absence of a genuine European single market is a much bigger factor^[15]behind Europe’s innovation gap. So, too, is the weak support for public procurement, or the incomprehensible message sent by governments that claim to want to develop sovereign solutions by relying on investments by foreign stakeholders.^[16]

It’s also worth noting that the AI Act provides for the possibility for national competent authorities (the ICO in the UK) [1] to set up “regulatory sandboxes”, i.e. a controlled environment to test innovative technologies for a limited time in order to ensure the compliance of the AI system and to not delay any potential placing on the market, with priority access to these sandboxes for SMEs and startups.

In short, regulations today do not hinder the development of projects that take advantage of the possibilities offered by AI, but rather strengthen companies’ obligations regarding the protection of personal data due to increased risks. These obligations will help to reassure users, once this brief period of carelessness and frivolity with AI has passed, and the inevitable first scandals start to surface. As Yoshua Bengio, researcher and founder of MILA (the Quebec Artificial Intelligence Institute), summed up: “We’re going too fast in an unfamiliar direction, and that could change the world in a very positive, or very dangerous, way.”^[17]Countries should therefore seek to regulate AI so that its development does not feel like the Wild West.

In this context, the preference for sovereign solutions will make it easier for your projects to comply with regulations, in addition to establishing a clear medium- and long-term vision. COVID and the current geopolitical instability have shown the cost of relying on foreign entities for essential services, and AI-based services will quickly follow the same path if they integrate the software we use every day, in such critical areas as health, education, transport, or logistics.

^[1] General-purpose and specialised LLMs can be distinguished by the number of parameters in their neural network: tens, hundreds or even thousands of billions of parameters for a general-purpose model versus “a few billion” for a specialised model.

^[2] https://www.usine-digitale.fr/article/openai-cible-par-deux-class-actions-aux-etats-unis.N2148412; https://www.lefigaro.fr/secteur/high-tech/des-journaux-americains-poursuivent-openai-et-microsoft-en-justice-pour-violation-de-leurs-droits-d-auteur-20240430

^[3] https://www.nytimes.com/2024/06/05/technology/nvidia-microsoft-openai-antitrust-doj-ftc.html

^[4] http://arxiv.org/pdf/2311.16863

^[5] https://www.technologyreview.com/2023/12/01/1084189/making-an-image-with-generative-ai-uses-as-much-energy-as-charging-your-phone/

^[6] https://corporate.ovhcloud.com/en-gb/sustainability/environment/
For our PUE calculation methodology, refer to https://corporate.ovhcloud.com/sites/default/files/2024-01/methodo_carboncalc_0.pdf

^[7] https://www.silicon.fr/llm-generaliste-specialise-angle-environnemental-473911.html

^[8] https://www.radiofrance.fr/franceculture/podcasts/le-journal-de-l-eco/le-cout-environnemental-de-l-ia-est-colossal-et-sous-evalue-3781962

^[9] https://www.newscientist.com/article/2424179-ai-chatbots-are-improving-at-an-even-faster-rate-than-computer-chips/

^[10] https://usbeketrica.com/fr/article/chatgpt-coute-t-il-vraiment-700-000-dollars-par-jour-a-openai

^[11] https://arstechnica.com/information-technology/2023/02/chatgpt-is-a-data-privacy-nightmare-and-you-ought-to-be-concerned/

^[12] https://www.cnil.fr/fr/definition/explicabilite-ia

^[13] https://op.europa.eu/en/publication-detail/-/publication/d3988569-0434-11ea-8c1f-01aa75ed71a1

^[14] AI systems are prohibited if they violate EU values by infringing on fundamental rights, such as:

• Subliminally manipulating behaviours

• Exploiting individuals’ vulnerabilities in order to influence their behaviour

• AI-based social scoring used by governments for general purposes

• The use of “real-time” remote biometric identification systems in publicly accessible spaces for law enforcement purposes (with exceptions).

^[15] https://twitter.com/hubertguillaud/status/1795001082843713968

^[16] https://twitter.com/canardenchaine/status/1795862230782640367

^[17] https://ici.radio-canada.ca/ohdio/premiere/emissions/ils-ont-fait-annee/segments/entrevue/469120/robot-chatgpt-lois-securite-ordinateurs

To be localised in translation

Machine learning Archives - OVHcloud Blog

Safety first: Detect harmful texts using an AI safeguard agent

Introduction

What is toxicity?

Metrics

Qwen 3 Guard

Specs

Availability

Scores

Notes

How do I set up my own moderator?

Conclusion

Reference Architecture: deploying the Mistral Large 123B model in a sovereign environment with OVHcloud

Context

Mistral Large model

AI Training

AI Deploy

Overview of the Mistral Large deployment architecture

Prerequisites

Architecture guide: Mistral Large on OVHcloud infrastructure

Step 1 – Install ovhai CLI

Step 2 – Provision Object Storage

Step 3 – Access the Mistral AI registry

Step 4 – Define environment variables

Step 5 – Download Mistral Large model weights

Step 6 – Deploy Mistral Large service

Step 7 – Test the Mistral Large model by sending your first requests

Conclusion

Reference Architecture: set up MLflow Remote Tracking Server on OVHcloud

Overview of the MLflow server architecture

Prerequisites

Architecture guide: MLflow remote tracking server

Step 1 – Install ovhai CLI

Step 2 – Provision Object Storage (Artifact Store)

Step 3 – Create PostgreSQL Managed DB (Backend Store)

Step 4 -Build you custom MLflow Docker image and

Step 5 – Start MLflow Tracking Server inside container

Step 6 – Whitelist AI Training job IP in PostgreSQL DB

Step 7 – Create an AI Notebook

Step 8 – Model training inside Jupyter notebook

Step 9 – Track and compare models from MLflow remote server

Step 10 – Monitor everything remotely

What’s next?

Deep Dive into DeepSeek-R1 – Part 1

Introduction

What are reasoning models?

The DeepSeek-R1 model family

Language model training overview

DeepSeek-V3 training

DeepSeek-R1 model series training

Recap

Run DeepSeek-R1 on OVHcloud

AI Endpoints

AI Deploy

Next up, creating a reasoning chatbot with DeepSeek-R1

Resources

Mistral Small 24B served with vLLM and AI Deploy – a single command to deploy an LLM (Part 1)

Context

Mistral Small

vLLM

AI Deploy

Prerequisites

Deployment of the Mistral Small 24B LLM

Manage access tokens

Launch Mistral Small LLM with AI Deploy

1. Start your AI Deploy app

2. Define access

3. Configure GPU resources

4. Set up environment variables

5. Mount persistent volumes

6. Choose the target Docker image

7. Running the model inside the container

Request and send prompt to the LLM

Conclusion

Five ways to develop sovereign, sustainable AI solutions

1- Consider Sovereignty from the Start

2- Plan for a Sovereign Future

3- Location, location, location

4- Always Consider Necessity

5- Re-use, recycle

Step 1 – Install `ovhai` CLI

Step 1 – Install `ovhai` CLI