AI Archives - OVHcloud Blog

How to process large AI requests with Batch Mode on OVHcloud AI Endpoints

Stéphane Philippart — Mon, 01 Jun 2026 12:26:07 +0000

Let’s say you have 20,000 support tickets to classify before tomorrow morning, or a full product catalog to translate without manually sending each request one by one. That kind of workload can quickly become slow, repetitive and difficult to manage.

Batch Mode is designed to help in exactly this type of scenario.

What is Batch Mode?

When working with LLMs, you often send requests one by one through synchronous endpoints like /v1/chat/completions or /v1/responses. This works fine for real-time use cases, but what can you do if you need to process hundreds or thousands of prompts? Sending them individually is slow, and you’re limited by rate limits.

Batch mode solves this problem. Instead of sending requests one at a time, you upload a file containing all your requests, submit a batch job, and get the results back asynchronously, within a maximum of 24 hours. And here’s the cherry on top: batch mode is 50% cheaper than synchronous requests. Since the platform can schedule your workload more efficiently, you benefit from a significant cost reduction.

This is ideal for:

📊 Bulk classification or summarization tasks
🌍 Large-scale translation jobs
📝 Generating descriptions for a product catalog
🧪 Evaluating model outputs on a test dataset

ℹ️ The Batch API is compatible with the OpenAI Batch API format, so you can use the official OpenAI SDK to interact with it.

When not to use Batch Mode!

Batch Mode is designed for large workloads that do not need an immediate response. This being said, it is not the right choice for real-time use cases such as chatbots, live customer support, interactive assistants or applications where users expect an answer within seconds. For those scenarios, synchronous endpoints remain more appropriate. Use Batch Mode when your requests can be processed asynchronously and retrieved later.

ℹ️ The Batch API is currently in beta. You can find more information about the beta on the dedicated page.

Prerequisites for using Batch Mode

Before getting started, you’ll need:

An AI Endpoints API key
Python 3.10+ installed
The openai Python package

⚠️ You can generate your API key from the AI Endpoints console.

Install the dependency:

pip install openai

Set up your environment variables:

export OVH_AI_ENDPOINTS_ACCESS_TOKEN='your_api_key'
export OVH_AI_ENDPOINTS_BASE_URL='https://oai.endpoints.kepler.ai.cloud.ovh.net/v1'

Step 1: Prepare the Input File

The input file uses the JSON Lines format (.jsonl). Each line is a self-contained request with a unique custom_id that lets you match results to their original requests.

Here’s an example requests.jsonl:

{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-oss-20b", "messages": [{"role": "user", "content": "Summarise the plot of Hamlet in two sentences."}]}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-oss-20b", "messages": [{"role": "user", "content": "Translate 'Good morning' into French, Spanish and German."}]}}

Key points:

Each custom_id must be unique within a batch
The model field must reference a model available in the AI Endpoints catalog
The url field indicates which endpoint to call

Step 2: Upload the File and Create the Batch

Here’s the complete Python code that handles the full workflow: upload, create, poll, and download:

import os
import time

from openai import OpenAI

# Load environment variables
_OVH_AI_ENDPOINTS_ACCESS_TOKEN = os.environ["OVH_AI_ENDPOINTS_ACCESS_TOKEN"]
_OVH_AI_ENDPOINTS_BASE_URL = os.environ["OVH_AI_ENDPOINTS_BASE_URL"]

# Initialize the OpenAI-compatible client targeting OVHcloud AI Endpoints
client = OpenAI(
    base_url=_OVH_AI_ENDPOINTS_BASE_URL,
    api_key=_OVH_AI_ENDPOINTS_ACCESS_TOKEN,
)

# 1. Upload the input JSONL file with purpose="batch"
print("📤 Uploading input file...")
batch_input_file = client.files.create(
    file=open("requests.jsonl", "rb"),
    purpose="batch",
)
print(f"✅ Uploaded file id: {batch_input_file.id}")

# 2. Create the batch referencing the uploaded file
print("🚀 Creating batch...")
batch = client.batches.create(
    input_file_id=batch_input_file.id,
    endpoint="/v1/chat/completions",
    completion_window="24h",
    metadata={"description": "Batch mode example - OVHcloud AI Endpoints"},
)
print(f"✅ Batch created: {batch.id} (status: {batch.status})")

# 3. Poll the batch status until it reaches a terminal state
print("⏳ Polling batch status...")
while True:
    current = client.batches.retrieve(batch.id)
    print(f"   status={current.status} counts={current.request_counts}")
    if current.status in ("completed", "failed", "expired", "cancelled"):
        break
    time.sleep(30)

# 4. Download the results (and errors if any)
final = client.batches.retrieve(batch.id)

if final.output_file_id:
    print("📥 Downloading results.jsonl...")
    output = client.files.content(final.output_file_id)
    with open("results.jsonl", "wb") as f:
        f.write(output.read())
    print("✅ Results written to results.jsonl")

if final.error_file_id:
    print("🐛 Downloading errors.jsonl...")
    errors = client.files.content(final.error_file_id)
    with open("errors.jsonl", "wb") as f:
        f.write(errors.read())
    print("🐛 Errors written to errors.jsonl")

print(f"🏁 Final batch status: {final.status}")

Let’s break down the key steps:

Upload the input file

batch_input_file = client.files.create(
    file=open("requests.jsonl", "rb"),
    purpose="batch",
)

The purpose=”batch” parameter tells the API that this file will be used as batch input.

Create the batch

batch = client.batches.create(
    input_file_id=batch_input_file.id,
    endpoint="/v1/chat/completions",
    completion_window="24h",
)

The completion_window=”24h” means the batch will be stopped after 24 hours if not completed.

Poll the batch status

while True:
    current = client.batches.retrieve(batch.id)
    print(f"   status={current.status} counts={current.request_counts}")
    if current.status in ("completed", "failed", "expired", "cancelled"):
        break
    time.sleep(30)

The client.batches.retrieve(batch.id) call returns the current state of the batch. The request_counts field gives you a breakdown of how many requests are completed, failed, or still in progress, useful for monitoring large batches.

The possible terminal states are:

completed: all requests have been processed successfully
failed: the batch encountered a fatal error
expired: the batch exceeded the completion_window duration
cancelled: the batch was manually cancelled via the API

We poll every 30 seconds here, but you can adjust this interval depending on your use case. For very large batches, a longer interval (e.g., 60–120 seconds) is more appropriate.

Download the results

final = client.batches.retrieve(batch.id)

if final.output_file_id:
    output = client.files.content(final.output_file_id)
    with open("results.jsonl", "wb") as f:
        f.write(output.read())

Once the batch is complete, the output_file_id field contains the ID of the results file. You download it using client.files.content() which returns the raw file content.

Download the errors (if any)

if final.error_file_id:
    errors = client.files.content(final.error_file_id)
    with open("errors.jsonl", "wb") as f:
        f.write(errors.read())

If some requests in your batch failed (e.g., invalid model name, malformed input, token limit exceeded), their details will be available in a separate error file. The error_file_id will be None if all requests succeeded. Each line in errors.jsonl contains the custom_id of the failed request along with the error details, making it easy to identify and retry only the failed ones.

Step 3: Read the Results

The output file (results.jsonl) contains one JSON object per line. Each object includes:

The custom_id matching your original request
The full response body (same format as a synchronous /v1/chat/completions responses)

Here’s what a result looks like:

{
  "id": "964e007472a557240221910ba143bb03",
  "custom_id": "request-1",
  "response": {
    "status_code": 200,
    "body": {
      "id": "chatcmpl-9879ebff777795a3",
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Hamlet, the Prince of Denmark, is driven to madness and vengeance after learning that his father was murdered by his uncle Claudius..."
          },
          "finish_reason": "stop"
        }
      ],
      "model": "gpt-oss-20b",
      "usage": {
        "prompt_tokens": 78,
        "completion_tokens": 297,
        "total_tokens": 375
      }
    }
  },
  "error": null
}

If some requests fail, the errors.jsonl file will contain details about what went wrong for each failed request.

Other Examples Available

The AI Endpoints – Batch mode guide also contains examples in:

JavaScript: using the OpenAI Node.js SDK
Pure HTTP requests: using curl without any framework, if you prefer a language-agnostic approach

These examples demonstrate that you can use the Batch API from any language or tool that can make HTTP requests, since it follows the standard OpenAI-compatible API format.

Conclusion

Batch mode is a powerful feature when you need to process large volumes of repetitive, non time-sensitive inference requests, without worrying about rate limits or timeout issues. Upload your file, submit the batch, and come back later for the results, it’s as simple a solution as that.

The OpenAI-compatible API makes it straightforward to integrate into existing workflows, and with examples available in Python, JavaScript, and raw HTTP, you can use whichever approach fits your stack best.

You have a dedicated Discord channel (#ai-endpoints) on our Discord server, see you there!

For more info on AI Endpoints, find our previous blog posts.

Find the full code example in the GitHub repository: public-cloud-examples/ai/ai-endpoints/batch-mode.

Why AI Moves Fast but AI Deployment Still Takes Weeks

Dvs Shiv Kumar and Himanshu Saxena — Thu, 21 May 2026 06:59:13 +0000

This article is a joint collaboration between OVHcloud and Facets Cloud.

Over the past few years, the speed at which teams can build infrastructure has changed dramatically. Models that once took weeks to train can now be iterated in days. Tooling has improved, workflows have matured, and the overall friction in getting from idea to working output has reduced significantly.

But this acceleration often stops the moment teams try to deploy — and this can make AI deployment frustrating. In this blog, you will learn how Facets can help teams accelerate migration to OVHcloud by making deployment more structured, repeatable and predictable.

What first begins as a fast, iterative process slows down when it enters the infrastructure layer. The work is no longer about models or code. It shifts to provisioning environments, configuring pipelines, managing permissions, setting up networking, and ensuring that everything works reliably outside a controlled development setting. This phase is not inherently complex, but it is fragmented, and that introduces delay.

As a result, there is now a visible gap between how quickly teams can build and how slowly they can move to production. The model may be ready in hours, but the surrounding system required to run it still takes days, weeks, or in some cases even months, to put together.

AI deployment is slow because infrastructure is inconsistent, not because AI is limited

It is tempting to assume that deployment delays are still a technical limitation but, in most cases, that is no longer true. The bottleneck is not the model itself. It is everything around it.

Infrastructure is rarely standardised across teams. Each project tends to define its own setup, its own pipeline, and its own configuration. Even when teams use the same tools, the way those tools are applied differs just enough to create inconsistency.

Over time, these differences accumulate. Environments are recreated, pipelines are reconfigured, dependencies are re-evaluated, and access is managed manually. What should be a repeatable process becomes a fresh effort every time.

This is where deployment slows down. Not because teams cannot build quickly, but because the systems around deployment are not structured for reuse.

Faster deployment starts with reusable infrastructure

Teams that deploy faster do not necessarily rely on fundamentally different tools. They change how infrastructure is organised.

Instead of treating infrastructure as something that needs to be set up for every project, they define it once and reuse it. Environments are created from standard definitions rather than being handcrafted. Deployment workflows follow consistent patterns instead of being recreated each time. Access, policies and guardrails are built into the system rather than handled separately.

This does not eliminate complexity, but it contains it. Systems become easier to understand, easier to operate and easier to extend. Most importantly, they become predictable.

Once infrastructure behaves predictably, deployment stops being a recurring source of delay.

AI improves operations only when infrastructure is already structured as a system

There is growing interest in applying AI to DevOps and infrastructure management, but its role is often misunderstood.

AI does not fix fragmented systems. If the underlying setup is inconsistent, AI will only automate that inconsistency. It may reduce effort in specific tasks, but it does not create structure on its own.

AI becomes valuable when it operates on top of a well-defined system. It can help teams understand system state, identify issues faster, assist with debugging, and reduce manual operational work. But it works best when the infrastructure beneath it is already standardised, reusable and consistent.

Infrastructure needs to shift from project-level setup to a standardised system

As AI accelerates development, the limitations of infrastructure become more pronounced. Teams are able to build faster than they can deploy, and that imbalance introduces friction.

This is not a temporary phase. As development continues to speed up, deployment will remain the limiting factor unless the underlying structure changes.

A different approach is needed. Instead of treating infrastructure as a series of isolated setups, teams need systems that standardise how infrastructure is created, managed and reused.

Facets turns infrastructure into a reusable deployment system

Facets is designed around this model. It brings infrastructure, CI/CD, environments, workflows and guardrails into a single operating system for deployment. It brings infrastructure, CI/CD, environments, workflows and guardrails into a single operating system for deployment. Teams can create environments on demand, follow predefined deployment workflows and retain operational context within the platform instead of relying on scattered documentation or individual knowledge.

This changes how teams interact with infrastructure. The focus shifts from repeatedly setting things up to using a system that is already structured to work.

Choosing the right cloud matters once infrastructure is standardised

Once infrastructure is structured as a system, the choice of where to run it becomes more important.

For AI workloads, teams need reliable compute, predictable cost structures and control over how environments are configured. OVHcloud provides a strong foundation for this, with access to GPU infrastructure, cost predictability, and flexibility in how environments are managed.

However, infrastructure alone does not remove friction. The real advantage comes when teams can move to that infrastructure quickly and operate it consistently without rebuilding their setup.

OVHcloud + Facets: Accelerating Deployment Through Standardisation

OVHcloud provides reliable, on-demand infrastructure for modern workloads. Facets helps teams make that infrastructure deployment-ready.

Together, they address both sides of the problem. OVHcloud provides the cloud foundation, while Facets defines how environments, workflows, policies and guardrails are created and operated on top of it.

Facets.cloud introduces reusable infrastructure blueprints that can be applied across projects. These blueprints define environments, workflows and operational guardrails in advance, allowing teams to deploy standardised systems instead of assembling infrastructure manually each time.

This shifts deployment from a configuration-heavy process to a repeatable, system-driven approach. Teams can reduce the time spent on setup, avoid recreating the same deployment patterns and move workloads onto OVHcloud with greater consistency.

The result is a faster path from migration planning to production-ready environments — helping teams migrate in days, not weeks.

AI speed is ultimately limited by how fast infrastructure can be deployed and operated

AI has already compressed the time it takes to build. The next bottleneck is AI deployment.

And that bottleneck is not solved by adding more tools or more people—it is solved by changing how infrastructure is structured. Because in practice, the speed of AI is defined not by how quickly models are built, but by how quickly they can be deployed and run.

Discover Facets.cloud

How Mia Experts Is Reinventing Medical Software with AI and Sovereign Cloud

Leonard Pommereau — Wed, 22 Apr 2026 16:05:58 +0000

The Context: Rethinking the Digital Tools of Physicians

Mia Experts is a new generation medical software platform designed by a physician, for physicians. From the very beginning, the product was built to integrate artificial intelligence in a way that is useful, secure, and aligned with the realities of medical practice.

Today, many doctors spend a significant part of their day dealing with administrative tasks rather than focusing on patient care and clinical decision-making. Existing medical software is often outdated, poorly designed, and disconnected from how physicians actually work.

Mia Experts aims to change that. By leveraging artificial intelligence, the platform automates repetitive tasks and structures medical data in a meaningful and usable way. The goal is simple: give physicians back their time.

The solution primarily targets private practitioners, particularly in general medicine and surgical specialties, where efficient data management, reliability, and time savings are critical.

Built from Real Medical Experience

The idea behind Mia Experts originated from the daily experience of Vincent Salabi, a surgeon who repeatedly encountered the same issue: medical software that was slow, repetitive, and time-consuming.

Instead of helping doctors, these tools often added friction to their workflow.

At the same time, a major technological shift was occurring: artificial intelligence was becoming accessible in a way that could be deployed securely and within a sovereign regulatory framework.

Mia Experts team (from left to right): Julie Rognon, Willy Noël, Kajarooban Thiyagarajah, Vincent Salabi, Patrick Wong

Mia Experts was born from the collaboration of three co-founders with complementary expertise — medical, technical, and entrepreneurial — united by a shared ambition: to fundamentally rethink the physician’s digital workspace.

Early Milestones and Key Achievements

From the earliest stages, several key milestones helped shape the development of Mia Experts.

One of the first successes was designing the software architecture. The team built a simple, modular, and scalable architecture capable of intelligently interacting with both patient and physician data.

The objective was clear: eliminate unnecessary repetition, ensure every piece of data has meaning, and enable reliable data usage — whether for prescription generation or reducing medical errors.

Operating in the highly regulated healthcare sector also required building an infrastructure compliant with Health Data Hosting (HDS) regulations. Mia Experts chose OVHcloud, ensuring health data sovereignty and providing a robust and secure cloud foundation.

Infrastructure management is handled in partnership with Lecpac Consulting, allowing the team to meet regulatory requirements while focusing on product development and innovation.

Another major milestone came through early presentations at medical conferences, particularly in orthopedic and urological surgery. The response from physicians was extremely positive. The software’s usability and clinical logic quickly generated word-of-mouth interest — even among doctors who had not been directly approached.

Mia Experts also achieved several regulatory and technological milestones:

LAP certification for prescription software, obtained in collaboration with healthtech company Posos
INSi compliance, enabling integration with national health identity standards

Even before official product launch, the startup received around 50 pre-orders purely through demonstrations and conference discussions.

The platform is now entering its beta testing phase, with the first deployments planned soon.

Core Values Driving the Product

The development of Mia Experts is guided by a set of strong principles:

Simplicity – intuitive interfaces designed for real medical workflows
Pragmatism – AI must deliver measurable time savings
Data sovereignty – full control over hosting and infrastructure
Health data security – non-negotiable protection standards
Intelligent data structuring – ensuring reliable and actionable medical information

Business, Technical and Regulatory Complexity

Building a medical software platform involves navigating a unique combination of business, technological, and regulatory challenges.

From a business perspective, the first hurdle was securing funding while preserving technological independence. Mia Experts achieved this through an initial funding round involving physician investors, complemented by support from Bpifrance and the French Tech Grant program.

On the technical side, the strict healthcare regulatory environment posed significant challenges. Compliance with HDS standards required implementing strong guarantees around security, traceability, service availability, and access governance from the very beginning.

Another critical challenge involved health data interoperability. Medical data must follow standardized national frameworks and coding systems. Mia Experts needed to structure and transform this data so it could interact seamlessly with national health services such as secure messaging systems and health data platforms.

Yet the biggest challenge was balancing all these constraints with a smooth user experience.

The ambition was never to create software that was simply compliant but difficult to use. Instead, the goal was to design a platform that remains intuitive, efficient, and truly supportive of physicians’ daily work.

Why Mia Experts Chose the Cloud

Cloud infrastructure quickly became a natural choice for the project.

First, artificial intelligence requires scalable computing resources. Running AI endpoints, fine-tuning models, and processing medical voice data demand infrastructure that can scale dynamically while protecting sensitive data.

Second, the cloud offers strong advantages for security and regulatory compliance. As a medical software publisher, Mia Experts needed an infrastructure capable of guaranteeing both data sovereignty and regulatory compliance within the European framework.

Finally, the cloud enables a much more agile product strategy. Unlike traditional locally installed medical software, cloud-based architecture allows centralized updates and continuous product improvement without disrupting physicians’ workflows.

For a fast-growing startup, this flexibility is essential.

Leveraging OVHcloud to Build a Sovereign Health Infrastructure

Choosing OVHcloud was a strategic decision for Mia Experts, especially in a context where health data sovereignty is a critical issue.

Many solutions rely on non-European cloud providers. OVHcloud allowed the startup to build its infrastructure on a secure, sovereign European cloud, fully compliant with French and EU regulations.

This has become a strong differentiator — both from a regulatory standpoint and in terms of trust with physicians.

The OVHcloud Startup Program also played a key role during the early development phase by helping offset the high technical costs associated with innovation.

Mia Experts relies heavily on speech-to-text and AI models for generating medical reports. Fine-tuning these models to understand medical vocabulary requires substantial computing power. The program allowed the team to train and test these models without immediate financial pressure.

The Infrastructure Behind Mia Experts

Today, the platform runs on a robust cloud architecture built on OVHcloud services, including:

Managed Kubernetes for Dev, Pre-production, and Production environments
S3-compatible object storage for medical documents and AI models
GPU instances supporting real-time medical speech transcription
AI Endpoints for LLMs such as Mistral, Llama, and GPT-OSS
Dedicated Public Cloud instances hosting GitHub CI/CD runners

All infrastructure is hosted in France, ensuring compliance with GDPR and HDS requirements.

One major advantage of OVHcloud AI endpoints is transparency: customer data is not used to train external models, a key concern in healthcare environments.

Tangible Results and Impact

The collaboration with OVHcloud has enabled several concrete achievements.

First, Mia Experts successfully deployed an infrastructure fully compliant with HDS health data hosting standards, guaranteeing high levels of security, availability, and traceability.

Second, the startup has been able to build and control its own AI capabilities, particularly around speech recognition and medical text generation. The voice recognition system has already been adapted to medical vocabulary, delivering strong accuracy in clinical contexts.

Another key outcome is AI sovereignty. By hosting AI inference within a controlled European environment, Mia Experts retains full control over its data, models, and algorithms.

Finally, the cloud infrastructure provides significant operational agility. The team can deploy updates quickly, iterate on AI models, and continuously improve application performance.

Accelerating Product Adoption

These technological choices have significantly strengthened Mia Experts’ positioning within the medical software ecosystem.

The cloud infrastructure makes the solution eligible for Ségur V2 standards, a key regulatory benchmark for healthcare software interoperability in France.

This strengthens credibility with physicians and facilitates integration into the national digital health ecosystem.

By maintaining full control over its AI pipeline — from hosting to model fine-tuning — Mia Experts can guarantee both data confidentiality and high-quality performance tailored to medical language.

What’s Next for Mia Experts

The next step is the progressive onboarding of the first users, with around 50 pre-registrations already secured before the official launch.

In the medium term, the startup aims to reach:

300 users within two years
500 users within three years

At the same time, Mia Experts plans to expand beyond surgical specialties with the launch of Mia Experts for General Practice, followed by extensions into additional medical disciplines.

The long-term vision is to build a modular medical platform adaptable to multiple specialties while sharing a unified technological foundation.

Advice for Other Startups

For startups building AI-driven products, the Mia Experts team highlights three key lessons.

First, anticipate your data strategy early. AI models are only as good as the data used to train them. Structuring and preparing datasets before accessing cloud resources can provide a major competitive advantage.

Second, do not underestimate regulatory complexity, especially in sectors like healthcare. Partnering with an experienced infrastructure manager can significantly accelerate deployment.

Finally, think of the cloud not only as hosting infrastructure but as a strategic platform for innovation and scalability.

Conclusion

The journey of Mia Experts shows that innovation in healthcare requires a careful balance between technological ambition, regulatory rigor, and practical usability.

By building on a sovereign and compliant cloud infrastructure from the outset, the startup has laid strong foundations for developing a medical platform that genuinely supports physicians.

The collaboration with OVHcloud has enabled Mia Experts to deploy a secure, scalable, and AI-ready infrastructure, ensuring full control over both health data and AI models.

For startups operating in highly regulated sectors, choosing the right cloud ecosystem can make all the difference — enabling innovation, accelerating growth, and building trust from day one.

Don’t let infrastructure costs limit your growth. We strongly urge other startups to join the OVHcloud Startup Program. Contact their team to build your own foundation for sustainable success.

If you’re a startup looking to transform your business, we encourage you to join the OVHcloud Startup Program or contact OVHcloud to discover how our solutions can support your journey!

Reference Architecture: Deploying a vision-language model with vLLM on OVHcloud MKS for high performance inference and full observability

Eléa Petton — Fri, 10 Apr 2026 07:48:53 +0000

Ensure complete digital sovereignty of your AI models with end-to-end control through open-source solutions on OVHcloud’s Managed Kubernetes Service.

vLLM on OVHcloud MKS for high availability and full observability

This reference architecture demonstrates how to deploy a Large Language Model (LLM) inference system using vLLM on OVHcloud Managed Kubernetes Service (MKS). The solution leverages NVIDIA L40S GPUs to serve the Qwen3-VL-8B-Instruct multimodal model (vision + text) with OpenAI-compatible API endpoints.

This comprehensive guide shows you how to deploy, to scale automatically, and how to monitor vLLM-based LLM workloads on the OVHcloud infrastructure.

What are the key benefits?

Cost-effectiveness: Leverage managed services to minimise operational overhead
Real-time observability: Track Time-to-First-Token (TTFT), throughput, and resource utilisation
Sovereign infrastructure: Keep all metrics and data within European datacentres
Scalable by design: Automatically scale GPU inference replicas based on real workload demand

Context

Managed Kubernetes Service

OVHcloud MKS is a fully managed Kubernetes platform designed to help you deploy, operate, and scale containerised applications in production. It provides a secure and reliable Kubernetes environment without the operational overhead of managing the control plane.

How does this benefit you?

Cost-efficient: Pay only for worker nodes and consumed resources, with no additional charge for the Kubernetes control plane
Fully managed Kubernetes: Certified upstream Kubernetes with automated control plane management, provided upgrades and high availability
Production-ready by design: Built-in integrations with OVHcloud Load Balancers, networking, and persistent storage
Scalable and flexible: Scale workloads easily, node pools to match application demand
Open and portable: Based on standard Kubernetes APIs, enable seamless integration with open-source ecosystems and avoid vendor lock-in

In the following guide, all services are deployed within the OVHcloud Public Cloud.

Architecture overview

This reference architecture demonstrates a basic deployment of vLLM for vision-language model inference on OVHcloud Managed Kubernetes Service, featuring:

High-availability deployment with 2 GPU nodes (NVIDIA L40S)
Optimised GPU utilisation with proper driver configuration
Scalable infrastructure supporting vision-language models
Comprehensive monitoring using Prometheus, Grafana, and DCGM
Full observability for both application and hardware metrics

Data flow:

Data Flow

Inference request:
- User → LoadBalancer → Gateway → NGINX Ingress → “Qwen3 VL” Service → vLLM Pod → GPU
- Response follows reverse path with streaming support
Metrics collection:
- vLLM Pods expose /metrics endpoint (port 8000)
- DCGM Exporters expose GPU metrics (port 9400)
- Prometheus scrapes both endpoints every 30 seconds
- Grafana queries Prometheus for visualization
Load distribution
- NGINX Ingress uses cookie-based session affinity
- vLLM Service uses ClientIP session affinity
- Anti-affinity ensures 1 pod per GPU node

Prerequisites

Before you begin, ensure you have:

An OVHcloud Public Cloud account
An OpenStack user with the Administrator role
Hugging Face access – create a Hugging Face account and generate an access token
kubectl already installed and helm installed (at least version 3.x)

🚀 Now you have all the ingredients, it’s time to deploy the recipe for Qwen/Qwen3-VL-8B-Instruct using vLLM and MKS!

Architecture guide: Native GPU deployment of vLLM on MKS with full stack observability

This reference architecture describes a Large Language Model deployment using vLLM inference server and Kubernetes, to enjoy the benefits of a service that’s both highly available and monitorable in real time.

Step 1 – Create MKS cluster and Node pools

From OVHcloud Control Panel, create a Kubernetes cluster using the MKS.

Navigate to: Public Cloud → Managed Kubernetes Service → Create a cluster

1. Configure cluster

Consider using the following configuration for the current use case:

Name: vllm-deployment-l40s-qwen3-8b
Location: 1-AZ Region – Gravelines (GRA11)
Plan: Free (or Standard)
Network: attach a Private network (e.g. 0000 - AI Private Network)
Version: Latest stable (e.g. 1.34)

2. Create GPU Node pool

During the cluster creation, configure the vLLM Node pool for GPUs:

Node pool name: vllm
Flavor: L40S-90
Number of nodes: 2
Autoscaling: Disabled (OFF)

Why L40S-90?

Cost-effective for single-model deployment (1 GPU per node)
Sufficient RAM (90GB) for Qwen3-VL-8B model

You should see your cluster (e.g. vllm-deployment-l40s-qwen3-8b) in the list, along with the following information:

You can now set up the node pool dedicated to monitoring.

3. Create CPU Node pool

From your cluster, click on Add a node pool and configure it as follow:

Node pool name: monitoring
Flavor: B2-15
Number of nodes: 1
Autoscaling: Disabled (OFF)

✅ Note

Monitoring stack can run on GPU nodes if cost is a concern. Dedicated CPU node provides better isolation and resource management.

If the status is green with the OK label, you can proceed to the next step.

4. Configure Kubernetes access

Once your nodes have been provisioned, you can download the Kubeconfig file and configure kubectl with your MKS cluster.

# configure kubectl with your MKS cluster
export KUBECONFIG=/path/to/your/kubeconfig-xxxxxx.yml

# verify cluster connectivity
kubectl cluster-info
kubectl get nodes

Returning:

NAME STATUS ROLES AGE VERSION monitoring-node-xxxxxx Ready 1d v1.34.2 vllm-node-yyyyyy Ready 1d v1.34.2 vllm-node-zzzzzz Ready 1d v1.34.2

Before going further, add a label to the CPU node for monitoring workloads.

CPU_NODE=$(kubectl get nodes -o json | \
  jq -r '.items[] | select(.status.allocatable."nvidia.com/gpu" == null) | .metadata.name')
kubectl label node $CPU_NODE node-role=monitoring

Finally, check with the following command:

NAME                     GPU      ROLE
monitoring-node-xxxxxx      monitoring
vllm-node-yyyyyy         1        
vllm-node-zzzzzz         1

Once both nodes are in Ready status, you can proceed to the next step.

Step 2 – Install GPU operator

To start, consider setting up the GPU operator.

✅ Note

This step is based on this OVHcloud documentation: Deploying a GPU application on OVHcloud Managed Kubernetes Service

1. Add NVIDIA helm repository and create namespace

Add NVIDIA helm repo:

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo update

And create Namespace as follow.

kubectl create namespace gpu-operator

2. Install GPU operator with correct configuration

The GPU Operator must be configured with specific driver versions to ensure compatibility with vLLM containers.

However, the default installation uses recent drivers (580.x with CUDA 13.x) which are incompatible with vLLM containers (CUDA 12.x).

Solution: Force driver version 535.183.01 (CUDA 12.2).

helm install gpu-operator nvidia/gpu-operator \
  -n gpu-operator \
  --set driver.enabled=true \
  --set driver.version="535.183.01" \
  --set toolkit.enabled=true \
  --set operator.defaultRuntime=containerd \
  --set devicePlugin.enabled=true \
  --set dcgmExporter.enabled=true \
  --set dcgmExporter.image="dcgm-exporter" \
  --set dcgmExporter.version="3.1.7-3.1.4-ubuntu20.04" \
  --set gfd.enabled=true \
  --set migManager.enabled=false \
  --set nodeStatusExporter.enabled=true \
  --set validator.driver.enable=false \
  --set validator.toolkit.enable=false \
  --set validator.plugin.enable=false \
  --timeout 20m

✅ Note

Specifying the DCGM version may only be necessary if you encounter problems with the default image (e.g. ‘ImagePullBackOff’). If this is the case, add the following parameters:
--set dcgmExporter.repository="nvcr.io/nvidia/k8s" --set dcgmExporter.image="dcgm-exporter" --set dcgmExporter.version="3.1.7-3.1.4-ubuntu20.04"

kubectl get pods -n gpu-operator

Note that all pods should reach Running state in 5-10 minutes.

You can also check the GPU availability:

kubectl get nodes -o json | jq -r '.items[] | select(.status.allocatable."nvidia.com/gpu" != null) | "\(.metadata.name): \(.status.allocatable."nvidia.com/gpu") GPU(s)"'

Returning:

vllm-node-yyyyyy: 1 GPU(s) vllm-node-zzzzzz: 1 GPU(s)

And you can test to run nvidia-smi:

DRIVER_POD=$(kubectl get pods -n gpu-operator -l app=nvidia-driver-daemonset -o name | head -1)
kubectl exec -n gpu-operator $DRIVER_POD -- nvidia-smi

If GPU tests are working properly, you can move on DCGM service configuration.

3. Configure DCGM service

Why is DCGM Exporter required?

DCGM (Data Centre GPU Manager) is NVIDIA’s official tool for monitoring GPUs in production. The goal is to be able to collect and display metrics from both GPU nodes.

GPU monitoring with DCGM

The metrics provided are:

DCGM_FI_DEV_GPU_UTIL – GPU utilisation (%)
DCGM_FI_DEV_GPU_TEMP – GPU temperature (°C)
DCGM_FI_DEV_FB_USED – VRAM used (MB)
DCGM_FI_DEV_FB_FREE – Free VRAM (MB)
DCGM_FI_DEV_POWER_USAGE – Power consumption (W)
And 50+ other GPU metrics

Next, ensure DCGM service has the correct labels and port configuration:

kubectl patch svc nvidia-dcgm-exporter -n gpu-operator --type merge -p '{
  "metadata": {
    "labels": {
      "app": "nvidia-dcgm-exporter"
    }
  },
  "spec": {
    "ports": [
      {
        "name": "metrics",
        "port": 9400,
        "targetPort": 9400,
        "protocol": "TCP"
      }
    ]
  }
}'

Verify the endpoints (should show 2 IPs, one per GPU node).

kubectl get endpoints nvidia-dcgm-exporter -n gpu-operator

NAME ENDPOINTS AGE nvidia-dcgm-exporter x.x.x.x:9400,x.x.x.x:9400 17d

Step 3 – Deploy Qwen3 VL 8B with vLLM inference server

The deployment of the Qwen 3 VL 8B model on two L40S GPU nodes is carried out in several stages.

1. Create namespace and Hugging Face secret

Start by creating Namespace:

kubectl create namespace vllm

Next, you must retrieve your Hugging Face token and replace the HF_TOKEN value by your own:

export HF_TOKEN="hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

Create your secret as follow:

kubectl create secret generic huggingface-secret \
  --from-literal=token=$HF_TOKEN \
  --namespace=vllm

Verify you obtain the following output by launching:

kubectl get secret huggingface-secret -n vllm

NAME TYPE DATA AGE huggingface-secret Opaque 1 14d

2. Create vLLM deployment configuration

First, you can create vllm-deployment-2nodes.yaml file.

Deploy vLLM:

kubectl apply -f vllm-deployment-2nodes.yaml

You can monitor the deployment (it should take 8-10 minutes for model download and loading).

kubectl get pods -n vllm -o wide -w

Expected output after 10 minutes:

NAME               READY  STATUS   RESTARTS  AGE  IP       NODE  
qwen3-vl-xxxx-yyy  1/1    Running  0         1d   X.X.X.X  vllm-node-yyyyyy
qwen3-vl-xxxx-zzz  1/1    Running  0         1d   X.X.X.X  vllm-node-zzzzzz

You can also check the container logs:

kubectl logs -f -n vllm

You should find in the logs: “Uvicorn running on http://0.0.0.0:8000“

Is everything installed correctly? Then let’s move on to the next step.

3. Add service label

Ensure service has the correct label for ServiceMonitor discovery.

kubectl label svc qwen3-vl-service -n vllm app=qwen3-vl --overwrite

You can now verify by launching the following command.

kubectl get svc qwen3-vl-service -n vllm --show-labels | grep "app=qwen3-vl"

Returning:

qwen3-vl-service ClusterIP X.X.X.X 8000/TCP 1d app=qwen3-vl

Step 4 – Install NGINX ingress controller

⚠️ Moving beyond Ingress

Follow this tutorial if you want to use Gateway instead of Ingress.

1. Add helm repository and configure Ingress

First of all, add helm repository:

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update

Create configuration file with ingress-nginx-values.yaml.

Then, install NGINX Ingress:

helm install ingress-nginx ingress-nginx/ingress-nginx \
  --namespace ingress-nginx \
  --create-namespace \
  -f ingress-nginx-values.yaml \
  --wait

Wait for LoadBalancer IP. The external IP assignment should take 1-2 minutes.

kubectl get svc -n ingress-nginx ingress-nginx-controller -w

Once is no longer , Ctrl+C and export it:

export EXTERNAL_IP=
echo "API URL: http://$EXTERNAL_IP"

2. Create vLLM Ingress resource

Next, create vLLM Ingress using vllm-ingress.yaml.

Apply it as follow:

kubectl apply -f vllm-ingress.yaml

You can now test different API calls to verify that your deployment is functional.

3. Test API

Firstly, check if the model is available:

curl http://$EXTERNAL_IP/v1/models | jq

{
  "object": "list",
  "data": [
    {
      "id": "qwen3-vl-8b",
      "object": "model",
      "created": 1772472143,
      "owned_by": "vllm",
      "root": "Qwen/Qwen3-VL-8B-Instruct",
      "parent": null,
      "max_model_len": 8192,
      "permission": [
        {
          "id": "modelperm-8fb35cdd3208b068",
          "object": "model_permission",
          "created": 1772472143,
          "allow_create_engine": false,
          "allow_sampling": true,
          "allow_logprobs": true,
          "allow_search_indices": false,
          "allow_view": true,
          "allow_fine_tuning": false,
          "organization": "*",
          "group": null,
          "is_blocking": false
        }
      ]
    }
  ]
}

Next, test inference using the following request:

curl http://$EXTERNAL_IP/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-vl-8b",
    "messages": [{"role": "user", "content": "Count from 1 to 10."}],
    "max_tokens": 100
  }' | jq '.choices[0].message.content'

"1, 2, 3, 4, 5, 6, 7, 8, 9, 10"

Great! You’re almost there…

Step 5 – Install Prometheus stack

Now, set up the monitoring stack that provides complete observability for application-level (vLLM) and hardware-level (GPU) metrics:

Monitoring architecture

1. Add helm repository and create namespace

Add Prometheus helm repo:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

Then, create the monitoring Namespace.

kubectl create namespace monitoring

2. Create Prometheus deployment configuration and installation

First, create prometheus.yaml file.

Install Prometheus stack:

helm install prometheus prometheus-community/kube-prometheus-stack \
  -n monitoring \
  -f prometheus.yaml \
  --timeout 10m \
  --wait

Now, monitor its installation and wait until the pods are ready:

kubectl get pods -n monitoring -w

If all pods are running successfully, you can proceed to the next step.

3. Check that the installation is operational

First access Grafana in background:

kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80 &

Test Grafana health:

curl -s http://localhost:3000/api/health | jq

{
  "database": "ok",
  "version": "12.3.3",
  "commit": "2a14494b2d6ab60f860d8b27603d0ccb264336f6"
}

You can now access to Grafana locally via http://localhost:3000. You will have to use:

Login: admin
Password: Admin123!vLLM

Well done! You can now proceed to the configuration step.

Step 6 – Configure ServiceMonitors

The ServiceMonitors is used to tell Prometheus which endpoints to scrape for metrics.

1. Create vLLM ServiceMonitor

Retrieve the file from the GitHub repository: vllm-servicemonitor.yaml.

Next, apply and check that the ServiceMonitor vllm-metrics exists:

kubectl apply -f vllm-servicemonitor.yaml
kubectl get servicemonitor -n vllm

2. Create DCGM ServiceMonitor

First, create the dcgm-servicemonitor.yaml file.

Once again, apply and verify:

kubectl apply -f dcgm-servicemonitor.yaml
kubectl get servicemonitor -n gpu-operator

gpu-operator                  1d
nvidia-dcgm-exporter          1d
nvidia-node-status-exporter   1d

3. Configure Prometheus for Cross-Namespace discovery

Apply a patch to allow Prometheus to discover ServiceMonitors in all namespaces.

kubectl patch prometheus prometheus-kube-prometheus-prometheus -n monitoring --type merge -p '{
  "spec": {
    "serviceMonitorNamespaceSelector": {},
    "podMonitorNamespaceSelector": {}
  }
}'

Now you have to restart Prometheus.

Delete Prometheus pod to force configuration reload
Wait for Prometheus to restart

kubectl delete pod prometheus-prometheus-kube-prometheus-prometheus-0 -n monitoring

kubectl wait --for=condition=Ready \
  pod/prometheus-prometheus-kube-prometheus-prometheus-0 \
  -n monitoring \
  --timeout=180s

Wait about 2 minutes for discovery and finally, verify targets:

kubectl port-forward -n monitoring \
  prometheus-prometheus-kube-prometheus-prometheus-0 9090:9090 &

You can open in browser: http://localhost:9090/targets and search for:

vllm
dcgm

Note that the expected targets are:

serviceMonitor/vllm/vllm-metrics/0 (2/2 UP)
serviceMonitor/gpu-operator/nvidia-dcgm-exporter/0 (2/2 UP)

Step 7 – Create Grafana dashboards

In this final step, the goal is to create two Grafana dashboards to track both the software side with vLLM metrics and the hardware metrics that will monitor the GPU consumption and system.

1. vLLM application metrics

The dashboard provides insights into vLLM application performance, request handling, and resource utilization based on the following metrics:

Metric	Type	Description	Unit	Dashboard Usage
`vllm:request_success_total`	Counter	Total successful requests	count	Request Rate, Total Requests
`vllm:num_requests_running`	Gauge	Requests currently being processed	count	Queue Depth, Active Requests
`vllm:num_requests_waiting`	Gauge	Requests waiting in queue	count	Queue Depth, Queued Requests
`vllm:time_to_first_token_seconds`	Histogram	Latency until first token generated	seconds	TTFT P50/P95/P99
`vllm:e2e_request_latency_seconds`	Histogram	Total end-to-end latency	seconds	E2E Latency P50/P95/P99
`vllm:generation_tokens_total`	Counter	Total tokens generated (output)	count	Token Generation Rate, Throughput
`vllm:prompt_tokens_total`	Counter	Total prompt tokens (input)	count	Token Generation Rate, Avg Tokens
`vllm:kv_cache_usage_perc`	Gauge	GPU KV cache utilization	0-1 (0-100%)	KV Cache Usage
`vllm:prefix_cache_hits_total`	Counter	Number of prefix cache hits	count	Cache Hit Rate
`vllm:prefix_cache_queries_total`	Counter	Number of prefix cache queries	count	Cache Hit Rate
`vllm:request_queue_time_seconds`	Histogram	Time spent waiting in queue	seconds	Request Queue Time
`vllm:request_prefill_time_seconds`	Histogram	Prefill phase time	seconds	Prefill Time
`vllm:request_decode_time_seconds`	Histogram	Decode phase time	seconds	Decode Time
`vllm:inter_token_latency_seconds`	Histogram	Latency between each token	seconds	Inter-Token Latency
`vllm:num_preemptions_total`	Counter	Number of preemptions (OOM)	count	Preemptions
`vllm:prompt_tokens_cached_total`	Counter	Prompt tokens cached	count	Cached Tokens
`vllm:request_prompt_tokens`	Histogram	Prompt size distribution	count	(Table)
`vllm:request_generation_tokens`	Histogram	Generated tokens distribution	count	(Table)
`vllm:iteration_tokens_total`	Histogram	Tokens per iteration	count	(Advanced analysis)

This vLLM Grafana dashboard is composed of 23 panels:

The dashboard provides insights into LLM application performance, request handling, and resource utilisation based on the previous metrics.

Type	Nombre	Panels
Timeseries	12	Request Rate, Queue Depth, TTFT, E2E Latency, Token Gen, Cache Usage, Cache Hit, Queue Time, Prefill/Decode, Inter-Token, Preemptions, Avg Tokens
Stat	10	Throughput, TTFT P95, Active Req, Queued Req, Cache Hit Rate, Cache Usage, Total Req, Total Tokens, Cached Tokens, Preemptions
Table	1	Pod Performance

Now create the dashboard using vllm-app-dashboard.json. Then, launch:

echo "Importing vLLM application dashboard..."
curl -X POST \
  'http://localhost:3000/api/dashboards/db' \
  -H 'Content-Type: application/json' \
  -u 'admin:Admin123!vLLM' \
  -d @vllm-app-dashboard.json | jq '.status, .url'

Next, you an access the vLLM dashboard and follow metrics in real time:

This dashboard is also essential to track hardware consumption for comprehensive monitoring.

2. GPU hardware metrics

Take advantage of the most useful DCGM metrics to check both the functioning and consumption of your hardware resources:

Metric	Type	Description	Unit	Normal Thresholds	Dashboard Usage
`DCGM_FI_DEV_GPU_UTIL`	Gauge	GPU utilization (compute)	% (0-100)	70-95% optimal	GPU Utilization
`DCGM_FI_DEV_GPU_TEMP`	Gauge	GPU temperature	°C	< 85°C normal	GPU Temperature
`DCGM_FI_DEV_FB_USED`	Gauge	VRAM used	MB	Variable by model	GPU Memory Used
`DCGM_FI_DEV_FB_FREE`	Gauge	VRAM free	MB	> 2GB recommended	GPU Memory Free
`DCGM_FI_DEV_POWER_USAGE`	Gauge	Power consumption	Watts	< 300W (L40S)	GPU Power Usage
`DCGM_FI_DEV_SM_CLOCK`	Gauge	GPU clock speed (compute)	MHz	Variable	GPU Clock Speed
`DCGM_FI_DEV_MEM_CLOCK`	Gauge	Memory clock speed	MHz	Variable	Memory Clock Speed
`DCGM_FI_DEV_NVLINK_BANDWIDTH_TOTAL`	Counter	Total NVLink bandwidth	bytes/s	(If multi-GPU)	NVLink Bandwidth
`DCGM_FI_DEV_PCIE_TX_BYTES`	Counter	PCIe data transmitted	bytes	(I/O monitoring)	PCIe TX
`DCGM_FI_DEV_PCIE_RX_BYTES`	Counter	PCIe data received	bytes	(I/O monitoring)	PCIe RX
`DCGM_FI_DEV_ECC_DBE_VOL_TOTAL`	Counter	ECC double-bit errors	count	0 ideal	(Health check)
`DCGM_FI_DEV_ECC_SBE_VOL_TOTAL`	Counter	ECC single-bit errors	count	< 10/day acceptable	(Health check)

This hardware Grafana dashboard is composed of 13 panels with GPU hardware and system metrics. A detailed view is also available GPU util (%), temperature (°C), vRAM (GB) and power (Watt).

Type	Count	Panels
Timeseries	8	GPU Util, GPU Mem, GPU Temp, GPU Power, CPU Usage, RAM Usage, Network I/O, Disk I/O
Stat	4	Avg GPU Util, Avg GPU Temp, Total GPU Mem, Total GPU Power
Table	1	Hardware Status

Please refer to hardware-dashboard.json by loading it as follows:

echo "Importing hardware dashboard..."
curl -X POST \
  'http://localhost:3000/api/dashboards/db' \
  -H 'Content-Type: application/json' \
  -u 'admin:Admin123!vLLM' \
  -d @hardware-dashboard.json | jq '.status, .url'

Finally, track resource consumption using this hardware dashboard:

Congratulations! Everything is working. You can now test your model and track the various metrics in real time.

Step 8 – LLM testing and performance tracking

Start by installing Python dependencies:

pip3 install openai tqdm

Replace the by the vLLM service external IP and launch the performance test thanks to the following Python code:

import time
import threading
import random
from statistics import mean
from openai import OpenAI
from tqdm import tqdm

APP_URL = "http://94.23.185.22/v1"
MODEL = "qwen3-vl-8b"

CONCURRENT_WORKERS = 500          # concurrency
REQUESTS_PER_WORKER = 10
MAX_TOKENS = 200                  # generation pressure

# some random prompts
SHORT_PROMPTS = [
    "Summarize the theory of relativity.",
    "Explain what a transformer model is.",
    "What is Kubernetes autoscaling?"
]

MEDIUM_PROMPTS = [
    "Explain how attention mechanisms work in transformer-based models, including self-attention and multi-head attention.",
    "Describe how vLLM manages KV cache and why it impacts inference performance."
]

LONG_PROMPTS = [
    "Write a very detailed technical explanation of how large language models perform inference, "
    "including tokenization, embedding lookup, transformer layers, attention computation, KV cache usage, "
    "GPU memory management, and how batching affects latency and throughput. Use examples.",
]

PROMPT_POOL = (
    SHORT_PROMPTS * 2 +
    MEDIUM_PROMPTS * 4 +
    LONG_PROMPTS * 6    # bias toward long prompts
)

# openai compliance
client = OpenAI(
    base_url=APP_URL,
    api_key="foo"
)

# basic metrics
latencies = []
errors = 0
lock = threading.Lock()

# worker
def worker(worker_id):
    global errors
    for _ in range(REQUESTS_PER_WORKER):
        prompt = random.choice(PROMPT_POOL)

        start = time.time()
        try:
            client.chat.completions.create(
                model=MODEL,
                messages=[{"role": "user", "content": prompt}],
                max_tokens=MAX_TOKENS,
                temperature=0.7,
            )
            elapsed = time.time() - start

            with lock:
                latencies.append(elapsed)

        except Exception as e:
            with lock:
                errors += 1

# run
threads = []
start_time = time.time()

print("\n-> STARTING PERFORMANCE TEST:")
print(f"Concurrency: {CONCURRENT_WORKERS}")
print(f"Total requests: {CONCURRENT_WORKERS * REQUESTS_PER_WORKER}")

for i in range(CONCURRENT_WORKERS):
    t = threading.Thread(target=worker, args=(i,))
    t.start()
    threads.append(t)

for t in threads:
    t.join()

total_time = time.time() - start_time

# results
print("\n-> BENCH RESULTS:")
print(f"Total requests sent: {len(latencies) + errors}")
print(f"Successful requests: {len(latencies)}")
print(f"Errors: {errors}")
print(f"Total wall time: {total_time:.2f}s")

if latencies:
    print(f"Avg latency: {mean(latencies):.2f}s")
    print(f"Min latency: {min(latencies):.2f}s")
    print(f"Max latency: {max(latencies):.2f}s")
    print(f"Throughput: {len(latencies)/total_time:.2f} req/s")

Returning:

-> STARTING PERFORMANCE TEST:
Concurrency: 500
Total requests: 5000

-> BENCH RESULTS:
Total requests sent: 5000
Successful requests: 5000
Errors: 0
Total wall time: 225.54s
Avg latency: 21.45s
Min latency: 6.06s
Max latency: 25.19s
Throughput: 22.17 req/s

Don’t forget to track GPU and vLLM metrics in your Grafana dashboards!

Conslusion

This reference architecture demonstrates a vLLM deployment on OVHcloud Managed Kubernetes Service (MKS) with comprehensive GPU monitoring. Benefits include:

High Performance: GPU-accelerated inference with L40S
Scalability: Kubernetes-native, horizontal scaling-ready
Reliability: Health checks, auto-restart, monitoring
API Compatibility: OpenAI-compatible endpoints
Multimodality: Vision & text capabilities
Full stack monitoring: Complete vLLM application and hardware dashboards

Going Further

Your current architecture is functional. However, if desired, it could be improved into a full production-ready solution.

Wish to take production hardening a step further?

Go further with the following enhancements:

Authentication & authorization
- vLLM API authentication
- Grafana authentication
- Prometheus security
High availability & load balancing
- Grafana high availability with multiple replicas and shared storage
- Prometheus high availability
- vLLM Horizontal Pod Autoscaling (HPA) based on custom metrics
Data persistence & backup
- Prometheus long-term storage with persistent storage
- Grafana Dashboard Backup
Observability enhancements
- Distributed tracing by adding OpenTelemetry for request tracing
- Alerting rules with production-ready alert rules

Extract Text from Images with OCR using Python and OVHcloud AI Endpoints

Stéphane Philippart — Wed, 01 Apr 2026 12:55:19 +0000

If you want to have more information on AI Endpoints, please read the following blog post. You can, also, have a look at our previous blog posts on how use AI Endpoints.

You can find the full code example in the GitHub repository.

In this article, we will explore how to perform OCR (Optical Character Recognition) on images using a vision-capable LLM, the OpenAI Python library, and OVHcloud AI Endpoints.

Introduction to OCR with Vision Models

Optical Character Recognition has been around for decades, but traditional OCR engines often struggle with complex layouts, handwritten text, or noisy images. Vision-capable Large Language Models bring a new approach: instead of relying on specialized OCR pipelines, you can simply send an image to a model that understands both visual and textual content.

In this example, we use the OpenAI Python library to create a simple OCR script powered by a vision model hosted on OVHcloud AI Endpoints.

The whole application is a single Python file: no complex setup, just pip install openai and you’re ready to go.

Setting up the Environment Variables

Before running the script, you need to set the following environment variables:

export OVH_AI_ENDPOINTS_ACCESS_TOKEN="your-access-token"
export OVH_AI_ENDPOINTS_MODEL_URL="https://your-model-url"
export OVH_AI_ENDPOINTS_VLLM_MODEL="your-vision-model-name"

You can find how to create your access token, model URL, and model name in the AI Endpoints catalog. Make sure to choose a vision-capable model from the AI Endpoints catalog.

Installing Dependencies

The only dependency is the OpenAI Python library:

pip install openai

Define the System Prompt

The first step is to define a system prompt that describes what our OCR service does. This prompt tells the model how to behave:

SYSTEM_PROMPT = """You are an expert OCR engine.
Extract every piece of text visible in the provided image.
Preserve the original layout as faithfully as possible (line breaks, columns, tables).
Do NOT interpret, summarise, or translate the content.
Use markdown formatting to represent the layout (e.g. tables, lists).
If the image contains no text, reply with: "No text found."
"""

We tell it to behave as an expert OCR engine, to preserve the original layout, and to use markdown formatting for structured content like tables or lists.

Load the Image

Before sending the image to the model, we need to encode it as a base64 string. Here is a simple helper function that reads a local PNG file and returns a base64-encoded string:

import base64
from pathlib import Path

def load_image_as_base64(path: Path) -> str:
    """Load a local image and encode it as base64."""
    with open(path, "rb") as f:
        return base64.b64encode(f.read()).decode("utf-8")

The base64-encoded data is what gets sent to the vision model as part of the prompt.

Extract Text from the Image

The extract_text function sends the image to the vision model and returns the extracted text:

def extract_text(client: OpenAI, image_base64: str, model: str) -> str:
    """Extract text from an image using the vision model."""
    response = client.chat.completions.create(
        model=model,
        temperature=0.0,
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {
                "role": "user",
                "content": [
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": f"data:image/png;base64,{image_base64}"
                        }
                    }
                ]
            }
        ]
    )
    return response.choices[0].message.content

The image is passed as a data URL inside the image_url field, following the OpenAI Vision API format. The temperature is set to 0.0 because we want deterministic, faithful text extraction and not creative output.

Configure the Client

This example uses a vision-capable model hosted on OVHcloud AI Endpoints. Since AI Endpoints exposes an OpenAI-compatible API, we use the OpenAI client and just point it to the OVHcloud endpoint:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("OVH_AI_ENDPOINTS_ACCESS_TOKEN"),
    base_url=os.getenv("OVH_AI_ENDPOINTS_MODEL_URL"),
)

model_name = os.getenv("OVH_AI_ENDPOINTS_VLLM_MODEL")

A few things to note:

The API key, base URL, and model name are read from environment variables.
The OpenAI library is compatible with any OpenAI compatible API, making it perfect for use with AI Endpoints.

Assemble and Run

With the client configured, extracting text from an image is straightforward:

image_base64 = load_image_as_base64(Path("./doc.png"))
result = extract_text(client, image_base64, model_name)
print(result)

And that’s it!

Here is the image used for this example:

And the result:

$ python ocr_demo.py
📄 Loading image: doc.png
🔍 Running OCR with Qwen2.5-VL-72B-Instruct via OVHcloud AI Endpoints...

📝 Extracted text 📝
Every month, the OVHcloud Developer Advocate team creates content, shares knowledge, and connects with the tech community. Here’s a look at what we did in March 2026. 🚀

🎙️ “Tranches de Tech” – Our monthly podcast

A new episode of our French-language podcast Tranches de Tech🥑 just dropped!

🎧 Episode 102: Tranches de Tech #26 – Architecte, c’est une bonne situation ça ?

This month we sat down with Alexandre Touret, Architect at Worldline to discuss the evolving role of software architects and the growing impact of AI on development practices. From Spotify’s claim that their devs no longer code, to agentic tools like OpenClaw and Claude Code reshaping workflows. We also cover ANSSI’s revised open-source policy, IBM tripling junior hires, and the critical responsibility of mentoring the next generation of developers in an AI-driven world.

📺 Live on Twitch

We streamed live on Twitch this month! Here’s what we covered:

🎥 Rémy Vandepoel discussed with Hugo Allabert and François Loiseau about our Public VCFaaS. Catch the replay on YouTube ▶️.

🎤 Conference Talks

The team hit the road (and the stage) at several conferences this month:

🇳🇱 KubeCon Amsterdam – Amsterdam, Netherlands 🇳🇱

Aurélie Vache gave a talk: The Ultimate Kubernetes Challenge: An Interactive Trivia Game

Conclusion

In this article, we have seen how to use a vision-capable LLM to perform OCR on images using the OpenAI Python library and OVHcloud AI Endpoints. The OpenAI library makes it very easy to send images to a vision model and extract text, and Python allows us to run the whole thing as a simple script.

You have a dedicated Discord channel (#ai-endpoints) on our Discord server (https://discord.gg/ovhcloud), see you there!

Pricing evolution of Public Cloud, Bare Metal and VPS at OVHcloud

Octave Klaba — Thu, 05 Mar 2026 12:59:25 +0000

For customers in the United States, the same article with US pricing is available here : https://us.ovhcloud.com/resources/blog/pricing-evolution-of-public-cloud-bare-metal-and-vps-at-ovhcloud/

Since autumn 2025, the global memory market has been going through a major disruption. Although barely noticeable to end users, these developments are radically changing the cost of computer hardware and, as a direct result, the cost of the cloud.

This article will decipher this structural crisis, its real-life impacts, and the strategic choices that OVHcloud is implementing to mitigate its effects.

An industrial shift towards GPUs

Globally, the three major memory manufacturers have redirected a significant portion of their production capacity to meet the massive demand for GPUs, particularly for AI-related and high-bandwidth computing applications.

This reallocation took place without a corresponding reduction in the historical demand for RAM and storage, generating pressure on several market segments simultaneously.

The consequences of this were immediate and noticeable:

pressure on supply, with reduced stock and extended lead times
continuous rise in RAM and disk prices since September 2025
long-term market instability, which is not expected to find a new balance until late 2026

A sustained inflation of memory components

Even after the market stabilises, prices are not expected to return to their historical levels before 2028, the amount of time needed for new production capacities to become truly operational.

This development profoundly disrupts the economic fundamentals of computer hardware, both for on-premises infrastructures and for the cloud. Depending on configurations, the prices related to RAM and storage could increase by 15% to 300% compared to 2025 prices, depending on the volumes of memory and disk capacity deployed.

This change of scale is both abrupt and unprecedented, with no recent equivalent in the global market.

A market under pressure, even with higher prices

Paradoxically, the rise in prices is not enough to secure the availability of components. Currently, to guarantee the delivery of their desired volume of RAM or disks, cloud providers need to order up to 12 months in advance, without being told the final price at the time of purchase.

In practice, prices are only communicated one to two months after delivery, depending on the changes in supply and demand during the quarter in question. This uncertainty places unprecedented pressure on industry players and cloud providers, simultaneously affecting production and distribution.

Towards a new global balance of demand

This situation will inevitably have repercussions on the volumes ordered. Some customers will find the prices too high and limit their investments, while others, lacking alternatives, will continue to place orders regardless.

This interplay of opposing forces should lead to a new global balance, but at a significantly higher price point. Current projections anticipate a 250% to 300% increase in the price of RAM by the end of 2026, compared to September 2025.

Our strategy to soften the blow

In light of this reality, OVHcloud has chosen not to automatically pass on the entire price increase of components to its customers.

For the cloud deployed between 2026 and 2028 (including Public Cloud, Private Cloud, and Bare Metal), the average price increase will be limited – between 9% and 11% – despite significantly higher RAM and disk costs.

To offset this gap, a moderate increase of 2% to 6% is planned for solutions deployed before 2025, depending on the age of the equipment, as well as a change in IPv4 pricing. The latter should not have a significant impact on our customers’ budgets, as the cost of IP addresses is a small share compared to other resources in a cloud project.

Our objective is clear: to maintain pricing consistency across the entire range from 2021 to 2028, and to prepare for a gradual return to normal in 2029.

Continuous investments and developing solutions

Beyond pricing adjustments, this period will be characterised by sustained investments in our solutions and in the customer experience.

Despite the strong pressure from rising component costs, we are continuing to develop our services to provide more value to our customers.

In practical terms, this will result in:

a gradual strengthening of support mechanisms
an increase in resources included in certain ranges
a modernization of our computing and storage infrastructures

These initiatives demonstrate our commitment to not reduce this period to merely a consequence of cost increases, but to maintain a dynamic of improving our services, even in a constrained economic context.

Time frame and implementation procedures

Our clients have already received emails detailing the precise impacts on their services. The new prices will come into effect on 1 April 2026.

Until that date, it is possible to renew services at the current rates for a duration of up to 2 years. In all cases, the new prices will only apply at the end of the current contractual period.

A time of uncertainty and a strategic advantage

We are going through an exceptionally unpredictable period, where market visibility rarely lasts longer than one to two weeks. There remains hope that prices will stabilize on a long-term basis from 2026, so that we can avoid further unfavorable announcements.
In this tense context, having a global supply chain and two internal production facilities is a major strategic advantage. This allows us to continue receiving components and producing servers, while the memory shortage affects a large part of the market.

Our Prices

You will find below our new pricing:
– Public Cloud: Prices below are displayed on an hourly basis and with Linux OS. Please, find on our Prices web page our monthly-consumed virtual machine instances (b2, c2, r2) and Savings Plan options (b3, c3, r3) as well as prices with Windows licences.
– All our VPS, Floating IPs, and Additional IP pricing.
– Bare Metal: The displayed prices correspond to a 1-month commitment; additional discounts apply for 12- or 24-month prepayments. The prices for options are for new orders only. The renewal of options, which has been communicated by email to our customers, will be limited to +10% for disk options and +15% for RAM options.

For existing subscriptions renewed before April 1st, you can secure your current pricing for the full duration of the commitment you choose, effective from your renewal date.

Please note that the following product categories are not affected by our pricing evolution:
– Public Cloud – Compute : Cloud GPUs and Metal Instances
– Public Cloud – Container : Managed Kubernetes, Managed Registries & Managed Rancher
– Public Cloud – Network : Load Balancer, Gateway. Public and Private network traffic remains included.
– Public Cloud – Storage : Object Storage, Block Storage.
– Public Cloud – Analytics : Data Platform
– Public Cloud – AI & Machine Learning : AI Solutions (AI Notebook, AI Training, AI Deploy) and AI Endpoints
– Public Cloud – Quantum : Emulators & QPUs
– Bare Metal : Kimsufi et SoYouStart ranges
– Bare Metal : All storage (Veeam Enterprise plus, HYCU, Back-up Agent, NAS-HA, Cloud Disk Array)
– Private Cloud : All VMware offers, all storage offers (Veeam Enterprise plus, HYCU, Back-up Agent)

Tableaux des prix

Public Cloud – Virtual Machine Instances

General Purpose

These are the standard hourly & monthly price for Linux version of the instances, without Savings Plan or any other additional discount.

Reference	Old public price (Excl. VAT / Hour)	New Public Price (Excl. VAT / Hour)
b3-8	0,0465 €	0,0512 €
b3-16	0,093 €	0,1023 €
b3-32	0,186 €	0,2046 €
b3-64	0,372 €	0,4092 €
b3-128	0,7439 €	0,819 €
b3-256	1,4878 €	1,637 €
b3-512	2,9756 €	3,274 €
b3-640	3,7195 €	4,092 €
b2-7	0,0681 €	0,0709 €
b2-15	0,129 €	0,1342 €
b2-30	0,261 €	0,2715 €
b2-60	0,505 €	0,526 €
b2-120	0,993 €	1,033 €

Compute Optimized

These are the standard hourly & monthly price for Linux version of the instances, without Savings Plan or any other additional discount.

Reference	Old public price (Excl. VAT / Hour)	New Public Price (Excl. VAT / Hour)
c3-4	0,0415 €	0,0457 €
c3-8	0,083 €	0,0913 €
c3-16	0,1659 €	0,1825 €
c3-32	0,3318 €	0,365 €
c3-64	0,6637 €	0,7301 €
c3-128	1,3274 €	1,461 €
c3-256	2,6547 €	2,921 €
c3-320	3,3184 €	3,651 €
c2-7	0,0978 €	0,1018 €
c2-15	0,19 €	0,1976 €
c2-30	0,383 €	0,3984 €
c2-60	0,749 €	0,779 €
c2-120	1,48 €	1,54 €

Memory Optimized

These are the standard hourly & monthly price for Linux version of the instances, without Savings Plan or any other additional discount.

Reference	Old public price (Excl. VAT / Hour)	New Public Price (Excl. VAT / Hour)
r3-16	0,0602 €	0,0663 €
r3-32	0,1203 €	0,1324 €
r3-64	0,2407 €	0,2648 €
r3-128	0,4813 €	0,53 €
r3-256	0,9627 €	1,059 €
r3-512	1,9254 €	2,118 €
r3-1024	3,8508 €	4,236 €
r2-15	0,0978 €	0,1018 €
r2-30	0,113 €	0,1176 €
r2-60	0,22 €	0,2288 €
r2-120	0,443 €	0,461 €
r2-240	0,871 €	0,906 €

Public Cloud – Databases

MySQL

Reference	Old public price (Excl. VAT / Hour / Node)	New Public Price (Excl. VAT / Hour / Node)	New Public Price (Excl. VAT / Hour)
Essential DB1-4	0,068 €	0,0746 €	0,0746 €
Essential DB1-7	0,1346 €	0,1477 €	0,1477 €
Essential DB1-15	0,2705 €	0,2968 €	0,2968 €
Essential DB1-30	0,5436 €	0,5967 €	0,5967 €
Production B3-8	0,2129 €	0,223 €	0,446 €
Production B3-16	0,4258 €	0,4461 €	0,8922 €
Production B3-32	0,8515 €	0,8922 €	1,7844 €
Production B3-64	1,703 €	1,7844 €	3,5688 €
Production B3-128	3,4059 €	3,5688 €	7,1376 €
Production B3-256	6,8118 €	7,1377 €	14,2754 €
Business DB1-4	0,0865 €	0,0949 €	0,1898 €
Business DB1-7	0,173 €	0,1899 €	0,3798 €
Business DB1-15	0,346 €	0,3797 €	0,7594 €
Business DB1-30	0,6933 €	0,761 €	1,522 €
Business DB1-60	1,3878 €	1,5234 €	3,0468 €
Business DB1-120	2,777 €	3,0484 €	6,0968 €
Advanced B3-8	0,2295 €	0,2404 €	0,7212 €
Advanced B3-16	0,4589 €	0,4808 €	1,4424 €
Advanced B3-32	0,9177 €	0,9616 €	2,8848 €
Advanced B3-64	1,8354 €	1,9232 €	5,7696 €
Advanced B3-128	3,6708 €	3,8464 €	11,5392 €
Advanced B3-256	7,3416 €	7,6928 €	23,0784 €
Enterprise DB1-4	0,0879 €	0,0964 €	0,2892 €
Enterprise DB1-7	0,173 €	0,1899 €	0,5697 €
Enterprise DB1-15	0,346 €	0,3797 €	1,1391 €
Enterprise DB1-30	0,6933 €	0,761 €	2,283 €
Enterprise DB1-60	1,3878 €	1,5234 €	4,5702 €
Enterprise DB1-120	2,777 €	3,0484 €	9,1452 €

PostgreSQL

Reference	Old public price (Excl. VAT / Hour / Node)	New Public Price (Excl. VAT / Hour / Node)	New Public Price (Excl. VAT / Hour)
Essential DB1-4	0,068 €	0,0746 €	0,0746 €
Essential DB1-7	0,1346 €	0,1477 €	0,1477 €
Essential DB1-15	0,2705 €	0,2968 €	0,2968 €
Essential DB1-30	0,5436 €	0,5967 €	0,5967 €
Production B3-8	0,2129 €	0,223 €	0,446 €
Production B3-16	0,4258 €	0,4461 €	0,8922 €
Production B3-32	0,8515 €	0,8922 €	1,7844 €
Production B3-64	1,703 €	1,7844 €	3,5688 €
Production B3-128	3,4059 €	3,5688 €	7,1376 €
Production B3-256	6,8118 €	7,1377 €	14,2754 €
Business DB1-4	0,0865 €	0,0949 €	0,1898 €
Business DB1-7	0,173 €	0,1899 €	0,3798 €
Business DB1-15	0,346 €	0,3797 €	0,7594 €
Business DB1-30	0,6933 €	0,761 €	1,522 €
Business DB1-60	1,3878 €	1,5234 €	3,0468 €
Business DB1-120	2,777 €	3,0484 €	6,0968 €
Advanced B3-8	0,2295 €	0,2404 €	0,7212 €
Advanced B3-16	0,4589 €	0,4808 €	1,4424 €
Advanced B3-32	0,9177 €	0,9616 €	2,8848 €
Advanced B3-64	1,8354 €	1,9232 €	5,7696 €
Advanced B3-128	3,6708 €	3,8464 €	11,5392 €
Advanced B3-256	7,3416 €	7,6928 €	23,0784 €
Enterprise DB1-4	0,0879 €	0,0964 €	0,2892 €
Enterprise DB1-7	0,173 €	0,1899 €	0,5697 €
Enterprise DB1-15	0,346 €	0,3797 €	1,1391 €
Enterprise DB1-30	0,6933 €	0,761 €	2,283 €
Enterprise DB1-60	1,3878 €	1,5234 €	4,5702 €
Enterprise DB1-120	2,777 €	3,0484 €	9,1452 €

Valkey

Reference	Old public price (Excl. VAT / Hour / Node)	New Public Price (Excl. VAT / Hour / Node)	New Public Price (Excl. VAT / Hour)
Essential DB1-4	0,0591 €	0,0648 €	0,0648 €
Essential DB1-7	0,1195 €	0,1311 €	0,1311 €
Production B3-8	0,1409 €	0,1476 €	0,2952 €
Production B3-16	0,3147 €	0,3297 €	0,6594 €
Production B3-32	0,6295 €	0,6595 €	1,319 €
Production B3-64	1,2588 €	1,319 €	2,638 €
Production B3-128	2,5175 €	2,6379 €	5,2758 €
Production B3-256	5,0349 €	5,2757 €	10,5514 €
Business DB1-4	0,068 €	0,0746 €	0,1492 €
Business DB1-7	0,151 €	0,1658 €	0,3316 €
Business DB1-15	0,2252 €	0,2471 €	0,4942 €
Business DB1-30	0,4448 €	0,4882 €	0,9764 €
Business DB1-60	0,8895 €	0,9764 €	1,9528 €
Business DB1-120	1,7736 €	1,9468 €	3,8936 €

Kafka

Reference	Old public price (Excl. VAT / Hour / Node)	New Public Price (Excl. VAT / Hour / Node)	New Public Price (Excl. VAT / Hour)
Production B3-8	0,2656 €	0,2782 €	0,8346 €
Production B3-16	0,5311 €	0,5565 €	1,6695 €
Production B3-32	1,0622 €	1,113 €	3,339 €
Business DB1-4	0,1469 €	0,1612 €	0,4836 €
Business DB1-7	0,2911 €	0,3195 €	0,9585 €
Business DB1-15	0,5532 €	0,6073 €	1,8219 €
Business DB1-30	1,0707 €	1,1753 €	3,5259 €
Business DB1-60	2,1428 €	2,3522 €	7,0566 €
Advanced B3-8	0,2656 €	0,2782 €	1,6692 €
Advanced B3-16	0,5311 €	0,5565 €	3,339 €
Advanced B3-32	1,0622 €	1,113 €	6,678 €
Enterprise DB1-7	0,2924 €	0,321 €	1,926 €
Enterprise DB1-15	0,5532 €	0,6073 €	3,6438 €
Enterprise DB1-30	1,0707 €	1,1753 €	7,0518 €
Enterprise DB1-60	2,1428 €	2,3522 €	14,1132 €

Kafka Connect

Reference	Old public price (Excl. VAT / Hour / Node)	New Public Price (Excl. VAT / Hour / Node)	New Public Price (Excl. VAT / Hour)
Essential DB1-4	0,1044 €	0,1145 €	0,1145 €
Essential DB1-7	0,2101 €	0,2305 €	0,2305 €
Essential DB1-15	0,3913 €	0,4295 €	0,4295 €
Essential DB1-30	0,7084 €	0,7775 €	0,7775 €
Production B3-8	0,1917 €	0,2008 €	0,6024 €
Production B3-16	0,3862 €	0,4046 €	1,2138 €
Production B3-32	0,7027 €	0,7363 €	2,2089 €
Business DB1-7	0,2101 €	0,2305 €	0,6915 €
Business DB1-15	0,4022 €	0,4415 €	1,3245 €
Business DB1-30	0,7084 €	0,7775 €	2,3325 €
Advanced B3-8	0,1908 €	0,1999 €	1,1994 €
Advanced B3-16	0,3862 €	0,4046 €	2,4276 €
Advanced B3-32	0,7027 €	0,7363 €	4,4178 €
Enterprise DB1-7	0,2101 €	0,2305 €	1,383 €
Enterprise DB1-15	0,4022 €	0,4415 €	2,649 €
Enterprise DB1-30	0,7084 €	0,7775 €	4,665 €

Kafka Mirror Maker

Reference	Old public price (Excl. VAT / Hour / Node)	New Public Price (Excl. VAT / Hour / Node)	New Public Price (Excl. VAT / Hour)
Essential DB1-4	0,1044 €	0,1145 €	0,1145 €
Essential DB1-7	0,2101 €	0,2305 €	0,2305 €
Essential DB1-15	0,3913 €	0,4295 €	0,4295 €
Essential DB1-30	0,7084 €	0,7775 €	0,7775 €
Production B3-8	0,1917 €	0,2008 €	0,6024 €
Production B3-16	0,3862 €	0,4046 €	1,2138 €
Production B3-32	0,7027 €	0,7363 €	2,2089 €
Business DB1-4	0,1057 €	0,116 €	0,348 €
Business DB1-7	0,2101 €	0,2305 €	0,6915 €
Business DB1-15	0,4022 €	0,4415 €	1,3245 €
Business DB1-30	0,7084 €	0,7775 €	2,3325 €
Advanced B3-8	0,1908 €	0,1999 €	1,1994 €
Advanced B3-16	0,3862 €	0,4046 €	2,4276 €
Advanced B3-32	0,7027 €	0,7363 €	4,4178 €
Enterprise DB1-7	0,2101 €	0,2305 €	1,383 €
Enterprise DB1-15	0,4022 €	0,4415 €	2,649 €
Enterprise DB1-30	0,7084 €	0,7775 €	4,665 €

Opensearch

Reference	Old public price (Excl. VAT / Hour / Node)	New Public Price (Excl. VAT / Hour / Node)	New Public Price (Excl. VAT / Hour)
Essential DB1-4	0,0742 €	0,0814 €	0,0814 €
Essential DB1-7	0,1497 €	0,1642 €	0,1642 €
Essential DB1-15	0,3007 €	0,33 €	0,33 €
Production B3-8	0,172 €	0,1801 €	0,5403 €
Production B3-16	0,3439 €	0,3603 €	1,0809 €
Production B3-32	0,6877 €	0,7205 €	2,1615 €
Production B3-64	1,3754 €	1,4411 €	4,3233 €
Business DB1-7	0,1607 €	0,1763 €	0,5289 €
Business DB1-15	0,3213 €	0,3526 €	1,0578 €
Business DB1-30	0,648 €	0,7112 €	2,1336 €
Business DB1-60	1,2972 €	1,424 €	4,272 €
Business DB1-120	2,6013 €	2,8555 €	8,5665 €
Advanced B3-8	0,1839 €	0,1927 €	1,1562 €
Advanced B3-16	0,3678 €	0,3854 €	2,3124 €
Advanced B3-32	0,7357 €	0,7708 €	4,6248 €
Advanced B3-64	1,4713 €	1,5416 €	9,2496 €
Enterprise DB1-7	0,162 €	0,1778 €	1,0668 €
Enterprise DB1-15	0,3254 €	0,3571 €	2,1426 €
Enterprise DB1-30	0,6521 €	0,7158 €	4,2948 €
Enterprise DB1-60	1,3014 €	1,4285 €	8,571 €
Enterprise DB1-120	2,6027 €	2,857 €	17,142 €

Managed Dashboard

Reference	Old public price (Excl. VAT / Hour / Node)	New Public Price (Excl. VAT / Hour / Node)	New Public Price (Excl. VAT / Hour)
Essential DB1-4	0,0591 €	0,0648 €	0,0648 €
Essential DB1-7	0,1195 €	0,1311 €	0,1311 €
Production B3-8	0,1195 €	0,1251 €	0,1251 €

Floating IPs

Reference	Old public price (Excl. VAT / Hour)	New Public Price (Excl. VAT / Hour)
Floating IPs	0.0025 €	0.0027 €

Bare Metal Options – Price Tables

Dedicated Servers & Options

These are the standard monthly price of the servers, without prepayment or commitment discount. The prices for options are for new orders only. The renewal of options, which has been communicated by email to our customers, will be limited to +10% for disk options and +15% for RAM options.

	Old Public Price (Excl. VAT / Month)	NEW Public Price (Excl. VAT / Month)
ADVANCE
ADVANCE-1 – 2024 – AMD EPYC 4244P	84.99 €	89.99 €
RAM
32GB DDR5 On-Die ECC 5200MHz	Included	Included
64GB DDR5 On-Die ECC 5200MHz	12 €	18 €
128GB DDR5 On-Die ECC 3600MHz	36 €	58 €
192GB DDR5 On-Die ECC 3600MHz	60 €	78 €
Storage
2x SSD NVMe 960GB Enterprise Class Soft RAID	Included	Included
4x SSD NVMe 960GB Enterprise Class Soft RAID	26 €	42 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 1.92TB Datacenter Class Soft RAID	52 €	70 €
4x SSD NVMe 1.92TB Enterprise Class Soft RAID	78 €	98 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 3.84TB Datacenter Class Soft RAID	104 €	118 €
4x SSD NVMe 3.84TB Enterprise Class Soft RAID	182 €	200 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 7.68TB Datacenter Class Soft RAID	208 €	229 €
4x SSD NVMe 7.68TB Enterprise Class Soft RAID	390 €	429 €
ADVANCE-1 – 2026 – AMD EPYC 4245P	99.99 €	104.99 €
RAM
32GB DDR5 On-Die ECC 5600MHz	Included	Included
64GB DDR5 On-Die ECC 5600MHz	22 €	26 €
128GB DDR5 On-Die ECC 3600MHz	44 €	58 €
256GB DDR5 On-Die ECC 3600MHz	63 €	130 €
Storage
2x SSD NVMe 960GB Datacenter Class Soft RAID	Included	Included
4x SSD NVMe 960GB Datacenter Class Soft RAID	21.60 €	42 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 1.92TB Datacenter Class Soft RAID	38 €	70 €
4x SSD NVMe 1.92TB Enterprise Class PCIe 5.0 Soft RAID	54.40 €	98 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 3.84TB Datacenter Class Soft RAID	100 €	118 €
4x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	178.40 €	197 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 7.68TB Datacenter Class Soft RAID	149.20 €	210 €
4x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	298.40 €	378 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 15.36TB Datacenter Class Soft RAID	299.99 €	392 €
ADVANCE-2 – 2024 – AMD EPYC 4344P	119.99 €	124.99 €
RAM
64GB DDR5 On-Die ECC 5200MHz	Included	Included
128GB DDR5 On-Die ECC 3600MHz	24 €	40 €
192GB DDR5 On-Die ECC 3600MHz	48 €	60 €
Storage
2x SSD NVMe 960GB Enterprise Class Soft RAID	Included	Included
4x SSD NVMe 960GB Enterprise Class Soft RAID	26 €	42 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 1.92TB Datacenter Class Soft RAID	52 €	70 €
4x SSD NVMe 1.92TB Enterprise Class Soft RAID	78 €	98 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 3.84TB Datacenter Class Soft RAID	104 €	118 €
4x SSD NVMe 3.84TB Enterprise Class Soft RAID	182 €	200 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 7.68TB Datacenter Class Soft RAID	208 €	229 €
4x SSD NVMe 7.68TB Enterprise Class Soft RAID	390 €	429 €
ADVANCE-2 – 2026 – AMD EPYC 4345P	119.99 €	134.99 €
RAM
64GB DDR5 On-Die ECC 5600MHz	Included	Included
128GB DDR5 On-Die ECC 3600MHz	22 €	40 €
256GB DDR5 On-Die ECC 3600MHz	52 €	112 €
Storage
2x SSD NVMe 960GB Datacenter Class Soft RAID	Included	Included
4x SSD NVMe 960GB Datacenter Class Soft RAID	21.60 €	42 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 1.92TB Datacenter Class Soft RAID	38 €	70 €
4x SSD NVMe 1.92TB Enterprise Class PCIe 5.0 Soft RAID	54.40 €	98 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 3.84TB Datacenter Class Soft RAID	100 €	118 €
4x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	178.40 €	197 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 7.68TB Datacenter Class Soft RAID	149.20 €	210 €
4x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	298.40 €	378 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 15.36TB Datacenter Class Soft RAID	299.99 €	392 €
ADVANCE-3 – 2024 – AMD EPYC 4464P	149.99 €	169.99 €
RAM
64GB DDR5 On-Die ECC 5200MHz	Included	Included
128GB DDR5 On-Die ECC 3600MHz	24 €	40 €
192GB DDR5 On-Die ECC 3600MHz	48 €	60 €
Storage
2x SSD NVMe 960GB Enterprise Class Soft RAID	Included	Included
4x SSD NVMe 960GB Enterprise Class Soft RAID	26 €	42 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 1.92TB Datacenter Class Soft RAID	52 €	70 €
4x SSD NVMe 1.92TB Enterprise Class Soft RAID	78 €	98 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 3.84TB Datacenter Class Soft RAID	104 €	118 €
4x SSD NVMe 3.84TB Enterprise Class Soft RAID	182 €	200 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 7.68TB Datacenter Class Soft RAID	208 €	229 €
4x SSD NVMe 7.68TB Enterprise Class Soft RAID	390 €	429 €
ADVANCE-3 – 2026 – AMD EPYC 4464P	159.99 €	199.99 €
RAM
64GB DDR5 On-Die ECC 5600MHz	Included	Included
128GB DDR5 On-Die ECC 3600MHz	22 €	40 €
256GB DDR5 On-Die ECC 3600MHz	52 €	112 €
Storage
2x SSD NVMe 960GB Datacenter Class Soft RAID	Included	Included
4x SSD NVMe 960GB Datacenter Class Soft RAID	21.60 €	42 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 1.92TB Datacenter Class Soft RAID	38 €	70 €
4x SSD NVMe 1.92TB Enterprise Class PCIe 5.0 Soft RAID	54.40 €	98 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 3.84TB Datacenter Class Soft RAID	100 €	118 €
4x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	178.40 €	197 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 7.68TB Datacenter Class Soft RAID	149.20 €	210 €
4x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	298.40 €	378 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 15.36TB Datacenter Class Soft RAID	299.99 €	392 €
ADVANCE-4 – 2024 – AMD EPYC 4584PX	199.99 €	219.99 €
RAM
64GB DDR5 On-Die ECC 5200MHz	Included	Included
128GB DDR5 On-Die ECC 3600MHz	24 €	40 €
192GB DDR5 On-Die ECC 3600MHz	48 €	60 €
Storage
2x SSD NVMe 960GB Enterprise Class Soft RAID	Included	Included
4x SSD NVMe 960GB Enterprise Class Soft RAID	26 €	42 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 1.92TB Datacenter Class Soft RAID	52 €	70 €
4x SSD NVMe 1.92TB Enterprise Class Soft RAID	78 €	98 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 3.84TB Datacenter Class Soft RAID	104 €	118 €
4x SSD NVMe 3.84TB Enterprise Class Soft RAID	182 €	200 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 7.68TB Datacenter Class Soft RAID	208 €	229 €
4x SSD NVMe 7.68TB Enterprise Class Soft RAID	390 €	429 €
ADVANCE-4 – 2026 – AMD EPYC 4585PX	199.99 €	239.99 €
RAM
64GB DDR5 On-Die ECC 5600MHz	Included	Included
128GB DDR5 On-Die ECC 3600MHz	22 €	40 €
256GB DDR5 On-Die ECC 3600MHz	52 €	112 €
Storage
2x SSD NVMe 960GB Datacenter Class Soft RAID	Included	Included
4x SSD NVMe 960GB Datacenter Class Soft RAID	21.60 €	42 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 1.92TB Datacenter Class Soft RAID	38 €	70 €
4x SSD NVMe 1.92TB Enterprise Class PCIe 5.0 Soft RAID	54.40 €	98 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 3.84TB Datacenter Class Soft RAID	100 €	118 €
4x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	178.40 €	197 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 7.68TB Datacenter Class Soft RAID	149.20 €	210 €
4x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	298.40 €	378 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 15.36TB Datacenter Class Soft RAID	299.99 €	392 €
ADVANCE-5 – 2024 – AMD EPYC 8224P	249.99 €	289.99 €
RAM
96GB DDR5 ECC 4800MHz	Included	Included
192GB DDR5 ECC 4800MHz	36 €	90 €
384GB DDR5 ECC 4800MHz	108 €	318 €
576GB DDR5 ECC 4800MHz	180 €	552 €
Storage
2x SSD NVMe 960GB Enterprise Class Soft RAID	Included	Included
4x SSD NVMe 960GB Enterprise Class Soft RAID	26 €	42 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 1.92TB Datacenter Class Soft RAID	52 €	70 €
4x SSD NVMe 1.92TB Enterprise Class Soft RAID	78 €	98 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 3.84TB Datacenter Class Soft RAID	104 €	118 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 4x SSD NVMe 1.92TB Datacenter Class Soft RAID	104 €	140 €
4x SSD NVMe 3.84TB Enterprise Class Soft RAID	182 €	200 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 7.68TB Datacenter Class Soft RAID	208 €	229 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 4x SSD NVMe 3.84TB Datacenter Class Soft RAID	208 €	236 €
4x SSD NVMe 7.68TB Enterprise Class Soft RAID	390 €	429 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 4x SSD NVMe 7.68TB Datacenter Class Soft RAID	416 €	458 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 6x SSD NVMe 7.68TB Datacenter Class Soft RAID	624 €	687 €
8x SSD NVMe 7.68TB Enterprise Class Soft RAID	806 €	887 €
ADVANCE-STOR – 2024 – AMD EPYC 4344P	199.99 €	199.99 €
RAM
32GB DDR5 On-Die ECC 5200MHz	Included	Included
64GB DDR5 On-Die ECC 5200MHz	12 €	14 €
128GB DDR5 On-Die ECC 3600MHz	36 €	42 €
192GB DDR5 On-Die ECC 3600MHz	60 €	69 €
Storage
2x HDD SAS 22TB Enterprise Class Soft RAID	Included	Included
2x SSD NVMe 960GB Enterprise Class Soft RAID	Included	Included
4x HDD SAS 22TB Enterprise Class Soft RAID	64 €	70 €
2x HDD SAS 22TB Enterprise Class Hard RAID	66 €	73 €
6x HDD SAS 22TB Enterprise Class Soft RAID	128 €	141 €
4x HDD SAS 22TB Enterprise Class Hard RAID	130 €	143 €
8x HDD SAS 22TB Enterprise Class Soft RAID	192 €	211 €
6x HDD SAS 22TB Enterprise Class Hard RAID	194 €	213 €
8x HDD SAS 22TB Enterprise Class Hard RAID	258 €	284 €
ADVANCE-STOR – 2026 – AMD EPYC 4345P	199.99 €	229.99 €
RAM
32GB DDR5 On-Die ECC 5600MHz	Included	Included
64GB DDR5 On-Die ECC 5600MHz	22 €	25 €
128GB DDR5 On-Die ECC 3600MHz	44 €	58 €
256GB DDR5 On-Die ECC 3600MHz	63 €	130 €
Storage
2x HDD SAS 24TB Enterprise Class Soft RAID	Included	Included
2x SSD NVMe 960GB Datacenter Class PCIe 5.0 Soft RAID	Included	Included
2x HDD SAS 24TB Enterprise Class Hard RAID	66 €	73 €
4x HDD SAS 24TB Enterprise Class Soft RAID	64 €	94 €
4x HDD SAS 24TB Enterprise Class Hard RAID	130 €	143 €
6x HDD SAS 24TB Enterprise Class Soft RAID	128 €	188 €
6x HDD SAS 24TB Enterprise Class Hard RAID	194 €	248 €
8x HDD SAS 24TB Enterprise Class Soft RAID	192 €	282 €
8x HDD SAS 24TB Enterprise Class Hard RAID	258 €	362 €
RISE
RISE-L – 2025 – AMD RYZEN 9 9950X	134.99 €	149.99 €
RISE-M – 2025 – AMD RYZEN 9 9900X	94.99 €	99.99 €
RISE-S – 2025 – AMD Ryzen 7 9700X	54.99 €	64.99 €
RISE-XL – 2025 – AMD EPYC TURIN 9455	269.99 €	299.99 €
GAME
GAME-1 – 2026 – AMD RYZEN 7 9800X3D	129.99 €	139.99 €
RAM
64GB DDR5 On-Die ECC 5600MHz	Included	Included
128GB DDR5 On-Die ECC 3600MHz	22 €	40 €
256GB DDR5 On-Die ECC 3600MHz	63 €	112 €
Storage
2x SSD NVMe 960GB Enterprise Class Soft RAID	Included	Included
GAME-2 – 2026 – AMD RYZEN 9 9950X3D	169.99 €	179.99 €
RAM
64GB DDR5 On-Die ECC 5600MHz	Included	Included
128GB DDR5 On-Die ECC 3600MHz	22 €	40 €
256GB DDR5 On-Die ECC 3600MHz	63 €	112 €
Storage
2x SSD NVMe 960GB Enterprise Class Soft RAID	Included	Included
SCALE-a
SCALE-a1 – 2024 – AMD EPYC GENOA 9124	349.99 €	369.99 €
RAM
128GB DDR5 ECC 4800MHz	Included	Included
256GB DDR5 ECC 4800MHz	80 €	200 €
512GB DDR5 ECC 4800MHz	240 €	440 €
1TB DDR5 ECC 4800MHz	560 €	1000 €
Storage
2x SSD NVMe 960GB Datacenter Class Soft RAID	Included	Included
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	52 €	70 €
2x SSD NVMe 3.84TB Datacenter Class Soft RAID	104 €	118 €
4x SSD NVMe 1.92TB Datacenter Class Soft RAID	104 €	140 €
6x SSD NVMe 1.92TB Enterprise Class Soft RAID	156 €	210 €
2x SSD NVMe 7.68TB Enterprise Class Soft RAID	208 €	229 €
4x SSD NVMe 3.84TB Datacenter Class Soft RAID	208 €	236 €
6x SSD NVMe 3.84TB Enterprise Class Soft RAID	312 €	354 €
4x SSD NVMe 7.68TB Enterprise Class Soft RAID	416 €	458 €
6x SSD NVMe 7.68TB Enterprise Class Soft RAID	624 €	687 €
SCALE-a1 – 2026 – AMD EPYC 9135	389.99 €	409.99 €
RAM
128GB DDR5 ECC 4800MHz	Included	Included
192GB DDR5 ECC 4800MHz	40 €	100 €
256GB DDR5 ECC 4800MHz	80 €	120 €
384GB DDR5 ECC 4800MHz	160 €	280 €
512GB DDR5 ECC 4800MHz	240 €	400 €
768GB DDR5 ECC 4800MHz	400 €	700 €
1TB DDR5 ECC 4800MHz	560 €	1368 €
1.5TB DDR5 ECC 4800MHz	880 €	2152 €
3TB DDR5 ECC 3600MHz	1840 €	4504 €
Storage
2x SSD NVMe 1.92TB Datacenter Class PCIe 5.0 Soft RAID	Included	Included
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	38 €	70 €
2x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	100 €	118 €
4x SSD NVMe 1.92TB Enterprise Class PCIe 5.0 Soft RAID	76 €	140 €
2x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	160 €	210 €
6x SSD NVMe 1.92TB Enterprise Class PCIe 5.0 Soft RAID	190 €	210 €
4x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	200 €	236 €
6x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	300 €	354 €
4x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	320 €	420 €
6x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	480 €	630 €
6x SSD NVMe 15.36TB Enterprise Class Soft RAID	900 €	1176 €
SCALE-a2 – 2024 – AMD EPYC GENOA 9254	379.99 €	389.99 €
RAM
128GB DDR5 ECC 4800MHz	Included	Included
256GB DDR5 ECC 4800MHz	80 €	200 €
512GB DDR5 ECC 4800MHz	240 €	440 €
1TB DDR5 ECC 4800MHz	560 €	1000 €
Storage
2x SSD NVMe 960GB Datacenter Class Soft RAID	Included	Included
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	52 €	70 €
2x SSD NVMe 3.84TB Datacenter Class Soft RAID	104 €	118 €
4x SSD NVMe 1.92TB Datacenter Class Soft RAID	104 €	140 €
6x SSD NVMe 1.92TB Enterprise Class Soft RAID	156 €	210 €
2x SSD NVMe 7.68TB Enterprise Class Soft RAID	208 €	229 €
4x SSD NVMe 3.84TB Datacenter Class Soft RAID	208 €	236 €
6x SSD NVMe 3.84TB Enterprise Class Soft RAID	312 €	354 €
4x SSD NVMe 7.68TB Enterprise Class Soft RAID	416 €	458 €
6x SSD NVMe 7.68TB Enterprise Class Soft RAID	624 €	687 €
SCALE-a2 – 2026 – AMD EPYC 9255	429.99 €	439.99 €
RAM
128GB DDR5 ECC 4800MHz	Included	Included
192GB DDR5 ECC 4800MHz	40 €	100 €
256GB DDR5 ECC 4800MHz	80 €	120 €
384GB DDR5 ECC 4800MHz	160 €	280 €
512GB DDR5 ECC 4800MHz	240 €	400 €
768GB DDR5 ECC 4800MHz	400 €	700 €
1TB DDR5 ECC 4800MHz	560 €	1368 €
1.5TB DDR5 ECC 4800MHz	880 €	2152 €
3TB DDR5 ECC 3600MHz	1840 €	4504 €
Storage
2x SSD NVMe 1.92TB Datacenter Class PCIe 5.0 Soft RAID	Included	Included
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	38 €	70 €
2x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	100 €	118 €
4x SSD NVMe 1.92TB Enterprise Class PCIe 5.0 Soft RAID	76 €	140 €
2x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	160 €	210 €
6x SSD NVMe 1.92TB Enterprise Class PCIe 5.0 Soft RAID	190 €	210 €
4x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	200 €	236 €
6x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	300 €	354 €
4x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	320 €	420 €
6x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	480 €	630 €
6x SSD NVMe 15.36TB Enterprise Class Soft RAID	900 €	1176 €
SCALE-a3 – 2024 – AMD EPYC GENOA 9354	419.99 €	449.99 €
RAM
128GB DDR5 ECC 4800MHz	Included	Included
256GB DDR5 ECC 4800MHz	80 €	200 €
512GB DDR5 ECC 4800MHz	240 €	440 €
1TB DDR5 ECC 4800MHz	560 €	1000 €
Storage
2x SSD NVMe 960GB Datacenter Class Soft RAID	Included	Included
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	52 €	70 €
2x SSD NVMe 3.84TB Datacenter Class Soft RAID	104 €	118 €
4x SSD NVMe 1.92TB Datacenter Class Soft RAID	104 €	140 €
6x SSD NVMe 1.92TB Enterprise Class Soft RAID	156 €	210 €
2x SSD NVMe 7.68TB Enterprise Class Soft RAID	208 €	229 €
4x SSD NVMe 3.84TB Datacenter Class Soft RAID	208 €	236 €
6x SSD NVMe 3.84TB Enterprise Class Soft RAID	312 €	354 €
4x SSD NVMe 7.68TB Enterprise Class Soft RAID	416 €	458 €
6x SSD NVMe 7.68TB Enterprise Class Soft RAID	624 €	687 €
SCALE-a3 – 2026 – AMD EPYC 9355	469.99 €	499.99 €
RAM
128GB DDR5 ECC 4800MHz	Included	Included
192GB DDR5 ECC 4800MHz	40 €	100 €
256GB DDR5 ECC 4800MHz	80 €	120 €
384GB DDR5 ECC 4800MHz	160 €	280 €
512GB DDR5 ECC 4800MHz	240 €	400 €
768GB DDR5 ECC 4800MHz	400 €	700 €
1TB DDR5 ECC 4800MHz	560 €	1368 €
1.5TB DDR5 ECC 4800MHz	880 €	2152 €
3TB DDR5 ECC 3600MHz	1840 €	4504 €
Storage
2x SSD NVMe 1.92TB Datacenter Class PCIe 5.0 Soft RAID	Included	Included
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	38 €	70 €
2x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	100 €	118 €
4x SSD NVMe 1.92TB Enterprise Class PCIe 5.0 Soft RAID	76 €	140 €
2x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	160 €	210 €
6x SSD NVMe 1.92TB Enterprise Class PCIe 5.0 Soft RAID	190 €	210 €
4x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	200 €	236 €
6x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	300 €	354 €
4x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	320 €	420 €
6x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	480 €	630 €
6x SSD NVMe 15.36TB Enterprise Class Soft RAID	900 €	1176 €
SCALE-a4 – 2024 – AMD EPYC GENOA 9454	449.99 €	459.99 €
RAM
128GB DDR5 ECC 4800MHz	Included	Included
256GB DDR5 ECC 4800MHz	80 €	200 €
512GB DDR5 ECC 4800MHz	240 €	440 €
1TB DDR5 ECC 4800MHz	560 €	1000 €
Storage
2x SSD NVMe 960GB Datacenter Class Soft RAID	Included	Included
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	52 €	70 €
2x SSD NVMe 3.84TB Datacenter Class Soft RAID	104 €	118 €
4x SSD NVMe 1.92TB Datacenter Class Soft RAID	104 €	140 €
6x SSD NVMe 1.92TB Enterprise Class Soft RAID	156 €	210 €
2x SSD NVMe 7.68TB Enterprise Class Soft RAID	208 €	229 €
4x SSD NVMe 3.84TB Datacenter Class Soft RAID	208 €	236 €
6x SSD NVMe 3.84TB Enterprise Class Soft RAID	312 €	354 €
4x SSD NVMe 7.68TB Enterprise Class Soft RAID	416 €	458 €
6x SSD NVMe 7.68TB Enterprise Class Soft RAID	624 €	687 €
SCALE-a4 – 2026 – AMD EPYC 9455	539.99 €	549.99 €
RAM
128GB DDR5 ECC 4800MHz	Included	Included
192GB DDR5 ECC 4800MHz	40 €	100 €
256GB DDR5 ECC 4800MHz	80 €	120 €
384GB DDR5 ECC 4800MHz	160 €	280 €
512GB DDR5 ECC 4800MHz	240 €	400 €
768GB DDR5 ECC 4800MHz	400 €	700 €
1TB DDR5 ECC 4800MHz	560 €	1368 €
1.5TB DDR5 ECC 4800MHz	880 €	2152 €
3TB DDR5 ECC 3600MHz	1840 €	4504 €
Storage
2x SSD NVMe 1.92TB Datacenter Class PCIe 5.0 Soft RAID	Included	Included
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	38 €	70 €
2x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	100 €	118 €
4x SSD NVMe 1.92TB Enterprise Class PCIe 5.0 Soft RAID	76 €	140 €
2x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	160 €	210 €
6x SSD NVMe 1.92TB Enterprise Class PCIe 5.0 Soft RAID	190 €	210 €
4x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	200 €	236 €
6x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	300 €	354 €
4x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	320 €	420 €
6x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	480 €	630 €
6x SSD NVMe 15.36TB Enterprise Class Soft RAID	900 €	1176 €
SCALE-a5 – 2024 – AMD EPYC GENOA 9554	499.99 €	539.99 €
RAM
128GB DDR5 ECC 4800MHz	Included	Included
256GB DDR5 ECC 4800MHz	80 €	200 €
512GB DDR5 ECC 4800MHz	240 €	440 €
1TB DDR5 ECC 4800MHz	560 €	1000 €
Storage
2x SSD NVMe 960GB Datacenter Class Soft RAID	Included	Included
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	52 €	70 €
2x SSD NVMe 3.84TB Datacenter Class Soft RAID	104 €	118 €
4x SSD NVMe 1.92TB Datacenter Class Soft RAID	104 €	140 €
6x SSD NVMe 1.92TB Enterprise Class Soft RAID	156 €	210 €
2x SSD NVMe 7.68TB Enterprise Class Soft RAID	208 €	229 €
4x SSD NVMe 3.84TB Datacenter Class Soft RAID	208 €	236 €
6x SSD NVMe 3.84TB Enterprise Class Soft RAID	312 €	354 €
4x SSD NVMe 7.68TB Enterprise Class Soft RAID	416 €	458 €
6x SSD NVMe 7.68TB Enterprise Class Soft RAID	624 €	687 €
SCALE-a5 – 2026 – AMD EPYC 9555	599.99 €	639.99 €
RAM
128GB DDR5 ECC 4800MHz	Included	Included
192GB DDR5 ECC 4800MHz	40 €	100 €
256GB DDR5 ECC 4800MHz	80 €	120 €
384GB DDR5 ECC 4800MHz	160 €	280 €
512GB DDR5 ECC 4800MHz	240 €	400 €
768GB DDR5 ECC 4800MHz	400 €	700 €
1TB DDR5 ECC 4800MHz	560 €	1368 €
1.5TB DDR5 ECC 4800MHz	880 €	2152 €
3TB DDR5 ECC 3600MHz	1840 €	4504 €
Storage
2x SSD NVMe 1.92TB Datacenter Class PCIe 5.0 Soft RAID	Included	Included
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	38 €	70 €
2x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	100 €	118 €
4x SSD NVMe 1.92TB Enterprise Class PCIe 5.0 Soft RAID	76 €	140 €
2x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	160 €	210 €
6x SSD NVMe 1.92TB Enterprise Class PCIe 5.0 Soft RAID	190 €	210 €
4x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	200 €	236 €
6x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	300 €	354 €
4x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	320 €	420 €
6x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	480 €	630 €
6x SSD NVMe 15.36TB Enterprise Class Soft RAID	900 €	1176 €
SCALE-a6 – 2024 – AMD EPYC GENOA 9654	579.99 €	629.99 €
RAM
128GB DDR5 ECC 4800MHz	Included	Included
256GB DDR5 ECC 4800MHz	80 €	200 €
512GB DDR5 ECC 4800MHz	240 €	440 €
1TB DDR5 ECC 4800MHz	560 €	1000 €
Storage
2x SSD NVMe 960GB Datacenter Class Soft RAID	Included	Included
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	52 €	70 €
2x SSD NVMe 3.84TB Datacenter Class Soft RAID	104 €	118 €
4x SSD NVMe 1.92TB Datacenter Class Soft RAID	104 €	140 €
6x SSD NVMe 1.92TB Enterprise Class Soft RAID	156 €	210 €
2x SSD NVMe 7.68TB Enterprise Class Soft RAID	208 €	229 €
4x SSD NVMe 3.84TB Datacenter Class Soft RAID	208 €	236 €
6x SSD NVMe 3.84TB Enterprise Class Soft RAID	312 €	354 €
4x SSD NVMe 7.68TB Enterprise Class Soft RAID	416 €	458 €
6x SSD NVMe 7.68TB Enterprise Class Soft RAID	624 €	687 €
SCALE-a6 – 2026 – AMD EPYC 9655	699.99 €	729.99 €
RAM
128GB DDR5 ECC 4800MHz	Included	Included
192GB DDR5 ECC 4800MHz	40 €	100 €
256GB DDR5 ECC 4800MHz	80 €	120 €
384GB DDR5 ECC 4800MHz	160 €	280 €
512GB DDR5 ECC 4800MHz	240 €	400 €
768GB DDR5 ECC 4800MHz	400 €	700 €
1TB DDR5 ECC 4800MHz	560 €	1368 €
1.5TB DDR5 ECC 4800MHz	880 €	2152 €
3TB DDR5 ECC 3600MHz	1840 €	4504 €
Storage
2x SSD NVMe 1.92TB Datacenter Class PCIe 5.0 Soft RAID	Included	Included
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	38 €	70 €
2x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	100 €	118 €
4x SSD NVMe 1.92TB Enterprise Class PCIe 5.0 Soft RAID	76 €	140 €
2x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	160 €	210 €
6x SSD NVMe 1.92TB Enterprise Class PCIe 5.0 Soft RAID	190 €	210 €
4x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	200 €	236 €
6x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	300 €	354 €
4x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	320 €	420 €
6x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	480 €	630 €
6x SSD NVMe 15.36TB Enterprise Class Soft RAID	900 €	1176 €
SCALE-a7 – 2026 – AMD EPYC 9755	809.99 €	829.99 €
RAM
128GB DDR5 ECC 4800MHz	Included	Included
192GB DDR5 ECC 4800MHz	40 €	100 €
256GB DDR5 ECC 4800MHz	80 €	120 €
384GB DDR5 ECC 4800MHz	160 €	280 €
512GB DDR5 ECC 4800MHz	240 €	400 €
768GB DDR5 ECC 4800MHz	192 €	700 €
1TB DDR5 ECC 4800MHz	560 €	1368 €
1.5TB DDR5 ECC 4800MHz	880 €	2152 €
3TB DDR5 ECC 3600MHz	1840 €	4504 €
Storage
2x SSD NVMe 1.92TB Datacenter Class PCIe 5.0 Soft RAID	Included	Included
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	38 €	70 €
2x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	100 €	118 €
4x SSD NVMe 1.92TB Enterprise Class PCIe 5.0 Soft RAID	76 €	140 €
2x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	160 €	210 €
6x SSD NVMe 1.92TB Enterprise Class PCIe 5.0 Soft RAID	190 €	210 €
4x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	200 €	236 €
6x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	300 €	354 €
4x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	320 €	420 €
6x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	480 €	630 €
6x SSD NVMe 15.36TB Enterprise Class Soft RAID	900 €	1176 €
SCALE-a8 – 2026 – AMD EPYC 9965	869.99 €	899.99 €
RAM
128GB DDR5 ECC 4800MHz	Included	Included
192GB DDR5 ECC 4800MHz	40 €	100 €
256GB DDR5 ECC 4800MHz	80 €	120 €
384GB DDR5 ECC 4800MHz	160 €	280 €
512GB DDR5 ECC 4800MHz	240 €	400 €
768GB DDR5 ECC 4800MHz	192 €	700 €
1TB DDR5 ECC 4800MHz	560 €	1368 €
1.5TB DDR5 ECC 4800MHz	880 €	2152 €
3TB DDR5 ECC 3600MHz	1840 €	4504 €
Storage
2x SSD NVMe 1.92TB Datacenter Class PCIe 5.0 Soft RAID	Included	Included
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	38 €	70 €
2x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	100 €	118 €
4x SSD NVMe 1.92TB Enterprise Class PCIe 5.0 Soft RAID	76 €	140 €
2x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	160 €	210 €
6x SSD NVMe 1.92TB Enterprise Class PCIe 5.0 Soft RAID	190 €	210 €
4x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	200 €	236 €
6x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	300 €	354 €
4x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	320 €	420 €
6x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	480 €	630 €
6x SSD NVMe 15.36TB Enterprise Class Soft RAID	900 €	1176 €
SCALE-a9 – 2026 – Dual AMD EPYC 9965	1349.99 €	1349.99 €
RAM
128GB DDR5 ECC 5600MHz	Included	Included
192GB DDR5 ECC 5600MHz	40 €	40 €
256GB DDR5 ECC 5600MHz	80 €	80 €
384GB DDR5 ECC 5600MHz	160 €	160 €
512GB DDR5 ECC 5600MHz	240 €	240 €
768GB DDR5 ECC 5600MHz	Included	400 €
1024GB DDR5 ECC 5600MHz	560 €	560 €
1.5TB DDR5 ECC 5600MHz	880 €	880 €
3TB DDR5 ECC 5600MHz	1840 €	1840 €
Storage
2x SSD NVMe 1.92TB Datacenter Class PCIe 5.0 Soft RAID	Included	Included
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	38 €	38 €
2x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	100 €	100 €
2x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	160 €	160 €
SCALE-i
SCALE-i1 – 2024 – Intel Xeon Gold 6426Y	349.99 €	369.99 €
RAM
128GB DDR5 ECC 4800MHz	Included	Included
256GB DDR5 ECC 4800MHz	80 €	200 €
512GB DDR5 ECC 4800MHz	240 €	440 €
1TB DDR5 ECC 4800MHz	560 €	1000 €
Storage
2x SSD NVMe 960GB Datacenter Class Soft RAID	Included	Included
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	52 €	70 €
2x SSD NVMe 3.84TB Datacenter Class Soft RAID	104 €	118 €
4x SSD NVMe 1.92TB Datacenter Class Soft RAID	104 €	140 €
6x SSD NVMe 1.92TB Enterprise Class Soft RAID	156 €	210 €
2x SSD NVMe 7.68TB Enterprise Class Soft RAID	208 €	229 €
4x SSD NVMe 3.84TB Datacenter Class Soft RAID	208 €	236 €
6x SSD NVMe 3.84TB Enterprise Class Soft RAID	312 €	354 €
4x SSD NVMe 7.68TB Enterprise Class Soft RAID	416 €	458 €
6x SSD NVMe 7.68TB Enterprise Class Soft RAID	624 €	687 €
SCALE-i2 – 2024 – Intel Xeon Gold 6442Y	379.99 €	389.99 €
RAM
128GB DDR5 ECC 4800MHz	Included	Included
256GB DDR5 ECC 4800MHz	80 €	200 €
512GB DDR5 ECC 4800MHz	240 €	440 €
1TB DDR5 ECC 4800MHz	560 €	1000 €
Storage
2x SSD NVMe 960GB Datacenter Class Soft RAID	Included	Included
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	52 €	70 €
2x SSD NVMe 3.84TB Datacenter Class Soft RAID	104 €	118 €
4x SSD NVMe 1.92TB Datacenter Class Soft RAID	104 €	140 €
6x SSD NVMe 1.92TB Enterprise Class Soft RAID	156 €	210 €
2x SSD NVMe 7.68TB Enterprise Class Soft RAID	208 €	229 €
4x SSD NVMe 3.84TB Datacenter Class Soft RAID	208 €	236 €
6x SSD NVMe 3.84TB Enterprise Class Soft RAID	312 €	354 €
4x SSD NVMe 7.68TB Enterprise Class Soft RAID	416 €	458 €
6x SSD NVMe 7.68TB Enterprise Class Soft RAID	624 €	687 €
SCALE-i3 – 2024 – Intel Xeon Gold 6438M	409.99 €	449.99 €
RAM
128GB DDR5 ECC 4800MHz	Included	Included
256GB DDR5 ECC 4800MHz	80 €	200 €
512GB DDR5 ECC 4800MHz	240 €	440 €
1TB DDR5 ECC 4800MHz	560 €	1000 €
Storage
2x SSD NVMe 960GB Datacenter Class Soft RAID	Included	Included
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	52 €	70 €
2x SSD NVMe 3.84TB Datacenter Class Soft RAID	104 €	118 €
4x SSD NVMe 1.92TB Datacenter Class Soft RAID	104 €	140 €
6x SSD NVMe 1.92TB Enterprise Class Soft RAID	156 €	210 €
2x SSD NVMe 7.68TB Enterprise Class Soft RAID	208 €	229 €
4x SSD NVMe 3.84TB Datacenter Class Soft RAID	208 €	236 €
6x SSD NVMe 3.84TB Enterprise Class Soft RAID	312 €	354 €
4x SSD NVMe 7.68TB Enterprise Class Soft RAID	416 €	458 €
6x SSD NVMe 7.68TB Enterprise Class Soft RAID	624 €	687 €
SCALE-GPU
SCALE-GPU-1 – 2024 – AMD EPYC GENOA 9354	969.99 €	969.99 €
RAM
192GB DDR5 ECC 4800MHz	Included	Included
384GB DDR5 ECC 4800MHz	120 €	120 €
768GB DDR5 ECC 4800MHz	240 €	240 €
RAM 1,1TB DDR5 ECC 4800MHz	420 €	420 €
Storage
2x SSD NVMe 960GB Datacenter Class Soft RAID	Included	Included
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	52 €	52 €
2x SSD NVMe 3.84TB Datacenter Class Soft RAID	104 €	104 €
2x SSD NVMe 7.68TB Enterprise Class Soft RAID	208 €	208 €
SCALE-GPU-2 – 2024 – AMD EPYC GENOA 9454	999.99 €	999.99 €
RAM
192GB DDR5 ECC 4800MHz	Included	Included
384GB DDR5 ECC 4800MHz	120 €	120 €
768GB DDR5 ECC 4800MHz	240 €	240 €
RAM 1,1TB DDR5 ECC 4800MHz	420 €	420 €
Storage
2x SSD NVMe 960GB Datacenter Class Soft RAID	Included	Included
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	52 €	52 €
2x SSD NVMe 3.84TB Datacenter Class Soft RAID	104 €	104 €
2x SSD NVMe 7.68TB Enterprise Class Soft RAID	208 €	208 €
SCALE-GPU-3 – 2024 – AMD EPYC GENOA 9554	1029.99 €	1029.99 €
RAM
192GB DDR5 ECC 4800MHz	Included	Included
384GB DDR5 ECC 4800MHz	120 €	120 €
768GB DDR5 ECC 4800MHz	240 €	240 €
RAM 1,1TB DDR5 ECC 4800MHz	420 €	420 €
Storage
2x SSD NVMe 960GB Datacenter Class Soft RAID	Included	Included
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	52 €	52 €
2x SSD NVMe 3.84TB Datacenter Class Soft RAID	104 €	104 €
2x SSD NVMe 7.68TB Enterprise Class Soft RAID	208 €	208 €
HGR
HGR-AI-2 – 2024 – DUAL AMD EPYC 9354	2969.99 €	2969.99 €
RAM
384GB DDR5 ECC 4800MHz	Included	Included
512GB DDR5 ECC 4800MHz	64 €	74 €
768GB DDR5 ECC 4800MHz	400 €	360 €
RAM 2304GB DDR5 ECC 4800MHz	960 €	2208 €
Storage
2x SSD NVMe 3.84TB Datacenter Class Soft RAID	Included	Included
4x SSD NVMe 3.84TB Datacenter Class Soft RAID	88 €	118 €
2x SSD NVMe 7.68TB Enterprise Class Soft RAID	150 €	165 €
4x SSD NVMe 7.68TB Enterprise Class Soft RAID	300 €	330 €
2x SSD NVMe 15.36TB Enterprise Class Soft RAID	308 €	339 €
4x SSD NVMe 15.36TB Enterprise Class Soft RAID	616 €	680 €
HGR-HCI-a1 – 2024 – DUAL AMD EPYC 9254	999.99 €	1119.99 €
RAM
256GB DDR5 ECC 4800MHz	Included	Included
512GB DDR5 ECC 4800MHz	128 €	240 €
1TB DDR5 ECC 4800MHz	384 €	800 €
1.5TB DDR5 ECC 4800MHz	512 €	1472 €
RAM 2304GB DDR5 ECC 4800MHz	1024 €	2408 €
Storage
6x SSD NVMe 3.84TB Enterprise Class Soft RAID	Included	Included
12x SSD NVMe 3.84TB Enterprise Class Soft RAID	264 €	354 €
18x SSD NVMe 3.84TB Enterprise Class Soft RAID	528 €	708 €
24x SSD NVMe 3.84TB Enterprise Class Soft RAID	792 €	1062 €
HGR-HCI-a2 – 2024 – DUAL AMD EPYC 9354	1139.99 €	1274.99 €
RAM
384GB DDR5 ECC 4800MHz	Included	Included
384GB DDR5 ECC 4800MHz	Included	Included
512GB DDR5 ECC 4800MHz	64 €	74 €
768GB DDR5 ECC 4800MHz	400 €	360 €
1TB DDR5 ECC 4800MHz	320 €	600 €
1TB DDR5 ECC 4800MHz	320 €	600 €
1.5TB DDR5 ECC 4800MHz	384 €	1272 €
1.5TB DDR5 ECC 4800MHz	384 €	1272 €
RAM 2304GB DDR5 ECC 4800MHz	960 €	2208 €
Storage
6x SSD NVMe 3.84TB Enterprise Class Soft RAID	Included	Included
12x SSD NVMe 3.84TB Enterprise Class Soft RAID	264 €	354 €
18x SSD NVMe 3.84TB Enterprise Class Soft RAID	528 €	708 €
24x SSD NVMe 3.84TB Enterprise Class Soft RAID	792 €	1062 €
HGR-HCI-i1 – 2024 – DUAL INTEL XEON GOLD 5515+	849.99 €	949.99 €
RAM
256GB DDR5 ECC 4800MHz	Included	Included
512GB DDR5 ECC 4800MHz	128 €	240 €
1TB DDR5 ECC 4800MHz	384 €	800 €
1.5TB DDR5 ECC 4800MHz	512 €	1472 €
Storage
2x SSD NVMe 960GB Datacenter Class Soft RAID	Included	Included
6x SSD NVMe 3.84TB Enterprise Class Soft RAID	Included	Included
12x SSD NVMe 3.84TB Enterprise Class Soft RAID	264 €	354 €
18x SSD NVMe 3.84TB Enterprise Class Soft RAID	528 €	708 €
24x SSD NVMe 3.84TB Enterprise Class Soft RAID	792 €	1062 €
HGR-HCI-i2 – 2024 – DUAL INTEL XEON GOLD 6526Y	929.99 €	1039.99 €
RAM
256GB DDR5 ECC 4800MHz	Included	Included
512GB DDR5 ECC 4800MHz	128 €	240 €
Storage
6x SSD NVMe 3.84TB Enterprise Class Soft RAID	Included	Included
12x SSD NVMe 3.84TB Enterprise Class Soft RAID	264 €	354 €
18x SSD NVMe 3.84TB Enterprise Class Soft RAID	528 €	708 €
24x SSD NVMe 3.84TB Enterprise Class Soft RAID	792 €	1062 €
HGR-HCI-i3 – 2024 – DUAL INTEL XEON GOLD 6542Y	999.99 €	1119.99 €
RAM
256GB DDR5 ECC 4800MHz	Included	Included
512GB DDR5 ECC 4800MHz	128 €	240 €
Storage
6x SSD NVMe 3.84TB Enterprise Class Soft RAID	Included	Included
12x SSD NVMe 3.84TB Enterprise Class Soft RAID	264 €	354 €
18x SSD NVMe 3.84TB Enterprise Class Soft RAID	528 €	708 €
24x SSD NVMe 3.84TB Enterprise Class Soft RAID	792 €	1062 €
HGR-HCI-i4 – 2024 – DUAL INTEL XEON GOLD 6554S	1079.99 €	1209.99 €
RAM
256GB DDR5 ECC 4800MHz	Included	Included
512GB DDR5 ECC 4800MHz	128 €	240 €
Storage
6x SSD NVMe 3.84TB Enterprise Class Soft RAID	Included	Included
12x SSD NVMe 3.84TB Enterprise Class Soft RAID	264 €	354 €
18x SSD NVMe 3.84TB Enterprise Class Soft RAID	528 €	708 €
24x SSD NVMe 3.84TB Enterprise Class Soft RAID	792 €	1062 €
HGR-SAP-1 – 2024 – DUAL INTEL XEON GOLD 6226R	1011.99 €	1254.99 €
RAM
192GB DDR4 ECC 2933MHz	Included	Included
384GB DDR4 ECC 2933MHz	96 €	216 €
Storage
6x SSD SAS 3.84TB Enterprise Class Hard RAID	Included	Included
2x SSD SATA 480GB	Included	Included
12x SSD SAS 3.84TB Enterprise Class Hard RAID	264 €	354 €
24x SSD SAS 3.84TB Enterprise Class Hard RAID	792 €	1062 €
HGR-SAP-2 – 2024 – DUAL INTEL XEON GOLD 6242R	1121.99 €	1391.99 €
RAM
384GB DDR4 ECC 2933MHz	Included	Included
RAM 768GB DDR4 ECC 2933MHz	192 €	312 €
Storage
6x SSD SAS 3.84TB Enterprise Class Hard RAID	Included	Included
2x SSD SATA 480GB	Included	Included
12x SSD SAS 3.84TB Enterprise Class Hard RAID	264 €	354 €
24x SSD SAS 3.84TB Enterprise Class Hard RAID	792 €	1062 €
HGR-SAP-3 – 2024 – DUAL INTEL XEON GOLD 6248R	1231.99 €	1527.99 €
RAM
RAM 768GB DDR4 ECC 2933MHz	192 €	Included
RAM 1.5TB DDR4 ECC 2933MHz	384 €	1032 €
Storage
6x SSD SAS 3.84TB Enterprise Class Hard RAID	Included	Included
2x SSD SATA 480GB	Included	Included
12x SSD SAS 3.84TB Enterprise Class Hard RAID	264 €	354 €
24x SSD SAS 3.84TB Enterprise Class Hard RAID	792 €	1062 €
HGR-SDS-1 – 2024 – DUAL INTEL XEON GOLD 5515+	999.99 €	1119.99 €
Storage
6x SSD NVMe 7.68TB Enterprise Class Soft RAID	Included	Included
6x SSD NVMe 15.36TB Enterprise Class Soft RAID	475 €	546 €
12x SSD NVMe 7.68TB Enterprise Class Soft RAID	450 €	630 €
18x SSD NVMe 7.68TB Enterprise Class Soft RAID	900 €	1260 €
12x SSD NVMe 15.36TB Enterprise Class Soft RAID	1399 €	1722 €
24x SSD NVMe 7.68TB Enterprise Class Soft RAID	1350 €	1890 €
18x SSD NVMe 15.36TB Enterprise Class Soft RAID	2323 €	2898 €
24x SSD NVMe 15.36TB Enterprise Class Soft RAID	3247 €	4074 €
HGR-SDS-2 – 2024 – DUAL INTEL XEON GOLD 6542Y	1149.99 €	1289.99 €
Storage
6x SSD NVMe 7.68TB Enterprise Class Soft RAID	Included	Included
6x SSD NVMe 15.36TB Enterprise Class Soft RAID	475 €	546 €
12x SSD NVMe 7.68TB Enterprise Class Soft RAID	450 €	630 €
18x SSD NVMe 7.68TB Enterprise Class Soft RAID	900 €	1260 €
12x SSD NVMe 15.36TB Enterprise Class Soft RAID	1399 €	1722 €
24x SSD NVMe 7.68TB Enterprise Class Soft RAID	1350 €	1890 €
18x SSD NVMe 15.36TB Enterprise Class Soft RAID	2323 €	2898 €
24x SSD NVMe 15.36TB Enterprise Class Soft RAID	3247 €	4074 €
HGR-STOR-1 – 2024 – INTEL XEON GOLD 6554S	1199.99 €	1399.99 €
RAM
128GB DDR5 ECC 4800MHz	Included	Included
256GB DDR5 ECC 4800MHz	64 €	200 €
512GB DDR5 ECC 4800MHz	192 €	440 €
768GB DDR5 ECC 4800MHz	400 €	760 €
768GB DDR5 ECC 4800MHz	320 €	760 €
Storage
24x HDD SAS 22TB Enterprise Class Soft RAID	Included	Included
24x HDD SAS 22TB + 2x SSD NVMe 3.84TB High perf. cache Enterprise Class Soft RAID	88 €	118 €
24x HDD SAS 22TB Enterprise Class Hard RAID	66 €	120 €
24x HDD SAS 22TB + 2x SSD NVMe 7.68TB High perf. cache Enterprise Class Soft RAID	150 €	210 €
24x HDD SAS 22TB + 2x SSD NVMe 3.84TB High perf. cache Enterprise Class Hard RAID	154 €	238 €
24x HDD SAS 22TB + 2x SSD NVMe 7.68TB High perf. cache Enterprise Class Hard RAID	216 €	330 €
24x HDD SAS 22TB + 2x SSD NVMe 15.36TB High perf. cache Enterprise Class Soft RAID	308 €	392 €
24x HDD SAS 22TB + 2x SSD NVMe 15.36TB High perf. cache Enterprise Class Hard RAID	374 €	512 €
36x HDD SAS 22TB Enterprise Class Soft RAID	384 €	516 €
36x HDD SAS 22TB + 2x SSD NVMe 3.84TB High perf. cache Enterprise Class Soft RAID	472 €	634 €
36x HDD SAS 22TB Enterprise Class Hard RAID	450 €	696 €
36x HDD SAS 22TB + 2x SSD NVMe 7.68TB High perf. cache Enterprise Class Soft RAID	534 €	726 €
36x HDD SAS 22TB + 2x SSD NVMe 3.84TB High perf. cache Enterprise Class Hard RAID	538 €	814 €
36x HDD SAS 22TB + 2x SSD NVMe 7.68TB High perf. cache Enterprise Class Hard RAID	600 €	906 €
36x HDD SAS 22TB + 2x SSD NVMe 15.36TB High perf. cache Enterprise Class Soft RAID	692 €	908 €
36x HDD SAS 22TB + 2x SSD NVMe 15.36TB High perf. cache Enterprise Class Hard RAID	758 €	1088 €

IPs

Reference	Old public price (Excl. VAT / Month)	New Public Price (Excl. VAT / Month)
Additional IPv4	1.50 €	2.00 €

VPS

Family	Reference	Commit	Old public price (Excl. VAT / Month)	New Public Price (Excl. VAT / Month)
VPS 2026	VPS-1	Monthly	4.49 €	6.49 €
VPS 2026	VPS-2	Monthly	6.99 €	9.99 €
VPS 2026	VPS-3	Monthly	13.99 €	19.99 €
VPS 2026	VPS-4	Monthly	24.99 €	36.99 €
VPS 2026	VPS-5	Monthly	36.99 €	54.99 €
VPS 2026	VPS-6	Monthly	48.99 €	72.99 €
VPS 2026	VPSLZ-1	Monthly	5.49 €	7.49 €
VPS 2026	VPS-1	prepayment 6 months	25.56 €	36.99 €
VPS 2026	VPS-2	prepayment 6 months	39.84 €	56.94 €
VPS 2026	VPS-3	prepayment 6 months	79.74 €	113.94 €
VPS 2026	VPS-4	prepayment 6 months	142.44 €	210.84 €
VPS 2026	VPS-5	prepayment 6 months	210.84 €	313.44 €
VPS 2026	VPS-6	prepayment 6 months	279.24 €	416.04 €
VPS 2026	VPSLZ-1	prepayment 6 months	31.26 €	42.69 €
VPS 2026	VPS-1	prepayment 12 months	45.72 €	66.19 €
VPS 2026	VPS-2	prepayment 12 months	71.28 €	101.89 €
VPS 2026	VPS-3	prepayment 12 months	142.68 €	203.89 €
VPS 2026	VPS-4	prepayment 12 months	254.88 €	377.29 €
VPS 2026	VPS-5	prepayment 12 months	377.28 €	560.89 €
VPS 2026	VPS-6	prepayment 12 months	499.68 €	744.49 €
VPS 2026	VPSLZ-1	prepayment 12 months	55.92 €	76.39 €

Évolutions tarifaires de Public Cloud, Bare Metal et VPS chez OVHcloud

Octave Klaba — Thu, 05 Mar 2026 12:59:14 +0000

Depuis l’automne 2025, le marché mondial de la mémoire subit une rupture majeure. Encore peu perceptible pour les utilisateurs finaux, cette évolution transforme en profondeur le coût du matériel informatique et, par effet direct, celui du cloud.

Cet article propose un décryptage de cette crise structurelle, de ses impacts concrets et des choix stratégiques qu’OVHcloud met en œuvre pour en limiter les effets.

Un basculement industriel vers les GPU

À l’échelle mondiale, les trois grands fabricants de mémoire ont réorienté une part importante de leurs capacités de production pour répondre à la demande massive en GPU, en particulier pour les usages liés à l’IA et au calcul haute performance.

Cette réallocation s’est effectuée sans réduction équivalente de la demande historique en mémoire vive et en stockage, générant une pression simultanée sur plusieurs segments du marché.

Les conséquences sont immédiates et visibles :

tension sur l’offre, avec des stocks réduits et des délais d’approvisionnement allongés ;
hausse continue des prix de la RAM et des disques depuis septembre 2025 ;
instabilité durable du marché, qui ne devrait retrouver un nouvel équilibre qu’à l’horizon fin 2026.

Une inflation durable des composants mémoire

Même après la stabilisation du marché, les prix ne devraient pas retrouver leurs niveaux historiques avant 2028, le temps nécessaire pour que de nouvelles capacités de production soient réellement opérationnelles.

Cette évolution bouleverse profondément les fondamentaux économiques du matériel informatique, tant pour les infrastructures on-premise que pour le cloud. Selon les configurations, l’impact tarifaire lié à la RAM et au stockage pourrait atteindre +15 % à +300 % par rapport aux prix de 2025, en fonction des volumes de mémoire et de capacité disque déployés.

Ce changement d’échelle est à la fois brutal et inédit, sans équivalent récent sur le marché mondial.

Un marché sous tension même à prix élevé

Paradoxalement, la hausse des prix ne suffit pas à sécuriser la disponibilité des composants. Aujourd’hui, pour garantir la livraison de volumes de RAM ou de disques, il est nécessaire pour les fournisseurs de cloud de passer commande jusqu’à 12 mois à l’avance, sans connaître le prix final au moment de l’achat.

En pratique, les tarifs ne sont communiqués qu’un à deux mois après la livraison, selon l’évolution de l’offre et de la demande sur le trimestre concerné. Cette incertitude exerce une pression inédite sur les acteurs industriels et les fournisseurs de cloud, affectant simultanément la production et la distribution.

Vers un nouvel équilibre mondial de la demande

Cette situation aura inévitablement des répercussions sur les volumes commandés. Certains clients jugeront les prix trop élevés et limiteront leurs investissements, tandis que d’autres, faute d’alternative, continueront à passer commande malgré tout.

Ce jeu de forces opposées devrait conduire à un nouvel équilibre mondial, mais à un niveau de prix nettement supérieur. Les projections actuelles anticipent une augmentation de la RAM de +250 % à +300 % à la fin 2026, par rapport à septembre 2025.

Notre stratégie pour amortir le choc

Face à cette réalité, OVHcloud a choisi de ne pas répercuter automatiquement l’intégralité de la hausse des composants sur ses clients.

Pour le cloud déployé entre 2026 et 2028 — incluant le Public Cloud, le Private Cloud et le Bare Metal — l’augmentation moyenne des prix sera limitée, entre +9 % et +11 %, malgré des coûts de RAM et de disques nettement plus élevés.

Pour compenser cet écart, un ajustement modéré est prévu sur les offres déployées avant 2025, de +2 % à +6 %, en fonction de l’ancienneté du matériel, ainsi qu’une évolution des tarifs des IPv4. Cette dernière ne devrait pas avoir d’impact significatif sur le budget de nos clients, le coût des adresses IP représentant une part limitée par rapport aux autres ressources d’un projet cloud.

Notre objectif est clair : préserver une cohérence tarifaire sur l’ensemble des gammes 2021-2028 et préparer un retour progressif à la normale en 2029.

Investissements continus et évolution des offres

Au-delà des ajustements tarifaires, cette période se caractérise par des investissements soutenus dans nos offres et dans l’expérience client.

Malgré la forte pression liée à l’augmentation des coûts des composants, nous continuons à faire évoluer nos services afin d’apporter davantage de valeur à nos clients.

Concrètement, cela se traduit par :

un renforcement progressif des dispositifs de support ;
une augmentation des ressources incluses dans certaines gammes ;
une modernisation de nos infrastructures de calcul et de stockage.

Ces initiatives témoignent de notre volonté de ne pas réduire cette phase à une simple répercussion des hausses de coûts, mais de maintenir une dynamique d’amélioration de nos services, même dans un contexte économique contraint.

Calendrier et modalités d’application

Nos clients ont déjà reçu des emails détaillant les impacts précis sur leurs services. Les nouveaux tarifs seront appliqués à compter du 1^er avril 2026.

Jusqu’à cette date, il est possible de renouveler les services aux tarifs actuels pour une durée pouvant aller jusqu’à 2 ans. Dans tous les cas, les nouveaux prix ne s’appliqueront qu’à l’issue de la période d’engagement en cours.

Une période d’incertitude et un avantage stratégique

Nous traversons une phase exceptionnellement imprévisible, où la visibilité sur les marchés dépasse rarement une à deux semaines. L’espoir demeure que les prix se stabilisent durablement dès 2026, afin d’éviter de nouvelles annonces défavorables.

Dans ce contexte tendu, disposer d’une chaîne d’approvisionnement mondiale et de deux usines de production internes constitue un avantage stratégique majeur. Cela nous permet de continuer à recevoir des composants et à produire des serveurs, là où la pénurie de mémoire touche une grande partie du marché.

Nos tarifs

Vous trouverez ci-dessous nos nouveaux tarifs :
– Public Cloud : les prix ci-dessous sont affichés à l’heure et avec OS Linux. Vous trouverez sur notre page Tarifs les prix des instances de machines virtuelle consommées au mois (b2, c2, r2) ou en Savings Plan (b3, c3, r3) ainsi que les tarifs avec licences Windows.
– Tous nos tarifs VPS, Floating IPs & IP additionnelles.
– Bare Metal : les prix affichés correspondent à un engagement d’un mois ; des remises supplémentaires sont appliquées en cas de prépaiement sur 12 ou 24 mois. Les prix des options sont ceux des nouvelles commandes uniquement. Le renouvellement d’options qui a été communiqué par email à nos clients sera quant à lui limité à +10% sur les options de disques, +15% sur les options de RAM.

Pour toutes les souscriptions existantes renouvelées avant le 1er avril, vous pouvez conserver votre tarif actuel pendant toute la durée d’engagement choisie, à compter de votre date de renouvellement.

Veuillez noter que les catégories de produits suivantes ne sont pas concernées par l’évolution de nos tarifs :
– Public Cloud – Compute : Cloud GPUs et Metal Instances
– Public Cloud – Container : Managed Kubernetes, Managed Registries & Managed Rancher
– Public Cloud – Network : Load Balancer, Gateway. Le trafic réseau Public et Privé reste inclus.
– Public Cloud – Storage : Object Storage, Block Storage.
– Public Cloud – Analytics : Data Platform
– Public Cloud – AI & Machine Learning : AI Solutions (AI Notebook, AI Training, AI Deploy) et AI Endpoints
– Public Cloud – Quantum : Emulators & QPUs
– Bare Metal – stockage : Veeam Enterprise Plus, HYCU, Back-up Agent, NAS-HA, Cloud Disk Array
– Bare Metal : Gammes Kimsufi et SoYouStart
– Private Cloud : offres VMware et offres de stockage (Veeam Enterprise plus, HYCU, Back-up Agent)

Tableaux des prix

Public Cloud – Instances de Machines Virtuelles

General Purpose

Ci-dessous, les tarifs standards horaires et mensuels pour les instances avec OS Linux, sans Savings Plan ni autre remise supplémentaire.

Reference	Ancien prix public (HT / heure)	Nouveau prix public (HT / heure)
b3-8	0,0465 €	0,0512 €
b3-16	0,093 €	0,1023 €
b3-32	0,186 €	0,2046 €
b3-64	0,372 €	0,4092 €
b3-128	0,7439 €	0,819 €
b3-256	1,4878 €	1,637 €
b3-512	2,9756 €	3,274 €
b3-640	3,7195 €	4,092 €
b2-7	0,0681 €	0,0709 €
b2-15	0,129 €	0,1342 €
b2-30	0,261 €	0,2715 €
b2-60	0,505 €	0,526 €
b2-120	0,993 €	1,033 €

Compute Optimized

Ci-dessous, les tarifs standards horaires et mensuels pour les instances avec OS Linux, sans Savings Plan ni autre remise supplémentaire.

Reference	Ancien prix public (HT / heure)	Nouveau prix public (HT / heure)
c3-4	0,0415 €	0,0457 €
c3-8	0,083 €	0,0913 €
c3-16	0,1659 €	0,1825 €
c3-32	0,3318 €	0,365 €
c3-64	0,6637 €	0,7301 €
c3-128	1,3274 €	1,461 €
c3-256	2,6547 €	2,921 €
c3-320	3,3184 €	3,651 €
c2-7	0,0978 €	0,1018 €
c2-15	0,19 €	0,1976 €
c2-30	0,383 €	0,3984 €
c2-60	0,749 €	0,779 €
c2-120	1,48 €	1,54 €

Memory Optimized

Ci-dessous, les tarifs standards horaires et mensuels pour les instances avec OS Linux, sans Savings Plan ni autre remise supplémentaire.

Reference	Ancien prix public (HT / heure)	Nouveau prix public (HT / heure)
r3-16	0,0602 €	0,0663 €
r3-32	0,1203 €	0,1324 €
r3-64	0,2407 €	0,2648 €
r3-128	0,4813 €	0,53 €
r3-256	0,9627 €	1,059 €
r3-512	1,9254 €	2,118 €
r3-1024	3,8508 €	4,236 €
r2-15	0,0978 €	0,1018 €
r2-30	0,113 €	0,1176 €
r2-60	0,22 €	0,2288 €
r2-120	0,443 €	0,461 €
r2-240	0,871 €	0,906 €

Public Cloud – Databases

MySQL

Reference	Ancien prix public (HT / heure / node)	Nouveau prix public (HT / heure / node)	Nouveau prix public (HT / heure)
Essential DB1-4	0,068 €	0,0746 €	0,0746 €
Essential DB1-7	0,1346 €	0,1477 €	0,1477 €
Essential DB1-15	0,2705 €	0,2968 €	0,2968 €
Essential DB1-30	0,5436 €	0,5967 €	0,5967 €
Production B3-8	0,2129 €	0,223 €	0,446 €
Production B3-16	0,4258 €	0,4461 €	0,8922 €
Production B3-32	0,8515 €	0,8922 €	1,7844 €
Production B3-64	1,703 €	1,7844 €	3,5688 €
Production B3-128	3,4059 €	3,5688 €	7,1376 €
Production B3-256	6,8118 €	7,1377 €	14,2754 €
Business DB1-4	0,0865 €	0,0949 €	0,1898 €
Business DB1-7	0,173 €	0,1899 €	0,3798 €
Business DB1-15	0,346 €	0,3797 €	0,7594 €
Business DB1-30	0,6933 €	0,761 €	1,522 €
Business DB1-60	1,3878 €	1,5234 €	3,0468 €
Business DB1-120	2,777 €	3,0484 €	6,0968 €
Advanced B3-8	0,2295 €	0,2404 €	0,7212 €
Advanced B3-16	0,4589 €	0,4808 €	1,4424 €
Advanced B3-32	0,9177 €	0,9616 €	2,8848 €
Advanced B3-64	1,8354 €	1,9232 €	5,7696 €
Advanced B3-128	3,6708 €	3,8464 €	11,5392 €
Advanced B3-256	7,3416 €	7,6928 €	23,0784 €
Enterprise DB1-4	0,0879 €	0,0964 €	0,2892 €
Enterprise DB1-7	0,173 €	0,1899 €	0,5697 €
Enterprise DB1-15	0,346 €	0,3797 €	1,1391 €
Enterprise DB1-30	0,6933 €	0,761 €	2,283 €
Enterprise DB1-60	1,3878 €	1,5234 €	4,5702 €
Enterprise DB1-120	2,777 €	3,0484 €	9,1452 €

PostgreSQL

Reference	Ancien prix public (HT / heure / node)	Nouveau prix public (HT / heure / node)	Nouveau prix public (HT / heure)
Essential DB1-4	0,068 €	0,0746 €	0,0746 €
Essential DB1-7	0,1346 €	0,1477 €	0,1477 €
Essential DB1-15	0,2705 €	0,2968 €	0,2968 €
Essential DB1-30	0,5436 €	0,5967 €	0,5967 €
Production B3-8	0,2129 €	0,223 €	0,446 €
Production B3-16	0,4258 €	0,4461 €	0,8922 €
Production B3-32	0,8515 €	0,8922 €	1,7844 €
Production B3-64	1,703 €	1,7844 €	3,5688 €
Production B3-128	3,4059 €	3,5688 €	7,1376 €
Production B3-256	6,8118 €	7,1377 €	14,2754 €
Business DB1-4	0,0865 €	0,0949 €	0,1898 €
Business DB1-7	0,173 €	0,1899 €	0,3798 €
Business DB1-15	0,346 €	0,3797 €	0,7594 €
Business DB1-30	0,6933 €	0,761 €	1,522 €
Business DB1-60	1,3878 €	1,5234 €	3,0468 €
Business DB1-120	2,777 €	3,0484 €	6,0968 €
Advanced B3-8	0,2295 €	0,2404 €	0,7212 €
Advanced B3-16	0,4589 €	0,4808 €	1,4424 €
Advanced B3-32	0,9177 €	0,9616 €	2,8848 €
Advanced B3-64	1,8354 €	1,9232 €	5,7696 €
Advanced B3-128	3,6708 €	3,8464 €	11,5392 €
Advanced B3-256	7,3416 €	7,6928 €	23,0784 €
Enterprise DB1-4	0,0879 €	0,0964 €	0,2892 €
Enterprise DB1-7	0,173 €	0,1899 €	0,5697 €
Enterprise DB1-15	0,346 €	0,3797 €	1,1391 €
Enterprise DB1-30	0,6933 €	0,761 €	2,283 €
Enterprise DB1-60	1,3878 €	1,5234 €	4,5702 €
Enterprise DB1-120	2,777 €	3,0484 €	9,1452 €

Valkey

Reference	Ancien prix public (HT / heure / node)	Nouveau prix public (HT / heure / node)	Nouveau prix public (HT / heure)
Essential DB1-4	0,0591 €	0,0648 €	0,0648 €
Essential DB1-7	0,1195 €	0,1311 €	0,1311 €
Production B3-8	0,1409 €	0,1476 €	0,2952 €
Production B3-16	0,3147 €	0,3297 €	0,6594 €
Production B3-32	0,6295 €	0,6595 €	1,319 €
Production B3-64	1,2588 €	1,319 €	2,638 €
Production B3-128	2,5175 €	2,6379 €	5,2758 €
Production B3-256	5,0349 €	5,2757 €	10,5514 €
Business DB1-4	0,068 €	0,0746 €	0,1492 €
Business DB1-7	0,151 €	0,1658 €	0,3316 €
Business DB1-15	0,2252 €	0,2471 €	0,4942 €
Business DB1-30	0,4448 €	0,4882 €	0,9764 €
Business DB1-60	0,8895 €	0,9764 €	1,9528 €
Business DB1-120	1,7736 €	1,9468 €	3,8936 €

Kafka

Reference	Ancien prix public (HT / heure / node)	Nouveau prix public (HT / heure / node)	Nouveau prix public (HT / heure)
Production B3-8	0,2656 €	0,2782 €	0,8346 €
Production B3-16	0,5311 €	0,5565 €	1,6695 €
Production B3-32	1,0622 €	1,113 €	3,339 €
Business DB1-4	0,1469 €	0,1612 €	0,4836 €
Business DB1-7	0,2911 €	0,3195 €	0,9585 €
Business DB1-15	0,5532 €	0,6073 €	1,8219 €
Business DB1-30	1,0707 €	1,1753 €	3,5259 €
Business DB1-60	2,1428 €	2,3522 €	7,0566 €
Advanced B3-8	0,2656 €	0,2782 €	1,6692 €
Advanced B3-16	0,5311 €	0,5565 €	3,339 €
Advanced B3-32	1,0622 €	1,113 €	6,678 €
Enterprise DB1-7	0,2924 €	0,321 €	1,926 €
Enterprise DB1-15	0,5532 €	0,6073 €	3,6438 €
Enterprise DB1-30	1,0707 €	1,1753 €	7,0518 €
Enterprise DB1-60	2,1428 €	2,3522 €	14,1132 €

Kafka Connect

Reference	Ancien prix public (HT / heure / node)	Nouveau prix public (HT / heure / node)	Nouveau prix public (HT / heure)
Essential DB1-4	0,1044 €	0,1145 €	0,1145 €
Essential DB1-7	0,2101 €	0,2305 €	0,2305 €
Essential DB1-15	0,3913 €	0,4295 €	0,4295 €
Essential DB1-30	0,7084 €	0,7775 €	0,7775 €
Production B3-8	0,1917 €	0,2008 €	0,6024 €
Production B3-16	0,3862 €	0,4046 €	1,2138 €
Production B3-32	0,7027 €	0,7363 €	2,2089 €
Business DB1-7	0,2101 €	0,2305 €	0,6915 €
Business DB1-15	0,4022 €	0,4415 €	1,3245 €
Business DB1-30	0,7084 €	0,7775 €	2,3325 €
Advanced B3-8	0,1908 €	0,1999 €	1,1994 €
Advanced B3-16	0,3862 €	0,4046 €	2,4276 €
Advanced B3-32	0,7027 €	0,7363 €	4,4178 €
Enterprise DB1-7	0,2101 €	0,2305 €	1,383 €
Enterprise DB1-15	0,4022 €	0,4415 €	2,649 €
Enterprise DB1-30	0,7084 €	0,7775 €	4,665 €

Kafka Mirror Maker

Reference	Ancien prix public (HT / heure / node)	Nouveau prix public (HT / heure / node)	Nouveau prix public (HT / heure)
Essential DB1-4	0,1044 €	0,1145 €	0,1145 €
Essential DB1-7	0,2101 €	0,2305 €	0,2305 €
Essential DB1-15	0,3913 €	0,4295 €	0,4295 €
Essential DB1-30	0,7084 €	0,7775 €	0,7775 €
Production B3-8	0,1917 €	0,2008 €	0,6024 €
Production B3-16	0,3862 €	0,4046 €	1,2138 €
Production B3-32	0,7027 €	0,7363 €	2,2089 €
Business DB1-4	0,1057 €	0,116 €	0,348 €
Business DB1-7	0,2101 €	0,2305 €	0,6915 €
Business DB1-15	0,4022 €	0,4415 €	1,3245 €
Business DB1-30	0,7084 €	0,7775 €	2,3325 €
Advanced B3-8	0,1908 €	0,1999 €	1,1994 €
Advanced B3-16	0,3862 €	0,4046 €	2,4276 €
Advanced B3-32	0,7027 €	0,7363 €	4,4178 €
Enterprise DB1-7	0,2101 €	0,2305 €	1,383 €
Enterprise DB1-15	0,4022 €	0,4415 €	2,649 €
Enterprise DB1-30	0,7084 €	0,7775 €	4,665 €

Opensearch

Reference	Ancien prix public (HT / heure / node)	Nouveau prix public (HT / heure / node)	Nouveau prix public (HT / heure)
Essential DB1-4	0,0742 €	0,0814 €	0,0814 €
Essential DB1-7	0,1497 €	0,1642 €	0,1642 €
Essential DB1-15	0,3007 €	0,33 €	0,33 €
Production B3-8	0,172 €	0,1801 €	0,5403 €
Production B3-16	0,3439 €	0,3603 €	1,0809 €
Production B3-32	0,6877 €	0,7205 €	2,1615 €
Production B3-64	1,3754 €	1,4411 €	4,3233 €
Business DB1-7	0,1607 €	0,1763 €	0,5289 €
Business DB1-15	0,3213 €	0,3526 €	1,0578 €
Business DB1-30	0,648 €	0,7112 €	2,1336 €
Business DB1-60	1,2972 €	1,424 €	4,272 €
Business DB1-120	2,6013 €	2,8555 €	8,5665 €
Advanced B3-8	0,1839 €	0,1927 €	1,1562 €
Advanced B3-16	0,3678 €	0,3854 €	2,3124 €
Advanced B3-32	0,7357 €	0,7708 €	4,6248 €
Advanced B3-64	1,4713 €	1,5416 €	9,2496 €
Enterprise DB1-7	0,162 €	0,1778 €	1,0668 €
Enterprise DB1-15	0,3254 €	0,3571 €	2,1426 €
Enterprise DB1-30	0,6521 €	0,7158 €	4,2948 €
Enterprise DB1-60	1,3014 €	1,4285 €	8,571 €
Enterprise DB1-120	2,6027 €	2,857 €	17,142 €

Managed Dashboard

Reference	Ancien prix public (HT / heure / node)	Nouveau prix public (HT / heure / node)	Nouveau prix public (HT / heure)
Essential DB1-4	0,0591 €	0,0648 €	0,0648 €
Essential DB1-7	0,1195 €	0,1311 €	0,1311 €
Production B3-8	0,1195 €	0,1251 €	0,1251 €

Floating IPs

Reference	Ancien prix public (HT / heure)	Nouveau prix public (HT / heure)
Floating Ips	0.0025 €	0.0027 €

Bare Metal – Tableaux des prix

Serveurs Dédiés & Options

Ci-dessous, les tarifs mensuels standards des serveurs, sans prépaiement ni remise liée à un engagement. Les prix des options s’appliquent uniquement aux nouvelles commandes. Le renouvellement des options, communiqué à nos clients par e-mail, sera limité à +10 % pour les options de disque et +15 % pour les options de RAM.

	Ancien prix public (HT / mois)	Nouveau prix public (HT / mois)
ADVANCE
ADVANCE-1 – 2024 – AMD EPYC 4244P	84.99 €	89.99 €
RAM
32GB DDR5 On-Die ECC 5200MHz	inclus	inclus
64GB DDR5 On-Die ECC 5200MHz	12 €	18 €
128GB DDR5 On-Die ECC 3600MHz	36 €	58 €
192GB DDR5 On-Die ECC 3600MHz	60 €	78 €
Stockage
2x SSD NVMe 960GB Enterprise Class Soft RAID	inclus	inclus
4x SSD NVMe 960GB Enterprise Class Soft RAID	26 €	42 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 1.92TB Datacenter Class Soft RAID	52 €	70 €
4x SSD NVMe 1.92TB Enterprise Class Soft RAID	78 €	98 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 3.84TB Datacenter Class Soft RAID	104 €	118 €
4x SSD NVMe 3.84TB Enterprise Class Soft RAID	182 €	200 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 7.68TB Datacenter Class Soft RAID	208 €	229 €
4x SSD NVMe 7.68TB Enterprise Class Soft RAID	390 €	429 €
ADVANCE-1 – 2026 – AMD EPYC 4245P	99.99 €	104.99 €
RAM
32GB DDR5 On-Die ECC 5600MHz	inclus	inclus
64GB DDR5 On-Die ECC 5600MHz	22 €	26 €
128GB DDR5 On-Die ECC 3600MHz	44 €	58 €
256GB DDR5 On-Die ECC 3600MHz	63 €	130 €
Stockage
2x SSD NVMe 960GB Datacenter Class Soft RAID	inclus	inclus
4x SSD NVMe 960GB Datacenter Class Soft RAID	21.60 €	42 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 1.92TB Datacenter Class Soft RAID	38 €	70 €
4x SSD NVMe 1.92TB Enterprise Class PCIe 5.0 Soft RAID	54.40 €	98 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 3.84TB Datacenter Class Soft RAID	100 €	118 €
4x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	178.40 €	197 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 7.68TB Datacenter Class Soft RAID	149.20 €	210 €
4x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	298.40 €	378 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 15.36TB Datacenter Class Soft RAID	299.99 €	392 €
ADVANCE-2 – 2024 – AMD EPYC 4344P	119.99 €	124.99 €
RAM
64GB DDR5 On-Die ECC 5200MHz	inclus	inclus
128GB DDR5 On-Die ECC 3600MHz	24 €	40 €
192GB DDR5 On-Die ECC 3600MHz	48 €	60 €
Stockage
2x SSD NVMe 960GB Enterprise Class Soft RAID	inclus	inclus
4x SSD NVMe 960GB Enterprise Class Soft RAID	26 €	42 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 1.92TB Datacenter Class Soft RAID	52 €	70 €
4x SSD NVMe 1.92TB Enterprise Class Soft RAID	78 €	98 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 3.84TB Datacenter Class Soft RAID	104 €	118 €
4x SSD NVMe 3.84TB Enterprise Class Soft RAID	182 €	200 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 7.68TB Datacenter Class Soft RAID	208 €	229 €
4x SSD NVMe 7.68TB Enterprise Class Soft RAID	390 €	429 €
ADVANCE-2 – 2026 – AMD EPYC 4345P	119.99 €	134.99 €
RAM
64GB DDR5 On-Die ECC 5600MHz	inclus	inclus
128GB DDR5 On-Die ECC 3600MHz	22 €	40 €
256GB DDR5 On-Die ECC 3600MHz	52 €	112 €
Stockage
2x SSD NVMe 960GB Datacenter Class Soft RAID	inclus	inclus
4x SSD NVMe 960GB Datacenter Class Soft RAID	21.60 €	42 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 1.92TB Datacenter Class Soft RAID	38 €	70 €
4x SSD NVMe 1.92TB Enterprise Class PCIe 5.0 Soft RAID	54.40 €	98 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 3.84TB Datacenter Class Soft RAID	100 €	118 €
4x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	178.40 €	197 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 7.68TB Datacenter Class Soft RAID	149.20 €	210 €
4x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	298.40 €	378 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 15.36TB Datacenter Class Soft RAID	299.99 €	392 €
ADVANCE-3 – 2024 – AMD EPYC 4464P	149.99 €	169.99 €
RAM
64GB DDR5 On-Die ECC 5200MHz	inclus	inclus
128GB DDR5 On-Die ECC 3600MHz	24 €	40 €
192GB DDR5 On-Die ECC 3600MHz	48 €	60 €
Stockage
2x SSD NVMe 960GB Enterprise Class Soft RAID	inclus	inclus
4x SSD NVMe 960GB Enterprise Class Soft RAID	26 €	42 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 1.92TB Datacenter Class Soft RAID	52 €	70 €
4x SSD NVMe 1.92TB Enterprise Class Soft RAID	78 €	98 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 3.84TB Datacenter Class Soft RAID	104 €	118 €
4x SSD NVMe 3.84TB Enterprise Class Soft RAID	182 €	200 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 7.68TB Datacenter Class Soft RAID	208 €	229 €
4x SSD NVMe 7.68TB Enterprise Class Soft RAID	390 €	429 €
ADVANCE-3 – 2026 – AMD EPYC 4464P	159.99 €	199.99 €
RAM
64GB DDR5 On-Die ECC 5600MHz	inclus	inclus
128GB DDR5 On-Die ECC 3600MHz	22 €	40 €
256GB DDR5 On-Die ECC 3600MHz	52 €	112 €
Stockage
2x SSD NVMe 960GB Datacenter Class Soft RAID	inclus	inclus
4x SSD NVMe 960GB Datacenter Class Soft RAID	21.60 €	42 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 1.92TB Datacenter Class Soft RAID	38 €	70 €
4x SSD NVMe 1.92TB Enterprise Class PCIe 5.0 Soft RAID	54.40 €	98 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 3.84TB Datacenter Class Soft RAID	100 €	118 €
4x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	178.40 €	197 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 7.68TB Datacenter Class Soft RAID	149.20 €	210 €
4x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	298.40 €	378 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 15.36TB Datacenter Class Soft RAID	299.99 €	392 €
ADVANCE-4 – 2024 – AMD EPYC 4584PX	199.99 €	219.99 €
RAM
64GB DDR5 On-Die ECC 5200MHz	inclus	inclus
128GB DDR5 On-Die ECC 3600MHz	24 €	40 €
192GB DDR5 On-Die ECC 3600MHz	48 €	60 €
Stockage
2x SSD NVMe 960GB Enterprise Class Soft RAID	inclus	inclus
4x SSD NVMe 960GB Enterprise Class Soft RAID	26 €	42 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 1.92TB Datacenter Class Soft RAID	52 €	70 €
4x SSD NVMe 1.92TB Enterprise Class Soft RAID	78 €	98 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 3.84TB Datacenter Class Soft RAID	104 €	118 €
4x SSD NVMe 3.84TB Enterprise Class Soft RAID	182 €	200 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 7.68TB Datacenter Class Soft RAID	208 €	229 €
4x SSD NVMe 7.68TB Enterprise Class Soft RAID	390 €	429 €
ADVANCE-4 – 2026 – AMD EPYC 4585PX	199.99 €	239.99 €
RAM
64GB DDR5 On-Die ECC 5600MHz	inclus	inclus
128GB DDR5 On-Die ECC 3600MHz	22 €	40 €
256GB DDR5 On-Die ECC 3600MHz	52 €	112 €
Stockage
2x SSD NVMe 960GB Datacenter Class Soft RAID	inclus	inclus
4x SSD NVMe 960GB Datacenter Class Soft RAID	21.60 €	42 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 1.92TB Datacenter Class Soft RAID	38 €	70 €
4x SSD NVMe 1.92TB Enterprise Class PCIe 5.0 Soft RAID	54.40 €	98 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 3.84TB Datacenter Class Soft RAID	100 €	118 €
4x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	178.40 €	197 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 7.68TB Datacenter Class Soft RAID	149.20 €	210 €
4x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	298.40 €	378 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 15.36TB Datacenter Class Soft RAID	299.99 €	392 €
ADVANCE-5 – 2024 – AMD EPYC 8224P	249.99 €	289.99 €
RAM
96GB DDR5 ECC 4800MHz	inclus	inclus
192GB DDR5 ECC 4800MHz	36 €	90 €
384GB DDR5 ECC 4800MHz	108 €	318 €
576GB DDR5 ECC 4800MHz	180 €	552 €
Stockage
2x SSD NVMe 960GB Enterprise Class Soft RAID	inclus	inclus
4x SSD NVMe 960GB Enterprise Class Soft RAID	26 €	42 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 1.92TB Datacenter Class Soft RAID	52 €	70 €
4x SSD NVMe 1.92TB Enterprise Class Soft RAID	78 €	98 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 3.84TB Datacenter Class Soft RAID	104 €	118 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 4x SSD NVMe 1.92TB Datacenter Class Soft RAID	104 €	140 €
4x SSD NVMe 3.84TB Enterprise Class Soft RAID	182 €	200 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 2x SSD NVMe 7.68TB Datacenter Class Soft RAID	208 €	229 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 4x SSD NVMe 3.84TB Datacenter Class Soft RAID	208 €	236 €
4x SSD NVMe 7.68TB Enterprise Class Soft RAID	390 €	429 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 4x SSD NVMe 7.68TB Datacenter Class Soft RAID	416 €	458 €
2x SSD NVMe 960GB Datacenter Class Soft RAID + 6x SSD NVMe 7.68TB Datacenter Class Soft RAID	624 €	687 €
8x SSD NVMe 7.68TB Enterprise Class Soft RAID	806 €	887 €
ADVANCE-STOR – 2024 – AMD EPYC 4344P	199.99 €	199.99 €
RAM
32GB DDR5 On-Die ECC 5200MHz	inclus	inclus
64GB DDR5 On-Die ECC 5200MHz	12 €	14 €
128GB DDR5 On-Die ECC 3600MHz	36 €	42 €
192GB DDR5 On-Die ECC 3600MHz	60 €	69 €
Stockage
2x HDD SAS 22TB Enterprise Class Soft RAID	inclus	inclus
2x SSD NVMe 960GB Enterprise Class Soft RAID	inclus	inclus
4x HDD SAS 22TB Enterprise Class Soft RAID	64 €	70 €
2x HDD SAS 22TB Enterprise Class Hard RAID	66 €	73 €
6x HDD SAS 22TB Enterprise Class Soft RAID	128 €	141 €
4x HDD SAS 22TB Enterprise Class Hard RAID	130 €	143 €
8x HDD SAS 22TB Enterprise Class Soft RAID	192 €	211 €
6x HDD SAS 22TB Enterprise Class Hard RAID	194 €	213 €
8x HDD SAS 22TB Enterprise Class Hard RAID	258 €	284 €
ADVANCE-STOR – 2026 – AMD EPYC 4345P	199.99 €	229.99 €
RAM
32GB DDR5 On-Die ECC 5600MHz	inclus	inclus
64GB DDR5 On-Die ECC 5600MHz	22 €	25 €
128GB DDR5 On-Die ECC 3600MHz	44 €	58 €
256GB DDR5 On-Die ECC 3600MHz	63 €	130 €
Stockage
2x HDD SAS 24TB Enterprise Class Soft RAID	inclus	inclus
2x SSD NVMe 960GB Datacenter Class PCIe 5.0 Soft RAID	inclus	inclus
2x HDD SAS 24TB Enterprise Class Hard RAID	66 €	73 €
4x HDD SAS 24TB Enterprise Class Soft RAID	64 €	94 €
4x HDD SAS 24TB Enterprise Class Hard RAID	130 €	143 €
6x HDD SAS 24TB Enterprise Class Soft RAID	128 €	188 €
6x HDD SAS 24TB Enterprise Class Hard RAID	194 €	248 €
8x HDD SAS 24TB Enterprise Class Soft RAID	192 €	282 €
8x HDD SAS 24TB Enterprise Class Hard RAID	258 €	362 €
RISE
RISE-L – 2025 – AMD RYZEN 9 9950X	134.99 €	149.99 €
RISE-M – 2025 – AMD RYZEN 9 9900X	94.99 €	99.99 €
RISE-S – 2025 – AMD Ryzen 7 9700X	54.99 €	64.99 €
RISE-XL – 2025 – AMD EPYC TURIN 9455	269.99 €	299.99 €
GAME
GAME-1 – 2026 – AMD RYZEN 7 9800X3D	129.99 €	139.99 €
RAM
64GB DDR5 On-Die ECC 5600MHz	inclus	inclus
128GB DDR5 On-Die ECC 3600MHz	22 €	40 €
256GB DDR5 On-Die ECC 3600MHz	63 €	112 €
Stockage
2x SSD NVMe 960GB Enterprise Class Soft RAID	inclus	inclus
GAME-2 – 2026 – AMD RYZEN 9 9950X3D	169.99 €	179.99 €
RAM
64GB DDR5 On-Die ECC 5600MHz	inclus	inclus
128GB DDR5 On-Die ECC 3600MHz	22 €	40 €
256GB DDR5 On-Die ECC 3600MHz	63 €	112 €
Stockage
2x SSD NVMe 960GB Enterprise Class Soft RAID	inclus	inclus
SCALE-a
SCALE-a1 – 2024 – AMD EPYC GENOA 9124	349.99 €	369.99 €
RAM
128GB DDR5 ECC 4800MHz	inclus	inclus
256GB DDR5 ECC 4800MHz	80 €	200 €
512GB DDR5 ECC 4800MHz	240 €	440 €
1TB DDR5 ECC 4800MHz	560 €	1000 €
Stockage
2x SSD NVMe 960GB Datacenter Class Soft RAID	inclus	inclus
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	52 €	70 €
2x SSD NVMe 3.84TB Datacenter Class Soft RAID	104 €	118 €
4x SSD NVMe 1.92TB Datacenter Class Soft RAID	104 €	140 €
6x SSD NVMe 1.92TB Enterprise Class Soft RAID	156 €	210 €
2x SSD NVMe 7.68TB Enterprise Class Soft RAID	208 €	229 €
4x SSD NVMe 3.84TB Datacenter Class Soft RAID	208 €	236 €
6x SSD NVMe 3.84TB Enterprise Class Soft RAID	312 €	354 €
4x SSD NVMe 7.68TB Enterprise Class Soft RAID	416 €	458 €
6x SSD NVMe 7.68TB Enterprise Class Soft RAID	624 €	687 €
SCALE-a1 – 2026 – AMD EPYC 9135	389.99 €	409.99 €
RAM
128GB DDR5 ECC 4800MHz	inclus	inclus
192GB DDR5 ECC 4800MHz	40 €	100 €
256GB DDR5 ECC 4800MHz	80 €	120 €
384GB DDR5 ECC 4800MHz	160 €	280 €
512GB DDR5 ECC 4800MHz	240 €	400 €
768GB DDR5 ECC 4800MHz	400 €	700 €
1TB DDR5 ECC 4800MHz	560 €	1368 €
1.5TB DDR5 ECC 4800MHz	880 €	2152 €
3TB DDR5 ECC 3600MHz	1840 €	4504 €
Stockage
2x SSD NVMe 1.92TB Datacenter Class PCIe 5.0 Soft RAID	inclus	inclus
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	38 €	70 €
2x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	100 €	118 €
4x SSD NVMe 1.92TB Enterprise Class PCIe 5.0 Soft RAID	76 €	140 €
2x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	160 €	210 €
6x SSD NVMe 1.92TB Enterprise Class PCIe 5.0 Soft RAID	190 €	210 €
4x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	200 €	236 €
6x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	300 €	354 €
4x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	320 €	420 €
6x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	480 €	630 €
6x SSD NVMe 15.36TB Enterprise Class Soft RAID	900 €	1176 €
SCALE-a2 – 2024 – AMD EPYC GENOA 9254	379.99 €	389.99 €
RAM
128GB DDR5 ECC 4800MHz	inclus	inclus
256GB DDR5 ECC 4800MHz	80 €	200 €
512GB DDR5 ECC 4800MHz	240 €	440 €
1TB DDR5 ECC 4800MHz	560 €	1000 €
Stockage
2x SSD NVMe 960GB Datacenter Class Soft RAID	inclus	inclus
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	52 €	70 €
2x SSD NVMe 3.84TB Datacenter Class Soft RAID	104 €	118 €
4x SSD NVMe 1.92TB Datacenter Class Soft RAID	104 €	140 €
6x SSD NVMe 1.92TB Enterprise Class Soft RAID	156 €	210 €
2x SSD NVMe 7.68TB Enterprise Class Soft RAID	208 €	229 €
4x SSD NVMe 3.84TB Datacenter Class Soft RAID	208 €	236 €
6x SSD NVMe 3.84TB Enterprise Class Soft RAID	312 €	354 €
4x SSD NVMe 7.68TB Enterprise Class Soft RAID	416 €	458 €
6x SSD NVMe 7.68TB Enterprise Class Soft RAID	624 €	687 €
SCALE-a2 – 2026 – AMD EPYC 9255	429.99 €	439.99 €
RAM
128GB DDR5 ECC 4800MHz	inclus	inclus
192GB DDR5 ECC 4800MHz	40 €	100 €
256GB DDR5 ECC 4800MHz	80 €	120 €
384GB DDR5 ECC 4800MHz	160 €	280 €
512GB DDR5 ECC 4800MHz	240 €	400 €
768GB DDR5 ECC 4800MHz	400 €	700 €
1TB DDR5 ECC 4800MHz	560 €	1368 €
1.5TB DDR5 ECC 4800MHz	880 €	2152 €
3TB DDR5 ECC 3600MHz	1840 €	4504 €
Stockage
2x SSD NVMe 1.92TB Datacenter Class PCIe 5.0 Soft RAID	inclus	inclus
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	38 €	70 €
2x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	100 €	118 €
4x SSD NVMe 1.92TB Enterprise Class PCIe 5.0 Soft RAID	76 €	140 €
2x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	160 €	210 €
6x SSD NVMe 1.92TB Enterprise Class PCIe 5.0 Soft RAID	190 €	210 €
4x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	200 €	236 €
6x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	300 €	354 €
4x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	320 €	420 €
6x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	480 €	630 €
6x SSD NVMe 15.36TB Enterprise Class Soft RAID	900 €	1176 €
SCALE-a3 – 2024 – AMD EPYC GENOA 9354	419.99 €	449.99 €
RAM
128GB DDR5 ECC 4800MHz	inclus	inclus
256GB DDR5 ECC 4800MHz	80 €	200 €
512GB DDR5 ECC 4800MHz	240 €	440 €
1TB DDR5 ECC 4800MHz	560 €	1000 €
Stockage
2x SSD NVMe 960GB Datacenter Class Soft RAID	inclus	inclus
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	52 €	70 €
2x SSD NVMe 3.84TB Datacenter Class Soft RAID	104 €	118 €
4x SSD NVMe 1.92TB Datacenter Class Soft RAID	104 €	140 €
6x SSD NVMe 1.92TB Enterprise Class Soft RAID	156 €	210 €
2x SSD NVMe 7.68TB Enterprise Class Soft RAID	208 €	229 €
4x SSD NVMe 3.84TB Datacenter Class Soft RAID	208 €	236 €
6x SSD NVMe 3.84TB Enterprise Class Soft RAID	312 €	354 €
4x SSD NVMe 7.68TB Enterprise Class Soft RAID	416 €	458 €
6x SSD NVMe 7.68TB Enterprise Class Soft RAID	624 €	687 €
SCALE-a3 – 2026 – AMD EPYC 9355	469.99 €	499.99 €
RAM
128GB DDR5 ECC 4800MHz	inclus	inclus
192GB DDR5 ECC 4800MHz	40 €	100 €
256GB DDR5 ECC 4800MHz	80 €	120 €
384GB DDR5 ECC 4800MHz	160 €	280 €
512GB DDR5 ECC 4800MHz	240 €	400 €
768GB DDR5 ECC 4800MHz	400 €	700 €
1TB DDR5 ECC 4800MHz	560 €	1368 €
1.5TB DDR5 ECC 4800MHz	880 €	2152 €
3TB DDR5 ECC 3600MHz	1840 €	4504 €
Stockage
2x SSD NVMe 1.92TB Datacenter Class PCIe 5.0 Soft RAID	inclus	inclus
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	38 €	70 €
2x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	100 €	118 €
4x SSD NVMe 1.92TB Enterprise Class PCIe 5.0 Soft RAID	76 €	140 €
2x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	160 €	210 €
6x SSD NVMe 1.92TB Enterprise Class PCIe 5.0 Soft RAID	190 €	210 €
4x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	200 €	236 €
6x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	300 €	354 €
4x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	320 €	420 €
6x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	480 €	630 €
6x SSD NVMe 15.36TB Enterprise Class Soft RAID	900 €	1176 €
SCALE-a4 – 2024 – AMD EPYC GENOA 9454	449.99 €	459.99 €
RAM
128GB DDR5 ECC 4800MHz	inclus	inclus
256GB DDR5 ECC 4800MHz	80 €	200 €
512GB DDR5 ECC 4800MHz	240 €	440 €
1TB DDR5 ECC 4800MHz	560 €	1000 €
Stockage
2x SSD NVMe 960GB Datacenter Class Soft RAID	inclus	inclus
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	52 €	70 €
2x SSD NVMe 3.84TB Datacenter Class Soft RAID	104 €	118 €
4x SSD NVMe 1.92TB Datacenter Class Soft RAID	104 €	140 €
6x SSD NVMe 1.92TB Enterprise Class Soft RAID	156 €	210 €
2x SSD NVMe 7.68TB Enterprise Class Soft RAID	208 €	229 €
4x SSD NVMe 3.84TB Datacenter Class Soft RAID	208 €	236 €
6x SSD NVMe 3.84TB Enterprise Class Soft RAID	312 €	354 €
4x SSD NVMe 7.68TB Enterprise Class Soft RAID	416 €	458 €
6x SSD NVMe 7.68TB Enterprise Class Soft RAID	624 €	687 €
SCALE-a4 – 2026 – AMD EPYC 9455	539.99 €	549.99 €
RAM
128GB DDR5 ECC 4800MHz	inclus	inclus
192GB DDR5 ECC 4800MHz	40 €	100 €
256GB DDR5 ECC 4800MHz	80 €	120 €
384GB DDR5 ECC 4800MHz	160 €	280 €
512GB DDR5 ECC 4800MHz	240 €	400 €
768GB DDR5 ECC 4800MHz	400 €	700 €
1TB DDR5 ECC 4800MHz	560 €	1368 €
1.5TB DDR5 ECC 4800MHz	880 €	2152 €
3TB DDR5 ECC 3600MHz	1840 €	4504 €
Stockage
2x SSD NVMe 1.92TB Datacenter Class PCIe 5.0 Soft RAID	inclus	inclus
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	38 €	70 €
2x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	100 €	118 €
4x SSD NVMe 1.92TB Enterprise Class PCIe 5.0 Soft RAID	76 €	140 €
2x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	160 €	210 €
6x SSD NVMe 1.92TB Enterprise Class PCIe 5.0 Soft RAID	190 €	210 €
4x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	200 €	236 €
6x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	300 €	354 €
4x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	320 €	420 €
6x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	480 €	630 €
6x SSD NVMe 15.36TB Enterprise Class Soft RAID	900 €	1176 €
SCALE-a5 – 2024 – AMD EPYC GENOA 9554	499.99 €	539.99 €
RAM
128GB DDR5 ECC 4800MHz	inclus	inclus
256GB DDR5 ECC 4800MHz	80 €	200 €
512GB DDR5 ECC 4800MHz	240 €	440 €
1TB DDR5 ECC 4800MHz	560 €	1000 €
Stockage
2x SSD NVMe 960GB Datacenter Class Soft RAID	inclus	inclus
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	52 €	70 €
2x SSD NVMe 3.84TB Datacenter Class Soft RAID	104 €	118 €
4x SSD NVMe 1.92TB Datacenter Class Soft RAID	104 €	140 €
6x SSD NVMe 1.92TB Enterprise Class Soft RAID	156 €	210 €
2x SSD NVMe 7.68TB Enterprise Class Soft RAID	208 €	229 €
4x SSD NVMe 3.84TB Datacenter Class Soft RAID	208 €	236 €
6x SSD NVMe 3.84TB Enterprise Class Soft RAID	312 €	354 €
4x SSD NVMe 7.68TB Enterprise Class Soft RAID	416 €	458 €
6x SSD NVMe 7.68TB Enterprise Class Soft RAID	624 €	687 €
SCALE-a5 – 2026 – AMD EPYC 9555	599.99 €	639.99 €
RAM
128GB DDR5 ECC 4800MHz	inclus	inclus
192GB DDR5 ECC 4800MHz	40 €	100 €
256GB DDR5 ECC 4800MHz	80 €	120 €
384GB DDR5 ECC 4800MHz	160 €	280 €
512GB DDR5 ECC 4800MHz	240 €	400 €
768GB DDR5 ECC 4800MHz	400 €	700 €
1TB DDR5 ECC 4800MHz	560 €	1368 €
1.5TB DDR5 ECC 4800MHz	880 €	2152 €
3TB DDR5 ECC 3600MHz	1840 €	4504 €
Stockage
2x SSD NVMe 1.92TB Datacenter Class PCIe 5.0 Soft RAID	inclus	inclus
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	38 €	70 €
2x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	100 €	118 €
4x SSD NVMe 1.92TB Enterprise Class PCIe 5.0 Soft RAID	76 €	140 €
2x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	160 €	210 €
6x SSD NVMe 1.92TB Enterprise Class PCIe 5.0 Soft RAID	190 €	210 €
4x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	200 €	236 €
6x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	300 €	354 €
4x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	320 €	420 €
6x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	480 €	630 €
6x SSD NVMe 15.36TB Enterprise Class Soft RAID	900 €	1176 €
SCALE-a6 – 2024 – AMD EPYC GENOA 9654	579.99 €	629.99 €
RAM
128GB DDR5 ECC 4800MHz	inclus	inclus
256GB DDR5 ECC 4800MHz	80 €	200 €
512GB DDR5 ECC 4800MHz	240 €	440 €
1TB DDR5 ECC 4800MHz	560 €	1000 €
Stockage
2x SSD NVMe 960GB Datacenter Class Soft RAID	inclus	inclus
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	52 €	70 €
2x SSD NVMe 3.84TB Datacenter Class Soft RAID	104 €	118 €
4x SSD NVMe 1.92TB Datacenter Class Soft RAID	104 €	140 €
6x SSD NVMe 1.92TB Enterprise Class Soft RAID	156 €	210 €
2x SSD NVMe 7.68TB Enterprise Class Soft RAID	208 €	229 €
4x SSD NVMe 3.84TB Datacenter Class Soft RAID	208 €	236 €
6x SSD NVMe 3.84TB Enterprise Class Soft RAID	312 €	354 €
4x SSD NVMe 7.68TB Enterprise Class Soft RAID	416 €	458 €
6x SSD NVMe 7.68TB Enterprise Class Soft RAID	624 €	687 €
SCALE-a6 – 2026 – AMD EPYC 9655	699.99 €	729.99 €
RAM
128GB DDR5 ECC 4800MHz	inclus	inclus
192GB DDR5 ECC 4800MHz	40 €	100 €
256GB DDR5 ECC 4800MHz	80 €	120 €
384GB DDR5 ECC 4800MHz	160 €	280 €
512GB DDR5 ECC 4800MHz	240 €	400 €
768GB DDR5 ECC 4800MHz	400 €	700 €
1TB DDR5 ECC 4800MHz	560 €	1368 €
1.5TB DDR5 ECC 4800MHz	880 €	2152 €
3TB DDR5 ECC 3600MHz	1840 €	4504 €
Stockage
2x SSD NVMe 1.92TB Datacenter Class PCIe 5.0 Soft RAID	inclus	inclus
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	38 €	70 €
2x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	100 €	118 €
4x SSD NVMe 1.92TB Enterprise Class PCIe 5.0 Soft RAID	76 €	140 €
2x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	160 €	210 €
6x SSD NVMe 1.92TB Enterprise Class PCIe 5.0 Soft RAID	190 €	210 €
4x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	200 €	236 €
6x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	300 €	354 €
4x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	320 €	420 €
6x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	480 €	630 €
6x SSD NVMe 15.36TB Enterprise Class Soft RAID	900 €	1176 €
SCALE-a7 – 2026 – AMD EPYC 9755	809.99 €	829.99 €
RAM
128GB DDR5 ECC 4800MHz	inclus	inclus
192GB DDR5 ECC 4800MHz	40 €	100 €
256GB DDR5 ECC 4800MHz	80 €	120 €
384GB DDR5 ECC 4800MHz	160 €	280 €
512GB DDR5 ECC 4800MHz	240 €	400 €
768GB DDR5 ECC 4800MHz	192 €	700 €
1TB DDR5 ECC 4800MHz	560 €	1368 €
1.5TB DDR5 ECC 4800MHz	880 €	2152 €
3TB DDR5 ECC 3600MHz	1840 €	4504 €
Stockage
2x SSD NVMe 1.92TB Datacenter Class PCIe 5.0 Soft RAID	inclus	inclus
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	38 €	70 €
2x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	100 €	118 €
4x SSD NVMe 1.92TB Enterprise Class PCIe 5.0 Soft RAID	76 €	140 €
2x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	160 €	210 €
6x SSD NVMe 1.92TB Enterprise Class PCIe 5.0 Soft RAID	190 €	210 €
4x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	200 €	236 €
6x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	300 €	354 €
4x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	320 €	420 €
6x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	480 €	630 €
6x SSD NVMe 15.36TB Enterprise Class Soft RAID	900 €	1176 €
SCALE-a8 – 2026 – AMD EPYC 9965	869.99 €	899.99 €
RAM
128GB DDR5 ECC 4800MHz	inclus	inclus
192GB DDR5 ECC 4800MHz	40 €	100 €
256GB DDR5 ECC 4800MHz	80 €	120 €
384GB DDR5 ECC 4800MHz	160 €	280 €
512GB DDR5 ECC 4800MHz	240 €	400 €
768GB DDR5 ECC 4800MHz	192 €	700 €
1TB DDR5 ECC 4800MHz	560 €	1368 €
1.5TB DDR5 ECC 4800MHz	880 €	2152 €
3TB DDR5 ECC 3600MHz	1840 €	4504 €
Stockage
2x SSD NVMe 1.92TB Datacenter Class PCIe 5.0 Soft RAID	inclus	inclus
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	38 €	70 €
2x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	100 €	118 €
4x SSD NVMe 1.92TB Enterprise Class PCIe 5.0 Soft RAID	76 €	140 €
2x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	160 €	210 €
6x SSD NVMe 1.92TB Enterprise Class PCIe 5.0 Soft RAID	190 €	210 €
4x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	200 €	236 €
6x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	300 €	354 €
4x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	320 €	420 €
6x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	480 €	630 €
6x SSD NVMe 15.36TB Enterprise Class Soft RAID	900 €	1176 €
SCALE-a9 – 2026 – Dual AMD EPYC 9965	1349.99 €	1349.99 €
RAM
128GB DDR5 ECC 5600MHz	inclus	inclus
192GB DDR5 ECC 5600MHz	40 €	40 €
256GB DDR5 ECC 5600MHz	80 €	80 €
384GB DDR5 ECC 5600MHz	160 €	160 €
512GB DDR5 ECC 5600MHz	240 €	240 €
768GB DDR5 ECC 5600MHz	inclus	400 €
1024GB DDR5 ECC 5600MHz	560 €	560 €
1.5TB DDR5 ECC 5600MHz	880 €	880 €
3TB DDR5 ECC 5600MHz	1840 €	1840 €
Stockage
2x SSD NVMe 1.92TB Datacenter Class PCIe 5.0 Soft RAID	inclus	inclus
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	38 €	38 €
2x SSD NVMe 3.84TB Enterprise Class PCIe 5.0 Soft RAID	100 €	100 €
2x SSD NVMe 7.68TB Enterprise Class PCIe 5.0 Soft RAID	160 €	160 €
SCALE-i
SCALE-i1 – 2024 – Intel Xeon Gold 6426Y	349.99 €	369.99 €
RAM
128GB DDR5 ECC 4800MHz	inclus	inclus
256GB DDR5 ECC 4800MHz	80 €	200 €
512GB DDR5 ECC 4800MHz	240 €	440 €
1TB DDR5 ECC 4800MHz	560 €	1000 €
Stockage
2x SSD NVMe 960GB Datacenter Class Soft RAID	inclus	inclus
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	52 €	70 €
2x SSD NVMe 3.84TB Datacenter Class Soft RAID	104 €	118 €
4x SSD NVMe 1.92TB Datacenter Class Soft RAID	104 €	140 €
6x SSD NVMe 1.92TB Enterprise Class Soft RAID	156 €	210 €
2x SSD NVMe 7.68TB Enterprise Class Soft RAID	208 €	229 €
4x SSD NVMe 3.84TB Datacenter Class Soft RAID	208 €	236 €
6x SSD NVMe 3.84TB Enterprise Class Soft RAID	312 €	354 €
4x SSD NVMe 7.68TB Enterprise Class Soft RAID	416 €	458 €
6x SSD NVMe 7.68TB Enterprise Class Soft RAID	624 €	687 €
SCALE-i2 – 2024 – Intel Xeon Gold 6442Y	379.99 €	389.99 €
RAM
128GB DDR5 ECC 4800MHz	inclus	inclus
256GB DDR5 ECC 4800MHz	80 €	200 €
512GB DDR5 ECC 4800MHz	240 €	440 €
1TB DDR5 ECC 4800MHz	560 €	1000 €
Stockage
2x SSD NVMe 960GB Datacenter Class Soft RAID	inclus	inclus
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	52 €	70 €
2x SSD NVMe 3.84TB Datacenter Class Soft RAID	104 €	118 €
4x SSD NVMe 1.92TB Datacenter Class Soft RAID	104 €	140 €
6x SSD NVMe 1.92TB Enterprise Class Soft RAID	156 €	210 €
2x SSD NVMe 7.68TB Enterprise Class Soft RAID	208 €	229 €
4x SSD NVMe 3.84TB Datacenter Class Soft RAID	208 €	236 €
6x SSD NVMe 3.84TB Enterprise Class Soft RAID	312 €	354 €
4x SSD NVMe 7.68TB Enterprise Class Soft RAID	416 €	458 €
6x SSD NVMe 7.68TB Enterprise Class Soft RAID	624 €	687 €
SCALE-i3 – 2024 – Intel Xeon Gold 6438M	409.99 €	449.99 €
RAM
128GB DDR5 ECC 4800MHz	inclus	inclus
256GB DDR5 ECC 4800MHz	80 €	200 €
512GB DDR5 ECC 4800MHz	240 €	440 €
1TB DDR5 ECC 4800MHz	560 €	1000 €
Stockage
2x SSD NVMe 960GB Datacenter Class Soft RAID	inclus	inclus
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	52 €	70 €
2x SSD NVMe 3.84TB Datacenter Class Soft RAID	104 €	118 €
4x SSD NVMe 1.92TB Datacenter Class Soft RAID	104 €	140 €
6x SSD NVMe 1.92TB Enterprise Class Soft RAID	156 €	210 €
2x SSD NVMe 7.68TB Enterprise Class Soft RAID	208 €	229 €
4x SSD NVMe 3.84TB Datacenter Class Soft RAID	208 €	236 €
6x SSD NVMe 3.84TB Enterprise Class Soft RAID	312 €	354 €
4x SSD NVMe 7.68TB Enterprise Class Soft RAID	416 €	458 €
6x SSD NVMe 7.68TB Enterprise Class Soft RAID	624 €	687 €
SCALE-GPU
SCALE-GPU-1 – 2024 – AMD EPYC GENOA 9354	969.99 €	969.99 €
RAM
192GB DDR5 ECC 4800MHz	inclus	inclus
384GB DDR5 ECC 4800MHz	120 €	120 €
768GB DDR5 ECC 4800MHz	240 €	240 €
RAM 1,1TB DDR5 ECC 4800MHz	420 €	420 €
Stockage
2x SSD NVMe 960GB Datacenter Class Soft RAID	inclus	inclus
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	52 €	52 €
2x SSD NVMe 3.84TB Datacenter Class Soft RAID	104 €	104 €
2x SSD NVMe 7.68TB Enterprise Class Soft RAID	208 €	208 €
SCALE-GPU-2 – 2024 – AMD EPYC GENOA 9454	999.99 €	999.99 €
RAM
192GB DDR5 ECC 4800MHz	inclus	inclus
384GB DDR5 ECC 4800MHz	120 €	120 €
768GB DDR5 ECC 4800MHz	240 €	240 €
RAM 1,1TB DDR5 ECC 4800MHz	420 €	420 €
Stockage
2x SSD NVMe 960GB Datacenter Class Soft RAID	inclus	inclus
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	52 €	52 €
2x SSD NVMe 3.84TB Datacenter Class Soft RAID	104 €	104 €
2x SSD NVMe 7.68TB Enterprise Class Soft RAID	208 €	208 €
SCALE-GPU-3 – 2024 – AMD EPYC GENOA 9554	1029.99 €	1029.99 €
RAM
192GB DDR5 ECC 4800MHz	inclus	inclus
384GB DDR5 ECC 4800MHz	120 €	120 €
768GB DDR5 ECC 4800MHz	240 €	240 €
RAM 1,1TB DDR5 ECC 4800MHz	420 €	420 €
Stockage
2x SSD NVMe 960GB Datacenter Class Soft RAID	inclus	inclus
2x SSD NVMe 1.92TB Datacenter Class Soft RAID	52 €	52 €
2x SSD NVMe 3.84TB Datacenter Class Soft RAID	104 €	104 €
2x SSD NVMe 7.68TB Enterprise Class Soft RAID	208 €	208 €
HGR
HGR-AI-2 – 2024 – DUAL AMD EPYC 9354	2969.99 €	2969.99 €
RAM
384GB DDR5 ECC 4800MHz	inclus	inclus
512GB DDR5 ECC 4800MHz	64 €	74 €
768GB DDR5 ECC 4800MHz	400 €	360 €
RAM 2304GB DDR5 ECC 4800MHz	960 €	2208 €
Stockage
2x SSD NVMe 3.84TB Datacenter Class Soft RAID	inclus	inclus
4x SSD NVMe 3.84TB Datacenter Class Soft RAID	88 €	118 €
2x SSD NVMe 7.68TB Enterprise Class Soft RAID	150 €	165 €
4x SSD NVMe 7.68TB Enterprise Class Soft RAID	300 €	330 €
2x SSD NVMe 15.36TB Enterprise Class Soft RAID	308 €	339 €
4x SSD NVMe 15.36TB Enterprise Class Soft RAID	616 €	680 €
HGR-HCI-a1 – 2024 – DUAL AMD EPYC 9254	999.99 €	1119.99 €
RAM
256GB DDR5 ECC 4800MHz	inclus	inclus
512GB DDR5 ECC 4800MHz	128 €	240 €
1TB DDR5 ECC 4800MHz	384 €	800 €
1.5TB DDR5 ECC 4800MHz	512 €	1472 €
RAM 2304GB DDR5 ECC 4800MHz	1024 €	2408 €
Stockage
6x SSD NVMe 3.84TB Enterprise Class Soft RAID	inclus	inclus
12x SSD NVMe 3.84TB Enterprise Class Soft RAID	264 €	354 €
18x SSD NVMe 3.84TB Enterprise Class Soft RAID	528 €	708 €
24x SSD NVMe 3.84TB Enterprise Class Soft RAID	792 €	1062 €
HGR-HCI-a2 – 2024 – DUAL AMD EPYC 9354	1139.99 €	1274.99 €
RAM
384GB DDR5 ECC 4800MHz	inclus	inclus
384GB DDR5 ECC 4800MHz	inclus	inclus
512GB DDR5 ECC 4800MHz	64 €	74 €
768GB DDR5 ECC 4800MHz	400 €	360 €
1TB DDR5 ECC 4800MHz	320 €	600 €
1TB DDR5 ECC 4800MHz	320 €	600 €
1.5TB DDR5 ECC 4800MHz	384 €	1272 €
1.5TB DDR5 ECC 4800MHz	384 €	1272 €
RAM 2304GB DDR5 ECC 4800MHz	960 €	2208 €
Stockage
6x SSD NVMe 3.84TB Enterprise Class Soft RAID	inclus	inclus
12x SSD NVMe 3.84TB Enterprise Class Soft RAID	264 €	354 €
18x SSD NVMe 3.84TB Enterprise Class Soft RAID	528 €	708 €
24x SSD NVMe 3.84TB Enterprise Class Soft RAID	792 €	1062 €
HGR-HCI-i1 – 2024 – DUAL INTEL XEON GOLD 5515+	849.99 €	949.99 €
RAM
256GB DDR5 ECC 4800MHz	inclus	inclus
512GB DDR5 ECC 4800MHz	128 €	240 €
1TB DDR5 ECC 4800MHz	384 €	800 €
1.5TB DDR5 ECC 4800MHz	512 €	1472 €
Stockage
2x SSD NVMe 960GB Datacenter Class Soft RAID	inclus	inclus
6x SSD NVMe 3.84TB Enterprise Class Soft RAID	inclus	inclus
12x SSD NVMe 3.84TB Enterprise Class Soft RAID	264 €	354 €
18x SSD NVMe 3.84TB Enterprise Class Soft RAID	528 €	708 €
24x SSD NVMe 3.84TB Enterprise Class Soft RAID	792 €	1062 €
HGR-HCI-i2 – 2024 – DUAL INTEL XEON GOLD 6526Y	929.99 €	1039.99 €
RAM
256GB DDR5 ECC 4800MHz	inclus	inclus
512GB DDR5 ECC 4800MHz	128 €	240 €
Stockage
6x SSD NVMe 3.84TB Enterprise Class Soft RAID	inclus	inclus
12x SSD NVMe 3.84TB Enterprise Class Soft RAID	264 €	354 €
18x SSD NVMe 3.84TB Enterprise Class Soft RAID	528 €	708 €
24x SSD NVMe 3.84TB Enterprise Class Soft RAID	792 €	1062 €
HGR-HCI-i3 – 2024 – DUAL INTEL XEON GOLD 6542Y	999.99 €	1119.99 €
RAM
256GB DDR5 ECC 4800MHz	inclus	inclus
512GB DDR5 ECC 4800MHz	128 €	240 €
Stockage
6x SSD NVMe 3.84TB Enterprise Class Soft RAID	inclus	inclus
12x SSD NVMe 3.84TB Enterprise Class Soft RAID	264 €	354 €
18x SSD NVMe 3.84TB Enterprise Class Soft RAID	528 €	708 €
24x SSD NVMe 3.84TB Enterprise Class Soft RAID	792 €	1062 €
HGR-HCI-i4 – 2024 – DUAL INTEL XEON GOLD 6554S	1079.99 €	1209.99 €
RAM
256GB DDR5 ECC 4800MHz	inclus	inclus
512GB DDR5 ECC 4800MHz	128 €	240 €
Stockage
6x SSD NVMe 3.84TB Enterprise Class Soft RAID	inclus	inclus
12x SSD NVMe 3.84TB Enterprise Class Soft RAID	264 €	354 €
18x SSD NVMe 3.84TB Enterprise Class Soft RAID	528 €	708 €
24x SSD NVMe 3.84TB Enterprise Class Soft RAID	792 €	1062 €
HGR-SAP-1 – 2024 – DUAL INTEL XEON GOLD 6226R	1011.99 €	1254.99 €
RAM
192GB DDR4 ECC 2933MHz	inclus	inclus
384GB DDR4 ECC 2933MHz	96 €	216 €
Stockage
6x SSD SAS 3.84TB Enterprise Class Hard RAID	inclus	inclus
2x SSD SATA 480GB	inclus	inclus
12x SSD SAS 3.84TB Enterprise Class Hard RAID	264 €	354 €
24x SSD SAS 3.84TB Enterprise Class Hard RAID	792 €	1062 €
HGR-SAP-2 – 2024 – DUAL INTEL XEON GOLD 6242R	1121.99 €	1391.99 €
RAM
384GB DDR4 ECC 2933MHz	inclus	inclus
RAM 768GB DDR4 ECC 2933MHz	192 €	312 €
Stockage
6x SSD SAS 3.84TB Enterprise Class Hard RAID	inclus	inclus
2x SSD SATA 480GB	inclus	inclus
12x SSD SAS 3.84TB Enterprise Class Hard RAID	264 €	354 €
24x SSD SAS 3.84TB Enterprise Class Hard RAID	792 €	1062 €
HGR-SAP-3 – 2024 – DUAL INTEL XEON GOLD 6248R	1231.99 €	1527.99 €
RAM
RAM 768GB DDR4 ECC 2933MHz	192 €	inclus
RAM 1.5TB DDR4 ECC 2933MHz	384 €	1032 €
Stockage
6x SSD SAS 3.84TB Enterprise Class Hard RAID	inclus	inclus
2x SSD SATA 480GB	inclus	inclus
12x SSD SAS 3.84TB Enterprise Class Hard RAID	264 €	354 €
24x SSD SAS 3.84TB Enterprise Class Hard RAID	792 €	1062 €
HGR-SDS-1 – 2024 – DUAL INTEL XEON GOLD 5515+	999.99 €	1119.99 €
Stockage
6x SSD NVMe 7.68TB Enterprise Class Soft RAID	inclus	inclus
6x SSD NVMe 15.36TB Enterprise Class Soft RAID	475 €	546 €
12x SSD NVMe 7.68TB Enterprise Class Soft RAID	450 €	630 €
18x SSD NVMe 7.68TB Enterprise Class Soft RAID	900 €	1260 €
12x SSD NVMe 15.36TB Enterprise Class Soft RAID	1399 €	1722 €
24x SSD NVMe 7.68TB Enterprise Class Soft RAID	1350 €	1890 €
18x SSD NVMe 15.36TB Enterprise Class Soft RAID	2323 €	2898 €
24x SSD NVMe 15.36TB Enterprise Class Soft RAID	3247 €	4074 €
HGR-SDS-2 – 2024 – DUAL INTEL XEON GOLD 6542Y	1149.99 €	1289.99 €
Stockage
6x SSD NVMe 7.68TB Enterprise Class Soft RAID	inclus	inclus
6x SSD NVMe 15.36TB Enterprise Class Soft RAID	475 €	546 €
12x SSD NVMe 7.68TB Enterprise Class Soft RAID	450 €	630 €
18x SSD NVMe 7.68TB Enterprise Class Soft RAID	900 €	1260 €
12x SSD NVMe 15.36TB Enterprise Class Soft RAID	1399 €	1722 €
24x SSD NVMe 7.68TB Enterprise Class Soft RAID	1350 €	1890 €
18x SSD NVMe 15.36TB Enterprise Class Soft RAID	2323 €	2898 €
24x SSD NVMe 15.36TB Enterprise Class Soft RAID	3247 €	4074 €
HGR-STOR-1 – 2024 – INTEL XEON GOLD 6554S	1199.99 €	1399.99 €
RAM
128GB DDR5 ECC 4800MHz	inclus	inclus
256GB DDR5 ECC 4800MHz	64 €	200 €
512GB DDR5 ECC 4800MHz	192 €	440 €
768GB DDR5 ECC 4800MHz	400 €	760 €
768GB DDR5 ECC 4800MHz	320 €	760 €
Stockage
24x HDD SAS 22TB Enterprise Class Soft RAID	inclus	inclus
24x HDD SAS 22TB + 2x SSD NVMe 3.84TB High perf. cache Enterprise Class Soft RAID	88 €	118 €
24x HDD SAS 22TB Enterprise Class Hard RAID	66 €	120 €
24x HDD SAS 22TB + 2x SSD NVMe 7.68TB High perf. cache Enterprise Class Soft RAID	150 €	210 €
24x HDD SAS 22TB + 2x SSD NVMe 3.84TB High perf. cache Enterprise Class Hard RAID	154 €	238 €
24x HDD SAS 22TB + 2x SSD NVMe 7.68TB High perf. cache Enterprise Class Hard RAID	216 €	330 €
24x HDD SAS 22TB + 2x SSD NVMe 15.36TB High perf. cache Enterprise Class Soft RAID	308 €	392 €
24x HDD SAS 22TB + 2x SSD NVMe 15.36TB High perf. cache Enterprise Class Hard RAID	374 €	512 €
36x HDD SAS 22TB Enterprise Class Soft RAID	384 €	516 €
36x HDD SAS 22TB + 2x SSD NVMe 3.84TB High perf. cache Enterprise Class Soft RAID	472 €	634 €
36x HDD SAS 22TB Enterprise Class Hard RAID	450 €	696 €
36x HDD SAS 22TB + 2x SSD NVMe 7.68TB High perf. cache Enterprise Class Soft RAID	534 €	726 €
36x HDD SAS 22TB + 2x SSD NVMe 3.84TB High perf. cache Enterprise Class Hard RAID	538 €	814 €
36x HDD SAS 22TB + 2x SSD NVMe 7.68TB High perf. cache Enterprise Class Hard RAID	600 €	906 €
36x HDD SAS 22TB + 2x SSD NVMe 15.36TB High perf. cache Enterprise Class Soft RAID	692 €	908 €
36x HDD SAS 22TB + 2x SSD NVMe 15.36TB High perf. cache Enterprise Class Hard RAID	758 €	1088 €

IPs

Reference	Ancien prix public (HT / mois)	Nouveau prix public (HT / mois)
Additional IPv4	1.50 €	2.00 €

VPS

Family	Reference	Commit	Ancien prix public (HT / mois)	Nouveau prix public (HT / mois)
VPS 2026	VPS-1	Monthly	4.49 €	6.49 €
VPS 2026	VPS-2	Monthly	6.99 €	9.99 €
VPS 2026	VPS-3	Monthly	13.99 €	19.99 €
VPS 2026	VPS-4	Monthly	24.99 €	36.99 €
VPS 2026	VPS-5	Monthly	36.99 €	54.99 €
VPS 2026	VPS-6	Monthly	48.99 €	72.99 €
VPS 2026	VPSLZ-1	Monthly	5.49 €	7.49 €
VPS 2026	VPS-1	prepayment 6 months	25.56 €	36.99 €
VPS 2026	VPS-2	prepayment 6 months	39.84 €	56.94 €
VPS 2026	VPS-3	prepayment 6 months	79.74 €	113.94 €
VPS 2026	VPS-4	prepayment 6 months	142.44 €	210.84 €
VPS 2026	VPS-5	prepayment 6 months	210.84 €	313.44 €
VPS 2026	VPS-6	prepayment 6 months	279.24 €	416.04 €
VPS 2026	VPSLZ-1	prepayment 6 months	31.26 €	42.69 €
VPS 2026	VPS-1	prepayment 12 months	45.72 €	66.19 €
VPS 2026	VPS-2	prepayment 12 months	71.28 €	101.89 €
VPS 2026	VPS-3	prepayment 12 months	142.68 €	203.89 €
VPS 2026	VPS-4	prepayment 12 months	254.88 €	377.29 €
VPS 2026	VPS-5	prepayment 12 months	377.28 €	560.89 €
VPS 2026	VPS-6	prepayment 12 months	499.68 €	744.49 €
VPS 2026	VPSLZ-1	prepayment 12 months	55.92 €	76.39 €

Reference Architecture: Custom metric autoscaling for LLM inference with vLLM on OVHcloud AI Deploy and observability using MKS

Eléa Petton — Tue, 10 Feb 2026 08:51:11 +0000

Take your LLM (Large Language Model) deployment to production level with comprehensive custom autoscaling configuration and advanced vLLM metrics observability.

vLLM metrics monitoring and observability based on OVHcloud infrastructure

This reference architecture describes a comprehensive solution for deploying, autoscaling and monitoring vLLM-based LLM workloads on OVHcloud infrastructure. It combinesAI Deploy, used for model serving with custom metric autoscaling, and Managed Kubernetes Service (MKS), which hosts the monitoring and observability stack.

By leveraging application-level Prometheus metrics exposed by vLLM, AI Deploy can automatically scale inference replicas based on real workload demand, ensuring high availability, consistent performance under load and efficient GPU utilisation. This autoscaling mechanism allows the platform to react dynamically to traffic spikes while maintaining predictable latency for end users.

On top of this scalable inference layer, the monitoring architecture provides observability through Prometheus, Grafana and Alertmanager. It enables real-time performance monitoring, capacity planning, and operational insights, while ensuring full data sovereignty for organisations running Large Language Models (LLMs) in production environments.

What are the key benefits?

Cost-effective: Leverage managed services to minimise operational overhead
Real-time observability: Track Time-to-First-Token (TTFT), throughput, and resource utilisation
Sovereign infrastructure: All metrics and data remain within European datacentres
Production-ready: Persistent storage, high availability, and automated monitoring

Context

AI Deploy

OVHcloud AI Deploy is a Container as a Service (CaaS) platform designed to help you deploy, manage and scale AI models. It provides a solution that allows you to optimally deploy your applications/APIs based on Machine Learning (ML), Deep Learning (DL) or Large Language Models (LLMs).

Key points to keep in mind:

Easy to use: Bring your own custom Docker image and deploy it in a command line or a few clicks surely
High-performance computing: A complete range of GPUs available (H100, A100, V100S, L40S and L4)
Scalability and flexibility: Supports automatic scaling, allowing your model to effectively handle fluctuating workloads
Cost-efficient: Billing per minute, no surcharges

Managed Kubernetes Service

What should you keep in mind?

Cost-efficient: Only pay for worker nodes and consumed resources, with no additional charge for the Kubernetes control plane
Fully managed Kubernetes: Certified upstream Kubernetes with automated control plane management, upgrades and high availability
Production-ready by design: Built-in integrations with OVHcloud Load Balancers, networking and persistent storage
Scalability and flexibility: Easily scale workloads and node pools to match application demand
Open and portable: Based on standard Kubernetes APIs, enabling seamless integration with open-source ecosystems and avoiding vendor lock-in

In the following guide, all services are deployed within the OVHcloud Public Cloud.

Overview of the architecture

This reference architecture describes a complete, secure and scalable solution to:

Deploy an LLM with vLLM and AI Deploy, benefiting from automatic scaling based on custom metrics to ensure high service availability – vLLM exposes /metrics via its public HTTPS endpoint on AI Deploy
Collect, store and visualise these vLLM metrics using Prometheus and Grafana on MKS

vLLM metrics monitoring and observability architecture overview

Here you will find the main components of the architecture. The solution comprises three main layers:

Model serving layer with AI Deploy
- vLLM containers running on top of GPUs for LLM inference
- vLLM inference server exposing Prometheus metrics
- Automatic scaling based on custom metrics to ensure high availability
- HTTPS endpoints with Bearer token authentication
Monitoring and observability infrastructure using Kubernetes
- Prometheus for metrics collection and storage
- Grafana for visualisation and dashboards
- Persistent volume storage for long-term retention
Network layer
- Secure HTTPS communication between components
- OVHcloud LoadBalancer for external access

To go further, some prerequisites must be checked!

Prerequisites

Before you begin, ensure you have:

An OVHcloud Public Cloud account
An OpenStack user with the Administrator role
ovhai CLI available – install the ovhai CLI
A Hugging Face access – create a Hugging Face account and generate an access token
kubectl installed and helm installed (at least version 3.x)

🚀 Now you have all the ingredients for our recipe, it’s time to deploy the Ministral 14B using AI Deploy and vLLM Docker container!

Architecture guide: From autoscaling to observability for LLMs served by vLLM

Let’s set up and deploy this architecture!

Overview of the deployment workflow

✅ Note

In this example, mistralai/Ministral-3-14B-Instruct-2512 is used. Choose the open-source model of your choice and follow the same steps, adapting the model slug (from Hugging Face), the versions and the GPU(s) flavour.

Remember that all of the following steps can be automated using OVHcloud APIs!

Step 1 – Manage access tokens

Before introducing the monitoring stack, this architecture starts with the deployment of the Ministral 3 14B on OVHcloud AI Deploy, configured to autoscale based on custom Prometheus metrics exposed by vLLM itself.

Export your Hugging Face token.

export MY_HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxx

Create a Bearer token to access your AI Deploy app once it’s been deployed.

ovhai token create --role operator ai_deploy_token=my_operator_token

Returning the following output:

Id: 47292486-fb98-4a5b-8451-600895597a2b Created At: 20-01-26 11:53:05 Updated At: 20-01-26 11:53:05 Spec: Name: ai_deploy_token=my_operator_token Role: AiTrainingOperator Label Selector: Status: Value: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Version: 1

You can now store and export your access token:

export MY_OVHAI_ACCESS_TOKEN=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Step 2 – LLM deployment using AI Deploy

1. Define the targeted vLLM metric for autoscaling

Before proceeding with the deployment of the Ministral 3 14B endpoint, you have to choose the metric you want to use as the trigger for scaling.

Instead of relying solely on CPU/RAM utilisation, AI Deploy allows autoscaling decisions to be driven by application-level signals.

To do this, you can consult the metrics exposed by vLLM.

In this example, you can use a basic metric such as vllm:num_requests_running to scale the number of replicas based on real inference load.

This enables:

Faster reaction to traffic spikes
Better GPU utilisation
Reduced inference latency under load
Cost-efficient scaling

Finally, the configuration chosen for scaling this application is as follows:

Parameter	Value	Description
Metric source	`/metrics`	vLLM Prometheus endpoint
Metric name	`vllm:num_requests_running`	Number of in-flight requests
Aggregation	`AVERAGE`	Mean across replicas
Target value	`50`	Desired load per replica
Min replicas	`1`	Baseline capacity
Max replicas	`3`	Burst capacity

✅ Note

You can choose the metric that best suits your use case. You can also apply a patch to your AI Deploy deployment at any time to change the target metric for scaling.

When the average number of running requests exceeds 50, AI Deploy automatically provisions additional GPU-backed replicas.

2. Deploy Ministral 3 14B using AI Deploy

Now you can deploy the LLM using the ovhai CLI.

Key elements necessary for proper functioning:

GPU-based inference: 1 x H100
vLLM OpenAI-compatible Docker image: vllm/vllm-openai:v0.13.0
Custom autoscaling rules based on Prometheus metrics: vllm:num_requests_running

Below is the reference command used to deploy the mistralai/Ministral-3-14B-Instruct-2512:

ovhai app run \
  --name vllm-ministral-14B-autoscaling-custom-metric \
  --default-http-port 8000 \
  --label ai_deploy_token=my_operator_token \
  --gpu 1 \
  --flavor h100-1-gpu \
  -e OUTLINES_CACHE_DIR=/tmp/.outlines \
  -e HF_TOKEN=$MY_HF_TOKEN \
  -e HF_HOME=/hub \
  -e HF_DATASETS_TRUST_REMOTE_CODE=1 \
  -e HF_HUB_ENABLE_HF_TRANSFER=0 \
  -v standalone:/hub:rw \
  -v standalone:/workspace:rw \
  --liveness-probe-path /health \
  --liveness-probe-port 8000 \
  --liveness-initial-delay-seconds 300 \
  --probe-path /v1/models \
  --probe-port 8000 \
  --initial-delay-seconds 300 \
  --auto-min-replicas 1 \
  --auto-max-replicas 3 \
  --auto-custom-api-url "http://:8000/metrics" \
  --auto-custom-metric-format PROMETHEUS \
  --auto-custom-value-location vllm:num_requests_running \
  --auto-custom-target-value 50 \
  --auto-custom-metric-aggregation-type AVERAGE \
  vllm/vllm-openai:v0.13.0 \
  -- bash -c "python3 -m vllm.entrypoints.openai.api_server \
    --model mistralai/Ministral-3-14B-Instruct-2512 \
    --tokenizer_mode mistral \
    --load_format mistral \
    --config_format mistral \
    --enable-auto-tool-choice \
    --tool-call-parser mistral \
    --enable-prefix-caching"

How to understand the different parameters of this command?

a. Start your AI Deploy app

Launch a new app using ovhai CLI and name it.

ovhai app run --name vllm-ministral-14B-autoscaling-custom-metric

b. Define access

Define the HTTP API port and restrict access to your token.

--default-http-port 8000
--label ai_deploy_token=my_operator_token

c. Configure GPU resources

Specify the hardware type (h100-1-gpu), which refers to an NVIDIA H100 GPU and the number (1).

--gpu 1 --flavor h100-1-gpu

⚠️WARNING! For this model, one H100 is sufficient, but if you want to deploy another model, you will need to check which GPU you need. Note that you can also access L40S and A100 GPUs for your LLM deployment.

d. Set up environment variables

Configure caching for the Outlines library (used for efficient text generation):

-e OUTLINES_CACHE_DIR=/tmp/.outlines

Pass the Hugging Face token ($MY_HF_TOKEN) for model authentication and download:

-e HF_TOKEN=$MY_HF_TOKEN

Set the Hugging Face cache directory to /hub (where models will be stored):

-e HF_HOME=/hub

Allow execution of custom remote code from Hugging Face datasets (required for some model behaviours):

-e HF_DATASETS_TRUST_REMOTE_CODE=1

Disable Hugging Face Hub transfer acceleration (to use standard model downloading):

-e HF_HUB_ENABLE_HF_TRANSFER=0

e. Mount persistent volumes

Mount two persistent storage volumes:

/hub → Stores Hugging Face model files
/workspace → Main working directory

The rw flag means read-write access.

-v standalone:/hub:rw -v standalone:/workspace:rw

f. Health checks and readiness

Configure liveness and readiness probes:

/health verifies the container is alive
/v1/models confirms the model is loaded and ready to serve requests

The long initial delays (300 seconds) can be reduced; they correspond to the startup time of vLLM and the loading of the model on the GPU.

--liveness-probe-path /health --liveness-probe-port 8000 --liveness-initial-delay-seconds 300 --probe-path /v1/models --probe-port 8000 --initial-delay-seconds 300

g. Autoscaling configuration (custom metrics)

First set the minimum and maximum number of replicas.

--auto-min-replicas 1 --auto-max-replicas 3

This guarantees basic availability (one replica always up) while allowing for peak capacity.

Then enable autoscaling based on application-level metrics exposed by vLLM.

--auto-custom-api-url "http://:8000/metrics" --auto-custom-metric-format PROMETHEUS --auto-custom-value-location vllm:num_requests_running --auto-custom-target-value 50 --auto-custom-metric-aggregation-type AVERAGE

AI Deploy:

Scrapes the local /metrics endpoint
Parses Prometheus-formatted metrics
Extracts the vllm:num_requests_running gauge
Computes the average value across replicas

Scaling behaviour:

When the average number of in-flight requests exceeds 50, AI Deploy adds replicas
When load decreases, replicas are scaled down

This approach ensures high availability and predictable latency under fluctuating traffic.

h. Choose the target Docker image and the startup command

Use the official vLLM OpenAI-compatible Docker image.

vllm/vllm-openai:v0.13.0

Finally, run the model inside the container using a Python command to launch the vLLM API server:

python3 -m vllm.entrypoints.openai.api_server → Starts the OpenAI-compatible vLLM API server
--model mistralai/Ministral-3-14B-Instruct-2512 → Loads the Ministral 3 14B model from Hugging Face
--tokenizer_mode mistral → Uses the Mistral tokenizer
--load_format mistral → Uses Mistral’s model loading format
--config_format mistral → Ensures the model configuration follows Mistral’s standard
--enable-auto-tool-choice → Automatic call of tools if necessary (function/tool call)
--tool-call-parser mistral → Tool calling support
--enable-prefix-caching → Prefix caching for improved throughput and reduced latency

You can now launch this command using ovhai CLI.

3. Check AI Deploy app status

You can now check if your AI Deploy app is alive:

ovhai app get

Is your app in RUNNING status? Perfect! You can check in the logs that the server is started:

ovhai app logs

⚠️WARNING! This step may take a little time as the LLM must be loaded.

4. Test that the deployment is functional

First you can request and send a prompt to the LLM. Launch the following query by asking the question of your choice:

curl https://.app.gra.ai.cloud.ovh.net/v1/chat/completions \
  -H "Authorization: Bearer $MY_OVHAI_ACCESS_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mistralai/Ministral-3-14B-Instruct-2512",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Give me the name of OVHcloud’s founder."}
    ],
    "stream": false
  }'

You can also verify access to vLLM metrics.

curl -H "Authorization: Bearer $MY_OVHAI_ACCESS_TOKEN" \
  https://.app.gra.ai.cloud.ovh.net/metrics

If both tests show that the model deployment is functional and you receive 200 HTTP responses, you are ready to move on to the next step!

The next step is to set up the observability and monitoring stack. This autoscaling mechanism is fully independent from Prometheus used for observability:

AI Deploy queries the local /metrics endpoint internally
Prometheus scrapes the same metrics endpoint externally for monitoring, dashboards and potentially alerting

This ensures:

A single source of truth for metrics
No duplication of exporters
Consistent signals for scaling and observability

Step 3 – Create an MKS cluster

From OVHcloud Control Panel, create a Kubernetes cluster using the MKS.

Consider using the following configuration for the current use case:

Location: GRA ( Gravelines) – you can select the same region as for AI Deploy
Network: Public
Node pool :
- Flavour : b2-15 (or something similar)
- Number of nodes: 3
- Autoscaling : OFF
Name your node pool: monitoring

You should see your cluster (e.g. prometheus-vllm-metrics-ai-deploy) in the list, along with the following information:

If the status is green with the OK label, you can proceed to the next step.

Step 4 – Configure Kubernetes access

Download your kubeconfig file from the OVHcloud Control Panel and configure kubectl:

# configure kubectl with your MKS cluster
export KUBECONFIG=/path/to/your/kubeconfig-xxxxxx.yml

# verify cluster connectivity
kubectl cluster-info
kubectl get nodes

Now,- you can create the values-prometheus.yaml file:

# general configuration
nameOverride: "monitoring"
fullnameOverride: "monitoring"

# Prometheus configuration
prometheus:
  prometheusSpec:
    # data retention (15d)
    retention: 15d
    
    # scrape interval (15s)
    scrapeInterval: 15s
    
    # persistent storage (required for production deployment)
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: csi-cinder-high-speed  # OVHcloud storage
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 50Gi  # (can be modified according to your needs)
    
    # scrape vLLM metrics from your AI Deploy instance (Ministral 3 14B)
    additionalScrapeConfigs:
      - job_name: 'vllm-ministral'
        scheme: https
        metrics_path: '/metrics'
        scrape_interval: 15s
        scrape_timeout: 10s
        
        # authentication using AI Deploy Bearer token stored Kubernetes Secret
        bearer_token_file: /etc/prometheus/secrets/vllm-auth-token/token
        static_configs:
          - targets:
              - '.app.gra.ai.cloud.ovh.net'  # /!\ REPLACE THE  by yours /!\
            labels:
              service: 'vllm'
              model: 'ministral'
              environment: 'production'
        
        # TLS configuration
        tls_config:
          insecure_skip_verify: false
    
    # kube-prometheus-stack mounts the secret under /etc/prometheus/secrets/ and makes it accessible to Prometheus
    secrets:
      - vllm-auth-token

# Grafana configuration (visualization layer)
grafana:
  enabled: true
  
  # disable automatic datasource provisioning
  sidecar:
    datasources:
      enabled: false
  
  # persistent dashboards
  persistence:
    enabled: true
    storageClassName: csi-cinder-high-speed
    size: 10Gi
  
  # /!\ DEFINE ADMIN PASSWORD - REPLACE "test" BY YOURS /!\
  adminPassword: "test"
  
  # access via OVHcloud LoadBalancer (public IP and managed LB)
  service:
    type: LoadBalancer
    port: 80
    annotations:
      # optional : limiter l'accès à certaines IPs
      # service.beta.kubernetes.io/ovh-loadbalancer-allowed-sources: "1.2.3.4/32"
  
# alertmanager (optional but recommended for production)
alertmanager:
  enabled: true
  
  alertmanagerSpec:
    storage:
      volumeClaimTemplate:
        spec:
          storageClassName: csi-cinder-high-speed
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 10Gi

# cluster observability components
nodeExporter:
  enabled: true
  
kubeStateMetrics:
  enabled: true

✅ Note

On OVHcloud MKS, persistent storage is handled automatically through the Cinder CSI driver. When a PersistentVolumeClaim (PVC) references a supported storageClassName such as csi-cinder-high-speed, OVHcloud dynamically provisions the underlying Block Storage volume and attaches it to the node running the pod. This enables stateful components like Prometheus, Alertmanager and Grafana to persist data reliably without any manual volume management, making the architecture fully cloud-native and operationally simple.

Then create the monitoring namespace:

# create namespace
kubectl create namespace monitoring

# verify creation
kubectl get namespaces | grep monitoring

Finally, configure the Bearer token secret to access vLLM metrics.

# create bearer token secret
kubectl create secret generic vllm-auth-token \
  --from-literal=token='"$MY_OVHAI_ACCESS_TOKEN"' \
  -n monitoring

# verify secret creation
kubectl get secret vllm-auth-token -n monitoring

# test token (optional)
kubectl get secret vllm-auth-token -n monitoring \
  -o jsonpath='{.data.token}' | base64 -d

Right, if everything is working, let’s move on to deployment.

Step 5 – Deploy Prometheus stack

Add the Prometheus Helm repository and install the monitoring stack. The deployment creates:

Prometheus StatefulSet with persistent storage
Grafana deployment with LoadBalancer access
Alertmanager for future alert configuration (optional)
Supporting components (node exporters, kube-state-metrics)

# add Helm repository
helm repo add prometheus-community \
  https://prometheus-community.github.io/helm-charts
helm repo update

# install monitoring stack
helm install monitoring prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --values values-prometheus.yaml \
  --wait

Then you can retrieve the LoadBalancer IP address to access Grafana:

kubectl get svc -n monitoring monitoring-grafana

Finally, open your browser to http:// and login with:

Username: admin
Password: as configured in your values-prometheus.yaml file

Step 6 – Create Grafana dashboards

In this step, you will be able to access Grafana interface and add your Prometheus as a new data source, then create a complete dashboard with different vLLM metrics.

1. Add a new data source in Grafana

First of all, create a new Prometheus connection inside Grafana:

Navigate to Connections → Data sources → Add data source
Select Prometheus
Configure URL: http://monitoring-prometheus:9090
Click Save & test

Now that your Prometheus has been configured as a new data source, you can create your Grafana dashboard.

2. Create your monitoring dashboard

To begin with, you can use the following pre-configured Grafana dashboard by downloading this JSON file locally:

In the left-hand menu, select Dashboard:

Navigate to Dashboards → Import
Upload the provided dashboard JSON
Select Prometheus as datasource
Click Import and select the vLLM-metrics-grafana-monitoring.json file

The dashboard provides real-time visibility for Ministral 3 14B deployed with vLLM container and OVHcloud AI Deploy.

You can now track:

Performance metrics: TTFT, inter-token latency, end-to-end latency
Throughput indicators: Requests per second, token generation rates
Resource utilisation: KV cache usage, active/waiting requests
Capacity indicators: Queue depth, preemption rates

Here are the key metrics tracked and displayed in the Grafana dashboard:

Metric Category	Prometheus Metric	Description	Use case
Latency	`vllm:time_to_first_token_seconds`	Time until first token generation	User experience monitoring
Latency	`vllm:inter_token_latency_seconds`	Time between tokens	Throughput optimisation
Latency	`vllm:e2e_request_latency_seconds`	End-to-end request time	SLA monitoring
Throughput	`vllm:request_success_total`	Successful requests counter	Capacity planning
Resource	`vllm:kv_cache_usage_perc`	KV cache memory usage	Memory management
Queue	`vllm:num_requests_running`	Active requests	Load monitoring
Queue	`vllm:num_requests_waiting`	Queued requests	Overload detection
Capacity	`vllm:num_preemptions_total`	Request preemptions	Peak load indicator
Tokens	`vllm:prompt_tokens_total`	Input tokens processed	Usage analytics
Tokens	`vllm:generation_tokens_total`	Output tokens generated	Cost tracking

Well done, you now have at your disposal:

An endpoint of the Ministral 3 14B model deployed with vLLM thanks to OVHcloud AI Deploy and its autoscaling strategies based on custom metrics
Prometheus for metrics collection and Grafana for visualisation/dashboards thanks to OVHcloud MKS

But how can you check that everything will work when the load increases?

Step 7 – Test autoscaling and real-time visualisation

The first objective here is to force AI Deploy to:

Increase vllm:num_requests_running
‘Saturate’ a single replica
Trigger the scale up
Observe replica increase + latency drop

1. Autoscaling testing strategy

The goal is to combine:

High concurrency
Long prompts (KVcache heavy)
Long generations
Bursty load

This is what vLLM autoscaling actually reacts to.

To do so, a Python code can simulate the expected behaviour:

import time
import threading
import random
from statistics import mean
from openai import OpenAI
from tqdm import tqdm

APP_URL = "https://.app.gra.ai.cloud.ovh.net/v1" # /!\ REPLACE THE  by yours /!\
MODEL = "mistralai/Ministral-3-14B-Instruct-2512"
API_KEY = $MY_OVHAI_ACCESS_TOKEN

CONCURRENT_WORKERS = 500          # concurrency (main scaling trigger)
REQUESTS_PER_WORKER = 25
MAX_TOKENS = 768                  # generation pressure

# some random prompts
SHORT_PROMPTS = [
    "Summarize the theory of relativity.",
    "Explain what a transformer model is.",
    "What is Kubernetes autoscaling?"
]

MEDIUM_PROMPTS = [
    "Explain how attention mechanisms work in transformer-based models, including self-attention and multi-head attention.",
    "Describe how vLLM manages KV cache and why it impacts inference performance."
]

LONG_PROMPTS = [
    "Write a very detailed technical explanation of how large language models perform inference, "
    "including tokenization, embedding lookup, transformer layers, attention computation, KV cache usage, "
    "GPU memory management, and how batching affects latency and throughput. Use examples.",
]

PROMPT_POOL = (
    SHORT_PROMPTS * 2 +
    MEDIUM_PROMPTS * 4 +
    LONG_PROMPTS * 6    # bias toward long prompts
)

# openai compliance
client = OpenAI(
    base_url=APP_URL,
    api_key=API_KEY,
)

# basic metrics
latencies = []
errors = 0
lock = threading.Lock()

# worker
def worker(worker_id):
    global errors
    for _ in range(REQUESTS_PER_WORKER):
        prompt = random.choice(PROMPT_POOL)

        start = time.time()
        try:
            client.chat.completions.create(
                model=MODEL,
                messages=[{"role": "user", "content": prompt}],
                max_tokens=MAX_TOKENS,
                temperature=0.7,
            )
            elapsed = time.time() - start

            with lock:
                latencies.append(elapsed)

        except Exception as e:
            with lock:
                errors += 1

# run
threads = []
start_time = time.time()

print("Starting autoscaling stress test...")
print(f"Concurrency: {CONCURRENT_WORKERS}")
print(f"Total requests: {CONCURRENT_WORKERS * REQUESTS_PER_WORKER}")

for i in range(CONCURRENT_WORKERS):
    t = threading.Thread(target=worker, args=(i,))
    t.start()
    threads.append(t)

for t in threads:
    t.join()

total_time = time.time() - start_time

# results
print("\n=== AUTOSCALING BENCH RESULTS ===")
print(f"Total requests sent: {len(latencies) + errors}")
print(f"Successful requests: {len(latencies)}")
print(f"Errors: {errors}")
print(f"Total wall time: {total_time:.2f}s")

if latencies:
    print(f"Avg latency: {mean(latencies):.2f}s")
    print(f"Min latency: {min(latencies):.2f}s")
    print(f"Max latency: {max(latencies):.2f}s")
    print(f"Throughput: {len(latencies)/total_time:.2f} req/s")

How can you verify that autoscaling is working and that the load is being handled correctly without latency skyrocketing?

2. Hardware and platform-level monitoring

First, AI Deploy Grafana answers ‘What resources are being used and how many replicas exist?‘.

GPU utilisation, GPU memory, CPU, RAM and replica count are monitored through OVHcloud AI Deploy Grafana (monitoring URL), which exposes infrastructure and runtime metrics for the AI Deploy application. This layer provides visibility into resource saturation and scaling events managed by the AI Deploy platform itself.

Access it using the following URL (do not forget to replace by yours): https://monitoring.gra.ai.cloud.ovh.net/d/app/app-monitoring?var-app=&orgId=1

For example, check GPU/RAM metrics:

You can also monitor scale ups and downs in real time, as well as information on HTTP calls and much more!

3. Software and application-level monitoring

Next the combination of MKS + Prometheus + Grafana answers ‘How the inference engine behaves internally’.

In fact, vLLM internal metrics (request concurrency, token throughput, latency indicators, KV cache pressure, etc.) are collected via the vLLM /metrics endpoint and scraped by Prometheus running on OVHcloud MKS, then visualised in a dedicated Grafana instance. This layer focuses on model behaviour and inference performance.

Find all these metrics via (just replace ): http:///d/vllm-ministral-monitoring/ministral-14b-vllm-metrics-monitoring?orgId=1

Find key metrics such as TTF, etc:

You can also find some information about ‘Model load and throughput’:

To go further and add even more metrics, you can refer to the vLLM documentation on ‘Prometheus and Grafana‘.

Conclusion

This reference architecture provides a scalable, and production-ready approach for deploying LLM inference on OVHcloud using AI Deploy and the autoscaling on custom metric feature.

OVHcloud MKS is dedicated to running Prometheus and Grafana, enabling secure scraping and visualisation of vLLM internal metrics exposed via the /metrics endpoint.

By scraping vLLM metrics securely from AI Deploy into Prometheus and exposing them through Grafana, the architecture provides full visibility into model behaviour, performance and load, enabling informed scaling analysis, troubleshooting and capacity planning in production environments.

Reference Architecture: build a sovereign n8n RAG workflow for AI agent using OVHcloud Public Cloud solutions

Eléa Petton — Tue, 27 Jan 2026 13:12:03 +0000

What if an n8n workflow, deployed in a sovereign environment, saved you time while giving you peace of mind? From document ingestion to targeted response generation, n8n acts as the conductor of your RAG pipeline without compromising data protection.

n8n workflow overview

In the current landscape of AI agents and knowledge assistants, connecting your internal documentation with Large Language Models (LLMs) is becoming a strategic differentiator.

How? By building Agentic RAG systems capable of retrieving, reasoning, and acting autonomously based on external knowledge.

To make this possible, engineers need a way to connect retrieval pipelines (RAG) with tool-based orchestration.

This article outlines a reference architecture for building a fully automated RAG pipeline orchestrated by n8n, leveraging OVHcloud AI Endpoints and PostgreSQL with pgvector as core components.

The final result will be a system that automatically ingests Markdown documentation from Object Storage, creates embeddings with OVHcloud’s BGE-M3 model available on AI Endpoints, and stores them in a Managed Database PostgreSQL with pgvector extension.

Lastly, you’ll be able to build an AI Agent that lets you chat with an LLM (GPT-OSS-120B on AI Endpoints). This agent, utilising the RAG implementation carried out upstream, will be an expert on OVHcloud products.

You can further improve the process by using an LLM guard to protect the questions sent to the LLM, and set up a chat memory to use conversation history for higher response quality.

But what about n8n?

n8n, the open-source workflow automation tool, offers many benefits and connects seamlessly with over 300 APIs, apps, and services:

Open-source: n8n is a 100% self-hostable solution, which means you retain full data control;
Flexible: combines low-code nodes and custom JavaScript/Python logic;
AI-ready: includes useful integrations for LangChain, OpenAI, and embedding support capabilities;
Composable: enables simple connections between data, APIs, and models in minutes;
Sovereign by design: compliant with privacy-sensitive or regulated sectors.

This reference architecture serves as a blueprint for building a sovereign, scalable Retrieval Augmented Generation (RAG) platform using n8n and OVHcloud Public Cloud solutions.

This setup shows how to orchestrate data ingestion, generate embedding, and enable conversational AI by combining OVHcloud Object Storage, Managed Databases with PostgreSQL, AI Endpoints and AI Deploy.The result? An AI environment that is fully integrated, protects privacy, and is exclusively hosted on OVHcloud’s European infrastructure.

Overview of the n8n workflow architecture for RAG

The workflow involves the following steps:

Ingestion: documentation in markdown format is fetched from OVHcloud Object Storage (S3);
Preprocessing: n8n cleans and normalises the text, removing YAML front-matter and encoding noise;
Vectorisation: Each document is embedded using the BGE-M3 model, which is available via OVHcloud AI Endpoints;
Persistence: vectors and metadata are stored in OVHcloud PostgreSQL Managed Database using pgvector;
Retrieval: when a user sends a query, n8n triggers a LangChain Agent that retrieves relevant chunks from the database;
Reasoning and actions: The AI Agent node combines LLM reasoning, memory, and tool usage to generate a contextual response or trigger downstream actions (Slack reply, Notion update, API call, etc.).

In this tutorial, all services are deployed within the OVHcloud Public Cloud.

Prerequisites

Before you start, double-check that you have:

an OVHcloud Public Cloud account
an OpenStack user with the following roles:
- Administrator
- AI Operator
- Object Storage Operator
An API key for AI Endpoints
ovhai CLI available – install the ovhai CLI
Hugging Face access – create a Hugging Face account and generate an access token

🚀 Now that you have everything you need, you can start building your n8n workflow!

Architecture guide: n8n agentic RAG workflow

You’re all set to configure and deploy your n8n workflow

⚙️ Keep in mind that the following steps can be completed using OVHcloud APIs!

Step 1 – Build the RAG data ingestion pipeline

This first step involves building the foundation of the entire RAG workflow by preparing the elements you need:

n8n deployment
Object Storage bucket creation
PostgreSQL database creation
and more

Remember to set up the proper credentials in n8n so the different elements can connect and function.

1. Deploy n8n on OVHcloud VPS

OVHcloud provides VPS solutions compatible with n8n. Get a ready-to-use virtual server with pre-installed n8n and start building automation workflows without manual setup. With plans ranging from 6 vCores / 12 GB RAM to 24 vCores / 96 GB RAM, you can choose the capacity that suits your workload.

How to set up n8n on a VPS?

Setting up n8n on an OVHcloud VPS generally involves:

Choosing and provisioning your OVHcloud VPS plan;
Connecting to your server via SSH and carrying out the initial server configuration, which includes updating the OS;
Installing n8n, typically with Docker (recommended for ease of management and updates), or npm by following this guide;
Configuring n8n with a domain name, SSL certificate for HTTPS, and any necessary environment variables for databases or settings.

While OVHcloud provides a robust VPS platform, you can find detailed n8n installation guides in the official n8n documentation.

Once the configuration is complete, you can configure the database and bucket in Object Storage.

2. Create Object Storage bucket

First, you have to set up your data source. Here you can store all your documentation in an S3-compatible Object Storage bucket.

Here, assume that all the documentation files are in Markdown format.

From OVHcloud Control Panel, create a new Object Storage container with S3-compatible API solution; follow this guide.

When the bucket is ready, add your Markdown documentation to it.

Note: For this tutorial, we’re using the various OVHcloud product documentation available in Open-Source on the GitHub repository maintained by OVHcloud members.

Click this link to access the repository.

How do you do that? Extract all the guide.en-gb.md files from the GitHub repository and rename each one to match its parent folder.

Example: the documentation about ovhai cli installation docs/pages/public_cloud/ai_machine_learning/cli_10_howto_install_cli/guide.en-gb.md is stored in ovhcloud-products-documentation-md bucket as cli_10_howto_install_cli.md

You should get an overview that looks like this:

Keep the following elements and create a new credential in n8n named OVHcloud S3 gra credentials:

S3 Endpoint: https://s3.gra.io.cloud.ovh.net/
Region: gra
Access Key ID:
Secret Access Key:

Then, create a new n8n node by selecting S3, then Get Multiple Files.
Configure this node as follows:

Connect the node to the previous one before moving on to the next step.

With the first phase done, you can now configure the vector DB.

3. Configure PostgreSQL Managed DB (pgvector)

In this step, you can set up the vector database that lets you store the embeddings generated from your documents.

How? By using OVHcloud’s managed databases, a pgvector extension of PostgreSQL. Go to your OVHcloud Control Panel and follow the steps.

1. Navigate to Databases & Analytics > Databases

2. Create a new database and select PostgreSQL and a datacenter location

3. Select Production plan and Instance type

4. Reset the user password and save it

5. Whitelist the IP of your n8n instance as follows

6. Take note of te following parameters

Make a note of this information and create a new credential in n8n named OVHcloud PGvector credentials:

Host:
Database: defaultdb
User: avnadmin
Password:
Port: 20184

Consider enabling the Ignore SSL Issues (Insecure) button as needed and setting the Maximum Number of Connections value to 1000.

✅ You’re now connected to the database! But what about the PGvector extension?

Add a PosgreSQL node in your n8n workflow Execute a SQL query, and create the extension through an SQL query, which should look like this:

-- drop table as needed
DROP TABLE IF EXISTS md_embeddings;

-- activate pgvector
CREATE EXTENSION IF NOT EXISTS vector;

-- create table
CREATE TABLE md_embeddings (
    id SERIAL PRIMARY KEY,
    text TEXT,
    embedding vector(1024),
    metadata JSONB
);

You should get this n8n node:

Finally, you can create a new table and name it md_embeddings using this node. Create a Stop and Error node if you run into errors setting up the table.

All set! Your vector DB is prepped and ready for data! Keep in mind, you still need an embeddings model for the RAG data ingestion pipeline.

4. Access to OVHcloud AI Endpoints

OVHcloud AI Endpoints is a managed service that provides ready-to-use APIs for AI models, including LLM, CodeLLM, embeddings, Speech-to-Text, and image models hosted within OVHcloud’s European infrastructure.

To vectorise the various documents in Markdown format, you have to select an embedding model: BGE-M3.

Usually, your AI Endpoints API key should already be created. If not, head to the AI Endpoints menu in your OVHcloud Control Panel to generate a new API key.

Once this is done, you can create new OpenAI credentials in your n8n.

Why do I need OpenAI credentials? Because AI Endpoints API is fully compatible with OpenAI’s, integrating it is simple and ensures the sovereignty of your data.

How? Thanks to a single endpoint https://oai.endpoints.kepler.ai.cloud.ovh.net/v1, you can request the different AI Endpoints models.

This means you can create a new n8n node by selecting Postgres PGVector Store and Add documents to Vector Store.
Set up this node as shown below:

Then configure the Data Loader with a custom text splitting and a JSON type.

For the text splitter, here are some options:

To finish, select the BGE-M3 embedding model from the model list and set the Dimensions to 1024.

You now have everything you need to build the ingestion pipeline.

5. Set up the ingestion pipeline loop

To make use of a fully automated document ingestion and vectorisation pipeline, you have to integrate some specific nodes, mainly:

a Loop Over Items that downloads each markdown file one by one so that it can be vectorised;
a Code in JavaScript that counts the number of files processed, which subsequently determines the number of requests sent to the embedding model;
an If condition that allows you to check when the 400 requests have been reached;
a Wait node that pauses after every 400 requests to avoid getting rate-limited;
an S3 block Download a file to download each markdown;
another Code in JavaScript to extract and process text from Markdown files by cleaning and removing special characters before sending it to the embeddings model;
a PostgreSQL node to Execute a SQL query to check that the table contains vectors after the process (loop) is complete.

5.1. Create a loop to process each documentation file

Begin by creating a Loop Over Items to process all the Markdown files one at a time. Set the batch size to 1 in this loop.

Add the Loop statement right after the S3 Get Many Files node as shown below:

Time to put the loop’s content into action!

5.2. Count the number of files using a code snippet

Next, choose the Code in JavaScript node from the list to see how many files have been processed. Set “Run Once for Each Item” Mode and “JavaScript” code Language, then add the following code snippet to the designated block.

// simple counter per item
const counter = $runIndex + 1;

return {
  counter
};

Make sure this code snippet is included in the loop.

You can start adding the if part to the loop now.

5.3. Add a condition that applies a rule every 400 requests

Here, you need to create an If node and add the following condition, which you have set as an expression.

{{ (Number($json["counter"]) % 400) === 0 }}

Add it immediately after counting the files:

If this condition is true, trigger the Wait node.

5.4. Insert a pause after each set of 400 requests

Then insert a Wait node to pause for a few seconds before resuming. You can insert Resume “After Time Interval” and set the Wait Amount to “60:00” seconds.

Link it to the If condition when this is True.

Next, you can go ahead and download the Markdown file, and then process it.

5.5. Launch documentation download

To do this, create a new Download a file S3 node and configure it with this File Key expression:

{{ $('Process each documentation file').item.json.Key }}

Want to connect it? That’s easy, link it to the output of the Wait and If statements when the ‘if’ statement returns False; this will allow the file to be processed only if the rate limit is not exceeded.

You’re almost done! Now you need to extract and process the text from the Markdown files – clean and remove any special characters before sending it to the embedding model.

5.6 Clean Markdown text content

Next, create another Code in JavaScript to process text from Markdown files:

// extract binary content
const binary = $input.item.binary.data;

// decoding into clean UTF-8 text
let text = Buffer.from(binary.data, 'base64').toString('utf8');

// cleaning - remove non-printable characters
text = text
  .replace(/[^\x09\x0A\x0D\x20-\x7EÀ-ÿ€£¥•–—‘’“”«»©®™°±§¶÷×]/g, ' ')
  .replace(/\s{2,}/g, ' ')
  .trim();

// check lenght
if (text.length > 14000) {
  text = text.slice(0, 14000);
}

return [{
  text,
  fileName: binary.fileName,
  mimeType: binary.mimeType
}];

Select the “Run Once for Each Item” Mode and place the previous code in the dedicated JavaScript block.

To finish, check that the output text has been sent to the document vectorisation system, which was set up in Step 3 – Configure PostgreSQL Managed DB (pgvector).

How do I confirm that the table contains all elements after vectorisation?

5.7 Double-check that the documents are in the table

To confirm that your RAG system is working, make sure your vector database has different vectors; use a PostgreSQL node with Execute a SQL query in your n8n workflow.

Then, run the following query:

-- count the number of elements
SELECT COUNT(*) FROM md_embeddings;

Next, link this element to the Done section of your Loop, so the elements are counted when the process is complete.

Congrats! You can now run the workflow to begin ingesting documents.

Click the Execute workflow button and wait until the vectorization process is complete.

Remember, everything should be green when it’s finished ✅.

Step 2 – RAG chatbot

With the data ingestion and vectorisation steps completed, you can now begin implementing your AI agent.

This involves building a RAG-based AI Agent by simply starting a chat with an LLM.

1. Set up the chat box to start a conversation

First, configure your AI Agent based on the RAG system, and add a new node in the same n8n workflow: Chat Trigger.

This node will allow you to interact directly with your AI agent! But before that, you need to check that your message is safe.

This node will allow you to interact directly with your AI agent! But before that, you need to check that your message is secure.

2. Set up your LLM Guard with AI Deploy

To check whether a message is secure or not, use an LLM Guard.

What’s an LLM Guard? This is a safety and control layer that sits between users and an LLM, or between the LLM and an external connection. Its main goal is to filter, monitor, and enforce rules on what goes into or comes out of the model 🔐.

You can use AI Deploy from OVHcloud to deploy your desired LLM guard. With a single command line, this AI solution lets you deploy a Hugging Face model using vLLM Docker containers.

For more details, please refer to this blog.

For the use case covered in this article, you can use the open-source model meta-llama/Llama-Guard-3-8B available on Hugging Face.

2.1 Create a Bearer token to request your custom AI Deploy endpoint

Create a token to access your AI Deploy app once it’s deployed.

ovhai token create --role operator ai_deploy_token=my_operator_token

The following output is returned:

Id: 47292486-fb98-4a5b-8451-600895597a2b Created At: 20-10-25 8:53:05 Updated At: 20-10-25 8:53:05 Spec: Name: ai_deploy_token=my_operator_token Role: AiTrainingOperator Label Selector: Status: Value: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Version: 1

You can now store and export your access token to add it as a new credential in n8n.

export MY_OVHAI_ACCESS_TOKEN=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

2.1 Start Llama Guard 3 model with AI Deploy

Using ovhai CLI, launch the following command and vLLM start inference server.

ovhai app run \
	--name vllm-llama-guard3 \
        --default-http-port 8000 \
        --gpu 1 \
	--flavor l40s-1-gpu \
        --label ai_deploy_token=my_operator_token \
	--env OUTLINES_CACHE_DIR=/tmp/.outlines \
	--env HF_TOKEN=$MY_HF_TOKEN \
	--env HF_HOME=/hub \
	--env HF_DATASETS_TRUST_REMOTE_CODE=1 \
	--env HF_HUB_ENABLE_HF_TRANSFER=0 \
	--volume standalone:/workspace:RW \
	--volume standalone:/hub:RW \
	vllm/vllm-openai:v0.10.1.1 \
	-- bash -c python3 -m vllm.entrypoints.openai.api_server                       
                           --model meta-llama/Llama-Guard-3-8B \                     
                           --tensor-parallel-size 1 \                     
                           --dtype bfloat16

Full command explained:

ovhai app run

This is the core command to run an app using the OVHcloud AI Deploy platform.

--name vllm-llama-guard3

Sets a custom name for the job. For example, vllm-llama-guard3.

--default-http-port 8000

Exposes port 8000 as the default HTTP endpoint. vLLM server typically runs on port 8000.

--gpu 1
--flavor l40s-1-gpu

Allocates 1 GPU L40S for the app. You can adjust the GPU type and number depending on the model you have to deploy.

--volume standalone:/workspace:RW
--volume standalone:/hub:RW

Mounts two persistent storage volumes: /workspace which is the main working directory and /hub to store Hugging Face model files.

--env OUTLINES_CACHE_DIR=/tmp/.outlines
--env HF_TOKEN=$MY_HF_TOKEN
--env HF_HOME=/hub
--env HF_DATASETS_TRUST_REMOTE_CODE=1
--env HF_HUB_ENABLE_HF_TRANSFER=0

These are Hugging Face environment variables you have to set. Please export your Hugging Face access token as environment variable before starting the app: export MY_HF_TOKEN=***********

vllm/vllm-openai:v0.10.1.1

Use the vllm/vllm-openai Docker image (a pre-configured vLLM OpenAI API server).

-- bash -c python3 -m vllm.entrypoints.openai.api_server --model meta-llama/Llama-Guard-3-8B \ --tensor-parallel-size 1 \ --dtype bfloat16

Finally, run a bash shell inside the container and executes a Python command to launch the vLLM API server.

2.2 Check to confirm your AI Deploy app is RUNNING

Replace the by yours.

ovhai app get

You should get:

History: DATE STATE 20-1O-25 09:58:00 QUEUED 20-10-25 09:58:01 INITIALIZING 04-04-25 09:58:07 PENDING 04-04-25 10:03:10 RUNNING Info: Message: App is running

2.3 Create a new n8n credential with AI Deploy app URL and Bearer access token

First, using your , retrieve your AI Deploy app URL.

ovhai app get  -o json | jq '.status.url' -r

Then, create a new OpenAI credential from your n8n workflow, using your AI Deploy URL and the Bearer token as an API key.

Don’t forget to replace 6e10e6a5-2862-4c82-8c08-26c458ca12c7 with your .

2.4 Create the LLM Guard node in n8n workflow

Create a new OpenAI node to Message a model and select the new AI Deploy credential for LLM Guard usage.

Next, create the prompt as follows:

{{ $('Chat with the OVHcloud product expert').item.json.chatInput }}

Then, use an If node to determine if the scenario is safe or unsafe:

If the message is unsafe, send an error message right away to stop the workflow.

But if the message is safe, you can send the request to the AI Agent without issues 🔐.

3. Set up AI Agent

The AI Agent node in n8n acts as an intelligent orchestration layer that combines LLMs, memory, and external tools within an automated workflow.

It allows you to:

Connect a Large Language Model using APIs (e.g., LLMs from AI Endpoints);
Use tools such as HTTP requests, databases, or RAG retrievers so the agent can take actions or fetch real information;
Maintain conversational memory via PostgreSQL databases;
Integrate directly with chat platforms (e.g., Slack, Teams) for interactive assistants (optional).

Simply put, n8n becomes an agentic automation framework, enabling LLMs to not only provide answers, but also think, choose, and perform actions.

Please note that you can change and customise this n8n AI Agent node to fit your use cases, using features like function calling or structured output. This is the most basic configuration for the given use case. You can go even further with different agents.

🧑‍💻 How do I implement this RAG?

First, create an AI Agent node in n8n as follows:

Then, a series of steps are required, the first of which is creating prompts.

3.1 Create prompts

In the AI Agent node on your n8n workflow, edit the user and system prompts.

Begin by creating the prompt, which is also the user message:

{{ $('Chat with the OVHcloud product expert').item.json.chatInput }}

Then create the System Message as shown below:

You have access to a retriever tool connected to a knowledge base.  
Before answering, always search for relevant documents using the retriever tool.  
Use the retrieved context to answer accurately.  
If no relevant documents are found, say that you have no information about it.

You should get a configuration like this:

🤔 Well, an LLM is now needed for this to work!

3.2 Select LLM using AI Endpoints API

First, add an OpenAI Chat Model node, and then set it as the Chat Model for your agent.

Next, select one of the OVHcloud AI Endpoints from the list provided, because they are compatible with Open AI APIs.

✅ How? By using the right API https://oai.endpoints.kepler.ai.cloud.ovh.net/v1

The GPT OSS 120B model has been selected for this use case. Other models, such as Llama, Mistral, and Qwen, are also available.

⚠️ WARNING ⚠️

If you are using a recent version of n8n, you will likely encounter the /responses issue (linked to OpenAI compatibility). To resolve this, you will need to disable the button Use Responses API and everything will work correctly

Tips to fix /responses issue

Your LLM is now set to answer your questions! Don’t forget, it needs access to the knowledge base.

3.3 Connect the knowledge base to the RAG retriever

As usual, the first step is to create an n8n node called PGVector Vector Store node and enter your PGvector credentials.

Next, link this element to the Tools section of the AI Agent node.

Remember to connect your PG vector database so that the retriever can access the previously generated embeddings. Here’s an overview of what you’ll get.

⏳Nearly done! The final step is to add the database memory.

3.4 Manage conversation history with database memory

Creating Database Memory node in n8n (PostgreSQL) lets you link it to your AI Agent, so it can store and retrieve past conversation history. This enables the model to remember and use context from multiple interactions.

So link this PostgreSQL database to the Memory section of your AI agent.

Congrats! 🥳 Your n8n RAG workflow is now complete. Ready to test it?

4. Make the most of your automated workflow

Want to try it? It’s easy!

By clicking the orange Open chat button, you can ask the AI agent questions about OVHcloud products, particularly where you need technical assistance.

For example, you can ask the LLM about rate limits in OVHcloud AI Endpoints and get the information in seconds.

You can now build your own autonomous RAG system using OVHcloud Public Cloud, suited for a wide range of applications.

What’s next?

To sum up, this reference architecture provides a guide on using n8n with OVHcloud AI Endpoints, AI Deploy, Object Storage, and PostgreSQL + pgvector to build a fully controlled, autonomous RAG AI system.

Teams can build scalable AI assistants that work securely and independently in their cloud environment by orchestrating ingestion, embedding generation, vector storage, retrieval, and LLM safety check, and reasoning within a single workflow.

With the core architecture in place, you can add more features to improve the capabilities and robustness of your agentic RAG system:

Web search
Images with OCR
Audio files transcribed using the Whisper model

This delivers an extensive knowledge base and a wider variety of use cases!