<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Open Source Archives - OVHcloud Blog</title>
	<atom:link href="https://blog.ovhcloud.com/tag/open-source/feed/" rel="self" type="application/rss+xml" />
	<link>https://blog.ovhcloud.com/tag/open-source/</link>
	<description>Innovation for Freedom</description>
	<lastBuildDate>Tue, 10 Feb 2026 08:51:12 +0000</lastBuildDate>
	<language>en-GB</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>https://blog.ovhcloud.com/wp-content/uploads/2019/07/cropped-cropped-nouveau-logo-ovh-rebranding-32x32.gif</url>
	<title>Open Source Archives - OVHcloud Blog</title>
	<link>https://blog.ovhcloud.com/tag/open-source/</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Reference Architecture: Custom metric autoscaling for LLM inference with vLLM on OVHcloud AI Deploy and observability using MKS</title>
		<link>https://blog.ovhcloud.com/reference-architecture-custom-metric-autoscaling-for-llm-inference-with-vllm-on-ovhcloud-ai-deploy-and-observability-using-mks/</link>
		
		<dc:creator><![CDATA[Eléa Petton]]></dc:creator>
		<pubDate>Tue, 10 Feb 2026 08:51:11 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[AI Deploy]]></category>
		<category><![CDATA[Kubernetes]]></category>
		<category><![CDATA[LLM]]></category>
		<category><![CDATA[MKS]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[OVHcloud]]></category>
		<category><![CDATA[prometheus]]></category>
		<category><![CDATA[Public Cloud]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=30203</guid>

					<description><![CDATA[Take your LLM (Large Language Model) deployment to production level with comprehensive custom autoscaling configuration and advanced vLLM metrics observability. This reference architecture describes a comprehensive solution for deploying, autoscaling and monitoring vLLM-based LLM workloads on OVHcloud infrastructure. It combinesAI Deploy, used for model serving with custom metric autoscaling, and Managed Kubernetes Service (MKS), which [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Freference-architecture-custom-metric-autoscaling-for-llm-inference-with-vllm-on-ovhcloud-ai-deploy-and-observability-using-mks%2F&amp;action_name=Reference%20Architecture%3A%20Custom%20metric%20autoscaling%20for%20LLM%20inference%20with%20vLLM%20on%20OVHcloud%20AI%20Deploy%20and%20observability%20using%20MKS&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p><em><strong>Take your LLM (Large Language Model) deployment to production level with comprehensive custom autoscaling configuration and advanced vLLM metrics observability.</strong></em></p>



<figure class="wp-block-image aligncenter size-large"><img fetchpriority="high" decoding="async" width="1024" height="538" src="https://blog.ovhcloud.com/wp-content/uploads/2026/02/3-1024x538.jpg" alt="" class="wp-image-30579" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/02/3-1024x538.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/02/3-300x158.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/02/3-768x403.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/02/3.jpg 1200w" sizes="(max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption"><em>vLLM metrics monitoring and observability based on OVHcloud infrastructure</em></figcaption></figure>



<p>This reference architecture describes a comprehensive solution for <strong>deploying, autoscaling and monitoring vLLM-based LLM workloads</strong> on OVHcloud infrastructure. It combines<strong>AI Deploy</strong>, used for <strong>model serving with custom metric autoscaling</strong>, and <strong>Managed Kubernetes Service (MKS)</strong>, which hosts the monitoring and observability stack.</p>



<p>By leveraging <strong>application-level Prometheus metrics exposed by vLLM</strong>, AI Deploy can automatically scale inference replicas based on real workload demand, ensuring <strong>high availability, consistent performance under load and efficient GPU utilisation</strong>. This autoscaling mechanism allows the platform to react dynamically to traffic spikes while maintaining predictable latency for end users.</p>



<p>On top of this scalable inference layer, the monitoring architecture provides <strong>observability</strong> through <strong>Prometheus</strong>, <strong>Grafana</strong> and Alertmanager. It enables real-time performance monitoring, capacity planning, and operational insights, while ensuring <strong>full data sovereignty</strong> for organisations running Large Language Models (LLMs) in production environments.</p>



<p><strong>What are the key benefits</strong>?</p>



<ul class="wp-block-list">
<li><strong>Cost-effective</strong>: Leverage managed services to minimise operational overhead</li>



<li><strong>Real-time observability</strong>: Track Time-to-First-Token (TTFT), throughput, and resource utilisation</li>



<li><strong>Sovereign infrastructure</strong>: All metrics and data remain within European datacentres</li>



<li><strong>Production-ready</strong>: Persistent storage, high availability, and automated monitoring</li>
</ul>



<h2 class="wp-block-heading">Context</h2>



<h3 class="wp-block-heading">AI Deploy</h3>



<p>OVHcloud AI Deploy is a<strong>&nbsp;Container as a Service</strong>&nbsp;(CaaS) platform designed to help you deploy, manage and scale AI models. It provides a solution that allows you to optimally deploy your applications/APIs based on Machine Learning (ML), Deep Learning (DL) or Large Language Models (LLMs).</p>



<p><strong>Key points to keep in mind</strong>:</p>



<ul class="wp-block-list">
<li><strong>Easy to use:</strong>&nbsp;Bring your own custom Docker image and deploy it in a command line or a few clicks surely</li>



<li><strong>High-performance computing:</strong>&nbsp;A complete range of GPUs available (H100, A100, V100S, L40S and L4)</li>



<li><strong>Scalability and flexibility:</strong>&nbsp;Supports automatic scaling, allowing your model to effectively handle fluctuating workloads</li>



<li><strong>Cost-efficient:</strong>&nbsp;Billing per minute, no surcharges</li>
</ul>



<h3 class="wp-block-heading">Managed Kubernetes Service</h3>



<p><strong>OVHcloud MKS</strong> is a fully managed Kubernetes platform designed to help you deploy, operate, and scale containerised applications in production. It provides a secure and reliable Kubernetes environment without the operational overhead of managing the control plane.</p>



<p><strong>What should you keep in mind?</strong></p>



<ul class="wp-block-list">
<li><strong>Cost-efficient</strong>: Only pay for worker nodes and consumed resources, with no additional charge for the Kubernetes control plane</li>



<li><strong>Fully managed Kubernetes</strong>: Certified upstream Kubernetes with automated control plane management, upgrades and high availability</li>



<li><strong>Production-ready by design</strong>: Built-in integrations with OVHcloud Load Balancers, networking and persistent storage</li>



<li><strong>Scalability and flexibility</strong>: Easily scale workloads and node pools to match application demand</li>



<li><strong>Open and portable</strong>: Based on standard Kubernetes APIs, enabling seamless integration with open-source ecosystems and avoiding vendor lock-in</li>
</ul>



<p>In the following guide, all services are deployed within the&nbsp;<strong>OVHcloud Public Cloud</strong>.</p>



<h2 class="wp-block-heading">Overview of the architecture</h2>



<p>This reference architecture describes a <strong>complete, secure and scalable solution</strong> to:</p>



<ul class="wp-block-list">
<li>Deploy an LLM with vLLM and <strong>AI Deploy</strong>, benefiting from automatic scaling based on custom metrics to ensure high service availability &#8211; vLLM exposes <code><mark class="has-inline-color has-ast-global-color-0-color"><strong>/metrics</strong></mark></code> via its public HTTPS endpoint on AI Deploy</li>



<li>Collect, store and visualise these vLLM metrics using Prometheus and Grafana on <strong>MKS</strong></li>
</ul>



<figure class="wp-block-image aligncenter size-full"><img decoding="async" width="1200" height="630" src="https://blog.ovhcloud.com/wp-content/uploads/2026/02/1.jpg" alt="" class="wp-image-30578" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/02/1.jpg 1200w, https://blog.ovhcloud.com/wp-content/uploads/2026/02/1-300x158.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/02/1-1024x538.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/02/1-768x403.jpg 768w" sizes="(max-width: 1200px) 100vw, 1200px" /><figcaption class="wp-element-caption"><em>vLLM metrics monitoring and observability architecture overview</em></figcaption></figure>



<p>Here you will find the main components of the architecture. The solution comprises three main layers:</p>



<ol class="wp-block-list">
<li><strong>Model serving layer</strong> with AI Deploy
<ul class="wp-block-list">
<li>vLLM containers running on top of GPUs for LLM inference</li>



<li>vLLM inference server exposing Prometheus metrics</li>



<li>Automatic scaling based on custom metrics to ensure high availability</li>



<li>HTTPS endpoints with Bearer token authentication</li>
</ul>
</li>



<li><strong>Monitoring and observability infrastructure</strong> using Kubernetes
<ul class="wp-block-list">
<li>Prometheus for metrics collection and storage</li>



<li>Grafana for visualisation and dashboards</li>



<li>Persistent volume storage for long-term retention</li>
</ul>
</li>



<li><strong>Network layer</strong>
<ul class="wp-block-list">
<li>Secure HTTPS communication between components</li>



<li>OVHcloud LoadBalancer for external access</li>
</ul>
</li>
</ol>



<p>To go further, some prerequisites must be checked!</p>



<h2 class="wp-block-heading">Prerequisites</h2>



<p>Before you begin, ensure you have:</p>



<ul class="wp-block-list">
<li>An&nbsp;<strong>OVHcloud Public Cloud</strong>&nbsp;account</li>



<li>An&nbsp;<strong>OpenStack user</strong>&nbsp;with the<a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-users?id=kb_article_view&amp;sysparm_article=KB0048170" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"> </a><strong><code><mark class="has-inline-color has-ast-global-color-0-color">Administrator</mark></code></strong> role</li>



<li><strong>ovhai CLI available</strong> &#8211;&nbsp;<em>install the&nbsp;<a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-cli-install-client?id=kb_article_view&amp;sysparm_article=KB0047844" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">ovhai CLI</a></em></li>



<li>A <strong>Hugging Face access</strong> &#8211; <em>create a&nbsp;<a href="https://huggingface.co/" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">Hugging Face account</a>&nbsp;and generate an&nbsp;<a href="https://huggingface.co/settings/tokens" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">access token</a></em></li>



<li><code><strong><mark class="has-inline-color has-ast-global-color-0-color">kubectl</mark></strong></code> installed and <code><strong><mark class="has-inline-color has-ast-global-color-0-color">helm</mark></strong></code> installed (at least version 3.x)</li>
</ul>



<p><strong>🚀 Now you have all the ingredients for our recipe, it’s time to deploy the Ministral 14B using AI Deploy and vLLM Docker container!</strong></p>



<h2 class="wp-block-heading">Architecture guide: From autoscaling to observability for LLMs served by vLLM</h2>



<p>Let’s set up and deploy this architecture!</p>



<figure class="wp-block-image aligncenter size-large"><img decoding="async" width="1024" height="538" src="https://blog.ovhcloud.com/wp-content/uploads/2026/02/2-1024x538.jpg" alt="" class="wp-image-30580" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/02/2-1024x538.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/02/2-300x158.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/02/2-768x403.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/02/2.jpg 1200w" sizes="(max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption"><em>Overview of the deployment workflow</em></figcaption></figure>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p><strong>✅ <em>Note</em></strong></p>



<p><strong><em>In this example, <a href="https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">mistralai/Ministral-3-14B-Instruct-2512</a> is used. Choose the open-source model of your choice and follow the same steps, adapting the model slug (from Hugging Face), the versions and the GPU(s) flavour.</em></strong></p>
</blockquote>



<p><em>Remember that all of the following steps can be automated using OVHcloud APIs!</em></p>



<h3 class="wp-block-heading">Step 1 &#8211; Manage access tokens</h3>



<p>Before introducing the monitoring stack, this architecture starts with the <strong>deployment of the <strong>Ministral 3 14B</strong> on OVHcloud AI Deploy</strong>, configured to <strong>autoscale based on custom Prometheus metrics exposed by vLLM itself</strong>.</p>



<p>Export your&nbsp;<a href="https://huggingface.co/settings/tokens" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">Hugging Face token</a>.</p>



<pre class="wp-block-code"><code class="">export MY_HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxx</code></pre>



<p><a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-cli-app-token?id=kb_article_view&amp;sysparm_article=KB0035280" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">Create a Bearer token</a>&nbsp;to access your AI Deploy app once it&#8217;s been deployed.</p>



<pre class="wp-block-code"><code class="">ovhai token create --role operator ai_deploy_token=my_operator_token</code></pre>



<p>Returning the following output:</p>



<p><code><strong>Id: 47292486-fb98-4a5b-8451-600895597a2b<br>Created At: 20-01-26 11:53:05<br>Updated At: 20-01-26 11:53:05<br>Spec:<br>Name: ai_deploy_token=my_operator_token<br>Role: AiTrainingOperator<br>Label Selector:<br>Status:<br>Value: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX<br>Version: 1</strong></code></p>



<p>You can now store and export your access token:</p>



<pre class="wp-block-code"><code class="">export MY_OVHAI_ACCESS_TOKEN=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX</code></pre>



<h3 class="wp-block-heading">Step 2 &#8211; LLM deployment using AI Deploy</h3>



<p>Before introducing the monitoring stack, this architecture starts with the <strong>deployment of the <strong>Ministral 3 14B</strong> on OVHcloud AI Deploy</strong>, configured to <strong>autoscale based on custom Prometheus metrics exposed by vLLM itself</strong>.</p>



<h4 class="wp-block-heading">1. Define the targeted vLLM metric for autoscaling</h4>



<p>Before proceeding with the deployment of the <strong>Ministral 3 14B</strong> endpoint, you have to choose the metric you want to use as the trigger for scaling.</p>



<p>Instead of relying solely on CPU/RAM utilisation, AI Deploy allows autoscaling decisions to be driven by <strong>application-level signals</strong>.</p>



<p>To do this, you can consult the <a href="https://docs.vllm.ai/en/latest/design/metrics/#v1-metrics" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">metrics exposed by vLLM</a>.</p>



<p>In this example, you can use a basic metric such as <code><mark class="has-inline-color has-ast-global-color-0-color"><strong>vllm:num_requests_running</strong></mark></code> to scale the number of replicas based on <strong>real inference load</strong>.</p>



<p>This enables:</p>



<ul class="wp-block-list">
<li>Faster reaction to traffic spikes</li>



<li>Better GPU utilisation</li>



<li>Reduced inference latency under load</li>



<li>Cost-efficient scaling</li>
</ul>



<p>Finally, the configuration chosen for scaling this application is as follows:</p>



<figure class="wp-block-table"><table class="has-fixed-layout"><thead><tr><th>Parameter</th><th>Value</th><th>Description</th></tr></thead><tbody><tr><td>Metric source</td><td><code>/metrics</code></td><td>vLLM Prometheus endpoint</td></tr><tr><td>Metric name</td><td><code>vllm:num_requests_running</code></td><td>Number of in-flight requests</td></tr><tr><td>Aggregation</td><td><code>AVERAGE</code></td><td>Mean across replicas</td></tr><tr><td>Target value</td><td><code>50</code></td><td>Desired load per replica</td></tr><tr><td>Min replicas</td><td><code>1</code></td><td>Baseline capacity</td></tr><tr><td>Max replicas</td><td><code>3</code></td><td>Burst capacity</td></tr></tbody></table></figure>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p><strong>✅ <em>Note</em></strong></p>



<p><em><strong>You can choose the metric that best suits your use case. You can also apply a patch to your AI Deploy deployment at any time to change the target metric for scaling</strong></em>.</p>
</blockquote>



<p>When the <strong>average number of running requests exceeds 50</strong>, AI Deploy automatically provisions <strong>additional GPU-backed replicas</strong>.</p>



<h4 class="wp-block-heading">2. Deploy Ministral 3 14B using AI Deploy</h4>



<p>Now you can deploy the LLM using the <strong><code>ovhai</code> CLI</strong>.</p>



<p>Key elements necessary for proper functioning:</p>



<ul class="wp-block-list">
<li>GPU-based inference: <strong><code><mark class="has-inline-color has-ast-global-color-0-color">1 x H100</mark></code></strong></li>



<li>vLLM OpenAI-compatible Docker image: <a href="https://hub.docker.com/r/vllm/vllm-openai/tags" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong><code><mark class="has-inline-color has-ast-global-color-0-color">vllm/vllm-openai:v0.13.0</mark></code></strong></a></li>



<li>Custom autoscaling rules based on Prometheus metrics: <code><strong><mark class="has-inline-color has-ast-global-color-0-color">vllm:num_requests_running</mark></strong></code></li>
</ul>



<p>Below is the reference command used to deploy the <strong><a href="https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">mistralai/Ministral-3-14B-Instruct-2512</a></strong>:</p>



<pre class="wp-block-code"><code class="">ovhai app run \<br>  --name vllm-ministral-14B-autoscaling-custom-metric \<br>  --default-http-port 8000 \<br>  --label ai_deploy_token=my_operator_token \<br>  --gpu 1 \<br>  --flavor h100-1-gpu \<br>  -e OUTLINES_CACHE_DIR=/tmp/.outlines \<br>  -e HF_TOKEN=$MY_HF_TOKEN \<br>  -e HF_HOME=/hub \<br>  -e HF_DATASETS_TRUST_REMOTE_CODE=1 \<br>  -e HF_HUB_ENABLE_HF_TRANSFER=0 \<br>  -v standalone:/hub:rw \<br>  -v standalone:/workspace:rw \<br>  --liveness-probe-path /health \<br>  --liveness-probe-port 8000 \<br>  --liveness-initial-delay-seconds 300 \<br>  --probe-path /v1/models \<br>  --probe-port 8000 \<br>  --initial-delay-seconds 300 \<br>  --auto-min-replicas 1 \<br>  --auto-max-replicas 3 \<br>  --auto-custom-api-url "http://&lt;SELF&gt;:8000/metrics" \<br>  --auto-custom-metric-format PROMETHEUS \<br>  --auto-custom-value-location vllm:num_requests_running \<br>  --auto-custom-target-value 50 \<br>  --auto-custom-metric-aggregation-type AVERAGE \<br>  vllm/vllm-openai:v0.13.0 \<br>  -- bash -c "python3 -m vllm.entrypoints.openai.api_server \<br>    --model mistralai/Ministral-3-14B-Instruct-2512 \<br>    --tokenizer_mode mistral \<br>    --load_format mistral \<br>    --config_format mistral \<br>    --enable-auto-tool-choice \<br>    --tool-call-parser mistral \<br>    --enable-prefix-caching"</code></pre>



<p>How to understand the different parameters of this command?</p>



<h5 class="wp-block-heading"><strong>a. Start your AI Deploy app</strong></h5>



<p>Launch a new app using&nbsp;<a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-cli-install-client?id=kb_article_view&amp;sysparm_article=KB0047844" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">ovhai CLI</a>&nbsp;and name it.</p>



<p><code><strong>ovhai app run --name vllm-ministral-14B-autoscaling-custom-metric</strong></code></p>



<h5 class="wp-block-heading"><strong>b. Define access</strong></h5>



<p>Define the HTTP API port and restrict access to your token.</p>



<p><strong><code>--default-http-port 8000</code><br><code>--label ai_deploy_token=my_operator_token</code></strong></p>



<h5 class="wp-block-heading"><strong>c. Configure GPU resources</strong></h5>



<p>Specify the hardware type (<code><strong>h100-1-gpu</strong></code>), which refers to an&nbsp;<strong>NVIDIA H100 GPU</strong>&nbsp;and the number (<strong>1</strong>).</p>



<p><code><strong>--gpu 1<br>--flavor h100-1-gpu</strong></code></p>



<p><strong><mark>⚠️WARNING!</mark></strong>&nbsp;For this model, one H100 is sufficient, but if you want to deploy another model, you will need to check which GPU you need. Note that you can also access L40S and A100 GPUs for your LLM deployment.</p>



<h5 class="wp-block-heading"><strong>d. Set up environment variables</strong></h5>



<p>Configure caching for the&nbsp;<strong>Outlines library</strong>&nbsp;(used for efficient text generation):</p>



<p><code><strong>-e OUTLINES_CACHE_DIR=/tmp/.outlines</strong></code></p>



<p>Pass the&nbsp;<strong>Hugging Face token</strong>&nbsp;(<code>$MY_HF_TOKEN</code>) for model authentication and download:</p>



<p><code><strong>-e HF_TOKEN=$MY_HF_TOKEN</strong></code></p>



<p>Set the&nbsp;<strong>Hugging Face cache directory</strong>&nbsp;to&nbsp;<code>/hub</code>&nbsp;(where models will be stored):</p>



<p><code><strong>-e HF_HOME=/hub</strong></code></p>



<p>Allow execution of&nbsp;<strong>custom remote code</strong>&nbsp;from Hugging Face datasets (required for some model behaviours):</p>



<p><code><strong>-e HF_DATASETS_TRUST_REMOTE_CODE=1</strong></code></p>



<p>Disable&nbsp;<strong>Hugging Face Hub transfer acceleration</strong>&nbsp;(to use standard model downloading):</p>



<p><code><strong>-e HF_HUB_ENABLE_HF_TRANSFER=0</strong></code></p>



<h5 class="wp-block-heading"><strong>e. Mount persistent volumes</strong></h5>



<p>Mount&nbsp;<strong>two persistent storage volumes</strong>:</p>



<ol class="wp-block-list">
<li><code>/hub</code>&nbsp;→ Stores Hugging Face model files</li>



<li><code>/workspace</code>&nbsp;→ Main working directory</li>
</ol>



<p>The&nbsp;<code>rw</code>&nbsp;flag means&nbsp;<strong>read-write access</strong>.</p>



<p><code><strong>-v standalone:/hub:rw<br>-v standalone:/workspace:rw</strong></code></p>



<h5 class="wp-block-heading"><strong>f. Health checks and readiness</strong></h5>



<p>Configure <strong>liveness and readiness probes</strong>:</p>



<ol class="wp-block-list">
<li><code>/health</code> verifies the container is alive</li>



<li><code>/v1/models</code> confirms the model is loaded and ready to serve requests</li>
</ol>



<p>The long initial delays (300 seconds) can be reduced; they correspond to the startup time of vLLM and the loading of the model on the GPU.</p>



<p><code><strong>--liveness-probe-path /health<br>--liveness-probe-port 8000<br>--liveness-initial-delay-seconds 300<br><br>--probe-path /v1/models<br>--probe-port 8000<br>--initial-delay-seconds 300</strong></code></p>



<h5 class="wp-block-heading"><strong>g. Autoscaling configuration (custom metrics)</strong></h5>



<p>First set the minimum and maximum number of replicas.</p>



<p><strong><code>--auto-min-replicas 1<br>--auto-max-replicas 3</code></strong></p>



<p>This guarantees basic availability (one replica always up) while allowing for peak capacity.</p>



<p>Then enable autoscaling based on application-level metrics exposed by vLLM.</p>



<p><strong><code>--auto-custom-api-url "http://&lt;SELF&gt;:8000/metrics"<br>--auto-custom-metric-format PROMETHEUS<br>--auto-custom-value-location vllm:num_requests_running<br>--auto-custom-target-value 50<br>--auto-custom-metric-aggregation-type AVERAGE</code></strong></p>



<p>AI Deploy:</p>



<ul class="wp-block-list">
<li>Scrapes the local <mark class="has-inline-color has-ast-global-color-0-color"><strong><code>/metrics</code></strong></mark> endpoint</li>



<li>Parses Prometheus-formatted metrics</li>



<li>Extracts the <strong><mark class="has-inline-color has-ast-global-color-0-color"><code>vllm:num_requests_running</code></mark></strong> gauge</li>



<li>Computes the average value across replicas</li>
</ul>



<p>Scaling behaviour:</p>



<ul class="wp-block-list">
<li>When the average number of in-flight requests exceeds <strong><code><mark class="has-inline-color has-ast-global-color-0-color">50</mark></code></strong>, AI Deploy adds replicas</li>



<li>When load decreases, replicas are scaled down</li>
</ul>



<p>This approach ensures high availability and predictable latency under fluctuating traffic.</p>



<h5 class="wp-block-heading"><strong>h. Choose the target Docker image and the startup command</strong></h5>



<p>Use the official <strong><a href="https://hub.docker.com/r/vllm/vllm-openai/tags" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">vLLM OpenAI-compatible Docker image</a></strong>.</p>



<p><strong><code>vllm/vllm-openai:v0.13.0</code></strong></p>



<p>Finally, run the model inside the container using a Python command to launch the vLLM API server:</p>



<ul class="wp-block-list">
<li><strong><code>python3 -m vllm.entrypoints.openai.api_server</code></strong>&nbsp;→ Starts the OpenAI-compatible vLLM API server</li>



<li><strong><code>--model mistralai/Ministral-3-14B-Instruct-2512</code></strong>&nbsp;→ Loads the&nbsp;<strong>Ministral 3 14B</strong>&nbsp;model from Hugging Face</li>



<li><strong><code>--tokenizer_mode mistral</code></strong>&nbsp;→ Uses the&nbsp;<strong>Mistral tokenizer</strong></li>



<li><strong><code>--load_format mistral</code></strong>&nbsp;→ Uses Mistral’s model loading format</li>



<li><strong><code>--config_format mistral</code></strong>&nbsp;→ Ensures the model configuration follows Mistral’s standard</li>



<li><code><strong>--enable-auto-tool-choice </strong></code>→ Automatic call of tools if necessary (function/tool call)</li>



<li><strong><code>--tool-call-parser mistral </code></strong>→ Tool calling support</li>



<li><strong><code>--enable-prefix-caching</code></strong> → Prefix caching for improved throughput and reduced latency</li>
</ul>



<p>You can now launch this command using <strong>ovhai CLI</strong>.</p>



<h4 class="wp-block-heading">3. Check AI Deploy app status</h4>



<p>You can now check if your&nbsp;<strong>AI Deploy</strong>&nbsp;app is alive:</p>



<pre class="wp-block-code"><code class="">ovhai app get &lt;your_vllm_app_id&gt;</code></pre>



<p><strong>Is your app in&nbsp;<code>RUNNING</code>&nbsp;status?</strong>&nbsp;Perfect! You can check in the logs that the server is started:</p>



<pre class="wp-block-code"><code class="">ovhai app logs &lt;your_vllm_app_id&gt;</code></pre>



<p><strong><mark>⚠️WARNING!</mark></strong>&nbsp;This step may take a little time as the LLM must be loaded.</p>



<h4 class="wp-block-heading">4. Test that the deployment is functional</h4>



<p>First you can request and send a prompt to the LLM. Launch the following query by asking the question of your choice:</p>



<pre class="wp-block-code"><code class="">curl https://&lt;your_vllm_app_id&gt;.app.gra.ai.cloud.ovh.net/v1/chat/completions \<br>  -H "Authorization: Bearer $MY_OVHAI_ACCESS_TOKEN" \<br>  -H "Content-Type: application/json" \<br>  -d '{<br>    "model": "mistralai/Ministral-3-14B-Instruct-2512",<br>    "messages": [<br>      {"role": "system", "content": "You are a helpful assistant."},<br>      {"role": "user", "content": "Give me the name of OVHcloud’s founder."}<br>    ],<br>    "stream": false<br>  }'</code></pre>



<p>You can also verify access to vLLM metrics.</p>



<pre class="wp-block-code"><code class="">curl -H "Authorization: Bearer $MY_OVHAI_ACCESS_TOKEN" \<br>  https://&lt;your_vllm_app_id&gt;.app.gra.ai.cloud.ovh.net/metrics</code></pre>



<p>If both tests show that the model deployment is functional and you receive 200 HTTP responses, you are ready to move on to the next step!</p>



<p>The next step is to set up the observability and monitoring stack. This autoscaling mechanism is <strong>fully independent</strong> from Prometheus used for observability:</p>



<ul class="wp-block-list">
<li>AI Deploy queries the local <strong><mark class="has-inline-color has-ast-global-color-0-color"><code>/metrics</code></mark></strong> endpoint internally</li>



<li>Prometheus scrapes the <strong>same metrics endpoint</strong> externally for monitoring, dashboards and potentially alerting</li>
</ul>



<p>This ensures:</p>



<ul class="wp-block-list">
<li>A single source of truth for metrics</li>



<li>No duplication of exporters</li>



<li>Consistent signals for scaling and observability</li>
</ul>



<h3 class="wp-block-heading">Step 3 &#8211; Create an MKS cluster</h3>



<p>From <a href="https://manager.eu.ovhcloud.com/#/hub/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OVHcloud Control Panel</a>, create a Kubernetes cluster using the <strong>MKS</strong>.</p>



<p>Consider using the following configuration for the current use case:</p>



<ul class="wp-block-list">
<li><strong>Location</strong>: GRA ( Gravelines) &#8211; <em>you can select the same region as for AI Deploy</em></li>



<li><strong>Network</strong>: Public</li>



<li><strong>Node pool</strong> :
<ul class="wp-block-list">
<li>Flavour : <code><strong><mark class="has-inline-color has-ast-global-color-0-color">b2-15</mark></strong></code> (or something similar)</li>



<li>Number of nodes: <strong><code><mark class="has-inline-color has-ast-global-color-0-color">3</mark></code></strong></li>



<li>Autoscaling : <strong><code><mark class="has-inline-color has-ast-global-color-0-color">OFF</mark></code></strong></li>
</ul>
</li>



<li><strong>Name your node pool:</strong> <strong><mark class="has-inline-color has-ast-global-color-0-color"><code>monitoring</code></mark></strong></li>
</ul>



<p>You should see your cluster (e.g. <code><mark class="has-inline-color has-ast-global-color-0-color"><strong>prometheus-vllm-metrics-ai-deploy</strong></mark></code>) in the list, along with the following information:</p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="632" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-3-1024x632.png" alt="" class="wp-image-30242" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-3-1024x632.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-3-300x185.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-3-768x474.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-3-1536x948.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-3-2048x1264.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>If the status is green with the <strong><mark style="color:#00d084" class="has-inline-color"><code>OK</code></mark></strong> label, you can proceed to the next step.</p>



<h3 class="wp-block-heading">Step 4 &#8211; Configure Kubernetes access</h3>



<p>Download your <strong>kubeconfig file</strong> from the OVHcloud Control Panel and configure <strong><code><mark class="has-inline-color has-ast-global-color-0-color">kubectl</mark></code></strong>:</p>



<pre class="wp-block-code"><code class=""># configure kubectl with your MKS cluster<br>export KUBECONFIG=/path/to/your/kubeconfig-xxxxxx.yml<br><br># verify cluster connectivity<br>kubectl cluster-info<br>kubectl get nodes</code></pre>



<p>Now,- you can create the <strong><mark class="has-inline-color has-ast-global-color-0-color"><code>values-prometheus.yaml</code></mark></strong> file:</p>



<pre class="wp-block-code"><code class=""># general configuration<br>nameOverride: "monitoring"<br>fullnameOverride: "monitoring"<br><br># Prometheus configuration<br>prometheus:<br>  prometheusSpec:<br>    # data retention (15d)<br>    retention: 15d<br>    <br>    # scrape interval (15s)<br>    scrapeInterval: 15s<br>    <br>    # persistent storage (required for production deployment)<br>    storageSpec:<br>      volumeClaimTemplate:<br>        spec:<br>          storageClassName: csi-cinder-high-speed  # OVHcloud storage<br>          accessModes: ["ReadWriteOnce"]<br>          resources:<br>            requests:<br>              storage: 50Gi  # (can be modified according to your needs)<br>    <br>    # scrape vLLM metrics from your AI Deploy instance (Ministral 3 14B)<br>    additionalScrapeConfigs:<br>      - job_name: 'vllm-ministral'<br>        scheme: https<br>        metrics_path: '/metrics'<br>        scrape_interval: 15s<br>        scrape_timeout: 10s<br>        <br>        # authentication using AI Deploy Bearer token stored Kubernetes Secret<br>        bearer_token_file: /etc/prometheus/secrets/vllm-auth-token/token<br>        static_configs:<br>          - targets:<br>              - '&lt;APP_ID&gt;.app.gra.ai.cloud.ovh.net'  # /!\ REPLACE THE &lt;APP_ID&gt; by yours /!\<br>            labels:<br>              service: 'vllm'<br>              model: 'ministral'<br>              environment: 'production'<br>        <br>        # TLS configuration<br>        tls_config:<br>          insecure_skip_verify: false<br>    <br>    # kube-prometheus-stack mounts the secret under /etc/prometheus/secrets/ and makes it accessible to Prometheus<br>    secrets:<br>      - vllm-auth-token<br><br># Grafana configuration (visualization layer)<br>grafana:<br>  enabled: true<br>  <br>  # disable automatic datasource provisioning<br>  sidecar:<br>    datasources:<br>      enabled: false<br>  <br>  # persistent dashboards<br>  persistence:<br>    enabled: true<br>    storageClassName: csi-cinder-high-speed<br>    size: 10Gi<br>  <br>  # /!\ DEFINE ADMIN PASSWORD - REPLACE "test" BY YOURS /!\<br>  adminPassword: "test"<br>  <br>  # access via OVHcloud LoadBalancer (public IP and managed LB)<br>  service:<br>    type: LoadBalancer<br>    port: 80<br>    annotations:<br>      # optional : limiter l'accès à certaines IPs<br>      # service.beta.kubernetes.io/ovh-loadbalancer-allowed-sources: "1.2.3.4/32"<br>  <br># alertmanager (optional but recommended for production)<br>alertmanager:<br>  enabled: true<br>  <br>  alertmanagerSpec:<br>    storage:<br>      volumeClaimTemplate:<br>        spec:<br>          storageClassName: csi-cinder-high-speed<br>          accessModes: ["ReadWriteOnce"]<br>          resources:<br>            requests:<br>              storage: 10Gi<br><br># cluster observability components<br>nodeExporter:<br>  enabled: true<br>  <br>kubeStateMetrics:<br>  enabled: true</code></pre>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p><strong>✅ <em>Note</em></strong></p>



<p><strong><em>On OVHcloud MKS, persistent storage is handled automatically through the Cinder CSI driver. When a PersistentVolumeClaim (PVC) references a supported <code>storageClassName</code> such as <code>csi-cinder-high-speed</code>, OVHcloud dynamically provisions the underlying Block Storage volume and attaches it to the node running the pod. This enables stateful components like Prometheus, Alertmanager and Grafana to persist data reliably without any manual volume management, making the architecture fully cloud-native and operationally simple.</em></strong></p>
</blockquote>



<p>Then create the <strong><code><mark class="has-inline-color has-ast-global-color-0-color">monitoring</mark></code></strong> namespace:</p>



<pre class="wp-block-code"><code class=""># create namespace<br>kubectl create namespace monitoring<br><br># verify creation<br>kubectl get namespaces | grep monitoring</code></pre>



<p>Finally,  configure the Bearer token secret to access vLLM metrics.</p>



<pre class="wp-block-code"><code class=""># create bearer token secret<br>kubectl create secret generic vllm-auth-token \<br>  --from-literal=token='"$MY_OVHAI_ACCESS_TOKEN"' \<br>  -n monitoring<br><br># verify secret creation<br>kubectl get secret vllm-auth-token -n monitoring<br><br># test token (optional)<br>kubectl get secret vllm-auth-token -n monitoring \<br>  -o jsonpath='{.data.token}' | base64 -d </code></pre>



<p>Right, if everything is working, let&#8217;s move on to deployment.</p>



<h3 class="wp-block-heading">Step 5 &#8211; Deploy Prometheus stack</h3>



<p>Add the Prometheus Helm repository and install the monitoring stack. The deployment creates:</p>



<ul class="wp-block-list">
<li>Prometheus StatefulSet with persistent storage</li>



<li>Grafana deployment with LoadBalancer access</li>



<li>Alertmanager for future alert configuration (optional)</li>



<li>Supporting components (node exporters, kube-state-metrics)</li>
</ul>



<pre class="wp-block-code"><code class=""># add Helm repository<br>helm repo add prometheus-community \<br>  https://prometheus-community.github.io/helm-charts<br>helm repo update<br><br># install monitoring stack<br>helm install monitoring prometheus-community/kube-prometheus-stack \<br>  --namespace monitoring \<br>  --values values-prometheus.yaml \<br>  --wait</code></pre>



<p>Then you can retrieve the LoadBalancer IP address to access Grafana:</p>



<pre class="wp-block-code"><code class="">kubectl get svc -n monitoring monitoring-grafana</code></pre>



<p>Finally, open your browser to <code><strong><mark class="has-inline-color has-ast-global-color-0-color">http://&lt;EXTERNAL-IP&gt;</mark></strong></code> and login with:</p>



<ul class="wp-block-list">
<li><strong>Username</strong>: <code><mark class="has-inline-color has-ast-global-color-0-color"><strong>admin</strong></mark></code></li>



<li><strong>Password</strong>: as configured in your <code><strong><mark class="has-inline-color has-ast-global-color-0-color">values-prometheus.yaml</mark></strong></code> file</li>
</ul>



<h3 class="wp-block-heading">Step 6 &#8211; Create Grafana dashboards</h3>



<p>In this step, you will be able to access Grafana interface and add your Prometheus as a new data source, then create a complete dashboard with different vLLM metrics.</p>



<h4 class="wp-block-heading">1. Add a new data source in Grafana</h4>



<p>First of all, create a new Prometheus connection inside Grafana:</p>



<ul class="wp-block-list">
<li>Navigate to <strong><mark class="has-inline-color has-ast-global-color-0-color"><code>Connections</code></mark></strong> → <strong><mark class="has-inline-color has-ast-global-color-0-color"><code>Data sources</code></mark></strong> → <strong><code><mark class="has-inline-color has-ast-global-color-0-color">Add data source</mark></code></strong></li>



<li>Select <strong>Prometheus</strong></li>



<li>Configure URL: <code><strong><mark class="has-inline-color has-ast-global-color-0-color">http://monitoring-prometheus:9090</mark></strong></code></li>



<li>Click <strong>Save &amp; test</strong></li>
</ul>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="609" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-4-1024x609.png" alt="" class="wp-image-30247" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-4-1024x609.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-4-300x178.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-4-768x457.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-4-1536x913.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-4-2048x1218.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Now that your Prometheus has been configured as a new data source, you can create your Grafana dashboard.</p>



<h4 class="wp-block-heading">2. Create your monitoring dashboard</h4>



<p>To begin with, you can use the following pre-configured Grafana dashboard by downloading this JSON file locally:</p>





<p>In the left-hand menu, select <strong><code><mark class="has-inline-color has-ast-global-color-0-color">Dashboard</mark></code></strong>:</p>



<ol class="wp-block-list">
<li>Navigate to <strong><code><mark class="has-inline-color has-ast-global-color-0-color">Dashboards</mark></code></strong> → <strong><code><mark class="has-inline-color has-ast-global-color-0-color">Import</mark></code></strong></li>



<li>Upload the provided dashboard JSON</li>



<li>Select <strong>Prometheus</strong> as datasource</li>



<li>Click <strong>Import</strong> and select the <strong><code><mark class="has-inline-color has-ast-global-color-0-color">vLLM-metrics-grafana-monitoring.json</mark></code></strong> file</li>
</ol>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="449" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-6-1024x449.png" alt="" class="wp-image-30250" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-6-1024x449.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-6-300x131.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-6-768x337.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-6-1536x673.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-6-2048x897.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>The dashboard provides real-time visibility for <strong>Ministral 3 14B</strong> deployed with vLLM container and OVHcloud AI Deploy.</p>



<p>You can now track:</p>



<ul class="wp-block-list">
<li><strong>Performance metrics</strong>: TTFT, inter-token latency, end-to-end latency</li>



<li><strong>Throughput indicators</strong>: Requests per second, token generation rates</li>



<li><strong>Resource utilisation</strong>: KV cache usage, active/waiting requests</li>



<li><strong>Capacity indicators</strong>: Queue depth, preemption rates</li>
</ul>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="540" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-7-1024x540.png" alt="" class="wp-image-30253" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-7-1024x540.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-7-300x158.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-7-768x405.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-7-1536x811.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-7-2048x1081.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Here are the key metrics tracked and displayed in the Grafana dashboard:</p>



<figure class="wp-block-table"><table class="has-fixed-layout"><thead><tr><th>Metric Category</th><th>Prometheus Metric</th><th>Description</th><th>Use case</th></tr></thead><tbody><tr><td><strong>Latency</strong></td><td><code>vllm:time_to_first_token_seconds</code></td><td>Time until first token generation</td><td>User experience monitoring</td></tr><tr><td><strong>Latency</strong></td><td><code>vllm:inter_token_latency_seconds</code></td><td>Time between tokens</td><td>Throughput optimisation</td></tr><tr><td><strong>Latency</strong></td><td><code>vllm:e2e_request_latency_seconds</code></td><td>End-to-end request time</td><td>SLA monitoring</td></tr><tr><td><strong>Throughput</strong></td><td><code>vllm:request_success_total</code></td><td>Successful requests counter</td><td>Capacity planning</td></tr><tr><td><strong>Resource</strong></td><td><code>vllm:kv_cache_usage_perc</code></td><td>KV cache memory usage</td><td>Memory management</td></tr><tr><td><strong>Queue</strong></td><td><code>vllm:num_requests_running</code></td><td>Active requests</td><td>Load monitoring</td></tr><tr><td><strong>Queue</strong></td><td><code>vllm:num_requests_waiting</code></td><td>Queued requests</td><td>Overload detection</td></tr><tr><td><strong>Capacity</strong></td><td><code>vllm:num_preemptions_total</code></td><td>Request preemptions</td><td>Peak load indicator</td></tr><tr><td><strong>Tokens</strong></td><td><code>vllm:prompt_tokens_total</code></td><td>Input tokens processed</td><td>Usage analytics</td></tr><tr><td><strong>Tokens</strong></td><td><code>vllm:generation_tokens_total</code></td><td>Output tokens generated</td><td>Cost tracking</td></tr></tbody></table></figure>



<p>Well done, you now have at your disposal:</p>



<ul class="wp-block-list">
<li>An endpoint of the Ministral 3 14B model deployed with vLLM thanks to <strong>OVHcloud AI Deploy</strong> and its autoscaling strategies based on custom metrics</li>



<li>Prometheus for metrics collection and Grafana for visualisation/dashboards thanks to <strong>OVHcloud MKS</strong></li>
</ul>



<p><strong>But how can you check that everything will work when the load increases?</strong></p>



<h3 class="wp-block-heading">Step 7 &#8211; Test autoscaling and real-time visualisation</h3>



<p>The first objective here is to force AI Deploy to:</p>



<ul class="wp-block-list">
<li>Increase <code>vllm:num_requests_running</code></li>



<li>&#8216;Saturate&#8217; a single replica</li>



<li>Trigger the <strong>scale up</strong></li>



<li>Observe replica increase + latency drop</li>
</ul>



<h4 class="wp-block-heading">1. Autoscaling testing strategy</h4>



<p>The goal is to combine:</p>



<ul class="wp-block-list">
<li><strong>High concurrency</strong></li>



<li><strong>Long prompts</strong> (KVcache heavy)</li>



<li><strong>Long generations</strong></li>



<li><strong>Bursty load</strong></li>
</ul>



<p>This is what vLLM autoscaling actually reacts to.</p>



<p>To do so, a Python code can simulate the expected behaviour:</p>



<pre class="wp-block-code"><code class="">import time<br>import threading<br>import random<br>from statistics import mean<br>from openai import OpenAI<br>from tqdm import tqdm<br><br>APP_URL = "https://&lt;APP_ID&gt;.app.gra.ai.cloud.ovh.net/v1" # /!\ REPLACE THE &lt;APP_ID&gt; by yours /!\<br>MODEL = "mistralai/Ministral-3-14B-Instruct-2512"<br>API_KEY = $MY_OVHAI_ACCESS_TOKEN<br><br>CONCURRENT_WORKERS = 500          # concurrency (main scaling trigger)<br>REQUESTS_PER_WORKER = 25<br>MAX_TOKENS = 768                  # generation pressure<br><br># some random prompts<br>SHORT_PROMPTS = [<br>    "Summarize the theory of relativity.",<br>    "Explain what a transformer model is.",<br>    "What is Kubernetes autoscaling?"<br>]<br><br>MEDIUM_PROMPTS = [<br>    "Explain how attention mechanisms work in transformer-based models, including self-attention and multi-head attention.",<br>    "Describe how vLLM manages KV cache and why it impacts inference performance."<br>]<br><br>LONG_PROMPTS = [<br>    "Write a very detailed technical explanation of how large language models perform inference, "<br>    "including tokenization, embedding lookup, transformer layers, attention computation, KV cache usage, "<br>    "GPU memory management, and how batching affects latency and throughput. Use examples.",<br>]<br><br>PROMPT_POOL = (<br>    SHORT_PROMPTS * 2 +<br>    MEDIUM_PROMPTS * 4 +<br>    LONG_PROMPTS * 6    # bias toward long prompts<br>)<br><br># openai compliance<br>client = OpenAI(<br>    base_url=APP_URL,<br>    api_key=API_KEY,<br>)<br><br># basic metrics<br>latencies = []<br>errors = 0<br>lock = threading.Lock()<br><br># worker<br>def worker(worker_id):<br>    global errors<br>    for _ in range(REQUESTS_PER_WORKER):<br>        prompt = random.choice(PROMPT_POOL)<br><br>        start = time.time()<br>        try:<br>            client.chat.completions.create(<br>                model=MODEL,<br>                messages=[{"role": "user", "content": prompt}],<br>                max_tokens=MAX_TOKENS,<br>                temperature=0.7,<br>            )<br>            elapsed = time.time() - start<br><br>            with lock:<br>                latencies.append(elapsed)<br><br>        except Exception as e:<br>            with lock:<br>                errors += 1<br><br># run<br>threads = []<br>start_time = time.time()<br><br>print("Starting autoscaling stress test...")<br>print(f"Concurrency: {CONCURRENT_WORKERS}")<br>print(f"Total requests: {CONCURRENT_WORKERS * REQUESTS_PER_WORKER}")<br><br>for i in range(CONCURRENT_WORKERS):<br>    t = threading.Thread(target=worker, args=(i,))<br>    t.start()<br>    threads.append(t)<br><br>for t in threads:<br>    t.join()<br><br>total_time = time.time() - start_time<br><br># results<br>print("\n=== AUTOSCALING BENCH RESULTS ===")<br>print(f"Total requests sent: {len(latencies) + errors}")<br>print(f"Successful requests: {len(latencies)}")<br>print(f"Errors: {errors}")<br>print(f"Total wall time: {total_time:.2f}s")<br><br>if latencies:<br>    print(f"Avg latency: {mean(latencies):.2f}s")<br>    print(f"Min latency: {min(latencies):.2f}s")<br>    print(f"Max latency: {max(latencies):.2f}s")<br>    print(f"Throughput: {len(latencies)/total_time:.2f} req/s")</code></pre>



<p><strong>How can you verify that autoscaling is working and that the load is being handled correctly without latency skyrocketing?</strong></p>



<h4 class="wp-block-heading">2. Hardware and platform-level monitoring</h4>



<p>First, <strong>AI Deploy Grafana</strong> answers <strong>&#8216;What resources are being used and how many replicas exist?</strong>&#8216;.</p>



<p>GPU utilisation, GPU memory, CPU, RAM and replica count are monitored through <strong>OVHcloud AI Deploy Grafana</strong> (monitoring URL), which exposes infrastructure and runtime metrics for the AI Deploy application. This layer provides visibility into <strong>resource saturation and scaling events</strong> managed by the AI Deploy platform itself.</p>



<p>Access it using the following URL (do not forget to replace <code><mark class="has-inline-color has-ast-global-color-0-color"><strong>&lt;APP_ID&gt;</strong></mark></code> by yours): <strong><code>https://monitoring.gra.ai.cloud.ovh.net/d/app/app-monitoring?var-app=</code><mark class="has-inline-color has-ast-global-color-0-color"><code>&lt;APP_ID&gt;</code></mark><code>&amp;orgId=1</code></strong></p>



<p>For example, check GPU/RAM metrics:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="540" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-8-1024x540.png" alt="" class="wp-image-30260" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-8-1024x540.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-8-300x158.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-8-768x405.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-8-1536x811.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-8-2048x1081.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>You can also monitor scale ups and downs in real time, as well as information on HTTP calls and much more!</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="540" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-9-1024x540.png" alt="" class="wp-image-30261" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-9-1024x540.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-9-300x158.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-9-768x405.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-9-1536x811.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-9-2048x1081.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<h4 class="wp-block-heading">3. Software and application-level monitoring</h4>



<p>Next the combination of MKS + Prometheus + Grafana answers <strong>&#8216;How the inference engine behaves internally&#8217;</strong>.</p>



<p>In fact, vLLM internal metrics (request concurrency, token throughput, latency indicators, KV cache pressure, etc.) are collected via the <strong>vLLM <code>/metrics</code> endpoint</strong> and scraped by <strong>Prometheus running on OVHcloud MKS</strong>, then visualised in a <strong>dedicated Grafana instance</strong>. This layer focuses on <strong>model behaviour and inference performance</strong>.</p>



<p>Find all these metrics via (just replace <strong><code><mark class="has-inline-color has-ast-global-color-0-color">&lt;EXTERNAL-IP&gt;</mark></code></strong>): <strong><code>http://<mark class="has-inline-color has-ast-global-color-0-color">&lt;EXTERNAL-IP&gt;</mark>/d/vllm-ministral-monitoring/ministral-14b-vllm-metrics-monitoring?orgId=1</code></strong></p>



<p>Find key metrics such as TTF, etc:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="540" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-10-1024x540.png" alt="" class="wp-image-30263" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-10-1024x540.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-10-300x158.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-10-768x405.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-10-1536x811.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-10-2048x1081.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>You can also find some information about <strong>&#8216;Model load and throughput&#8217;</strong>:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="540" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-11-1024x540.png" alt="" class="wp-image-30264" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-11-1024x540.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-11-300x158.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-11-768x405.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-11-1536x811.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-11-2048x1081.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>To go further and add even more metrics, you can refer to the vLLM documentation on &#8216;<a href="https://docs.vllm.ai/en/v0.7.2/getting_started/examples/prometheus_grafana.html" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Prometheus and Grafana</a>&#8216;.</p>



<h2 class="wp-block-heading">Conclusion</h2>



<p>This reference architecture provides a scalable, and production-ready approach for deploying LLM inference on OVHcloud using <strong>AI Deploy</strong> and the <a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-deploy-apps-deployments?id=kb_article_view&amp;sysparm_article=KB0047997#advanced-custom-metrics-for-autoscaling" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">autoscaling on custom metric feature</a>.</p>



<p>OVHcloud <strong>MKS</strong> is dedicated to running Prometheus and Grafana, enabling secure scraping and visualisation of <strong>vLLM internal metrics</strong> exposed via the <strong><mark class="has-inline-color has-ast-global-color-0-color"><code>/metrics</code> </mark></strong>endpoint.</p>



<p>By scraping vLLM metrics securely from AI Deploy into Prometheus and exposing them through Grafana, the architecture provides full visibility into model behaviour, performance and load, enabling informed scaling analysis, troubleshooting and capacity planning in production environments.</p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Freference-architecture-custom-metric-autoscaling-for-llm-inference-with-vllm-on-ovhcloud-ai-deploy-and-observability-using-mks%2F&amp;action_name=Reference%20Architecture%3A%20Custom%20metric%20autoscaling%20for%20LLM%20inference%20with%20vLLM%20on%20OVHcloud%20AI%20Deploy%20and%20observability%20using%20MKS&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Reference Architecture: build a sovereign n8n RAG workflow for AI agent using OVHcloud Public Cloud solutions</title>
		<link>https://blog.ovhcloud.com/reference-architecture-build-a-sovereign-n8n-rag-workflow-for-ai-agent-using-ovhcloud-public-cloud-solutions/</link>
		
		<dc:creator><![CDATA[Eléa Petton]]></dc:creator>
		<pubDate>Tue, 27 Jan 2026 13:12:03 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[AI Deploy]]></category>
		<category><![CDATA[AI Endpoints]]></category>
		<category><![CDATA[LLM]]></category>
		<category><![CDATA[Managed Database]]></category>
		<category><![CDATA[n8n]]></category>
		<category><![CDATA[Object Storage]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[OVHcloud]]></category>
		<category><![CDATA[Public Cloud]]></category>
		<category><![CDATA[RAG]]></category>
		<category><![CDATA[S3]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=29694</guid>

					<description><![CDATA[What if an n8n workflow, deployed in a&#160;sovereign environment, saved you time while giving you peace of mind? From document ingestion to targeted response generation, n8n acts as the conductor of your RAG pipeline without compromising data protection. In the current landscape of AI agents and knowledge assistants, connecting your internal documentation with&#160;Large Language Models&#160;(LLMs) [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Freference-architecture-build-a-sovereign-n8n-rag-workflow-for-ai-agent-using-ovhcloud-public-cloud-solutions%2F&amp;action_name=Reference%20Architecture%3A%20build%20a%20sovereign%20n8n%20RAG%20workflow%20for%20AI%20agent%20using%20OVHcloud%20Public%20Cloud%20solutions&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p><em><em>What if an n8n workflow, deployed in a&nbsp;</em><strong><em>sovereign environment</em></strong><em>, saved you time while giving you peace of mind? From document ingestion to targeted response generation, n8n acts as the conductor of your RAG pipeline without compromising data protection.</em></em></p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="576" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/ref-archi-n8n-rag-1024x576.jpg" alt="" class="wp-image-30002" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/ref-archi-n8n-rag-1024x576.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/ref-archi-n8n-rag-300x169.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/ref-archi-n8n-rag-768x432.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/ref-archi-n8n-rag-1536x864.jpg 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/ref-archi-n8n-rag.jpg 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption"><em>n8n workflow overview</em></figcaption></figure>



<p>In the current landscape of AI agents and knowledge assistants, connecting your internal documentation with&nbsp;<strong>Large Language Models</strong>&nbsp;(LLMs) is becoming a strategic differentiator.</p>



<p><strong>How?</strong>&nbsp;By building&nbsp;<strong>Agentic RAG systems</strong>&nbsp;capable of retrieving, reasoning, and acting autonomously based on external knowledge.</p>



<p>To make this possible, engineers need a way to connect&nbsp;<strong>retrieval pipelines (RAG)</strong>&nbsp;with&nbsp;<strong>tool-based orchestration</strong>.</p>



<p>This article outlines a&nbsp;<strong>reference architecture</strong>&nbsp;for building a&nbsp;<strong>fully automated RAG pipeline orchestrated by n8n</strong>, leveraging&nbsp;<strong>OVHcloud AI Endpoints</strong>&nbsp;and&nbsp;<strong>PostgreSQL with pgvector</strong>&nbsp;as core components.</p>



<p>The final result will be a system that automatically ingests Markdown documentation from&nbsp;<strong>Object Storage</strong>, creates embeddings with OVHcloud’s&nbsp;<strong>BGE-M3</strong>&nbsp;model available on AI Endpoints, and stores them in a&nbsp;<strong>Managed Database PostgreSQL</strong>&nbsp;with pgvector extension.</p>



<p>Lastly, you’ll be able to build an AI Agent that lets you chat with an LLM (<strong>GPT-OSS-120B</strong>&nbsp;on AI Endpoints). This agent, utilising the RAG implementation carried out upstream, will be an expert on OVHcloud products.</p>



<p>You can further improve the process by using an&nbsp;<strong>LLM guard</strong>&nbsp;to protect the questions sent to the LLM, and set up a chat memory to use conversation history for higher response quality.</p>



<p><strong>But what about n8n?</strong></p>



<p><a href="https://n8n.io/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong>n8n</strong></a>, the open-source workflow automation tool,&nbsp;offers many benefits and connects seamlessly with over&nbsp;<strong>300</strong>&nbsp;APIs, apps, and services:</p>



<ul class="wp-block-list">
<li><strong>Open-source</strong>: n8n is a 100% self-hostable solution, which means you retain full data control;</li>



<li><strong>Flexible</strong>: combines low-code nodes and custom JavaScript/Python logic;</li>



<li><strong>AI-ready</strong>: includes useful integrations for LangChain, OpenAI, and embedding support capabilities;</li>



<li><strong>Composable</strong>: enables simple connections between data, APIs, and models in minutes;</li>



<li><strong>Sovereign by design</strong>: compliant with privacy-sensitive or regulated sectors.</li>
</ul>



<p>This reference architecture serves as a blueprint for building a sovereign, scalable Retrieval Augmented Generation (<strong>RAG</strong>) platform using&nbsp;<strong>n8n</strong>&nbsp;and&nbsp;<strong>OVHcloud Public Cloud</strong>&nbsp;solutions.</p>



<p>This setup shows how to orchestrate data ingestion, generate embedding, and enable conversational AI by combining&nbsp;<strong>OVHcloud Object Storage</strong>,&nbsp;<strong>Managed Databases with PostgreSQL</strong>,&nbsp;<strong>AI Endpoints</strong>&nbsp;and&nbsp;<strong>AI Deploy</strong>.<strong>The result?</strong>&nbsp;An AI environment that is fully integrated, protects privacy, and is exclusively hosted on <strong>OVHcloud’s European infrastructure</strong>.</p>



<h2 class="wp-block-heading">Overview of the n8n workflow architecture for RAG </h2>



<p>The workflow involves the following steps:</p>



<ul class="wp-block-list">
<li><strong>Ingestion:</strong>&nbsp;documentation in markdown format is fetched from <strong>OVHcloud Object Storage (S3);</strong></li>



<li><strong>Preprocessing:</strong> n8n cleans and normalises the text, removing YAML front-matter and encoding noise;</li>



<li><strong>Vectorisation:</strong>&nbsp;Each document is embedded using the <strong>BGE-M3</strong> model, which is available via <strong>OVHcloud AI Endpoints;</strong></li>



<li><strong>Persistence:</strong> vectors and metadata are stored in <strong>OVHcloud PostgreSQL Managed Database</strong> using pgvector;</li>



<li><strong>Retrieval:</strong> when a user sends a query, n8n triggers a <strong>LangChain Agent</strong> that retrieves relevant chunks from the database;</li>



<li><strong>Reasoning and actions:</strong>&nbsp;The <strong>AI Agent node</strong> combines LLM reasoning, memory, and tool usage to generate a contextual response or trigger downstream actions (Slack reply, Notion update, API call, etc.).</li>
</ul>



<p>In this tutorial, all services are deployed within the <strong>OVHcloud Public Cloud</strong>.</p>



<h2 class="wp-block-heading">Prerequisites</h2>



<p>Before you start, double-check that you have:</p>



<ul class="wp-block-list">
<li>an <strong>OVHcloud Public Cloud</strong> account</li>



<li>an <strong>OpenStack user</strong> with the <a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-users?id=kb_article_view&amp;sysparm_article=KB0048170" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">&nbsp;following roles</a>:
<ul class="wp-block-list">
<li>Administrator</li>



<li>AI Operator</li>



<li>Object Storage Operator</li>
</ul>
</li>



<li>An <strong>API key</strong> for <a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-endpoints-getting-started?id=kb_article_view&amp;sysparm_article=KB0065401" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Endpoints</a></li>



<li><strong>ovhai CLI available</strong> – <em>install the </em><a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-cli-install-client?id=kb_article_view&amp;sysparm_article=KB0047844" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><em>ovhai CLI</em></a></li>



<li><strong>Hugging Face access</strong> – <em>create a </em><a href="https://huggingface.co/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><em>Hugging Face account</em></a><em> and generate an </em><a href="https://huggingface.co/settings/tokens" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><em>access token</em></a></li>
</ul>



<p><strong>🚀 Now that you have everything you need, you can start building your n8n workflow!</strong></p>



<h2 class="wp-block-heading">Architecture guide: n8n agentic RAG workflow</h2>



<p>You’re all set to configure and deploy your n8n workflow</p>



<p>⚙️<em> Keep in mind that the following steps can be completed using OVHcloud APIs!</em></p>



<h3 class="wp-block-heading">Step 1 &#8211; Build the RAG data ingestion pipeline</h3>



<p>This first step involves building the foundation of the entire RAG workflow by preparing the elements you need:</p>



<ul class="wp-block-list">
<li>n8n deployment</li>



<li>Object Storage bucket creation</li>



<li>PostgreSQL database creation</li>



<li>and more</li>
</ul>



<p>Remember to set up the proper credentials in n8n so the different elements can connect and function.</p>



<h4 class="wp-block-heading">1. Deploy n8n on OVHcloud VPS</h4>



<p>OVHcloud provides <a href="https://www.ovhcloud.com/en-gb/vps/vps-n8n/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong>VPS solutions compatible with n8n</strong></a><strong>.</strong> Get a ready-to-use virtual server with <strong>pre-installed n8n </strong>and start building automation workflows without manual setup. With plans ranging from <strong>6 vCores&nbsp;/&nbsp;12 GB RAM</strong> to <strong>24 vCores&nbsp;/&nbsp;96 GB RAM</strong>, you can choose the capacity that suits your workload.</p>



<p><strong>How to set up n8n on a VPS?</strong></p>



<p>Setting up n8n on an OVHcloud VPS generally involves:</p>



<ul class="wp-block-list">
<li>Choosing and provisioning your OVHcloud VPS plan;</li>



<li>Connecting to your server via SSH and carrying out the initial server configuration, which includes updating the OS;</li>



<li>Installing n8n, typically with Docker (recommended for ease of management and updates), or npm by following this <a href="https://help.ovhcloud.com/csm/en-gb-vps-install-n8n?id=kb_article_view&amp;sysparm_article=KB0072179" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">guide</a>;</li>



<li>Configuring n8n with a domain name, SSL certificate for HTTPS, and any necessary environment variables for databases or settings.</li>
</ul>



<p>While OVHcloud provides a robust VPS platform, you can find detailed n8n installation guides in the <a href="https://docs.n8n.io/hosting/installation/docker/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">official n8n documentation</a>.</p>



<p>Once the configuration is complete, you can configure the database and bucket in Object Storage.</p>



<h4 class="wp-block-heading">2. Create Object Storage bucket</h4>



<p>First, you have to set up your data source. Here you can store all your documentation in an S3-compatible <a href="https://www.ovhcloud.com/en-gb/public-cloud/object-storage/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Object Storage</a> bucket.</p>



<p>Here, assume that all the documentation files are in Markdown format.</p>



<p>From <strong>OVHcloud Control Panel</strong>, create a new Object Storage container with <strong>S3-compatible API </strong>solution; follow this <a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-storage-s3-getting-started-object-storage?id=kb_article_view&amp;sysparm_article=KB0034674" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">guide</a>.</p>



<p>When the bucket is ready, add your Markdown documentation to it.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1024x580.png" alt="" class="wp-image-29733" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p><strong>Note:</strong>&nbsp;For this tutorial, we’re using the various OVHcloud product documentation available in Open-Source on the GitHub repository maintained by OVHcloud members.</p>



<p><em>Click this </em><a href="https://github.com/ovh/docs.git" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><em>link</em></a><em> to access the repository.</em></p>
</blockquote>
</blockquote>



<p>How do you do that? Extract all the <a href="http://guide.en-gb.md" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong>guide.en-gb.md</strong></a> files from the GitHub repository and rename each one to match its parent folder.</p>



<p>Example: the documentation about ovhai cli installation <code><strong>docs/pages/public_cloud/ai_machine_learning/cli_10_howto_install_cli/</strong></code><a href="http://guide.en-gb.md" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong>guide.en-gb.md</strong></a> is stored in <strong>ovhcloud-products-documentation-md</strong> bucket as <a href="http://cli_10_howto_install_cli.md" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong>cli_10_howto_install_cli.md</strong></a></p>



<p>You should get an overview that looks like this:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-1024x580.png" alt="" class="wp-image-29735" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Keep the following elements and create a new credential in n8n named <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">OVHcloud S3 gra credentials</mark></strong></code>:</p>



<ul class="wp-block-list">
<li>S3 Endpoint: <a href="https://s3.gra.io.cloud.ovh.net/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">https://s3.gra.io.cloud.ovh.net/</mark></code></strong></a></li>



<li>Region: <strong><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">gra</mark></code></strong></li>



<li>Access Key ID: <strong><code>&lt;your_object_storage_user_access_key&gt;</code></strong></li>



<li>Secret Access Key: <strong><code>&lt;your_pbject_storage_user_secret_key&gt;</code></strong></li>
</ul>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-2-1024x580.png" alt="" class="wp-image-29736" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-2-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-2-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-2-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-2-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-2-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Then, create a new n8n node by selecting&nbsp;<strong>S3</strong>, then&nbsp;<strong>Get Multiple Files</strong>.<br>Configure this node as follows:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.20.47-1024x580.png" alt="" class="wp-image-29740" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.20.47-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.20.47-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.20.47-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.20.47-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.20.47-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Connect the node to the previous one before moving on to the next step.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.18.00-1024x580.png" alt="" class="wp-image-29741" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.18.00-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.18.00-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.18.00-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.18.00-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.18.00-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>With the first phase done, you can now configure the vector DB.</p>



<h4 class="wp-block-heading">3. Configure PostgreSQL Managed DB (pgvector)</h4>



<p>In this step, you can set up the vector database that lets you store the embeddings generated from your documents.</p>



<p>How? By using OVHcloud’s managed databases, a pgvector extension of&nbsp;<a href="https://www.ovhcloud.com/en-gb/public-cloud/postgresql/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">PostgreSQL</a>. Go to your OVHcloud Control Panel and follow the steps.</p>



<p>1. Navigate to&nbsp;<strong>Databases &amp; Analytics &gt; Databases</strong></p>



<p><strong>2. Create a new database and select&nbsp;<em>PostgreSQL</em>&nbsp;and a datacenter location</strong></p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/4-1024x580.png" alt="" class="wp-image-29758" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/4-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/4-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/4-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/4-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/4-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p><strong>3. Select&nbsp;<em>Production</em>&nbsp;plan and&nbsp;<em>Instance type</em></strong></p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/5-1024x580.png" alt="" class="wp-image-29759" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/5-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/5-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/5-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/5-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/5-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p><strong>4. Reset the user password and save it</strong></p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-1-1024x580.png" alt="" class="wp-image-29762" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-1-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-1-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-1-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-1-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-1-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p><strong>5. Whitelist the IP of your n8n instance as follows</strong></p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/7-1024x580.png" alt="" class="wp-image-29761" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/7-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/7-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/7-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/7-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/7-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p><strong>6. Take note of te following parameters</strong></p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/6-1024x580.png" alt="" class="wp-image-29760" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/6-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/6-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/6-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/6-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/6-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Make a note of this information and create a new credential in n8n named&nbsp;<strong>OVHcloud PGvector credentials</strong>:</p>



<ul class="wp-block-list">
<li>Host:<strong>&nbsp;<code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">&lt;db_hostname&gt;</mark></code></strong></li>



<li>Database:&nbsp;<strong>defaultdb</strong></li>



<li>User:&nbsp;<code>avnadmin</code></li>



<li>Password:&nbsp;<code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">&lt;db_password&gt;</mark></strong></code></li>



<li>Port:&nbsp;<strong>20184</strong></li>
</ul>



<p>Consider&nbsp;<code>enabling</code>&nbsp;the&nbsp;<strong>Ignore SSL Issues (Insecure)</strong>&nbsp;button as needed and setting the&nbsp;<strong>Maximum Number of Connections</strong>&nbsp;value to&nbsp;<strong><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">1000</mark></code></strong>.</p>



<figure class="wp-block-image"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/8-1024x580.png" alt="" class="wp-image-29763" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/8-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/8-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/8-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/8-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/8-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>✅ You’re now connected to the database! But what about the PGvector extension?</p>



<p>Add a PosgreSQL node in your n8n workflow&nbsp;<code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Execute a SQL query</mark></strong></code>,&nbsp;and create the extension through an SQL query, which should look like this:</p>



<pre class="wp-block-code"><code class="">-- drop table as needed<br>DROP TABLE IF EXISTS md_embeddings;<br><br>-- activate pgvector<br>CREATE EXTENSION IF NOT EXISTS vector;<br><br>-- create table<br>CREATE TABLE md_embeddings (<br>    id SERIAL PRIMARY KEY,<br>    text TEXT,<br>    embedding vector(1024),<br>    metadata JSONB<br>);</code></pre>



<p>You should get this n8n node:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.43.39-1024x580.png" alt="" class="wp-image-29752" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.43.39-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.43.39-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.43.39-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.43.39-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.43.39-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Finally, you can create a new table and name it&nbsp;<code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">md_embeddings</mark></strong></code>&nbsp;using this node. Create a&nbsp;<code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Stop and Error</mark></strong></code>&nbsp;node if you run into errors setting up the table.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.51.45-1024x580.png" alt="" class="wp-image-29753" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.51.45-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.51.45-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.51.45-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.51.45-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.51.45-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>All set! Your vector DB is prepped and ready for data! Keep in mind, you still need an&nbsp;<strong>embeddings model</strong> for the RAG data ingestion pipeline.</p>



<h4 class="wp-block-heading">4. Access to OVHcloud AI Endpoints</h4>



<p><strong>OVHcloud AI Endpoints</strong>&nbsp;is a managed service that provides&nbsp;<strong>ready-to-use APIs for AI models</strong>, including&nbsp;<strong>LLM, CodeLLM, embeddings, Speech-to-Text, and image models</strong>&nbsp;hosted within OVHcloud’s European infrastructure.</p>



<p>To vectorise the various documents in Markdown format, you have to select an embedding model:&nbsp;<a href="https://endpoints.ai.cloud.ovh.net/models/bge-m3" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong>BGE-M3</strong></a>.</p>



<p>Usually, your AI Endpoints API key should already be created. If not, head to the AI Endpoints menu in your OVHcloud Control Panel to generate a new API key.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-3-1-1024x580.png" alt="" class="wp-image-29775" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-3-1-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-3-1-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-3-1-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-3-1-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-3-1-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Once this is done, you can create new OpenAI credentials in your n8n.</p>



<p>Why do I need OpenAI credentials? Because <strong>AI Endpoints API&nbsp;</strong>is fully compatible with OpenAI’s, integrating it is simple and ensures the&nbsp;<strong>sovereignty of your data.</strong></p>



<p>How? Thanks to a single endpoint&nbsp;<a href="https://oai.endpoints.kepler.ai.cloud.ovh.net/v1" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>https://oai.endpoints.kepler.ai.cloud.ovh.net/v1</code></mark></strong></a>, you can request the different AI Endpoints models.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.45.33-1024x580.png" alt="" class="wp-image-29776" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.45.33-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.45.33-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.45.33-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.45.33-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.45.33-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>This means you can create a new n8n node by selecting&nbsp;<strong>Postgres PGVector Store</strong>&nbsp;and&nbsp;<strong>Add documents to Vector Store</strong>.<br>Set up this node as shown below:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.24-1024x580.png" alt="" class="wp-image-29781" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.24-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.24-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.24-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.24-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.24-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Then configure the <strong>Data Loader</strong> with a custom text splitting and a JSON type.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.38-1-1024x580.png" alt="" class="wp-image-29780" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.38-1-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.38-1-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.38-1-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.38-1-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.38-1-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>For the text splitter, here are some options:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-12.02.43-1024x580.png" alt="" class="wp-image-29786" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-12.02.43-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-12.02.43-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-12.02.43-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-12.02.43-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-12.02.43-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>To finish, select the&nbsp;<strong>BGE-M3</strong> embedding model from the model list and set the&nbsp;<strong>Dimensions</strong> to 1024.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.51-1024x580.png" alt="" class="wp-image-29784" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.51-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.51-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.51-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.51-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.51-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>You now have everything you need to build the ingestion pipeline.</p>



<h4 class="wp-block-heading">5. Set up the ingestion pipeline loop</h4>



<p>To make use of a fully automated document ingestion and vectorisation pipeline, you have to integrate some specific nodes, mainly:</p>



<ul class="wp-block-list">
<li>a <strong><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Loop Over Items</mark></code></strong> that downloads each markdown file one by one so that it can be vectorised;</li>



<li>a <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Code in JavaScript</mark></strong></code> that counts the number of files processed, which subsequently determines the number of requests sent to the embedding model;</li>



<li>an <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">If</mark></strong></code> condition that allows you to check when the 400 requests have been reached;</li>



<li>a <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Wait</mark></strong></code> node that pauses after every 400 requests to avoid getting rate-limited;</li>



<li>an S3 block <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Download a file</mark></strong></code> to download each markdown;</li>



<li>another <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Code in JavaScript</mark></strong></code> to extract and process text from Markdown files by cleaning and removing special characters before sending it to the embeddings model;</li>



<li>a PostgreSQL node to <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Execute a SQL</mark></strong></code> query to check that the table contains vectors after the process (loop) is complete.</li>
</ul>



<h5 class="wp-block-heading">5.1. Create a loop to process each documentation file</h5>



<p>Begin by creating a <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Loop Over Items</mark></strong></code> to process all the Markdown files one at a time. Set the <strong>batch size</strong> to <strong><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">1</mark></code></strong> in this loop.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-10.50.13-1024x580.png" alt="" class="wp-image-29788" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-10.50.13-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-10.50.13-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-10.50.13-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-10.50.13-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-10.50.13-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Add the <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>Loop</code></mark></strong> statement right after the S3 <strong><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Get Many Files</mark></code></strong> node as shown below:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.30.00-1024x580.png" alt="" class="wp-image-29797" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.30.00-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.30.00-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.30.00-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.30.00-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.30.00-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Time to put the loop’s content into action!</p>



<h5 class="wp-block-heading">5.2. Count the number of files using a code snippet</h5>



<p>Next, choose the <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Code in JavaScript</mark></strong></code> node from the list to see how many files have been processed. Set “Run Once for Each Item” <code><strong>Mode</strong></code> and “JavaScript” code <strong>Language</strong>, then add the following code snippet to the designated block.</p>



<pre class="wp-block-code"><code class="">// simple counter per item<br>const counter = $runIndex + 1;<br><br>return {<br>  counter<br>};</code></pre>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.05.47-1024x580.png" alt="" class="wp-image-29792" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.05.47-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.05.47-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.05.47-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.05.47-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.05.47-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Make sure this code snippet is included in the loop.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.33.57-1024x580.png" alt="" class="wp-image-29798" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.33.57-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.33.57-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.33.57-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.33.57-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.33.57-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>You can start adding the <mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><strong><code>if</code></strong></mark> part to the loop now.</p>



<h5 class="wp-block-heading">5.3. Add a condition that applies a rule every 400 requests</h5>



<p>Here, you need to create an <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">If</mark></strong></code> node and add the following condition, which you have set as an expression.</p>



<pre class="wp-block-code"><code class="">{{ (Number($json["counter"]) % 400) === 0 }}</code></pre>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.11.42-1024x580.png" alt="" class="wp-image-29794" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.11.42-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.11.42-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.11.42-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.11.42-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.11.42-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Add it immediately after counting the files:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.44.10-1024x580.png" alt="" class="wp-image-29800" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.44.10-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.44.10-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.44.10-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.44.10-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.44.10-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>If this condition <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">is true</mark></strong></code>, trigger the <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Wait</mark></strong></code> node.</p>



<h5 class="wp-block-heading">5.4. Insert a pause after each set of 400 requests</h5>



<p>Then insert a <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Wait</mark></strong></code> node to pause for a few seconds before resuming. You can insert <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Resume</mark></strong></code> “After Time Interval” and set the <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Wait Amount</mark></strong></code> to “60:00” seconds.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.23.39-1024x580.png" alt="" class="wp-image-29796" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.23.39-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.23.39-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.23.39-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.23.39-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.23.39-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Link it to the <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">If</mark></strong></code> condition when this is <strong>True</strong>.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.45.08-1024x580.png" alt="" class="wp-image-29801" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.45.08-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.45.08-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.45.08-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.45.08-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.45.08-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Next, you can go ahead and download the Markdown file, and then process it.</p>



<h5 class="wp-block-heading">5.5. Launch documentation download</h5>



<p>To do this, create a new <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Download a file</mark></strong></code> S3 node and configure it with this File Key expression:</p>



<pre class="wp-block-code"><code class="">{{ $('Process each documentation file').item.json.Key }}</code></pre>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.42.12-1024x580.png" alt="" class="wp-image-29804" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.42.12-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.42.12-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.42.12-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.42.12-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.42.12-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Want to connect it?  That’s easy, link it to the output of the <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Wait</mark></strong></code> and <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">If</mark></strong></code> statements when the ‘if’ statement returns <strong>False</strong>; this will allow the file to be processed only if the rate limit is not exceeded.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.49.05-1024x580.png" alt="" class="wp-image-29805" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.49.05-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.49.05-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.49.05-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.49.05-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.49.05-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>You’re almost done! Now you need to extract and process the text from the Markdown files – clean and remove any special characters before sending it to the embedding model.</p>



<h5 class="wp-block-heading">5.6 Clean Markdown text content</h5>



<p>Next, create another <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Code in JavaScript</mark></strong></code> to process text from Markdown files:</p>



<pre class="wp-block-code"><code class="">// extract binary content<br>const binary = $input.item.binary.data;<br><br>// decoding into clean UTF-8 text<br>let text = Buffer.from(binary.data, 'base64').toString('utf8');<br><br>// cleaning - remove non-printable characters<br>text = text<br>  .replace(/[^\x09\x0A\x0D\x20-\x7EÀ-ÿ€£¥•–—‘’“”«»©®™°±§¶÷×]/g, ' ')<br>  .replace(/\s{2,}/g, ' ')<br>  .trim();<br><br>// check lenght<br>if (text.length &gt; 14000) {<br>  text = text.slice(0, 14000);<br>}<br><br>return [{<br>  text,<br>  fileName: binary.fileName,<br>  mimeType: binary.mimeType<br>}];</code></pre>



<p>Select the <em>“Run Once for Each Item”</em> <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Mode</mark></strong></code> and place the previous code in the dedicated JavaScript block.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.02.04-1024x580.png" alt="" class="wp-image-29806" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.02.04-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.02.04-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.02.04-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.02.04-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.02.04-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>To finish, check that the output text has been sent to the document vectorisation system, which was set up in <strong>Step 3 – Configure PostgreSQL Managed DB (pgvector)</strong>.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.15.45-1024x580.png" alt="" class="wp-image-29808" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.15.45-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.15.45-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.15.45-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.15.45-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.15.45-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>How do I confirm that the table contains all elements after vectorisation?</p>



<h5 class="wp-block-heading">5.7 Double-check that the documents are in the table</h5>



<p>To confirm that your RAG system is working, make sure your vector database has different vectors; use a PostgreSQL node with <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Execute a SQL query</mark></strong></code> in your n8n workflow.</p>



<p>Then, run the following query:</p>



<pre class="wp-block-code"><code class="">-- count the number of elements<br>SELECT COUNT(*) FROM md_embeddings;</code></pre>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-20.28.49-1024x580.png" alt="" class="wp-image-29818" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-20.28.49-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-20.28.49-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-20.28.49-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-20.28.49-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-20.28.49-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Next, link this element to the <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Done</mark></strong></code> section of your <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Loop</mark></strong>, so the elements are counted when the process is complete.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.14.41-1024x580.png" alt="" class="wp-image-29773" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.14.41-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.14.41-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.14.41-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.14.41-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.14.41-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Congrats! You can now run the workflow to begin ingesting documents.</p>



<p>Click the <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Execute workflow</mark></strong></code> button and wait until the vectorization process is complete.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-11.41.52-1024x580.png" alt="" class="wp-image-29823" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-11.41.52-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-11.41.52-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-11.41.52-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-11.41.52-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-11.41.52-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Remember, everything should be green when it’s finished ✅.</p>



<h3 class="wp-block-heading">Step 2 – RAG chatbot</h3>



<p>With the data ingestion and vectorisation steps completed, you can now begin implementing your AI agent.</p>



<p>This involves building a <strong>RAG-based AI Agent</strong>&nbsp;by simply starting a chat with an LLM.</p>



<h4 class="wp-block-heading">1. Set up the chat box to start a conversation</h4>



<p>First, configure your AI Agent based on the RAG system, and add a new node in the same n8n workflow: <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Chat Trigger</mark></strong></code>.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.31.24-1024x580.png" alt="" class="wp-image-29834" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.31.24-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.31.24-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.31.24-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.31.24-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.31.24-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>This node will allow you to interact directly with your AI agent! But before that, you need to check that your message is safe.</p>



<p>This node will allow you to interact directly with your AI agent! But before that, you need to check that your message is secure.</p>



<h4 class="wp-block-heading">2. Set up your LLM Guard with AI Deploy</h4>



<p>To check whether a message is secure or not, use an LLM Guard.</p>



<p><strong>What’s an LLM Guard?</strong>&nbsp;This is a safety and control layer that sits between users and an LLM, or between the LLM and an external connection. Its main goal is to filter, monitor, and enforce rules on what goes into or comes out of the model 🔐.</p>



<p>You can use <a href="file:///Users/jdutse/Downloads/www.ovhcloud.com/en-gb/public-cloud/ai-deploy" data-wpel-link="internal">AI Deploy</a> from OVHcloud to deploy your desired LLM guard. With a single command line, this AI solution lets you deploy a Hugging Face model using vLLM Docker containers.</p>



<p>For more details, please refer to this <a href="https://blog.ovhcloud.com/mistral-small-24b-served-with-vllm-and-ai-deploy-one-command-to-deploy-llm/" data-wpel-link="internal">blog</a>.</p>



<p>For the use case covered in this article, you can use the open-source model <strong>meta-llama/Llama-Guard-3-8B</strong> available on <a href="https://huggingface.co/meta-llama/Llama-Guard-3-8B" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Hugging Face</a>.</p>



<h5 class="wp-block-heading">2.1 Create a Bearer token to request your custom AI Deploy endpoint</h5>



<p><a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-cli-app-token?id=kb_article_view&amp;sysparm_article=KB0035280" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Create a token</a> to access your AI Deploy app once it’s deployed.</p>



<pre class="wp-block-code"><code class="">ovhai token create --role operator ai_deploy_token=my_operator_token</code></pre>



<p>The following output is returned:</p>



<p><code><strong>Id: 47292486-fb98-4a5b-8451-600895597a2b<br>Created At: 20-10-25 8:53:05<br>Updated At: 20-10-25 8:53:05<br>Spec:<br>Name: ai_deploy_token=my_operator_token<br>Role: AiTrainingOperator<br>Label Selector:<br>Status:<br>Value: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX<br>Version: 1</strong></code></p>



<p>You can now store and export your access token to add it as a new credential in n8n.</p>



<pre class="wp-block-code"><code class="">export MY_OVHAI_ACCESS_TOKEN=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX</code></pre>



<h5 class="wp-block-heading">2.1 Start Llama Guard 3 model with AI Deploy</h5>



<p>Using <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">ovhai</mark></strong></code> CLI, launch the following command and vLLM start inference server.</p>



<pre class="wp-block-code"><code class="">ovhai app run \<br>	--name vllm-llama-guard3 \<br>        --default-http-port 8000 \<br>        --gpu 1 \<br>	--flavor l40s-1-gpu \<br>        --label ai_deploy_token=my_operator_token \<br>	--env OUTLINES_CACHE_DIR=/tmp/.outlines \<br>	--env HF_TOKEN=$MY_HF_TOKEN \<br>	--env HF_HOME=/hub \<br>	--env HF_DATASETS_TRUST_REMOTE_CODE=1 \<br>	--env HF_HUB_ENABLE_HF_TRANSFER=0 \<br>	--volume standalone:/workspace:RW \<br>	--volume standalone:/hub:RW \<br>	vllm/vllm-openai:v0.10.1.1 \<br>	-- bash -c python3 -m vllm.entrypoints.openai.api_server                       <br>                           --model meta-llama/Llama-Guard-3-8B \                     <br>                           --tensor-parallel-size 1 \                     <br>                           --dtype bfloat16</code></pre>



<p><em>Full command explained:</em></p>



<ul class="wp-block-list">
<li><code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">ovhai app run</mark></strong></code></li>
</ul>



<p>This is the core command to&nbsp;<strong>run an app</strong>&nbsp;using the&nbsp;<strong>OVHcloud AI Deploy</strong>&nbsp;platform.</p>



<ul class="wp-block-list">
<li><code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">--name vllm-llama-guard3</mark></strong></code></li>
</ul>



<p>Sets a&nbsp;<strong>custom name</strong>&nbsp;for the job. For example,&nbsp;<code>vllm-llama-guard3</code>.</p>



<ul class="wp-block-list">
<li><code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">--default-http-port 8000</mark></strong></code></li>
</ul>



<p>Exposes&nbsp;<strong>port 8000</strong>&nbsp;as the default HTTP endpoint. vLLM server typically runs on port 8000.</p>



<ul class="wp-block-list">
<li><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>--gpu&nbsp;</code>1</mark></strong></li>



<li><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>--flavor l40s-1-gpu</code></mark></strong></li>
</ul>



<p>Allocates&nbsp;<strong>1 GPU L40S</strong>&nbsp;for the app. You can adjust the GPU type and number depending on the model you have to deploy.</p>



<ul class="wp-block-list">
<li><code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">--volume standalone:/workspace:RW</mark></strong></code></li>



<li><code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">--volume standalone:/hub:RW</mark></strong></code></li>
</ul>



<p>Mounts&nbsp;<strong>two persistent storage volumes</strong>: <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>/workspace</code></mark></strong> which is the main working directory and <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">/hub</mark></strong></code>&nbsp;to store Hugging Face model files.</p>



<ul class="wp-block-list">
<li><code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">--env OUTLINES_CACHE_DIR=/tmp/.outlines</mark></strong></code></li>



<li><strong><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">--env HF_TOKEN=$MY_HF_TOKEN</mark></code></strong></li>



<li><code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">--env HF_HOME=/hub</mark></strong></code></li>



<li><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><strong>--env HF_DATASETS_TRUST_REMOTE_CODE=1</strong></mark></code></li>



<li><code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">--env HF_HUB_ENABLE_HF_TRANSFER=0</mark></strong></code></li>
</ul>



<p>These are Hugging Face&nbsp;<strong>environment variables</strong> you have to set. Please export your Hugging Face access token as environment variable before starting the app: <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">export MY_HF_TOKEN=***********</mark></strong></code></p>



<ul class="wp-block-list">
<li><code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">vllm/vllm-openai:v0.10.1.1</mark></strong></code></li>
</ul>



<p>Use the&nbsp;<strong><code>v<strong><code>llm/vllm-openai</code></strong></code></strong>&nbsp;Docker image (a pre-configured vLLM OpenAI API server).</p>



<ul class="wp-block-list">
<li><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><strong>-- bash -c python3 -m vllm.entrypoints.openai.api_server                       <br>                    --model meta-llama/Llama-Guard-3-8B \                     <br>                    --tensor-parallel-size 1 \                     <br>                    --dtype bfloat16</strong></mark></code></li>
</ul>



<p>Finally, run a<strong>&nbsp;bash shell</strong>&nbsp;inside the container and executes a Python command to launch the vLLM API server.</p>



<h5 class="wp-block-heading">2.2 Check to confirm your AI Deploy app is RUNNING</h5>



<p>Replace the <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">&lt;app_id></mark></strong></code> by yours.</p>



<pre class="wp-block-code"><code class="">ovhai app get &lt;app_id&gt;</code></pre>



<p>You should get:</p>



<p><code>History:<br>DATE STATE<br>20-1O-25 09:58:00 QUEUED<br>20-10-25 09:58:01 INITIALIZING<br>04-04-25 09:58:07 PENDING<br>04-04-25 10:03:10&nbsp;<strong>RUNNING</strong><br>Info:<br>Message: App is running</code></p>



<h5 class="wp-block-heading">2.3 Create a new n8n credential with AI Deploy app URL and Bearer access token</h5>



<p>First, using your <code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><strong>&lt;app_id></strong></mark></code>, retrieve your AI Deploy app URL.</p>



<pre class="wp-block-code"><code class="">ovhai app get <span style="background-color: initial; font-family: inherit; font-size: inherit; text-align: initial; font-weight: inherit;">&lt;app_id&gt;</span> -o json | jq '.status.url' -r</code></pre>



<p>Then, create a new OpenAI credential from your n8n workflow, using your AI Deploy URL and the Bearer token as an API key.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.49.14-1024x580.png" alt="" class="wp-image-29837" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.49.14-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.49.14-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.49.14-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.49.14-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.49.14-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Don&#8217;t forget to replace <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>6e10e6a5-2862-4c82-8c08-26c458ca12c7</code></mark></strong> with your <span style="background-color: initial; font-family: inherit; font-size: inherit; text-align: initial; font-weight: inherit;"><strong><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">&lt;app_id></mark></code></strong></span>.</p>



<h5 class="wp-block-heading">2.4 Create the LLM Guard node in n8n workflow</h5>



<p>Create a new <strong>OpenAI node</strong> to <strong>Message a model</strong> and select the new AI Deploy credential for LLM Guard usage.</p>



<p>Next, create the prompt as follows:</p>



<pre class="wp-block-code"><code class="">{{ $('Chat with the OVHcloud product expert').item.json.chatInput }}</code></pre>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.09.43-1024x580.png" alt="" class="wp-image-29840" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.09.43-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.09.43-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.09.43-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.09.43-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.09.43-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Then, use an <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">If</mark></strong></code> node to determine if the scenario is <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>safe</code></mark></strong> or <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>unsafe</code></mark></strong>:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.25.29-1024x580.png" alt="" class="wp-image-29842" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.25.29-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.25.29-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.25.29-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.25.29-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.25.29-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>If the message is <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">unsafe</mark></strong></code>, send an error message right away to stop the workflow.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.26.49-1024x580.png" alt="" class="wp-image-29843" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.26.49-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.26.49-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.26.49-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.26.49-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.26.49-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>But if the message is <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">safe</mark></strong></code>, you can send the request to the AI Agent without issues 🔐.</p>



<h4 class="wp-block-heading">3. Set up AI Agent</h4>



<p>The&nbsp;<strong>AI Agent</strong>&nbsp;node in&nbsp;<strong>n8n</strong>&nbsp;acts as an intelligent orchestration layer that combines&nbsp;<strong>LLMs, memory, and external tools</strong>&nbsp;within an automated workflow.</p>



<p>It allows you to:</p>



<ul class="wp-block-list">
<li>Connect a <strong>Large Language Model</strong> using APIs (e.g., LLMs from AI Endpoints);</li>



<li>Use <strong>tools</strong> such as HTTP requests, databases, or RAG retrievers so the agent can take actions or fetch real information;</li>



<li>Maintain <strong>conversational memory</strong> via PostgreSQL databases;</li>



<li>Integrate directly with chat platforms (e.g., Slack, Teams) for interactive assistants (optional).</li>
</ul>



<p>Simply put, n8n becomes an&nbsp;<strong>agentic automation framework</strong>, enabling LLMs to not only provide answers, but also think, choose, and perform actions.</p>



<p>Please note that you can change and customise this n8n <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">AI Agent</mark></strong></code> node to fit your use cases, using features like function calling or structured output. This is the most basic configuration for the given use case. You can go even further with different agents.</p>



<p>🧑‍💻&nbsp;<strong>How do I implement this RAG?</strong></p>



<p>First, create an <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">AI Agent</mark></strong></code> node in <strong>n8n</strong> as follows:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1024x580.png" alt="" class="wp-image-29933" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Then, a series of steps are required, the first of which is creating prompts.</p>



<h5 class="wp-block-heading">3.1 Create prompts</h5>



<p>In the AI Agent node on your n8n workflow, edit the user and system prompts.</p>



<p>Begin by creating the&nbsp;<strong>prompt</strong>,&nbsp;which is also the&nbsp;<strong>user message</strong>:</p>



<pre class="wp-block-code"><code class="">{{ $('Chat with the OVHcloud product expert').item.json.chatInput }}</code></pre>



<p>Then create the <strong>System Message</strong> as shown below:</p>



<pre class="wp-block-code"><code class="">You have access to a retriever tool connected to a knowledge base.  <br>Before answering, always search for relevant documents using the retriever tool.  <br>Use the retrieved context to answer accurately.  <br>If no relevant documents are found, say that you have no information about it.</code></pre>



<p>You should get a configuration like this:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1-1024x580.png" alt="" class="wp-image-29935" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>🤔 Well, an LLM is now needed for this to work!</p>



<h5 class="wp-block-heading">3.2 Select LLM using AI Endpoints API</h5>



<p>First, add an <strong>OpenAI Chat Model</strong> node, and then set it as the <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Chat Model</mark></strong></code> for your agent.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-3-1024x580.png" alt="" class="wp-image-29939" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-3-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-3-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-3-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-3-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-3-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Next, select one of the&nbsp;<a href="https://www.ovhcloud.com/en/public-cloud/ai-endpoints/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OVHcloud AI Endpoints</a>&nbsp;from the list provided, because they are compatible with Open AI APIs.</p>



<p>✅ <strong>How?</strong> By using the right API <a href="https://oai.endpoints.kepler.ai.cloud.ovh.net/v1" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>https://oai.endpoints.kepler.ai.cloud.ovh.net/v1</code></mark></strong></a></p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-2-1024x580.png" alt="" class="wp-image-29936" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-2-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-2-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-2-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-2-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-2-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>The <a href="https://www.ovhcloud.com/en/public-cloud/ai-endpoints/catalog/gpt-oss-120b/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong>GPT OSS 120B</strong></a> model has been selected for this use case. Other models, such as Llama, Mistral, and Qwen, are also available.</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p><mark style="background-color:#fcb900" class="has-inline-color">⚠️ <strong>WARNING</strong> ⚠️</mark></p>



<p>If you are using a recent version of n8n, you will likely encounter the <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>/responses</code></mark></strong> issue (linked to OpenAI compatibility). To resolve this, you will need to disable the button <strong><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Use Responses API</mark></code></strong> and everything will work correctly</p>
</blockquote>



<figure class="wp-block-image aligncenter size-full is-resized"><img loading="lazy" decoding="async" width="829" height="675" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/02_44_08-1.jpg" alt="" class="wp-image-30352" style="aspect-ratio:1.2281554640124863;width:409px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/02_44_08-1.jpg 829w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/02_44_08-1-300x244.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/02_44_08-1-768x625.jpg 768w" sizes="auto, (max-width: 829px) 100vw, 829px" /><figcaption class="wp-element-caption"><em>Tips to fix /responses issue</em></figcaption></figure>



<p>Your LLM is now set to answer your questions! Don’t forget, it needs access to the knowledge base.</p>



<h5 class="wp-block-heading">3.3 Connect the knowledge base to the RAG retriever</h5>



<p>As usual, the first step is to create an n8n node called <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">PGVector Vector Store nod</mark>e</strong></code> and enter your PGvector credentials.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-4-1024x580.png" alt="" class="wp-image-29943" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-4-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-4-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-4-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-4-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-4-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Next, link this element to the <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>Tools</code></mark></strong> section of the AI Agent node.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-5-1024x580.png" alt="" class="wp-image-29944" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-5-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-5-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-5-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-5-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-5-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Remember to connect your PG vector database so that the retriever can access the previously generated embeddings. Here’s an overview of what you’ll get.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-6-1024x580.png" alt="" class="wp-image-29945" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-6-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-6-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-6-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-6-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-6-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>⏳Nearly done! The final step is to add the database memory.</p>



<h5 class="wp-block-heading">3.4 Manage conversation history with database memory</h5>



<p>Creating&nbsp;<strong>Database Memory</strong>&nbsp;node in n8n (PostgreSQL) lets you link it to your AI Agent, so it can store and retrieve past conversation history. This enables the model to remember and use context from multiple interactions.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-7-1024x580.png" alt="" class="wp-image-29946" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-7-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-7-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-7-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-7-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-7-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>So link this PostgreSQL database to the <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Memory</mark></strong></code> section of your AI agent.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-8-1024x580.png" alt="" class="wp-image-29947" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-8-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-8-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-8-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-8-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-8-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Congrats! 🥳 Your&nbsp;<strong>n8n RAG workflow</strong>&nbsp;is now complete. Ready to test it?</p>



<h4 class="wp-block-heading">4. Make the most of your automated workflow</h4>



<p>Want to try it? It’s easy!</p>



<p>By clicking the orange <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>Open chat</code></mark></strong> button, you can ask the AI agent questions about OVHcloud products, particularly where you need technical assistance.</p>



<figure class="wp-block-video"><video height="1660" style="aspect-ratio: 2930 / 1660;" width="2930" controls src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/video-n8n1.mp4"></video></figure>



<p>For example, you can ask the LLM about rate limits in OVHcloud AI Endpoints and get the information in seconds.</p>



<figure class="wp-block-video"><video height="1660" style="aspect-ratio: 2930 / 1660;" width="2930" controls src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/video-n8n2.mp4"></video></figure>



<p>You can now build your own autonomous RAG system using OVHcloud Public Cloud, suited for a wide range of applications.</p>



<h2 class="wp-block-heading">What’s next?</h2>



<p>To sum up, this reference architecture provides a guide on using&nbsp;<strong>n8n</strong> with&nbsp;<strong>OVHcloud AI Endpoints</strong>,&nbsp;<strong>AI Deploy</strong>,&nbsp;<strong>Object Storage</strong>, and&nbsp;<strong>PostgreSQL + pgvector</strong> to build a fully controlled, autonomous&nbsp;<strong>RAG AI system</strong>.</p>



<p>Teams can build scalable AI assistants that work securely and independently in their cloud environment by orchestrating ingestion, embedding generation, vector storage, retrieval, and LLM safety check, and reasoning within a single workflow.</p>



<p>With the core architecture in place, you can add more features to improve the capabilities and robustness of your agentic RAG system:</p>



<ul class="wp-block-list">
<li>Web search</li>



<li>Images with OCR</li>



<li>Audio files transcribed using the Whisper model</li>
</ul>



<p>This delivers an extensive knowledge base and a wider variety of use cases!</p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Freference-architecture-build-a-sovereign-n8n-rag-workflow-for-ai-agent-using-ovhcloud-public-cloud-solutions%2F&amp;action_name=Reference%20Architecture%3A%20build%20a%20sovereign%20n8n%20RAG%20workflow%20for%20AI%20agent%20using%20OVHcloud%20Public%20Cloud%20solutions&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		<enclosure url="https://blog.ovhcloud.com/wp-content/uploads/2025/11/video-n8n1.mp4" length="11190376" type="video/mp4" />
<enclosure url="https://blog.ovhcloud.com/wp-content/uploads/2025/11/video-n8n2.mp4" length="9881210" type="video/mp4" />

			</item>
		<item>
		<title>Celebrating 10 Years of Impact: Looking Forward to 2035</title>
		<link>https://blog.ovhcloud.com/celebrating-10-years-of-impact-looking-forward-to-2035/</link>
		
		<dc:creator><![CDATA[Philip Marais]]></dc:creator>
		<pubDate>Mon, 09 Jun 2025 10:40:26 +0000</pubDate>
				<category><![CDATA[OVHcloud Startup Program]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[Infrastructure]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[OVHcloud]]></category>
		<category><![CDATA[Public Cloud]]></category>
		<category><![CDATA[Security]]></category>
		<category><![CDATA[Startup Program]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=29047</guid>

					<description><![CDATA[The Startup Program is 10 years old this year! As we mark our 10th anniversary, we are not just reflecting on the past decade – we are looking ahead to the future and the impact we can have by 2035. The key to achieving this vision lies with YOU, our valued members of OVHcloud’s unique [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fcelebrating-10-years-of-impact-looking-forward-to-2035%2F&amp;action_name=Celebrating%2010%20Years%20of%20Impact%3A%20Looking%20Forward%20to%202035&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p>The <a href="https://startup.ovhcloud.com" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Startup Program</a> is 10 years old this year! As we mark our 10th anniversary, we are not just reflecting on the past decade – we are looking ahead to the future and the impact we can have by 2035. </p>



<p>The key to achieving this vision lies with YOU, our valued members of OVHcloud’s unique data sovereign ecosystem, including startups, scaleups, incubators, accelerators, venture capital companies, government agencies, technology partners, and other enablers. Together, we are united around a common vision of data freedom, innovation, and mutual growth.</p>



<h4 class="wp-block-heading">Global Report 2025: 10 Years of Impact</h4>



<p>To capture the essence of our unique ecosystem, we have compiled a comprehensive report, <strong>&#8220;Global Report 2025 &#8211; 10 Years of Impact&#8221;</strong>. This report showcases key stories from our ecosystem, including:</p>



<ul class="wp-block-list">
<li>Our support for <a href="https://harfanglab.io/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Harfanglab</a>, a French scaleup that&#8217;s developed cutting-edge technologies to anticipate and neutralise cyberattacks, raising almost €30m and leveraging OVHcloud and the Startup Program to drive innovation, data sovereignty, and cybersecurity excellence.</li>



<li>The success of <a href="https://internxt.com/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Internxt</a>, a Southern Europe scaleup alumni, which has become a recognized privacy-first alternative to mainstream cloud providers, offering secure, user-centric, and environmentally sustainable file-sharing and storage solutions that protect user privacy and data sovereignty.</li>



<li>The journey of female founders Jeanne Le Peillet and Cecile Doan, who developed a collaborative design SaaS solution, <a href="https://www.beink.fr/en/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Beink Dream</a>, selected for the France 2030 initiative.</li>



<li>The acquisition of Startup Program alumnus <a href="https://github.com/open-io" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OpenIO</a> by OVHcloud, which has become our high-performance object storage solution.</li>
</ul>



<p><em>“OVHcloud is a great partner if you are looking for a long-term, reliable, affordable and robust provider. The synergy between Internxt&#8217;s mission to protect user privacy and OVHcloud&#8217;s commitment to data sovereignty has been pivotal.”</em></p>



<p><em>Fran Villalba Segarra Founder &amp; CEO at Internxt</em></p>



<h4 class="wp-block-heading">The Startup Program: A Decade of Growth</h4>



<p>The report also highlights the Startup Program&#8217;s journey over the last decade, including how we operate, our partnerships with incubators, accelerators, venture capital companies, and other enablers, what sets us apart, and how we have successfully supported over 5,000 members to date.</p>



<h4 class="wp-block-heading">Key Statistics</h4>



<figure class="wp-block-table aligncenter"><table><tbody><tr><td class="has-text-align-center" data-align="center"><strong>5000+</strong> <br>Startups have joined our program</td><td class="has-text-align-center" data-align="center"><strong>100+</strong> <br>Ecosystem enablers (Accelerators etc.)</td><td class="has-text-align-center" data-align="center"><strong>Thousands</strong> <br>of hours of free mentorship and support</td><td class="has-text-align-center" data-align="center"><strong>€ Millions</strong> <br>in free cloud credits given</td></tr></tbody></table></figure>



<h4 class="wp-block-heading">Personalised Support</h4>



<p>What sets our Startup Program apart is our personal touch. As <a href="https://www.linkedin.com/in/philip-marais/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Philip Marais</a>, Global Startup Program Director at OVHcloud, explains: <em>&#8220;You&#8217;re personally onboarded by a manager in your region, have free support from our engineers to solve technical and migration issues, and access to our unique ecosystem to grow your business.&#8221;</em></p>



<h4 class="wp-block-heading">Download the Report</h4>



<p>To learn more about our ecosystem, our plans for the future, and the impact we can have by 2035, download the <strong><a href="https://startup.ovhcloud.com/en/globalreport2025/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">&#8220;Global Report 2025 &#8211; 10 Years of Impact&#8221;</a></strong> now.</p>



<figure class="wp-block-image aligncenter size-full is-resized"><a href="https://startup.ovhcloud.com/en/globalreport2025/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><img loading="lazy" decoding="async" width="512" height="512" src="https://blog.ovhcloud.com/wp-content/uploads/2025/06/download.png" alt="" class="wp-image-29049" style="width:118px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/06/download.png 512w, https://blog.ovhcloud.com/wp-content/uploads/2025/06/download-300x300.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/06/download-150x150.png 150w, https://blog.ovhcloud.com/wp-content/uploads/2025/06/download-70x70.png 70w" sizes="auto, (max-width: 512px) 100vw, 512px" /></a></figure>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="970" height="250" src="https://blog.ovhcloud.com/wp-content/uploads/2025/06/Email-Signature-–-1.jpg" alt="" class="wp-image-29054" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/06/Email-Signature-–-1.jpg 970w, https://blog.ovhcloud.com/wp-content/uploads/2025/06/Email-Signature-–-1-300x77.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/06/Email-Signature-–-1-768x198.jpg 768w" sizes="auto, (max-width: 970px) 100vw, 970px" /></figure>



<p>Our 5000+ startups&#8217; journey with OVHcloud highlights how the right cloud partnership can help overcome challenges, achieve sustainable growth, and scale globally. If you’re a startup looking to transform your business, we encourage you to join the <strong><a href="https://startup.ovhcloud.com/en/" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">OVHcloud Startup Program</a></strong> or contact OVHcloud to discover how our solutions can support your journey!</p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fcelebrating-10-years-of-impact-looking-forward-to-2035%2F&amp;action_name=Celebrating%2010%20Years%20of%20Impact%3A%20Looking%20Forward%20to%202035&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Reference Architecture: set up MLflow Remote Tracking Server on OVHcloud</title>
		<link>https://blog.ovhcloud.com/mlflow-remote-tracking-server-ovhcloud-databases-object-storage-ai-solutions/</link>
		
		<dc:creator><![CDATA[Eléa Petton]]></dc:creator>
		<pubDate>Tue, 15 Apr 2025 07:52:46 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[AI Notebooks]]></category>
		<category><![CDATA[AI Training]]></category>
		<category><![CDATA[Machine learning]]></category>
		<category><![CDATA[Managed Database]]></category>
		<category><![CDATA[MLflow]]></category>
		<category><![CDATA[Object Storage]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[OVHcloud]]></category>
		<category><![CDATA[Public Cloud]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=28564</guid>

					<description><![CDATA[Travel through the Data &#38; AI universe of OVHcloud with the MLflow integration. As Artificial Intelligence (AI) continues to grow in importance, Data Scientists and Machine Learning Engineers need a robust and scalable platform to manage the entire Machine Learning (ML) lifecycle. MLflow, an open-source platform, provides a comprehensive framework for managing ML experiments, models, [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fmlflow-remote-tracking-server-ovhcloud-databases-object-storage-ai-solutions%2F&amp;action_name=Reference%20Architecture%3A%20set%20up%20MLflow%20Remote%20Tracking%20Server%20on%20OVHcloud&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p><em>Travel through the Data &amp; AI universe of OVHcloud with the <em>MLflow</em> integration.</em></p>



<figure class="wp-block-image aligncenter size-full"><img decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2025/04/mlflow_ref_archi.svg" alt="" class="wp-image-28689"/><figcaption class="wp-element-caption"><em>Mlflow Remote Tracking Server on OVHcloud</em></figcaption></figure>



<p>As <strong>Artificial Intelligence</strong> (AI) continues to grow in importance, <em>Data Scientists</em> and <em>Machine Learning Engineers</em> need a robust and scalable platform to manage the entire Machine Learning (ML) lifecycle. <br><a href="https://mlflow.org/docs/latest/introduction/index.html" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">MLflow</a>, an open-source platform, provides a comprehensive framework for managing ML experiments, models, and deployments. </p>



<p><strong>Mlflow</strong> offers many benefits and provides a complete framework for ML lifecycle management with features such as:</p>



<ul class="wp-block-list">
<li>Experiment tracking and model management</li>



<li>Reproducibility and collaboration</li>



<li>Scalability, flexibility, and integration</li>



<li>Automated ML and model serving capabilities</li>



<li>Improved model accuracy, faster time-to-market, and reduced costs.</li>
</ul>



<p>In this reference architecture, you will explore how to leverage remote experience tracking with the <strong>MLflow Tracking Server</strong> on the <a href="https://www.ovhcloud.com/fr/public-cloud/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OVHcloud Public Cloud</a> infrastructure.<br>In fact, you will be able to build a scalable and efficient ML platform, streamlining your ML workflow and accelerating model development using <strong>OVHcloud AI Notebooks</strong>, <strong>AI Training</strong>, <strong>Managed Databases (PostgreSQL)</strong>, and <strong>Object Storage</strong>.</p>



<p><strong>The result?</strong> A fully remote, <strong>production-ready ML experiment tracking pipeline</strong>, powered by OVHcloud&#8217;s Data &amp; Machine Learning Services (e.g. AI Notebooks and AI Training).</p>



<h2 class="wp-block-heading">Overview of the MLflow server architecture</h2>



<p>Here is how will be configured MLflow:</p>



<ul class="wp-block-list">
<li><strong>Development and training environment:</strong> create and train model with <strong>AI Notebooks</strong></li>



<li><strong>Remote Tracking Server</strong>: host in an <strong>AI Training</strong> job (Container as a Service)</li>



<li><strong>Backend Store</strong>: benefit from a managed <strong>PostgreSQL</strong> database (DBaaS).</li>



<li><strong>Artifact Store</strong>: use OVHcloud <strong>Object Storage</strong> (S3-compatible).</li>
</ul>



<figure class="wp-block-image aligncenter size-full"><img decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2025/04/mlflow_overview.svg" alt="" class="wp-image-28688"/><figcaption class="wp-element-caption"><em>MLflow remote server deployment steps</em></figcaption></figure>



<p>In the following tutorial, all services are deployed within the <strong>OVHcloud Public Cloud</strong>.</p>



<h2 class="wp-block-heading">Prerequisites</h2>



<p>Before you begin, ensure you have:</p>



<ul class="wp-block-list">
<li>An <strong>OVHcloud Public Cloud</strong> account</li>



<li>An <strong>OpenStack user</strong> with the following roles:
<ul class="wp-block-list">
<li>Administrator</li>



<li>AI Training Operator</li>



<li>Object Storage Operator</li>
</ul>
</li>
</ul>



<p><strong>🚀 Having all the ingredients for our recipe, it’s time to set up your MLflow remote tracking server!</strong></p>



<h2 class="wp-block-heading">Architecture guide: MLflow remote tracking server</h2>



<p>Let’s go for the set up and deployment of your custom MLflow tracking tool!</p>



<p>⚙️<em> Also consider that all of the following steps can be automated using OVHcloud APIs!</em></p>



<h4 class="wp-block-heading">Step 1 – Install <code>ovhai</code> CLI</h4>



<p>Firstly, start by setting up your CLI environment.</p>



<pre class="wp-block-code"><code class="">curl https://cli.gra.ai.cloud.ovh.net/install.sh | bash</code></pre>



<p>Secondly, login using your <strong>OpenStack credentials</strong>.</p>



<pre class="wp-block-code"><code class="">ovhai login -u &lt;openstack-username&gt; -p &lt;openstack-password&gt;</code></pre>



<p>Now, it&#8217;s time to create your bucket inside OVHcloud Object Storage!</p>



<h4 class="wp-block-heading">Step 2 – Provision Object Storage (Artifact Store)</h4>



<ol class="wp-block-list">
<li>Go to <strong>Public Cloud &gt; Storage &gt; Object Storage</strong> in the OVHcloud Control Panel.</li>



<li>Create a <strong>datastore</strong> and a new <strong>S3 bucket</strong> (e.g., <code>mlflow-s3-bucket</code>).</li>



<li>Register the datastore with the <code>ovhai</code> CLI:</li>
</ol>



<pre class="wp-block-code"><code class="">ovhai datastore add s3 &lt;ALIAS&gt; https://s3.gra.io.cloud.ovh.net/ gra &lt;my-access-key&gt; &lt;my-secret-key&gt; --store-credentials-locally</code></pre>



<h4 class="wp-block-heading">Step 3 – Create PostgreSQL Managed DB (Backend Store)</h4>



<p>1. Navigate to <strong>Databases &amp; Analytics &gt; Databases</strong></p>



<p><strong>2. Create a new <em>PostgreSQL</em> instance with <em>Essential plan</em></strong></p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="627" src="https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-13-1024x627.png" alt="" class="wp-image-28580" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-13-1024x627.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-13-300x184.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-13-768x470.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-13-1536x941.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-13-2048x1254.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p><strong>3. Select <em>Location</em> and <em>Node type</em></strong></p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="661" src="https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-14-1024x661.png" alt="" class="wp-image-28581" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-14-1024x661.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-14-300x194.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-14-768x495.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-14-1536x991.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-14-2048x1321.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p><strong>4. Reset the user password</strong></p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="2384" height="1340" src="https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-15-edited.png" alt="" class="wp-image-28590" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-15-edited.png 2384w, https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-15-edited-300x169.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-15-edited-1024x576.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-15-edited-768x432.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-15-edited-1536x863.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-15-edited-2048x1151.png 2048w" sizes="auto, (max-width: 2384px) 100vw, 2384px" /></figure>



<p><strong>5. Take note of te following parameters</strong></p>



<p>Go to your database dashboard:</p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="640" src="https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-16-1024x640.png" alt="" class="wp-image-28583" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-16-1024x640.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-16-300x188.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-16-768x480.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-16-1536x960.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-16-2048x1280.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Then, copy the <strong>connexion information</strong>:</p>



<pre class="wp-block-code"><code class="">&lt;db_hostname&gt;
&lt;db_username&gt;
&lt;db_password&gt;
&lt;db_name&gt;
&lt;db_port&gt;
&lt;ssl_mode&gt;</code></pre>



<p>Your <strong>Backend Store</strong> is now ready to use!</p>



<h4 class="wp-block-heading">Step 4 -Build you custom MLflow Docker image and </h4>



<p>1. Develop MLflow launching script</p>



<p>Firstly, you have to write a script in bash to launch the server: <strong><em>mlflow_server.sh</em></strong></p>



<pre class="wp-block-code"><code class="">echo "The MLflow server is starting..."

mlflow server \
  --backend-store-uri postgresql://${POSTGRE_USER}:${POSTGRE_PASSWORD}@${PG_HOST}:${PG_PORT}/${PG_DB}?sslmode=${SSL_MODE} \
  --default-artifact-root ${S3_BUCKET_NAME}/ \
  --host 0.0.0.0 \
  --port 5000</code></pre>



<p><strong>2. Create Dockerfile</strong></p>



<p>Install the required Python dependency and give the rights on the<strong> /mlruns</strong> path to the OVHcloud user.</p>



<pre class="wp-block-code"><code class="">FROM ghcr.io/mlflow/mlflow:latest

# Install Python dependencies
RUN pip install psycopg2-binary

COPY mlflow_server.sh .

# Change the ownership of `mlruns` directory to the OVHcloud user (42420:42420)
RUN mkdir -p /mlruns
RUN chown -R 42420:42420 /mlruns

# Start MLflow server inside container
CMD ["bash", "mlflow_server.sh"]</code></pre>



<p><strong>3. Build your custom MLflow docker image</strong></p>



<p>Build the docker image using the previous Dockerfile.</p>



<pre class="wp-block-code"><code class="">docker build . -t mlflow-server-ai-training:latest</code></pre>



<p><strong>4. Tag and push the docker image to your registry</strong></p>



<p>Finally, you can push the Docker image to your registry.</p>



<pre class="wp-block-code"><code class="">docker tag mlflow-server-ai-training:latest &lt;your-registry-address&gt;/mlflow-server-ai-training:latest</code></pre>



<pre class="wp-block-code"><code class="">docker push &lt;your-registry-address&gt;/mlflow-server-ai-training:latest</code></pre>



<p>Congrats! You can now use the Docker image to launch MLflow server.</p>



<h4 class="wp-block-heading">Step 5 &#8211; Start MLflow Tracking Server inside container</h4>



<p>You can use AI Training to start MLflow server inside a job.</p>



<p><strong>1. Using <code>ovhai</code> CLI, run the following command inside terminal</strong></p>



<pre class="wp-block-code"><code class="">ovhai job run --name mlflow-server \
              --default-http-port 5000 \
              --cpu 4 \
              -v mlflow-s3-bucket@DEMO/:/artifacts:RW:cache \
              -e POSTGRE_USER=avnadmin \
              -e POSTGRE_PASSWORD=&lt;db_password&gt; \
              -e S3_ENDPOINT=https://s3.gra.io.cloud.ovh.net/ \
              -e S3_BUCKET_NAME=mlflow-s3-bucket \
              -e PG_HOST=&lt;db_hostname&gt; \
              -e PG_DB=defaultdb \
              -e PG_PORT=20184 \
              -e SSL_MODE=require \
              &lt;your_registry_address&gt;/mlflow-server-ai-training:latest</code></pre>



<p><em>Full command explained:</em></p>



<ul class="wp-block-list">
<li><code>ovhai job run</code></li>
</ul>



<p>This is the core command to <strong>run a job</strong> using the <strong>OVHcloud AI Training</strong> platform.</p>



<ul class="wp-block-list">
<li><code>--name mlflow-server</code></li>
</ul>



<p>Sets a <strong>custom name</strong> for the job. For example, <code>mlflow-server</code>.</p>



<ul class="wp-block-list">
<li><code>--default-http-port 5000</code></li>
</ul>



<p>Exposes <strong>port 5000</strong> as the default HTTP endpoint. MLflow’s web UI typically runs on port 5000, so this ensures the UI is accessible once the job is running.</p>



<ul class="wp-block-list">
<li><code>--cpu </code>4</li>
</ul>



<p>Allocates <strong>4 CPUs</strong> for the job. You can adjust this based on how heavy your MLflow workload is.</p>



<ul class="wp-block-list">
<li><code>-v mlflow-s3-bucket@DEMO/:/artifacts:RW:cache</code></li>
</ul>



<p>This mounts your <strong>OVHcloud Object Storage volume</strong> into the job’s file system:<br>&#8211; <code>mlflow-s3-bucket@DEMO/</code>: refers to your <strong>S3 bucket volume</strong> from the OVHcloud Object Storage<br>&#8211; <code>:/artifacts</code>: mounts the volume into the container under <code>/artifacts</code><br>&#8211; <code>RW</code>: enables <strong>Read/Write</strong> permissions<br>&#8211; <code>cache</code>: enables <strong>volume caching</strong>, improving performance for frequent reads/writes</p>



<ul class="wp-block-list">
<li><code>-e POSTGRE_USER=avnadmin</code></li>



<li><code>-e POSTGRE_PASSWORD=&lt;db_password&gt;</code></li>



<li><code>-e PG_HOST=&lt;db_hostname&gt;</code></li>



<li><code>-e PG_DB=defaultdb</code></li>



<li><code>-e PG_PORT=20184</code></li>



<li><code>-e SSL_MODE=require</code></li>
</ul>



<p>These are <strong>environment variables</strong> for connecting to the <strong>PostgreSQL </strong>backend store:<br>&#8211; <code>avnadmin</code>: the default admin user for OVHcloud’s managed PostgreSQL<br>&#8211; <code>POSTGRE_PASSWORD</code>: must be replaced with your actual database password<br>&#8211; <code>PG_HOST</code>: the hostname of your managed PostgreSQL instance<br>&#8211; <code>PG_DB</code>: the name of the database to use (default: <code>defaultdb</code>)<br>&#8211; <code>PG_PORT</code>: the port your PostgreSQL server is listening on<br>&#8211; <code>SSL_MODE</code>: enforce SSL connection to secure DB traffic</p>



<ul class="wp-block-list">
<li><code>-e S3_ENDPOINT=https://s3.gra.io.cloud.ovh.net/</code></li>
</ul>



<p>Tells MLflow where the <strong>S3-compatible endpoint</strong> is hosted. This is specific to OVHcloud&#8217;s GRA (Gravelines) region Object Storage.</p>



<ul class="wp-block-list">
<li><code>-e S3_BUCKET_NAME=mlflow-s3-bucket</code></li>
</ul>



<p>Sets the <strong>name of the S3 bucket</strong> where MLflow should store artifacts (models, metrics, etc.).</p>



<ul class="wp-block-list">
<li><code>&lt;your_registry_address&gt;/mlflow-server-ai-training:latest</code></li>
</ul>



<p>This is the<strong> custom MLflow Docker image</strong> you are running inside the job.</p>



<p><strong>2. Check if your AI Training job is RUNNING</strong></p>



<p>Replace the <code>&lt;job_id&gt;</code> by yours.</p>



<pre class="wp-block-code"><code class="">ovhai job get &lt;job_id&gt;</code></pre>



<p>You should obtain:</p>



<p><code>History:<br>    DATE                  STATE<br>    04-04-25 09:58:00     QUEUED<br>    04-04-25 09:58:01     INITIALIZING<br>    04-04-25 09:58:07     PENDING<br>    04-04-25 09:58:10     <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">RUNNING</mark></strong><br>  Info:<br>    Message:   Job is running</code></p>



<p><strong>3. Recover the IP and external IP of your AI Training job</strong></p>



<p>Using, your <code>&lt;job_id&gt;</code>, you can retrieve your AI Training <strong>job IP</strong>.</p>



<pre class="wp-block-code"><code class="">ovhai job get &lt;job_id&gt; -o json | jq '.status.ip' -r</code></pre>



<p>For example, you can obtain something like that: <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">10.42.80.176</mark></strong></p>



<p>You also need the External IP:</p>



<pre class="wp-block-code"><code class="">ovhai job get &lt;job_id&gt; -o json | jq '.status.externalIp' -r</code></pre>



<p>Returning the IP address you will have to whitelist to be able to connect to your database (e.g. <mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><strong>51.210.38.188</strong></mark>)</p>



<h4 class="wp-block-heading">Step 6 – Whitelist AI Training job IP in PostgreSQL DB</h4>



<p>From <strong>Databases &amp; Analytics &gt; Databases</strong>, edit your DB configuration to <strong>allow access from the job Extranal IP</strong>.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="475" src="https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-19-1024x475.png" alt="" class="wp-image-28593" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-19-1024x475.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-19-300x139.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-19-768x356.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-19-1536x712.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-19-2048x950.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Then, you can see that the job External IP is now white listed.</p>



<p>Well done! Your MLflow server and the backend store are now connected.</p>



<h4 class="wp-block-heading">Step 7 –  Create an AI Notebook</h4>



<p>It&#8217;s time to train and track your Machine Learning models using MLflow!</p>



<p>To do so, use the OVHcloud <code>ovhai</code> CLI and start a new AI Notebook with GPU.</p>



<pre class="wp-block-code"><code class="">ovhai notebook run conda jupyterlab \
  --name mlflow-notebook \
  --framework-version conda-py311-cudaDevel11.8 \
  --gpu 1</code></pre>



<p><em>Full command explained:</em></p>



<ul class="wp-block-list">
<li><code>ovhai noteb</code>ook<code> run</code></li>
</ul>



<p>This is the core command to <strong>run a notebook</strong> using the <strong>OVHcloud AI Notebooks</strong> platform.</p>



<ul class="wp-block-list">
<li><code>--name mlflow-notebook</code></li>
</ul>



<p>Sets a <strong>custom name</strong> for the notebook. In this case, you can name it <code>mlflow-notebook</code>.</p>



<ul class="wp-block-list">
<li><code>--framework-version conda-py311-cudaDevel11.8</code></li>
</ul>



<p>Define the framework and version you want to use in your notebook. Here, you are using Python 3.11 with Conda framework and CUDA compatibility.</p>



<ul class="wp-block-list">
<li><code>--gpu 1</code></li>
</ul>



<p>Allocates <strong>1 GPU</strong> for the job, by default a <strong>Tesla V100S</strong> from NVIDIA (<code>ai1-1-gpu</code>). You can select the flavor you want from the OVHcloud GPU range.</p>



<p>Then, check if your AI Notebook is RUNNING.</p>



<pre class="wp-block-code"><code class="">ovhai notebook get &lt;notebook_id&gt;</code></pre>



<p>Once your notebook is in RUNNING status, you should be able to access it using its URL:</p>



<p><code>State:          <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">RUNNING</mark></strong><br>Duration:       1411412   <br>Url:            <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">https://&lt;notebook_id&gt;.notebook.gra.ai.cloud.ovh.net</mark></strong><br>Grpc Address:   &lt;notebook_id&gt;.nb-grpc.gra.ai.cloud.ovh.net:443<br>Info Url:       https://ui.gra.ai.cloud.ovh.net/notebook/&lt;notebook_id&gt;</code></p>



<p>You can start your AI model development inside notebook.</p>



<h4 class="wp-block-heading">Step 8 – Model training inside Jupyter notebook</h4>



<p>To begin with, set up your notebook environment.</p>



<p><strong>1. Create the <code>requirements.txt</code> file</strong></p>



<pre class="wp-block-code"><code class="">numpy==2.2.3
scipy==1.15.2
mlflow==2.20.3
sklearn==1.6.1</code></pre>



<p><strong>2. Install dependencies</strong></p>



<p>From a notebook cell, launch the following command.</p>



<pre class="wp-block-code"><code class="">!pip3 install -r requirements.txt</code></pre>



<p>Perfect! You can start coding&#8230;</p>



<p><strong> 3. Import Python librairies</strong></p>



<p>Here, you have to import os, mlflow and scikit-learn.</p>



<pre class="wp-block-code"><code class=""># import dependencies
import os
import mlflow
import sklearn
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_diabetes
from sklearn.ensemble import RandomForestRegressor</code></pre>



<p>In another notebook cell, set the MLflow tracking URI. Note that you have to replace <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">10.42.80.176</mark></strong> by your own <strong>job IP</strong>.</p>



<pre class="wp-block-code"><code class="">mlflow.set_tracking_uri("http://<strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">10.42.80.176</mark></strong>:5000")</code></pre>



<p>Then start training your model!</p>



<pre class="wp-block-code"><code class="">mlflow.autolog()

db = load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(db.data, db.target)

# Create and train models.
rf = RandomForestRegressor(n_estimators=100, max_depth=6, max_features=3)
rf.fit(X_train, y_train)

# Use the model to make predictions on the test dataset.
predictions = rf.predict(X_test)</code></pre>



<p><strong>Output:</strong></p>



<p><code>🏃 View run dashing-foal-850 at: http://<strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">10.42.80.176</mark></strong>:5000/#/experiments/0/runs/e7dad7c073634ec28675c0defce2b9ec </code><br><code>🧪 View experiment at: http://<strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">10.42.80.176</mark></strong>:5000/#/experiments/0</code></p>



<p>Congrats! You can now track your model training from<strong> MLflow remote server</strong>&#8230;</p>



<h4 class="wp-block-heading">Step 9 – Track and compare models from MLflow remote server</h4>



<p>Finally, access to MLflow dashboard using the job URL: <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>https://&lt;job_id&gt;.job.gra.ai.cloud.ovh.net</code></mark></strong></p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="578" src="https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-23-1024x578.png" alt="" class="wp-image-28598" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-23-1024x578.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-23-300x169.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-23-768x433.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-23-1536x867.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-23-2048x1155.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Then, you can check your model trainings and evaluations:</p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="577" src="https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-24-1024x577.png" alt="" class="wp-image-28599" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-24-1024x577.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-24-300x169.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-24-768x433.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-24-1536x866.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/04/image-24-2048x1154.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>What a success! You can finally use your MLflow to evaluate, compare and archive your various trainings.</p>



<h4 class="wp-block-heading">Step 10 &#8211; Monitor everything remotely</h4>



<p>You now have a complete Machine Learning pipeline with remote experiment tracking. Access:</p>



<ul class="wp-block-list">
<li><strong>Metrics, Parameters, and Tags</strong> → PostgreSQL</li>



<li><strong>Artifacts (Models, Files)</strong> → S3 bucket</li>
</ul>



<p>This setup is reusable, automatable, and production-ready!</p>



<h2 class="wp-block-heading">What’s next?</h2>



<ul class="wp-block-list">
<li>Automate deployment with <strong><a href="https://eu.api.ovh.com/" data-wpel-link="exclude">OVHcloud APIs</a></strong></li>



<li>Run different training sessions in parallel and compare them with your <strong>remote MLflow tracking server</strong></li>



<li>Use <strong><a href="https://www.ovhcloud.com/fr/public-cloud/ai-deploy/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Deploy</a></strong> to serve your trained models</li>
</ul>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fmlflow-remote-tracking-server-ovhcloud-databases-object-storage-ai-solutions%2F&amp;action_name=Reference%20Architecture%3A%20set%20up%20MLflow%20Remote%20Tracking%20Server%20on%20OVHcloud&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Bare Metal Pod: Genesis</title>
		<link>https://blog.ovhcloud.com/bare-metal-pod-genesis/</link>
		
		<dc:creator><![CDATA[David Mondon]]></dc:creator>
		<pubDate>Tue, 01 Apr 2025 07:10:26 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[bare metal]]></category>
		<category><![CDATA[engineering]]></category>
		<category><![CDATA[Infrastructure]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Security]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=28439</guid>

					<description><![CDATA[Today, we&#8217;re going to embark on a journey of discovery, and unveil our latest product: Bare Metal Pod. You know us for the services we provide: bare metal servers, managed and unmanaged virtualisation platform, our 40+ public cloud services, domain names and telco. This is just the tip of the iceberg, and to understand why [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fbare-metal-pod-genesis%2F&amp;action_name=Bare%20Metal%20Pod%3A%20Genesis&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="683" src="https://blog.ovhcloud.com/wp-content/uploads/2025/03/Copy-of-Blog-post-1200x8001-1-1024x683.png" alt="" class="wp-image-28486" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/03/Copy-of-Blog-post-1200x8001-1-1024x683.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/Copy-of-Blog-post-1200x8001-1-300x200.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/Copy-of-Blog-post-1200x8001-1-768x512.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/Copy-of-Blog-post-1200x8001-1.png 1200w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Today, we&#8217;re going to embark on a journey of discovery, and unveil our latest product: <a href="https://www.ovhcloud.com/en-ie/bare-metal/secnumcloud/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Bare Metal Pod</a>.</p>



<p>You know us for the services we provide: bare metal servers, managed and unmanaged virtualisation platform, our 40+ public cloud services, domain names and telco.</p>



<p>This is just the tip of the iceberg, and to understand why we built and now offer Bare Metal Pod, we have to dig deeper.</p>



<p>So let’s begin this journey exploring the origins of Bare Metal Pod, and in later articles we’ll cover the more technical details—there’s a lot to touch on.</p>



<h3 class="wp-block-heading"><strong>The OVHcloud way: more than just servers</strong></h3>



<p>As a cloud services provider, we supply the different platforms mentioned above. But most importantly, we have to take care of the infrastructure dedicated to these services, from the buildings, power and cooling to the software stack and automation required.</p>



<p>And we’ve been doing just this since 2001. It all started with the opening of our first datacentre in Paris, then building our own servers the next year, and our proprietary water-cooling solution the year after that.</p>



<p>At the core, we are all about <strong>efficiency, automation, and sustainability</strong>:</p>



<ul class="wp-block-list">
<li><strong>Repurposing buildings</strong> as datacentres</li>



<li><strong>Designing our own servers</strong> to optimise performance and cost</li>



<li><strong>Maximising cooling efficiency</strong> to cut waste</li>



<li><strong>Automating everything</strong> to reduce errors and delays</li>
</ul>



<p>And, in all modesty&#8230;. we&#8217;re pretty good at these.</p>



<h3 class="wp-block-heading"><strong>Optimising datacentres like a pro</strong></h3>



<p>Basically, building our own servers in our Croix (FR) and Beauharnois (CA) plants means packing <strong>a ton of servers into a square metre. </strong>We’re talking about 4 custom racks, each hosting 48 servers, all in just 3 sq.m and using up to 160kW of 12V DC power. This gives us a server density of about 5000W per sq/ft, which beats out 90% of the industry.</p>



<p>And on top of that, we’ve got our proprietary water-cooling system—we save energy by not using AC for our servers. To further optimise air cooling, each of our rack is equipped with a large condenser (we call it a <strong>chilled door</strong>) at the rear of the rack, dissipating regular server heat in our water system. This keeps the datacentre comfortably warm for our staff and the network equipment, and extends hardware lifespan (less maintenance, fewer replacements, fewer outages….so <strong>more savings</strong>).</p>



<p>In addition to the physical optimisations we’ve just mentioned is our <strong>automation system</strong>. When a server or a cluster of servers have been assembled and tested in our plant, it’s sent to the datacentre, racked and connected to power, network, and water-cooling systems by our DC staff.</p>



<p>And from there, everything is automated. From server power management, discovery, testing, and readiness checks, to the moment it’s selected by a customer using their Control Panel, and then configured. No human interaction is required, meaning no delay and no error.</p>



<p>And these operations have been optimised and refined for over 20 years.</p>



<h3 class="wp-block-heading"><strong>Enter Project Gold-o-rack</strong></h3>



<p>So in June 2023, a small team was assembled to review, analyse and build a new version of this system. We had 3 goals:</p>



<ul class="wp-block-list">
<li>Provide customers with dedicated <strong>on-premises autonomous racks</strong></li>



<li>Offer custom-built, plug-and-play <strong>Bare Metal Pods</strong></li>



<li>Upgrade the automation and security of our <strong>own datacentres</strong></li>
</ul>



<p>And that’s how <strong>Project Gold-o-rack</strong> came to be—a tribute to <strong>Goldorak (Grendizer)</strong>, the legendary <strong>70s anime mecha</strong> that crushed its enemies with style. Like its namesake, our system is <strong>powerful, autonomous, and unstoppable</strong>.</p>



<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="1024" height="1024" src="https://blog.ovhcloud.com/wp-content/uploads/2025/03/Final.png" alt="" class="wp-image-28440" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/03/Final.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/Final-300x300.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/Final-150x150.png 150w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/Final-768x768.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/Final-70x70.png 70w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Using opensource technology was a must, as we absolutely can’t do without transparency and community support. So we went for <strong>OpenStack</strong>, <strong>Netbox</strong>, <strong>Grafana</strong>, and developed our own network management and automation system, and much more.</p>



<p>By <strong>September 2023</strong>—just <strong>three months later</strong>—we had a fully functional <strong>24U rack</strong>, deployable and operational in <strong>25 minutes</strong>. That’s not just fast—that’s <strong>insanely fast</strong>.</p>



<p>Security was a top priority since these racks would be installed in <strong>third-party datacentres</strong>. We quickly applied for <strong>SecNumCloud qualification</strong>, leveraging our existing compliance expertise.</p>



<p>Then, it hit us: <strong>why not offer this as a full-fledged product?</strong> And that’s how <strong>Bare Metal Pod</strong> came to be—dedicated, secure, and fully automated.</p>



<p>We structured the product into <strong>three key components</strong>:</p>



<ol class="wp-block-list" start="1">
<li><strong>On-Prem Cloud Platform (OPCP):</strong> The autonomous rack, with its own <strong>KMS and encryption mechanisms</strong></li>



<li><strong>Bare Metal Pod:</strong> Built on <strong>OPCP</strong>, hosted in <strong>our datacentres</strong>, and <strong>SecNumCloud-compliant</strong></li>



<li><strong>Cloud Store:</strong> A software catalogue enabling automated deployment within the rack</li>
</ol>



<p>In June 2024, OPCP was ready, just 12 months after the 1st meeting… and shortly after we got the “green light” from the ANSSI, allowing us to pursue the SecNumCloud qualification process.</p>



<p>And if you were at, or watched our Summit Keynote in November 2024, you definitely saw it live…</p>



<figure class="wp-block-image aligncenter size-full"><img loading="lazy" decoding="async" width="576" height="577" src="https://blog.ovhcloud.com/wp-content/uploads/2025/03/Capture-decran-2025-03-28-094957.png" alt="BM POD Summit 2024" class="wp-image-28470" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/03/Capture-decran-2025-03-28-094957.png 576w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/Capture-decran-2025-03-28-094957-300x300.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/Capture-decran-2025-03-28-094957-150x150.png 150w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/Capture-decran-2025-03-28-094957-70x70.png 70w" sizes="auto, (max-width: 576px) 100vw, 576px" /></figure>



<p></p>



<h3 class="wp-block-heading"><strong>What’s under the hood?</strong></h3>



<p>As an autonomous rack, it contains:</p>



<ul class="wp-block-list">
<li> Power Distribution Units</li>



<li> Network equipment for internal and external connectivity</li>



<li> Servers, including a <strong>Pod Controller</strong></li>
</ul>



<p>There are <strong>9 Bare Metal server models</strong> available, from 16 to <strong>256 cores</strong>, from 128 GB to <strong>2.5 TB of memory</strong>, up to 792 TB NVMe SSD (RAW),<strong> Nvidia L4 and L40s GPU</strong> depending on your needs.</p>



<p>And the best part is that you can mix and match them, to build and manage the perfect autonomous rack, while keeping <strong>full control on security and resources</strong>.</p>



<p>We’ve got a total of 607 models in Bare Metal Pod, enough for nearly any configuration and need. And with up to 1500 servers in a single Pod, the possibilities are endless.</p>



<p>And on top of these servers, we are building an automated software library: <strong>the Cloud Store</strong>. Enclosed in the Bare Metal Pod, the Cloud Store will offer the Pod admin a selection of OS, virtualisation platforms and various software that can be <strong>pushed, installed, configured automatically on the servers</strong> in the Pod. This includes built-in <strong>security, monitoring, and logging</strong> integrated in the Pod monitoring tools.</p>



<p>And herein<sup data-fn="116cf438-18fd-4e6b-9424-87a974fecaf9" class="fn"><a href="#116cf438-18fd-4e6b-9424-87a974fecaf9" id="116cf438-18fd-4e6b-9424-87a974fecaf9-link">1</a></sup> lies the main challenge: making sure an entire collection of software from various editors can cohabit and interact with a single, opensource monitoring platform, a KMS, and an IAM without breaking anything…</p>



<h3 class="wp-block-heading"><strong>Coming up next…</strong></h3>



<p>That’s a wrap for now! In the next article, we’ll deep-dive into <strong>hardware, networking, and security</strong>. Stay tuned!</p>



<h3 class="wp-block-heading">Some of the Bare Metal servers options:</h3>



<ul class="wp-block-list">
<li><strong>Scale A1 &#8211; A8</strong>: Equipped with 4th Gen Intel Xeon Gold or AMD EPYC 9004 series processors, these servers provide between 16 to 256 cores and 128 GB to 1 TB of DDR5 ECC RAM. They are suitable for:
<ul class="wp-block-list">
<li>Hosting SaaS and PaaS solutions</li>



<li>Virtualisation</li>



<li>Database hosting</li>



<li>Containerisation and orchestration</li>



<li>Confidential computing</li>



<li>High-performance computing</li>
</ul>
</li>



<li><strong>Scale-GPU 1 &#8211; 3</strong>: Featuring NVIDIA L4 GPU cards (x2 or x4) and up to 1.2 TB of DDR5 ECC RAM, these servers are ideal for:
<ul class="wp-block-list">
<li>3D modelling</li>



<li>Media streaming</li>



<li>Virtual Desktop Infrastructure (VDI)</li>



<li>Data inference</li>
</ul>
</li>
</ul>



<ul class="wp-block-list">
<li><strong>HGR-HCI I1 &#8211; I4</strong>: With dual 5th Gen Intel Xeon Gold or 4th Gen AMD EPYC 9004 series processors, these servers provide between 16 to 72 cores and up to 2.5 TB of DDR5 ECC RAM. They are suitable for:
<ul class="wp-block-list">
<li>Hyperconverged infrastructure</li>



<li>Virtualisation</li>



<li>Database hosting</li>



<li>Containerisation and orchestration</li>



<li>Confidential computing</li>



<li>High-performance computing</li>
</ul>
</li>



<li><strong>HGR-SDS 1 &#8211; 2</strong>: Equipped with dual 5th Gen Intel Xeon Gold processors, these servers offer between 16 to 48 cores and up to 1.5 TB of DDR5 ECC RAM. They are ideal for:
<ul class="wp-block-list">
<li>Software-defined storage solutions</li>



<li>Object storage solutions</li>



<li>Big data</li>



<li>Database hosting</li>
</ul>
</li>



<li><strong>HGR-STOR 1 &#8211; 2</strong>: Featuring a 5th Gen Intel Xeon Gold processor with 36 cores and up to 512 GB of DDR5 ECC RAM, these servers are designed for:
<ul class="wp-block-list">
<li>Archiving</li>



<li>Database hosting</li>



<li>Backup and disaster recovery plans</li>
</ul>
</li>



<li><strong>HGR-AI-2</strong>: Equipped with NVIDIA L40s GPU cards (x2 or x4) and up to 2.3 TB of DDR5 ECC RAM, these servers are optimized for:
<ul class="wp-block-list">
<li>Machine learning</li>



<li>Deep learning</li>
</ul>
</li>
</ul>



<p>(And many other options… you get the idea.)</p>


<ol class="wp-block-footnotes"><li id="116cf438-18fd-4e6b-9424-87a974fecaf9"><a href="https://www.collinsdictionary.com/dictionary/english/herein" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external"> </a>My editor liked the word and I found it cool too. <a href="https://www.collinsdictionary.com/dictionary/english/herein" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">https://www.collinsdictionary.com/dictionary/english/herein</a> <a href="#116cf438-18fd-4e6b-9424-87a974fecaf9-link" aria-label="Jump to footnote reference 1">↩︎</a></li></ol><img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fbare-metal-pod-genesis%2F&amp;action_name=Bare%20Metal%20Pod%3A%20Genesis&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Deep Dive into DeepSeek-R1 &#8211; Part 1</title>
		<link>https://blog.ovhcloud.com/deep-dive-into-deepseek-r1-part-1/</link>
		
		<dc:creator><![CDATA[Fabien Ric]]></dc:creator>
		<pubDate>Thu, 06 Mar 2025 09:56:20 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[AI Deploy]]></category>
		<category><![CDATA[AI Endpoints]]></category>
		<category><![CDATA[DeepSeek]]></category>
		<category><![CDATA[LLM]]></category>
		<category><![CDATA[Machine learning]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[OVHcloud]]></category>
		<category><![CDATA[Public Cloud]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=28199</guid>

					<description><![CDATA[Introduction A few weeks ago, the release of the open-source large language model DeepSeek-R1 has taken the AI world by storm. The Chinese research team claimed their new reasoning model was on par with OpenAI&#8217;s flagship model o1, open-sourced the model and gave details about the work behind it. In this blog post series, we [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fdeep-dive-into-deepseek-r1-part-1%2F&amp;action_name=Deep%20Dive%20into%20DeepSeek-R1%20%26%238211%3B%20Part%201&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="512" src="https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-16-1024x512.png" alt="A cute whale with a baseball cap, using a computer, representing DeepSeek." class="wp-image-28353" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-16-1024x512.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-16-300x150.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-16-768x384.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-16-1536x768.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-16.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<h2 class="wp-block-heading">Introduction</h2>



<p>A few weeks ago, the release of the open-source large language model DeepSeek-R1 has taken the AI world by storm. The Chinese research team claimed their new reasoning model was on par with OpenAI&#8217;s flagship model o1, open-sourced the model and gave details about the work behind it.</p>



<p>In this blog post series, we will dive into the DeepSeek-R1 model family and see how you can run it on OVHcloud to build a simple chatbot that handles reasoning.</p>



<p>The &#8220;R&#8221; in DeepSeek-R1 stands for &#8220;Reasoning&#8221;, so let&#8217;s start by defining what a reasoning model is.</p>



<h2 class="wp-block-heading">What are reasoning models?</h2>



<p>Reasoning models are large language models (LLM) capable of reflecting on a problem before generating an answer. Traditionally, LLMs have been improved by spending more compute (more data, increase the number of parameters and the number of training iterations) at training time: it is <strong>training-time compute</strong>. Reasoning models, however, differ with standard LLMs in the way they use <strong>test-time compute</strong>, which means that during inference, they spend more time and resources to generate and refine a better answer.</p>



<p>Reasoning models excel at tasks that require understanding and working through a problem step-by-step, such as mathematics, riddles, puzzles, coding, planning tasks and agentic workflows. They may be counterproductive for use cases that don&#8217;t require reasoning capabilities, such as knowledge facts (for example, <em>who discovered penicillin)</em>.</p>



<p>In a classroom, a reasoning model would be a student that takes time to understand the question, split the problem into manageable steps and detail the resolution process before rushing to write the answer.</p>



<p>Here is a comparison between the outputs of a standard LLM and a reasoning LLM, on an example prompt:</p>



<figure data-wp-context="{&quot;imageId&quot;:&quot;69cd1c9fa4ed1&quot;}" data-wp-interactive="core/image" data-wp-key="69cd1c9fa4ed1" class="wp-block-image aligncenter size-full wp-lightbox-container"><img loading="lazy" decoding="async" width="1029" height="492" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-14.png" alt="A diagram showing the differences between standard LLM and reasoning LLM outputs for a given prompt." class="wp-image-28318" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-14.png 1029w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-14-300x143.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-14-1024x490.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-14-768x367.png 768w" sizes="auto, (max-width: 1029px) 100vw, 1029px" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>



<p>The reasoning model has generated more tokens, showing how it plans to solve the problem, before the actual answer. You can see it generates reasoning content into <code>&lt;think&gt;...&lt;/think&gt; </code>tags, in the case of DeepSeek-R1.</p>



<p>A standard LLM can also show reasoning abilities, that are often more visible when using a technique called <a href="https://arxiv.org/abs/2201.11903" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">Chain-of-Thought prompting (CoT)</a>, by adding phrases such as &#8220;let&#8217;s think step-by-step&#8221; in the prompt.</p>



<p>However, a reasoning LLM has been trained to behave this way. Its reasoning skill is internalized, so it doesn&#8217;t require specific prompting techniques to trigger the chain of thoughts process.</p>



<p>It&#8217;s important to note that DeepSeek-R1 is not the first reasoning model; OpenAI led the way by releasing their o1 model in September 2024.</p>



<p>The two main reasons why DeepSeek-R1 made the headline are its open-source nature, and the paper released by the research team which give many details on how they trained the model, with valuable insight for the open-source community to create reasoning models. Especially, the key highlight of their paper is that they observe the reasoning behavior can emerge only through Reinforcement Learning (RL), without fine-tuning.</p>



<h2 class="wp-block-heading">The DeepSeek-R1 model family</h2>



<p>You may have heard about DeepSeek-R1 but it&#8217;s not the only model of the DeepSeek family: DeepSeek-V3, DeepSeek-R1-Zero, and distilled models, are also available. So what are the differences between those models?</p>



<p>First, let&#8217;s go through some definitions and an overview of how language models are trained.</p>



<h3 class="wp-block-heading">Language model training overview</h3>



<p>The large language models available in apps and playgrounds are usually trained in 3 steps:</p>



<ol class="wp-block-list">
<li>A <strong>base model</strong> is trained on an unsupervised language modeling task (for instance, next token prediction) with a dataset of trillions of tokens (also called <em>pre-training</em>),</li>



<li>An <strong>instruct model </strong>is trained from the base model, by fine-tuning it on a massive dataset of instructions, conversations, questions and answers, to improve the performances of the model with the prompts frequently encountered in a chat,</li>



<li>The <strong>final model</strong> is the instruct model trained to better handle human preferences, avoid the generation of harmful content, etc. with techniques such as RLHF (reinforcement learning from human feedback) and DPO (direct policy optimization).</li>
</ol>



<figure data-wp-context="{&quot;imageId&quot;:&quot;69cd1c9fa5617&quot;}" data-wp-interactive="core/image" data-wp-key="69cd1c9fa5617" class="wp-block-image aligncenter size-full wp-lightbox-container"><img loading="lazy" decoding="async" width="1459" height="239" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://blog.ovhcloud.com/wp-content/uploads/2025/03/image.png" alt="A diagram showing the 3 training steps of a LLM." class="wp-image-28268" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/03/image.png 1459w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-300x49.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-1024x168.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-768x126.png 768w" sizes="auto, (max-width: 1459px) 100vw, 1459px" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>



<p></p>



<h3 class="wp-block-heading">DeepSeek-V3 training</h3>



<p>According to the <a href="https://arxiv.org/pdf/2412.19437" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">technical report provided by DeepSeek</a>, DeepSeek-V3 is a mixture-of-experts (MoE) language model trained with the same kind of process, which is described in the image below:</p>



<ul class="wp-block-list">
<li><strong>DeepSeek-V3-Base</strong> is trained with 14.8 trillion tokens,</li>



<li>A dataset of 1.5 million instructions examples is used to fine-tune the base model,</li>



<li>This instruct model goes through reinforcement learning with several reward models. The final model is <strong>DeepSeek-V3</strong>.</li>
</ul>



<figure data-wp-context="{&quot;imageId&quot;:&quot;69cd1c9fa5c7d&quot;}" data-wp-interactive="core/image" data-wp-key="69cd1c9fa5c7d" class="wp-block-image aligncenter size-full wp-lightbox-container"><img loading="lazy" decoding="async" width="1453" height="242" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-8.png" alt="A diagram showing the 3 training steps of DeepSeek-V3." class="wp-image-28288" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-8.png 1453w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-8-300x50.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-8-1024x171.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-8-768x128.png 768w" sizes="auto, (max-width: 1453px) 100vw, 1453px" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>



<p>For the reinforcement learning step, DeepSeek uses their algorithm called <strong>GRPO</strong> (<a href="https://arxiv.org/pdf/2402.03300" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">group relative policy optimization</a>), which uses several reward models to assess the quality of the content generated by the model. The score given by each reward model is combined into a final score, used to update the model so that it maximizes its global score the next time.</p>



<h3 class="wp-block-heading">DeepSeek-R1 model series training</h3>



<p><strong>DeepSeek-R1</strong> models are built with a different training pipeline, using the base model of DeepSeek-V3. The diagram below shows the main steps of the process designed by DeepSeek to create several reasoning models mentioned in their <a href="https://arxiv.org/pdf/2501.12948" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">technical report</a>:</p>



<figure data-wp-context="{&quot;imageId&quot;:&quot;69cd1c9fa626a&quot;}" data-wp-interactive="core/image" data-wp-key="69cd1c9fa626a" class="wp-block-image aligncenter size-full wp-lightbox-container"><img loading="lazy" decoding="async" width="1262" height="1323" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-12.png" alt="A diagram showing the training process of DeepSeek-R1, DeepSeek-R1-Zero and DeepSeek-Distill models." class="wp-image-28301" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-12.png 1262w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-12-286x300.png 286w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-12-977x1024.png 977w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-12-768x805.png 768w" sizes="auto, (max-width: 1262px) 100vw, 1262px" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>



<p>Let&#8217;s walk through it step-by-step (no pun intended):</p>



<p>1. The main breakthrough described in DeepSeek&#8217;s paper: they managed to train the DeepSeek-V3-Base 671B model to learn the reasoning capability with reinforcement learning only, which doesn&#8217;t require labeled data, as opposed to supervised fine-tuning. They use the same GRPO algorithm as before, with two rewards: one on the accuracy of the generated content, with &#8220;rule-based&#8221; experts instead of full reward models, that are also trained and require significant resources. For example, to assess if the model generated a correct Python code, you could have one expert that compiles the generated code and gives a note based on the number of errors. Another expert would generate test cases and see if the generated code can handle them. The other reward they use is about the format of the model&#8217;s responses, which must follow the  <code>&lt;think&gt;...&lt;think&gt;</code> tags to enclose the reasoning content. The resulting model is <strong>DeepSeek-R1-Zero.</strong> However, it has limitations that make it unsuitable for direct use, such as language mixing and poor readability.</p>



<p>2. To overcome these limitations, DeepSeek uses DeepSeek-R1-Zero to create a cold-start reasoning dataset, augmented with other data from sources not explicitly mentioned. DeepSeek-V3-Base is trained with this cold-start data, before applying a new round of reinforcement learning.</p>



<p>3. They use the same RL approach to get a new reasoning model, that generates a better quality of output. Using this model, they build a 100x bigger reasoning data, growing from 5k to 600k samples, using DeepSeek-V3 as a quality judge. This dataset is then completed with 200k samples generated with DeepSeek-V3 on non-reasoning tasks.</p>



<p>4. A second stage of supervised fine-tuning is done with the dataset built earlier.</p>



<p>5. The model is then aligned with human preferences with a final round of reinforcement learning with a specific human preferences reward. The resulting model is <strong>DeepSeek-R1</strong>.</p>



<p>6. Finally, DeepSeek experimented with fine-tuning much smaller models than DeepSeek-V3 (LLaMa 3.3 70B, Qwen 2.5 32B&#8230;) with the dataset built at step 3. In the paper, they call this process <strong>distillation</strong>. However, it must not be mistaken with the <em>knowledge distillation</em> technique frequently used in deep learning, where a student model learns from the probabilities distribution of a teacher model. Here, the term &#8220;distillation&#8221; refers to the fact that the reasoning skill is &#8220;distilled&#8221; into the base model, but it&#8217;s plain old supervised fine-tuning. This is how the <strong>DeepSeek-R1-Distill </strong>model series is trained. The quality of the dataset enables the resulting distilled models to beat much larger models on reasoning tasks, as show in the benchmark below:</p>



<figure data-wp-context="{&quot;imageId&quot;:&quot;69cd1c9fa6b03&quot;}" data-wp-interactive="core/image" data-wp-key="69cd1c9fa6b03" class="wp-block-image aligncenter size-full is-resized wp-lightbox-container"><img loading="lazy" decoding="async" width="770" height="312" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-13.png" alt="A screen capture of benchmark data table." class="wp-image-28310" style="width:750px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-13.png 770w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-13-300x122.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-13-768x311.png 768w" sizes="auto, (max-width: 770px) 100vw, 770px" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button><figcaption class="wp-element-caption"><em>Benchmark of distilled models on several reasoning tasks (source: DeepSeek R1 technical paper)</em></figcaption></figure>



<h3 class="wp-block-heading">Recap</h3>



<p>The table below summarize the differences between the model of the DeepSeek-R1 series:</p>



<figure class="wp-block-table"><table><tbody><tr><td>Model</td><td>Description</td></tr><tr><td>DeepSeek-R1-Zero</td><td>Intermediate 671B reasoning model trained from DeepSeek-V3 exclusively with reinforcement learning, and used to bootstrap DeepSeek-R1 training.</td></tr><tr><td>DeepSeek-R1</td><td>671B reasoning model trained from DeepSeek-V3.</td></tr><tr><td>DeepSeek-R1-Distill</td><td>Smaller models fine-tuned for reasoning with a dataset generated by an intermediate version of DeepSeek-R1.</td></tr></tbody></table></figure>



<h2 class="wp-block-heading">Run DeepSeek-R1 on OVHcloud</h2>



<p>Now that we&#8217;ve seen the differences between all DeepSeek models, let&#8217;s try to use them!</p>



<h3 class="wp-block-heading">AI Endpoints</h3>



<p>The fastest way to test DeepSeek-R1 is to use OVHcloud<strong> AI Endpoints</strong>.</p>



<p><strong>DeepSeek-R1-Distill-Llama-70B</strong> is already available, ready to use and optimized for inference speed. Check it out here: <a href="https://endpoints.ai.cloud.ovh.net/models/a011515c-0042-41b2-9a00-ec8b5d34462d" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">https://endpoints.ai.cloud.ovh.net/models/a011515c-0042-41b2-9a00-ec8b5d34462d</a></p>



<p>AI Endpoints makes it easy to integrate AI into your applications with a simple API call, without the need for deep AI expertise or infrastructure management. And while it’s in beta, it’s <strong>free</strong>!</p>



<p>Here is an example cURL command to use DeepSeek-R1 Distill Llama 70B on the OpenAI compatible endpoint provided by OVHcloud AI Endpoints:</p>



<pre class="wp-block-code"><code class="">curl -X 'POST' \
  'https://deepseek-r1-distill-llama-70b.endpoints.kepler.ai.cloud.ovh.net/api/openai_compat/v1/chat/completions' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "max_tokens": 4096,
  "messages": [
    {
      "content": "How can I calculate an approximation of Pi in Python?",
      "role": "user"
    }
  ],
  "model": null,
  "seed": null,
  "stream": false,
  "temperature": 0.7,
  "top_p": 1
}'</code></pre>



<p>We can see in the output the thinking process followed by the answer, which have been truncated for clarity.</p>



<pre class="wp-block-code"><code class="">{
    "id": "chatcmpl-8c21b2e3fac44d43b63c06fa25e58091",
    "object": "chat.completion",
    "created": 1741199564,
    "model": "DeepSeek-R1-Distill-Llama-70B",
    "choices":
    [
        {
            "index": 0,
            "message":
            {
                "role": "assistant",
                "content": "&lt;think&gt;\nOkay, the user is asking how to approximate Pi using Python. I need to think about different methods they can use. Let's see, there are a few common approaches. \n\nFirst, there's the Monte Carlo method. ... Let me structure the response with each method as a separate section, explaining what it is, how it works, and providing the code. Then, the user can pick which one they prefer based on their situation.\n&lt;/think&gt;\n\nThere are several ways to approximate the value of Pi (π) using Python. Below are a few methods:\n\n### 1. Using the Monte Carlo Method..."
            },
            "finish_reason": "stop",
            "logprobs": null
        }
    ],
    "usage":
    {
        "prompt_tokens": 14,
        "completion_tokens": 1377,
        "total_tokens": 1391
    }
}</code></pre>



<p>Stéphane Philippart, Developer Relation Advocate at OVHcloud, has written a blog post covering everything you need to know to get up to speed with AI Endpoints and run this model: <a href="https://blog.ovhcloud.com/release-of-deepseek-r1-on-ovhcloud-ai-endpoints/" target="_blank" rel="noreferrer noopener" data-wpel-link="internal">Release of DeepSeek-R1 on OVHcloud AI Endpoints</a></p>



<h3 class="wp-block-heading">AI Deploy</h3>



<p>What if you want to run another version of DeepSeek-R1, such as the Qwen 7B distilled version?</p>



<p>You can use another OVHcloud AI product, <strong>AI Deploy</strong>, to create your own serving endpoint, with <a href="https://docs.vllm.ai/en/stable/" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">vLLM</a> as the inference engine. It is open-source, fast and well maintained, ensuring maximal compatibility with even the most recent AI models.</p>



<p>Eléa Petton, Solution Architect at OVHcloud, has written a blog post explaining in details how to serve an open-source model with vLLM on AI Deploy. Just replace the Mistral Small model with the DeepSeek distilled version you want to use (e.g. <strong>deepseek-ai/DeepSeek-R1-Distill-Qwen-7B</strong>) and adapt the number of L40S cards needed (1 is enough for the 7B version) : <a href="https://blog.ovhcloud.com/mistral-small-24b-served-with-vllm-and-ai-deploy-one-command-to-deploy-llm/" target="_blank" rel="noreferrer noopener" data-wpel-link="internal">Mistral Small 24B served with vLLM and AI Deploy – a single command to deploy an LLM (Part 1)</a></p>



<h3 class="wp-block-heading">Next up, creating a reasoning chatbot with DeepSeek-R1</h3>



<p>In part 2 of this blog post series, we will use a DeepSeek-R1-Distill model to create a chatbot that will handle reasoning gracefully, by showing the thinking process of the model.</p>



<p>We will develop our chatbot with OVHcloud AI Endpoints and the Python library <a href="https://www.gradio.app/" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">Gradio</a>, that enables to quickly create simple chat interfaces.</p>



<p>Here a screenshot of the finalized chatbot we will build:</p>



<figure data-wp-context="{&quot;imageId&quot;:&quot;69cd1c9fa8eb3&quot;}" data-wp-interactive="core/image" data-wp-key="69cd1c9fa8eb3" class="wp-block-image aligncenter size-full wp-lightbox-container"><img loading="lazy" decoding="async" width="723" height="1173" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://blog.ovhcloud.com/wp-content/uploads/2025/03/chatbot.png" alt="A screenshot of a chatbot application developed with DeepSeek-R1 and Gradio in Python." class="wp-image-28328" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/03/chatbot.png 723w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/chatbot-185x300.png 185w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/chatbot-631x1024.png 631w" sizes="auto, (max-width: 723px) 100vw, 723px" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>



<p>Stay tuned for the next article in this DeepSeek-R1 series. In the meantime, try out DeepSeek-R1 on AI Endpoints and AI Deploy and let us know what you &lt;think&gt;!</p>



<h3 class="wp-block-heading">Resources</h3>



<p>If you want to learn more about DeepSeek-R1 and the topics we covered in this blog post, such as test-time compute, GRPO, reinforcement learning and reasoning models, we suggest having a look at these resources:</p>



<ul class="wp-block-list">
<li><a href="https://arxiv.org/pdf/2501.12948" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">DeepSeek-R1 technical report</a>, by the DeepSeek team</li>



<li><a href="https://newsletter.languagemodels.co/p/the-illustrated-deepseek-r1" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">The Illustrated DeepSeek-R1</a>, by Jay Alamar</li>



<li><a href="https://magazine.sebastianraschka.com/p/understanding-reasoning-llms" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">Understanding Reasoning LLMs</a>, by Sebastian Raschka</li>



<li><a href="https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-reasoning-llms" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">A Visual Guide to Reasoning LLMs</a>, by Maarten Grootendorst</li>
</ul>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fdeep-dive-into-deepseek-r1-part-1%2F&amp;action_name=Deep%20Dive%20into%20DeepSeek-R1%20%26%238211%3B%20Part%201&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Managed Valkey: Our Commitment to Open Source and Customer choice.</title>
		<link>https://blog.ovhcloud.com/valkey-open-source-commitment/</link>
		
		<dc:creator><![CDATA[Jonathan Clarke]]></dc:creator>
		<pubDate>Mon, 03 Mar 2025 10:04:51 +0000</pubDate>
				<category><![CDATA[Accelerating with OVHcloud]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[community]]></category>
		<category><![CDATA[customer]]></category>
		<category><![CDATA[Open Source]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=28254</guid>

					<description><![CDATA[OVHcloud strives for openness, expert takes and customer centricity. As part of this commitment, we keep adapting to next-gen industry shifts. Our goal remains the same: to provide our community with top market solutions. Up to now, OVHcloud had offered a Managed Caching service based on the renowned Redis engine. We helped our customers to [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fvalkey-open-source-commitment%2F&amp;action_name=Managed%20Valkey%3A%20Our%20Commitment%20to%20Open%20Source%20and%20Customer%20choice.&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p>OVHcloud strives for <strong>openness, expert takes and customer centricity</strong>. As part of this commitment, we keep adapting to next-gen industry shifts. Our goal remains the same: to provide our community with top market solutions.</p>



<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="841" height="561" src="https://blog.ovhcloud.com/wp-content/uploads/2025/02/Image1.png" alt="" class="wp-image-28243" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/02/Image1.png 841w, https://blog.ovhcloud.com/wp-content/uploads/2025/02/Image1-300x200.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/02/Image1-768x512.png 768w" sizes="auto, (max-width: 841px) 100vw, 841px" /></figure>



<p>Up to now, OVHcloud had offered a <strong>Managed Caching</strong> service based on the renowned <strong>Redis engine</strong>. We helped our customers to effortlessly scale their caching or real-time data management instances. A particularly useful asset for example in e-commerce sites and apps.</p>



<p>As <strong>Redis’ 2024 licensing model changed</strong>, introducing <a href="https://blog.ovhcloud.com/new-redis-licensing-model-and-ovhcloud-managed-databases-for-caching/" data-wpel-link="internal"><strong>dual source-available licences</strong></a> with <a href="https://redis.io/blog/redis-adopts-dual-source-available-licensing/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Redis 7.4</a>. We took the strategic decision to <strong>discontinue this Managed Caching service</strong>. That&#8217;s why we will transition to <strong>Managed Valkey</strong>, its open-source alternative, in the Spring of 2025.</p>



<h3 class="wp-block-heading"><strong>Why Valkey? A fully open-source alternative</strong></h3>



<p><a href="https://valkey.io/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Valkey</a> is an open-source fork of Redis developed under the Linux Foundation. Through Valkey, our users benefit from a seamless and reliable alternative to Redis OSS. Its mission aligns with OVHcloud’s core values:</p>



<ul class="wp-block-list">
<li><strong>100% open-source</strong>: unlike Redis’ new licensing model, Valkey remains fully open-source. True to our words: ensuring freedom, transparency, security and long-term sustainability.</li>



<li><strong>Seamless compatibility</strong>: Valkey is designed as a <strong>drop-in replacement</strong> for Redis. It means customers can continue to use the same commands, data structures, and configurations.</li>



<li><strong>Community-driven innovation</strong>: as an Open-Source project, Valkey benefits from a thriving developer ecosystem, allowing for <strong>collaborative advancements</strong> and continuous improvements.</li>
</ul>



<p>By choosing <strong>Managed Valkey</strong>, OVHcloud is ensuring that customers have access to a <strong>high-performance caching solution</strong>. Both <strong>future-proof and aligned with open-source values</strong>.</p>



<p>On top of that, this choice aligns with our partner <a href="https://aiven.io/blog/introducing-aiven-for-valkey" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Aiven’s decision</a> to stand firmly as an early supporter of and committer to Valkey. Thanks to our <strong>long-term partnership with this unicorn</strong>, OVHcloud is now able to provide multiple managed services to its ecosystem. Lately, we demonstrated our partnership during the last OVHcloud Summit. On this much expected ecosystem event,<a href="https://ovh.to/RLz2ghA" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"> we presented the <strong>many specific use cases</strong></a> we cover together.</p>



<h3 class="wp-block-heading"><strong>What this means for OVHcloud customers</strong>?</h3>



<p><strong>Managed Valkey</strong>, which we will run on version 8.0, ensures full compatibility with Redis OSS 7.2.4. It&#8217;s now making it easier for users to transition their existing applications without disruption:</p>



<ul class="wp-block-list">
<li><strong>Effortless migration</strong>: our teams have designed an <strong>automated transition process</strong>. This requires no changes on the customer’s end and no service disruption.</li>



<li><strong>Full support &amp; documentation</strong>: comprehensive <strong>guides, migration tools, and expert assistance</strong> will be available to facilitate the switch.</li>
</ul>



<h3 class="wp-block-heading"><strong>Looking ahead: OVHcloud’s vision for Managed Databases</strong></h3>



<p>Beyond Valkey, we are constantly expanding our <a href="https://www.ovhcloud.com/en/public-cloud/databases/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong>Managed Databases portfolio</strong></a>. Our goal? To provide <strong>cost-efficient, high-performance, secured and transparent solutions</strong> for our customers. Recent developments include:</p>



<ul class="wp-block-list">
<li><strong>Expanding database engines to new regions</strong> to ensure high availability and low latency worldwide, more recently in Singapore and the USA.</li>



<li><strong>Upcoming ClickHouse Deployment</strong>, enhancing our analytics database functionality by Q3 2025.</li>
</ul>



<p>Our <strong>Databases Public Cloud Roadmap </strong><a href="https://github.com/orgs/ovh/projects/16/views/1?sliceBy%5Bvalue%5D=Managed+Databases" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">on GitHub</a> remains available for customers who want to stay informed about our future developments.</p>



<p>🚀 <strong>Let’s build the future of open source databases – together!</strong></p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fvalkey-open-source-commitment%2F&amp;action_name=Managed%20Valkey%3A%20Our%20Commitment%20to%20Open%20Source%20and%20Customer%20choice.&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Mistral Small 24B served with vLLM and AI Deploy &#8211; a single command to deploy an LLM (Part 1)</title>
		<link>https://blog.ovhcloud.com/mistral-small-24b-served-with-vllm-and-ai-deploy-one-command-to-deploy-llm/</link>
		
		<dc:creator><![CDATA[Eléa Petton]]></dc:creator>
		<pubDate>Mon, 24 Feb 2025 10:08:37 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[AI Deploy]]></category>
		<category><![CDATA[LLM]]></category>
		<category><![CDATA[Machine learning]]></category>
		<category><![CDATA[Mistral]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[OVHcloud]]></category>
		<category><![CDATA[Public Cloud]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=28212</guid>

					<description><![CDATA[You are not dreaming! You can deploy open-source LLM in a single command line. Deploying advanced language models can be a challenge! But this sometimes this arduous task is becoming increasingly accessible, enabling developers to integrate sophisticated AI capabilities into their applications. In this guide, we will walk through deploying the Mistral-Small-24B-Instruct-2501 model using vLLM [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fmistral-small-24b-served-with-vllm-and-ai-deploy-one-command-to-deploy-llm%2F&amp;action_name=Mistral%20Small%2024B%20served%20with%20vLLM%20and%20AI%20Deploy%20%26%238211%3B%20a%20single%20command%20to%20deploy%20an%20LLM%20%28Part%201%29&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p><strong><em>You are not dreaming! You can deploy open-source LLM in a single command line</em>.</strong></p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="724" src="https://blog.ovhcloud.com/wp-content/uploads/2025/02/image_blog_post_mistral_small_ai_deploy-1024x724.png" alt="Rocket in MistralAI colors in a data center with a French rooster showing rapid LLM deployment" class="wp-image-28219" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/02/image_blog_post_mistral_small_ai_deploy-1024x724.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/02/image_blog_post_mistral_small_ai_deploy-300x212.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/02/image_blog_post_mistral_small_ai_deploy-768x543.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/02/image_blog_post_mistral_small_ai_deploy-1536x1086.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/02/image_blog_post_mistral_small_ai_deploy.png 2000w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Deploying advanced language models can be a challenge! But this sometimes this arduous task is becoming increasingly accessible, enabling developers to integrate sophisticated AI capabilities into their applications.</p>



<p>In this guide, we will walk through deploying the <strong><a href="https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Mistral-Small-24B-Instruct-2501</a></strong> model using <strong>vLLM</strong> on OVHcloud&#8217;s <a href="https://www.ovhcloud.com/fr/public-cloud/ai-deploy/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Deploy platform</a>. This combination offers a powerful solution for efficient and scalable AI model serving.</p>



<p>Deploying a model is great, but doing it quickly is even better!</p>



<p>🤯 <strong>What if a single command line was enough?</strong> That&#8217;s the challenge we&#8217;re tackling today!</p>



<h2 class="wp-block-heading">Context</h2>



<p>Before deployment, let’s take a closer look at our key technologies!</p>



<h3 class="wp-block-heading">Mistral Small</h3>



<p>The <code><strong>mistralai/Mistral-Small-24B-Instruct-2501</strong></code> is a 24-billion-parameter instruction-fine-tuned model, renowned for its compact size and performance comparable to larger models.</p>



<p>This model, from <a href="https://mistral.ai/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">MistralAI</a>, is an instruction-fine-tuned version of the base model:&nbsp;<a href="https://huggingface.co/mistralai/Mistral-Small-24B-Base-2501" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Mistral-Small-24B-Base-2501</a>.</p>



<p>To serve this model efficiently, we will utilize vLLM, an open-source library for <strong>LLM inference</strong>.</p>



<h3 class="wp-block-heading">vLLM</h3>



<p><a href="https://docs.vllm.ai/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">vLLM</a> (<strong>Virtual LLM</strong>) is a highly optimized service engine designed to efficiently run large language models. It takes advantage of several key optimizations, such as:</p>



<ul class="wp-block-list">
<li><strong>PagedAttention:</strong> an attention mechanism that reduces memory fragmentation and enables more efficient use of GPU memory</li>



<li><strong>Continuous Batching:</strong> vLLM dynamically adjusts batch sizes in real time, ensuring that the GPU is always used efficiently, even with multiple simultaneous requests</li>



<li><strong>Tensor parallelism:</strong> enables model inference across multiple GPUs to boost performance</li>



<li><strong>Optimized kernel implementations:</strong> vLLM uses custom CUDA kernels for faster execution, reducing latency compared to traditional inference frameworks</li>
</ul>



<p>These features make vLLM one of the best choices for large models such as Mistral Small 24B, enabling low-latency, high-throughput inference on the latest GPUs.</p>



<p>By deploying on OVHcloud&#8217;s AI Deploy platform, you can deploy this model in a single command line.</p>



<h3 class="wp-block-heading">AI Deploy </h3>



<p>OVHcloud AI Deploy is a<strong> Container as a Service</strong> (CaaS) platform designed to help you deploy, manage and scale AI models. It provides a solution that allows you to optimally deploy your applications / APIs based on Machine Learning (ML), Deep Learning (DL) or LLMs.</p>



<p>The key benefits are:</p>



<ul class="wp-block-list">
<li><strong>Easy to use:</strong> bring your own custom Docker image and deploy it in a command line or a few clicks surely</li>



<li><strong>High-performance computing:</strong> a complete range of GPUs available (H100, A100, V100S, L40S and L4)</li>



<li><strong>Scalability and flexibility:</strong> supports automatic scaling, allowing your model to effectively handle fluctuating workloads</li>



<li><strong>Cost-efficient:</strong> billing per minute, no surcharges</li>
</ul>



<p>✅ To go further, some prerequisites must be checked!</p>



<h2 class="wp-block-heading">Prerequisites</h2>



<p>Before you begin, ensure that you have:</p>



<ul class="wp-block-list">
<li><strong>OVHcloud account</strong>: access to the&nbsp;<a href="https://www.ovh.com/auth/?action=gotomanager&amp;from=https://www.ovh.co.uk/&amp;ovhSubsidiary=GB" data-wpel-link="exclude">OVHcloud Control Panel</a></li>



<li><strong>ovhai CLI available:</strong> install the <a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-cli-install-client?id=kb_article_view&amp;sysparm_article=KB0047844" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">ovhai CLI</a></li>



<li><strong>AI Deploy access</strong>: ensure you have a <a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-users?id=kb_article_view&amp;sysparm_article=KB0048170" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">user for AI Deploy</a></li>



<li><strong>Hugging Face access</strong>: create an <a href="https://huggingface.co/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Hugging Face account</a> and generate an <a href="https://huggingface.co/settings/tokens" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">access token</a></li>



<li><strong>Gated model authorization</strong>: be sure you have been granted access to <a href="https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Mistral-Small-24B-Instruct-2501</a> model</li>
</ul>



<p><strong>🚀 Having all the ingredients for our recipe, it&#8217;s time to deploy!</strong></p>



<h2 class="wp-block-heading">Deployment of the Mistral Small 24B LLM</h2>



<p>Let&#8217;s go for the deployment of the model <code><strong>mistralai/Mistral-Small-24B-Instruct-2501</strong></code></p>



<h3 class="wp-block-heading">Manage access tokens</h3>



<p>Export your <a href="https://huggingface.co/settings/tokens" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Hugging Face token</a>.</p>



<pre class="wp-block-code"><code class="">export MY_HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxx</code></pre>



<p><a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-cli-app-token?id=kb_article_view&amp;sysparm_article=KB0035280" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Create a token</a> to access your AI Deploy app once it will be deployed.</p>



<pre class="wp-block-code"><code class="">ovhai token create --role operator ai_deploy_token=my_operator_token</code></pre>



<p>Returning the following output:</p>



<p><code><strong>Id:         47292486-fb98-4a5b-8451-600895597a2b<br>Created At: 20-02-25 11:53:05<br>Updated At: 20-02-25 11:53:05<br>Spec:<br>  Name:           ai_deploy_token=my_operator_token<br>  Role:           AiTrainingOperator<br>  Label Selector: <br>Status:<br>  Value:   XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX<br>  Version: 1</strong></code></p>



<p>You can now store and export your access token:</p>



<pre class="wp-block-code"><code class="">export MY_OVHAI_ACCESS_TOKEN=<span style="background-color: initial; font-family: inherit; font-size: inherit; font-weight: inherit;">XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX</span></code></pre>



<h3 class="wp-block-heading">Launch Mistral Small LLM with AI Deploy</h3>



<p>You are ready to start<strong> Mistral-Small-24B</strong> using vLLM and AI Deploy:</p>



<pre class="wp-block-code"><code class="">ovhai app run --name vllm-mistral-small \
              --default-http-port 8000 \
              --label ai_deploy_token=my_operator_token \
              --gpu 2 \
              --flavor l40s-1-gpu \
              -e OUTLINES_CACHE_DIR=/tmp/.outlines \
              -e HF_TOKEN=$MY_HF_TOKEN \
              -e HF_HOME=/hub \
              -e HF_DATASETS_TRUST_REMOTE_CODE=1 \
              -e HF_HUB_ENABLE_HF_TRANSFER=0 \
              -v standalone:/hub:rw \
              -v standalone:/workspace:rw \
              vllm/vllm-openai:v0.8.2 \
              -- bash -c "python3 -m vllm.entrypoints.openai.api_server \
                        --model mistralai/Mistral-Small-24B-Instruct-2501 \
                        --tensor-parallel-size 2 \
                        --tokenizer_mode mistral \
                        --load_format mistral \
                        --config_format mistral \
                        --dtype half"</code></pre>



<p><strong>How to understand the different parameters of this command?</strong></p>



<h5 class="wp-block-heading">1. Start your AI Deploy app</h5>



<p>Launch a new app using <a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-cli-install-client?id=kb_article_view&amp;sysparm_article=KB0047844" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">ovhai CLI</a> and name it.</p>



<p><code><strong>ovhai app run --name vllm-mistral-small</strong></code></p>



<h5 class="wp-block-heading">2. Define access</h5>



<p>Define the HTTP API port and restrict access to your token.</p>



<p><strong><code>--default-http-port 8000</code><br><code>--label ai_deploy_token=my_operator_token</code></strong></p>



<h5 class="wp-block-heading">3. Configure GPU resources</h5>



<p>Specifies the hardware type (<code><strong>l40s-1-gpu</strong></code>), which refers to an <strong>NVIDIA L40S GPU</strong> and the number (<code><strong>2</strong></code>).</p>



<p><code><strong>--gpu 2<br>--flavor l40s-1-gpu</strong></code></p>



<p><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">⚠️WARNING!</mark></strong> For this model, two L40S are sufficient, but if you want to deploy another model, you will need to check which GPU you need. Note that you can also access to A100 and H100 GPUs for your larger models.</p>



<h5 class="wp-block-heading">4. Set up environment variables</h5>



<p>Configure caching for the <strong>Outlines library</strong> (used for efficient text generation):</p>



<p><code><strong>-e OUTLINES_CACHE_DIR=/tmp/.outlines</strong></code></p>



<p>Pass the <strong>Hugging Face token</strong> (<code>$MY_HF_TOKEN</code>) for model authentication and download:</p>



<p><code><strong>-e HF_TOKEN=$MY_HF_TOKEN</strong></code></p>



<p>Set the <strong>Hugging Face cache directory</strong> to <code>/hub</code> (where models will be stored):</p>



<p><code><strong>-e HF_HOME=/hub</strong></code></p>



<p>Allow execution of <strong>custom remote code</strong> from Hugging Face datasets (required for some model behaviors):</p>



<p><code><strong>-e HF_DATASETS_TRUST_REMOTE_CODE=1</strong></code></p>



<p>Disable <strong>Hugging Face Hub transfer acceleration</strong> (to use standard model downloading):</p>



<p><code><strong>-e HF_HUB_ENABLE_HF_TRANSFER=0</strong></code></p>



<h5 class="wp-block-heading">5. Mount persistent volumes</h5>



<p>Mounts <strong>two persistent storage volumes</strong>:</p>



<ul class="wp-block-list">
<li><code>/hub</code> → Stores Hugging Face model files</li>



<li><code>/workspace</code> → Main working directory</li>
</ul>



<p>The <code>rw</code> flag means <strong>read-write access</strong>.</p>



<p><code><strong>-v standalone:/hub:rw<br>-v standalone:/workspace:rw</strong></code></p>



<h5 class="wp-block-heading">6. Choose the target Docker image</h5>



<p>Uses the <strong><code>v<strong><code>llm/vllm-openai:v0.8.2</code></strong></code></strong> Docker image (a pre-configured vLLM OpenAI API server).</p>



<p><strong><code>vllm/vllm-openai:v0.8.2</code></strong></p>



<h5 class="wp-block-heading">7. Running the model inside the container</h5>



<p>Runs a<strong> bash shell</strong> inside the container and executes a Python command to launch the vLLM API server:</p>



<ul class="wp-block-list">
<li><strong><code>python3 -m vllm.entrypoints.openai.api_server</code></strong> → Starts the OpenAI-compatible vLLM API server</li>



<li><strong><code>--model mistralai/Mistral-Small-24B-Instruct-2501</code></strong> → Loads the <strong>Mistral Small 24B</strong> model from Hugging Face</li>



<li><strong><code>--tensor-parallel-size 2</code></strong> → Distributes the model across <strong>2 GPUs</strong></li>



<li><strong><code>--tokenizer_mode mistral</code></strong> → Uses the <strong>Mistral tokenizer</strong></li>



<li><strong><code>--load_format mistral</code></strong> → Uses Mistral’s model loading format</li>



<li><strong><code>--config_format mistral</code></strong> → Ensures the model configuration follows Mistral&#8217;s standard</li>



<li><strong><code>--dtype half</code></strong> → Uses <strong>FP16 (half-precision floating point)</strong> for optimized GPU performance</li>
</ul>



<p>You can now check if your <strong>AI Deploy</strong> app is alive:</p>



<pre class="wp-block-code"><code class="">ovhai app get &lt;your_vllm_app_id&gt;</code></pre>



<p>💡<strong>Is your app in <code>RUNNING</code> status?</strong> Perfect! You can check in the logs that the server is started&#8230;</p>



<pre class="wp-block-code"><code class="">ovhai app logs &lt;your_vllm_app_id&gt;</code></pre>



<p><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">⚠️WARNING!</mark></strong> This step may take a little time as the template must be loaded&#8230;<br>After a few minutes, you should get the following information in the logs:</p>



<p><code><strong>2025-02-20T13:48:07Z [app] [tcmzt] INFO:     Started server process [13] 2025-02-20T13:48:07Z [app] [tcmzt] INFO:     Waiting for application startup. 2025-02-20T13:48:07Z [app] [tcmzt] INFO:     Application startup complete. 2025-02-20T13:48:07Z [app] [tcmzt] INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)</strong></code></p>



<p>🚦 <strong>Are all the indicators green? </strong>Then it&#8217;s off to inference!</p>



<h3 class="wp-block-heading">Request and send prompt to the LLM</h3>



<p>Launch the following query by asking the question of your choice:</p>



<pre class="wp-block-code"><code class="">curl https://&lt;your_vllm_app_id&gt;.app.gra.ai.cloud.ovh.net/v1/chat/completions \
  -H "Authorization: Bearer $MY_OVHAI_ACCESS_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mistralai/Mistral-Small-24B-Instruct-2501",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Give me the name of OVHcloud’s founder."}
    ],
    "stream": false
  }'</code></pre>



<p>Returning the following result:</p>



<pre class="wp-block-code"><code class="">{
  "id":"chatcmpl-d6ea734b524bd851668e71d4111ba496",
  "object":"chat.completion",
  "created":1740059807,
  "model":"mistralai/Mistral-Small-24B-Instruct-2501",
  "choices":[
    {
      "index":0,
      "message":{
        "role":"assistant",
        "reasoning_content":null, 
        "content":"The founder of OVHcloud is Octave Klaba.",
        "tool_calls":[]
      },
      "logprobs":null,
      "finish_reason":"stop",
      "stop_reason":null
    }
  ],
  "usage":{
    "prompt_tokens":22,
    "total_tokens":35,
    "completion_tokens":13,
    "prompt_tokens_details":null
  },
  "prompt_logprobs":null
}</code></pre>



<h2 class="wp-block-heading">Conclusion</h2>



<p>By following these steps, you have successfully deployed the <code><strong>mistralai/Mistral-Small-24B-Instruct-2501</strong></code> model using <strong>vLLM</strong> on OVHcloud&#8217;s AI Deploy platform. This setup provides a scalable and efficient solution for serving advanced language models in production environments.</p>



<p>For further customization and optimization, refer to the <a href="https://help.ovhcloud.com/csm/en-ie-documentation-public-cloud-ai-and-machine-learning-ai-deploy?id=kb_browse_cat&amp;kb_id=574a8325551974502d4c6e78b7421938&amp;kb_category=3241efc6a052d910f078d4b4ef43651f&amp;spa=1" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">vLLM documentation</a> and <a>OVHcloud AI Deploy resources</a>.</p>



<p>💪 <strong>Challenges taken!</strong> You can now enjoy the power of your LLM deployed in a single command line!</p>



<p>Want even more simplicity? You can also use ready-to-use APIs with <a href="https://endpoints.ai.cloud.ovh.net/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Endpoints</a>!</p>



<p><strong><em>But… what’s next?</em></strong></p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fmistral-small-24b-served-with-vllm-and-ai-deploy-one-command-to-deploy-llm%2F&amp;action_name=Mistral%20Small%2024B%20served%20with%20vLLM%20and%20AI%20Deploy%20%26%238211%3B%20a%20single%20command%20to%20deploy%20an%20LLM%20%28Part%201%29&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Enhancing Kubernetes Security: Detecting Threats in OVHcloud Managed Kubernetes cluster (MKS) Audit Logs with Falco</title>
		<link>https://blog.ovhcloud.com/enhancing-kubernetes-security-detecting-threats-in-ovhcloud-managed-kubernetes-cluster-mks-audit-logs-with-falco/</link>
		
		<dc:creator><![CDATA[Aurélie Vache]]></dc:creator>
		<pubDate>Tue, 11 Feb 2025 08:58:40 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[Tranches de Tech & co]]></category>
		<category><![CDATA[Kubernetes]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[OVHcloud]]></category>
		<category><![CDATA[Security]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=27886</guid>

					<description><![CDATA[Several month ago we discovered Falco, a Cloud Native near real-time threats detection tool, and we saw how to install it on an OVHcloud MKS cluster. Today we will connect our Falco instance to a MKS cluster in order to retrieve Kubernetes Audit Logs events and watch if everything is OK in our cluster. Concretely, [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fenhancing-kubernetes-security-detecting-threats-in-ovhcloud-managed-kubernetes-cluster-mks-audit-logs-with-falco%2F&amp;action_name=Enhancing%20Kubernetes%20Security%3A%20Detecting%20Threats%20in%20OVHcloud%20Managed%20Kubernetes%20cluster%20%28MKS%29%20Audit%20Logs%20with%20Falco&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="484" src="https://blog.ovhcloud.com/wp-content/uploads/2025/02/falco-blogpost-plugin-mks-1-1024x484.jpg" alt="" class="wp-image-28194" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/02/falco-blogpost-plugin-mks-1-1024x484.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/02/falco-blogpost-plugin-mks-1-300x142.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/02/falco-blogpost-plugin-mks-1-768x363.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/02/falco-blogpost-plugin-mks-1-1536x725.jpg 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/02/falco-blogpost-plugin-mks-1.jpg 1749w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Several month ago we discovered <a href="https://falco.org/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Falco</a>, a Cloud Native near real-time threats detection tool, and we saw <a href="https://blog.ovhcloud.com/near-real-time-threats-detection-with-falco-on-ovhcloud-managed-kubernetes/" data-wpel-link="internal">how to install it on an OVHcloud MKS cluster</a>.</p>



<p>Today we will connect our Falco instance to a MKS cluster in order to retrieve <strong>Kubernetes Audit Logs</strong> events and watch if everything is OK in our cluster.</p>



<p>Concretely, in this blog post we will:</p>



<ul class="wp-block-list">
<li>deploy an OVHcloud LDP (Logs Data Platform)</li>



<li>create a data stream into this LDP</li>



<li>connect an OVHcloud MKS cluster to the data stream (to send Audit Logs into it)</li>



<li>use the <strong>k8saudit-ovh</strong> Falco plugin to retrieve in realtime the Audit Logs of a MKS cluster</li>



<li>test a rule and detect security events based on MKS audit logs activity</li>
</ul>



<h2 class="wp-block-heading">Prerequisites</h2>



<p>This blog post presupposes that you already have a working&nbsp;<a href="https://www.ovhcloud.com/fr/public-cloud/kubernetes/" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">OVHcloud Managed Kubernetes</a>&nbsp;(MKS) cluster, and a running instance of Falco.</p>



<p>If it is not the case, follow the <a href="https://blog.ovhcloud.com/near-real-time-threats-detection-with-falco-on-ovhcloud-managed-kubernetes/" data-wpel-link="internal">Near real-time threats detection with Falco on OVHcloud Managed Kubernetes</a> blog post.</p>



<h2 class="wp-block-heading">Deploying a Logs Data Platform (LDP)</h2>



<p>LDP is the managed platform for collecting, processing, analyzing and storing your logs of the OVHcloud products. To be able to access to our Kubernetes clusters Audit Logs we need to deploy a LDP.</p>



<p>Find more information on our&nbsp;dedicated<a href="https://www.ovhcloud.com/en/identity-security-operations/logs-data-platform/" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">&nbsp;LDP page</a>.</p>



<p>We can deploy a LDP through the OVHcloud Control Panel and the API. In this blog post, we will deploy it through the Control Panel.</p>



<p>First, you have to log in to the&nbsp;<a href="https://www.ovh.com/manager/#/dedicated/dbaas/logs/order" target="_blank" rel="noreferrer noopener" data-wpel-link="exclude">OVHcloud Control Panel</a>, click on the <strong>Bare Metal Cloud</strong> section located at the top in the header and then click on the <strong>Logs Data Platform</strong> in the sidebar.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="529" src="https://blog.ovhcloud.com/wp-content/uploads/2025/01/image-1-1024x529.png" alt="" class="wp-image-27901" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/01/image-1-1024x529.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/01/image-1-300x155.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/01/image-1-768x396.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/01/image-1-1536x793.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/01/image-1-2048x1057.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Choose the LDP plan you want: <em>Standard</em> (free) or <em>Enterprise</em> one, depending on your needs.</p>



<p>Select a <strong>region</strong> (<em>North America</em> or <em>Europe</em>). We will choose &#8220;<strong>GRA</strong>&#8221; for this blog post, click on <strong>Order</strong> button and follow the instructions.</p>



<p>After several minutes your LDP will be created. </p>



<p>Refresh the page, click on the new deployed LDP, then enter a password and click on the <strong>Save</strong> button.</p>



<h2 class="wp-block-heading">Creating a Data stream and retrieving the Websocket URL</h2>



<p>Our Kubernetes Audit Logs will be stored in a data stream so click on the <strong>Data stream</strong> tab and then click on the <strong>Add data stream</strong> button.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="466" src="https://blog.ovhcloud.com/wp-content/uploads/2025/01/image-3-1024x466.png" alt="" class="wp-image-27905" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/01/image-3-1024x466.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/01/image-3-300x137.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/01/image-3-768x350.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/01/image-3-1536x700.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/01/image-3-2048x933.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Choose a name of the data stream. On my side I like to call it with the name of my MKS cluster following by &#8220;-audit-logs&#8221; to know easily what it is this data stream for. My MKS cluster&#8217;s name is &#8220;my-rancher-mks-cluster&#8221; so let&#8217;s name it &#8220;my-rancher-mks-cluster-audit-logs&#8221;. Fill the description (mandatory).</p>



<p>The OVHcloud Audit Logs Falco plugin you will use receive the audit logs through Websocket so you need to enable <strong>Websocket broadcasting</strong> then click on the <strong>Save</strong> button.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="730" src="https://blog.ovhcloud.com/wp-content/uploads/2025/01/image-5-1024x730.png" alt="" class="wp-image-27909" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/01/image-5-1024x730.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/01/image-5-300x214.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/01/image-5-768x548.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/01/image-5-1536x1095.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/01/image-5-2048x1460.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Now, to retrieve the Websocket URL of your data stream, click on the<strong> Data stream</strong> tab, then click on the<strong> &#8230;</strong> button (located at the right in the line of your data stream), and click on <strong>Monitor in real time</strong> action.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="674" src="https://blog.ovhcloud.com/wp-content/uploads/2025/01/image-6-1024x674.png" alt="" class="wp-image-27913" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/01/image-6-1024x674.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/01/image-6-300x197.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/01/image-6-768x505.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/01/image-6-1536x1011.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/01/image-6-2048x1347.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Finally, click on the <strong>Action</strong> button and in the <strong>Copy Websocket address</strong>, then save the LDP Websocket URL somewhere ;-).</p>



<p>Note that the Websocket address have this kind of format: <code>w<em>ss://&lt;region&gt;.logs.ovh.com/tail/?tk=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx</em></code></p>



<h2 class="wp-block-heading">Connect a MKS cluster to a LDP data stream</h2>



<p>Now we need to send the Kubernetes Audit Logs of our MKS cluster in the data stream. </p>



<p>For that, in the OVHcloud Control Panel, click on the <strong>Public Cloud</strong> section in the header and then in <strong>Managed Kubernetes Service</strong> in the sidebar.</p>



<p>Click on your Kubernetes cluster (my-rancher-mks-cluster for example), then in the <strong>Logs</strong> tab and click on the <strong>Subscribe</strong> button.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="500" src="https://blog.ovhcloud.com/wp-content/uploads/2025/01/image-7-1024x500.png" alt="" class="wp-image-27917" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/01/image-7-1024x500.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/01/image-7-300x146.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/01/image-7-768x375.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/01/image-7-1536x750.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/01/image-7.png 2040w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Click on the <strong>Add data stream</strong> button to visualize in real time the Audit Logs of your cluster. Then select the LDP instance and click on the <strong>Subscribe</strong> button for the data stream your created:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="544" src="https://blog.ovhcloud.com/wp-content/uploads/2025/01/image-8-1024x544.png" alt="" class="wp-image-27918" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/01/image-8-1024x544.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/01/image-8-300x159.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/01/image-8-768x408.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/01/image-8-1536x815.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/01/image-8.png 2046w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<h2 class="wp-block-heading">Retrieve the MKS Audit Logs with Falco</h2>



<p>Falco can receive <strong>Events</strong>, compare them to a set of <strong>Rules</strong> to determine the actions to perform and generate <strong>Alerts</strong> to different endpoints. </p>



<p>Thanks to the <strong>k8saudit-ovh</strong> plugin, Falco can receive a new sort of <strong>Events</strong>: the Audit Logs of your MKS cluster. These events have also some <a href="https://github.com/falcosecurity/plugins/blob/main/plugins/k8saudit/rules/k8s_audit_rules.yaml" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">rules to follow</a>.</p>



<p>Concretely, when an user will execute some <strong>kubectl</strong> commands in an OVHcloud MKS cluster, Audit Logs will be generated. Falco is listening from them and depending on the configured rules, it will generate some alerts.</p>



<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="961" height="327" src="https://blog.ovhcloud.com/wp-content/uploads/2025/02/image.png" alt="" class="wp-image-28190" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/02/image.png 961w, https://blog.ovhcloud.com/wp-content/uploads/2025/02/image-300x102.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/02/image-768x261.png 768w" sizes="auto, (max-width: 961px) 100vw, 961px" /></figure>



<p>Let&#8217;s install or update a Falco configuration running in a MKS cluster and use this plugin.</p>



<p>Create a <strong>values.yaml</strong> file with the following content:</p>



<pre class="wp-block-code"><code class="">tty: true<br>kubernetes: false<br><br># Just a Deployment with 1 replica (instead of a Daemonset) to have only one Pod that pulls the MKS Audit Logs from a OVHcloud LDP<br>controller:<br>  kind: deployment<br>  deployment:<br>    replicas: 1<br><br>falco:<br>  rule_matching: all<br>  rules_files:<br>    - /etc/falco/k8s_audit_rules.yaml<br>    - /etc/falco/rules.d<br>  plugins:<br>    - name: k8saudit-ovh<br>      library_path: libk8saudit-ovh.so<br>      open_params: "&lt;region&gt;.logs.ovh.com/tail/?tk=&lt;ID&gt;" # Replace with your LDP Websocket URL<br>    - name: json<br>      library_path: libjson.so<br>      init_config: ""<br>  # Plugins that Falco will load. Note: the same plugins are installed by the falcoctl-artifact-install init container.<br>  load_plugins: [k8saudit-ovh, json]<br><br>driver:<br>  enabled: false<br>collectors:<br>  enabled: false<br><br># use falcoctl to install automatically the plugin and the rules<br>falcoctl:<br>  artifact:<br>    install:<br>      enabled: true<br>    follow:<br>      enabled: true<br>  config:<br>    indexes:<br>    - name: falcosecurity<br>      url: https://falcosecurity.github.io/falcoctl/index.yaml<br>    artifact:<br>      allowedTypes:<br>        - plugin<br>        - rulesfile<br>      install:<br>        resolveDeps: false<br>        refs: [k8saudit-rules:0, k8saudit-ovh:0.1, json:0]<br>      follow:<br>        refs: [k8saudit-rules:0]</code></pre>



<p>This <strong>values.yaml </strong>file will install Falco with the <strong>k8saudit-ovh</strong> and the <strong>json</strong> plugins. </p>



<p>Install the latest version of Falco with&nbsp;<code>helm install</code>&nbsp;command:</p>



<pre class="wp-block-code"><code class="">$ helm install falco --create-namespace --namespace falco --values=values.yaml falcosecurity/falco</code></pre>



<p>This command will install the latest version of Falco, with the k8saudit-ovh and json plugins, and create a new&nbsp;<code>falco</code>&nbsp;namespace:</p>



<pre class="wp-block-code"><code class="">$ helm install falco --create-namespace --namespace falco --values=values.yaml falcosecurity/falco

NAME: falco
LAST DEPLOYED: Mon Feb 10 10:15:20 2025
NAMESPACE: falco
STATUS: deployed
REVISION: 1
NOTES:
No further action should be required.</code></pre>



<p>Or if you already have Falco deployed in a Kubernetes cluster, you can use the <code>helm update</code> command instead:</p>



<pre class="wp-block-code"><code class="">$ helm upgrade falco --create-namespace --namespace falco --values=values.yaml falcosecurity/falco</code></pre>



<p>You can check if the Falco pods are correctly running:</p>



<pre class="wp-block-code"><code class="">$ kubectl get pods -n falco

NAME                                      READY   STATUS    RESTARTS   AGE
falco-6b8bc77d8b-v24jr                    2/2     Running   0          96s
falco-falcosidekick-67877d6946-4hmbn      1/1     Running   0          96s
falco-falcosidekick-67877d6946-tpjk6      1/1     Running   0          96s
falco-falcosidekick-ui-78b96fd57d-4wb6q   1/1     Running   0          96s
falco-falcosidekick-ui-78b96fd57d-v7rnm   1/1     Running   0          96s
falco-falcosidekick-ui-redis-0            1/1     Running   0          96s</code></pre>



<p>Wait and execute the command again if the pods are in “Init” or “ContainerCreating” state.</p>



<p>Once the Falco pod is ready, run the following command to see the logs:</p>



<pre class="wp-block-code"><code class="">kubectl logs -l app.kubernetes.io/name=falco -n falco -c falco</code></pre>



<p>You should see logs like that:</p>



<pre class="wp-block-code"><code class="">$ kubectl logs -l app.kubernetes.io/name=falco -n falco -c falco

Mon Feb 10 09:15:35 2025:    /etc/falco/k8s_audit_rules.yaml | schema validation: ok
Mon Feb 10 09:15:35 2025: Hostname value has been overridden via environment variable to: my-pool-1-node-921b61
Mon Feb 10 09:15:35 2025: The chosen syscall buffer dimension is: 8388608 bytes (8 MBs)
Mon Feb 10 09:15:35 2025: Starting health webserver with threadiness 2, listening on 0.0.0.0:8765
Mon Feb 10 09:15:35 2025: Loaded event sources: syscall, k8s_audit
Mon Feb 10 09:15:35 2025: Enabled event sources: k8s_audit
Mon Feb 10 09:15:35 2025: Opening 'k8s_audit' source with plugin 'k8saudit-ovh'
{"hostname":"my-pool-1-node-921b61","output":"09:15:40.698757000: Warning K8s Operation performed by user not in allowed list of users (user=csi-cinder-controller target=csi-6afb06dce281b86b7bab718b5d966dc261b2b1554941ae449519a128cb2e3fb3/volumeattachments verb=patch uri=/apis/storage.k8s.io/v1/volumeattachments/csi-6afb06dce281b86b7bab718b5d966dc261b2b1554941ae449519a128cb2e3fb3/status resp=200)","output_fields":{"evt.time":1739178940698757000,"ka.response.code":"200","ka.target.name":"csi-6afb06dce281b86b7bab718b5d966dc261b2b1554941ae449519a128cb2e3fb3","ka.target.resource":"volumeattachments","ka.uri":"/apis/storage.k8s.io/v1/volumeattachments/csi-6afb06dce281b86b7bab718b5d966dc261b2b1554941ae449519a128cb2e3fb3/status","ka.user.name":"csi-cinder-controller","ka.verb":"patch"},"priority":"Warning","rule":"Disallowed K8s User","source":"k8s_audit","tags":["k8s"],"time":"2025-02-10T09:15:40.698757000Z"}
{"hostname":"my-pool-1-node-921b61","output":"09:15:57.508657000: Warning K8s Operation performed by user not in allowed list of users (user=yacht target=my-pool-1.18051c0a88716868/events verb=patch uri=/api/v1/namespaces/default/events/my-pool-1.18051c0a88716868 resp=403)","output_fields":{"evt.time":1739178957508657000,"ka.response.code":"403","ka.target.name":"my-pool-1.18051c0a88716868","ka.target.resource":"events","ka.uri":"/api/v1/namespaces/default/events/my-pool-1.18051c0a88716868","ka.user.name":"yacht","ka.verb":"patch"},"priority":"Warning","rule":"Disallowed K8s User","source":"k8s_audit","tags":["k8s"],"time":"2025-02-10T09:15:57.508657000Z"}
{"hostname":"my-pool-1-node-921b61","output":"09:15:57.807013000: Warning K8s Operation performed by user not in allowed list of users (user=yacht target=my-pool-1/nodepools verb=update uri=/apis/kube.cloud.ovh.com/v1alpha1/nodepools/my-pool-1/status resp=200)","output_fields":{"evt.time":1739178957807013000,"ka.response.code":"200","ka.target.name":"my-pool-1","ka.target.resource":"nodepools","ka.uri":"/apis/kube.cloud.ovh.com/v1alpha1/nodepools/my-pool-1/status","ka.user.name":"yacht","ka.verb":"update"},"priority":"Warning","rule":"Disallowed K8s User","source":"k8s_audit","tags":["k8s"],"time":"2025-02-10T09:15:57.807013000Z"}</code></pre>



<p>The logs confirm that Falco <strong>k8saudit-ovh</strong> plugin and the <strong>k8saudit</strong> rules have been loaded correctly 💪.</p>



<h2 class="wp-block-heading"> Testing Falco</h2>



<p>In order to test Falco we need to know which rules are installed by default. In our case, as we defined it in the values.yaml file, the <strong>k8saudit-ovh</strong> plugin follow the <a href="https://github.com/falcosecurity/plugins/blob/main/plugins/k8saudit/rules/k8s_audit_rules.yaml" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">k8s_audit_rules.yaml</a> file. You can take a look at them in order to know them.</p>



<p>In this blog post we will test one of well-known default k8s audit rules:</p>



<pre class="wp-block-code"><code class="">- rule: Attach/Exec Pod
  desc: &gt;
    Detect any attempt to attach/exec to a pod
  condition: kevt_started and pod_subresource and (kcreate or kget) and ka.target.subresource in (exec,attach) and not user_known_exec_pod_activities
  output: Attach/Exec to pod (user=%ka.user.name pod=%ka.target.name resource=%ka.target.resource ns=%ka.target.namespace action=%ka.target.subresource command=%ka.uri.param[command])
  priority: NOTICE
  source: k8s_audit
  tags: [k8s]</code></pre>



<p>This rule is interesting because an event will be generated if/when an user execute commands in a pod.</p>



<p>Let&#8217;s test the rule!</p>



<p>In a tab of your terminal, watch the coming logs:</p>



<pre class="wp-block-code"><code class="">$ kubectl logs -l app.kubernetes.io/name=falco -n falco -c falco -f</code></pre>



<p>In an another tab of your terminal, create a Nginx pod and execute a command into it:</p>



<pre class="wp-block-code"><code class="">$ kubectl run nginx --image=nginx<br><br>$ kubectl exec -it nginx -- cat /etc/shadow</code></pre>



<p>Several seconds later, in the logs you should see this you will see this <strong>Attach/Exec to pod</strong> logs:</p>



<pre class="wp-block-code"><code class="">...
{"hostname":"my-pool-1-node-921b61","output":"09:29:46.302906000: Notice Attach/Exec to pod (user=kubernetes-admin pod=nginx-676b6c5bbc-4xc6t resource=pods ns=hello-app action=exec command=cat)","output_fields":{"evt.time":1739179786302906000,"ka.target.name":"nginx-676b6c5bbc-4xc6t","ka.target.namespace":"hello-app","ka.target.resource":"pods","ka.target.subresource":"exec","ka.uri.param[command]":"cat","ka.user.name":"kubernetes-admin"},"priority":"Notice","rule":"Attach/Exec Pod","source":"k8s_audit","tags":["k8s"],"time":"2025-02-10T09:29:46.302906000Z"}
...</code></pre>



<p>🎉</p>



<h2 class="wp-block-heading">Conclusion</h2>



<p>Ensuring the security of Kubernetes clusters is important and in general we have a lot of information in the Audit Logs but we don&#8217;t use them so don&#8217;t hesitate to use this new plugin.</p>



<p>We installed the new k8saudit-ovh plugin in an OVHcloud MKS cluster but note that you can deploy it in a Kubernetes cluster in another Cloud provider and even in a Falco instance running locally 💪.</p>



<p>We visualized the logs/the events in the terminal but you can also visualize them in the <a href="https://github.com/falcosecurity/falcosidekick" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">sidekick</a> UI, create a custom rule and even use <a href="https://github.com/falcosecurity/falco-talon" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Talon</a> to execute some actions.</p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fenhancing-kubernetes-security-detecting-threats-in-ovhcloud-managed-kubernetes-cluster-mks-audit-logs-with-falco%2F&amp;action_name=Enhancing%20Kubernetes%20Security%3A%20Detecting%20Threats%20in%20OVHcloud%20Managed%20Kubernetes%20cluster%20%28MKS%29%20Audit%20Logs%20with%20Falco&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Backdoor in xz/liblzma (CVE-2024-3094)</title>
		<link>https://blog.ovhcloud.com/backdoor-in-xz-liblzma-cve-2024-3094/</link>
		
		<dc:creator><![CDATA[Julien Levrard]]></dc:creator>
		<pubDate>Tue, 02 Apr 2024 12:37:49 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Security]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=26523</guid>

					<description><![CDATA[On March 29th, Andres Freund, a Postgres developer, working at Microsoft, identified a response time while authenticating to openSSH on a Debian Sid installation that was about 500 ms longer as usual. He investigated the behaviour and concluded that liblzma, part of the xz library, was compromised by a complex backdoor injected into distribution packages [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fbackdoor-in-xz-liblzma-cve-2024-3094%2F&amp;action_name=Backdoor%20in%20xz%2Fliblzma%20%28CVE-2024-3094%29&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p>On March 29th, <a href="https://twitter.com/AndresFreundTec" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Andres Freund</a>, a Postgres developer, working at Microsoft, identified a response time while authenticating to openSSH on a Debian Sid installation that was about 500 ms longer as usual. <a href="https://www.openwall.com/lists/oss-security/2024/03/29/4" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">He investigated</a> the behaviour and concluded that liblzma, part of the xz library, was compromised by a complex backdoor injected into distribution packages during build. The versions 5.6.0 and 5.6.1 of the library are impacted. Further investigations led to the discovery of an elaborated supply chain attack scenario. The maintainers team seems to have been infiltrated over a long period of time (several years) by malevolent actors. </p>



<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" width="967" height="1024" src="https://blog.ovhcloud.com/wp-content/uploads/2024/04/cri-967x1024.png" alt="" class="wp-image-26531" style="width:400px" srcset="https://blog.ovhcloud.com/wp-content/uploads/2024/04/cri-967x1024.png 967w, https://blog.ovhcloud.com/wp-content/uploads/2024/04/cri-283x300.png 283w, https://blog.ovhcloud.com/wp-content/uploads/2024/04/cri-768x814.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2024/04/cri.png 1011w" sizes="auto, (max-width: 967px) 100vw, 967px" /></figure>



<p>The story of this backdoor deserves a deep analysis which is out of topic here, but it raises a lot of questions for open-source communities and all the IT sector.</p>



<h2 class="wp-block-heading">What systems are impacted?</h2>



<p>Since the vulnerability has been detected in a relatively short time, no major distribution has already integrated those versions of the XZ library.</p>



<p>Only the distributions with a very fast pace of software integration (Rolling releases, testing, so-called &#8220;unstable&#8221;) had integrated the corrupted version at detection time.</p>



<h2 class="wp-block-heading">As an OVHcloud customers, what are the risks?</h2>



<p>No Linux image provided by OVHcloud to customers for automated installation are impacted. So no customer should be vulnerable to this backdoor using images provided by OVHcloud.</p>



<p>In some corner cases, the backdoor might have been installed on your system:</p>



<ul class="wp-block-list">
<li>If you installed a vulnerable distribution yourself, in the timespan where the compromission was not yet discovered, outside of the OVHcloud automated installation process (for instance, Linux distribution in &#8220;rolling release&#8221; mode)</li>
</ul>



<ul class="wp-block-list">
<li>If you activated edge repositories on your&nbsp;system (for instance, &#8220;experimental&#8221;, &#8220;unstable&#8221; or &#8220;testing&#8221; for Debian, &#8220;edge&#8221; for Alpine, &#8220;update-proposed&#8221; for Ubuntu)</li>
</ul>



<ul class="wp-block-list">
<li>If you installed a software that is&nbsp;&nbsp;packaging the vulnerable version of the library</li>
</ul>



<ul class="wp-block-list">
<li>If you use an alternative package manager&nbsp;(for instance Homebrew)</li>
</ul>



<p>The backdoor is quite complex, so even in such case, you might have deployed the corrupted version of the XZ library, without your system being actually vulnerable. Refer to your distribution/software security advisory page to get more information.</p>



<h2 class="wp-block-heading">How can I check if I use a backdoored version of the library?</h2>



<p>Check your active version of XZ library:</p>



<pre class="wp-block-code"><code class="">debian@lab:~$ strings `which xz` | grep "(XZ Utils)"
xz (XZ Utils) 5.2.5</code></pre>



<p>Note: The command &#8220;xz -V&#8221; would provide a similar output. However, It is not a good practice to execute a binary that might be compromised.</p>



<p>Ensure the active version of XZ library is not part of the known vulnerable ones (5.6.0 and 5.6.1). If you have a compromised version of XZ, follow the security recommendations from your distribution. In some cases, a patch has been released to correct the vulnerability, in other cases, backporting to an older version of the library is recommended.</p>



<h2 class="wp-block-heading">In any case apply the following recommendations:</h2>



<ul class="wp-block-list">
<li>Reduce the exposure of administration&nbsp;interfaces of your server, filter at network level what source IP is allowed to connect to SSH.</li>
</ul>



<ul class="wp-block-list">
<li>Use a bastion to connect to your server&nbsp;for administration (for instance: <a href="https://github.com/ovh/the-bastion" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">https://github.com/ovh/the-bastion</a>)</li>
</ul>



<ul class="wp-block-list">
<li>Perform regular backup of your data and&nbsp;system configurations, and regularly test your ability to rebuild your service from backups</li>
</ul>



<h2 class="wp-block-heading">External references:</h2>



<p><a href="https://www.openwall.com/lists/oss-security/2024/03/29/4" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">https://www.openwall.com/lists/oss-security/2024/03/29/4</a></p>



<p><a href="https://www.cisa.gov/news-events/alerts/2024/03/29/reported-supply-chain-compromise-affecting-xz-utils-data-compression-library-cve-2024-3094" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">https://www.cisa.gov/news-events/alerts/2024/03/29/reported-supply-chain-compromise-affecting-xz-utils-data-compression-library-cve-2024-3094</a></p>



<p><a href="https://lists.debian.org/debian-security-announce/2024/msg00057.html" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">https://lists.debian.org/debian-security-announce/2024/msg00057.html</a></p>



<p><a href="https://news.opensuse.org/2024/03/29/xz-backdoor/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">https://news.opensuse.org/2024/03/29/xz-backdoor/</a></p>



<p><a href="https://access.redhat.com/security/cve/CVE-2024-3094#cve-cvss-v3" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">https://access.redhat.com/security/cve/CVE-2024-3094#cve-cvss-v3</a></p>



<p><a href="https://archlinux.org/news/the-xz-package-has-been-backdoored/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">https://archlinux.org/news/the-xz-package-has-been-backdoored/</a></p>



<p><a href="https://boehs.org/node/everything-i-know-about-the-xz-backdoor" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">https://boehs.org/node/everything-i-know-about-the-xz-backdoor</a></p>



<p><a href="https://gynvael.coldwind.pl/?lang=en&amp;id=782" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">https://gynvael.coldwind.pl/?lang=en&amp;id=782</a></p>



<p><a href="https://www.wiz.io/blog/cve-2024-3094-critical-rce-vulnerability-found-in-xz-utils" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">https://www.wiz.io/blog/cve-2024-3094-critical-rce-vulnerability-found-in-xz-utils</a></p>



<p><a href="https://research.swtch.com/xz-timeline" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">https://research.swtch.com/xz-timeline</a></p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fbackdoor-in-xz-liblzma-cve-2024-3094%2F&amp;action_name=Backdoor%20in%20xz%2Fliblzma%20%28CVE-2024-3094%29&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
