<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>OVHcloud Engineering Archives - OVHcloud Blog</title>
	<atom:link href="https://blog.ovhcloud.com/category/engineering/feed/" rel="self" type="application/rss+xml" />
	<link></link>
	<description>Innovation for Freedom</description>
	<lastBuildDate>Wed, 01 Apr 2026 12:56:38 +0000</lastBuildDate>
	<language>en-GB</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>https://blog.ovhcloud.com/wp-content/uploads/2019/07/cropped-cropped-nouveau-logo-ovh-rebranding-32x32.gif</url>
	<title>OVHcloud Engineering Archives - OVHcloud Blog</title>
	<link></link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Extract Text from Images with OCR using Python and OVHcloud AI Endpoints</title>
		<link>https://blog.ovhcloud.com/extract-text-from-images-with-ocr-using-python-and-ovhcloud-ai-endpoints/</link>
		
		<dc:creator><![CDATA[Stéphane Philippart]]></dc:creator>
		<pubDate>Wed, 01 Apr 2026 12:55:19 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[AI Endpoints]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=30992</guid>

					<description><![CDATA[If you want to have more information on&#160;AI Endpoints, please read the&#160;following blog post.&#160;You can, also, have a look at our&#160;previous blog posts&#160;on how use AI Endpoints. You can find the full code example in the GitHub repository. In this article,&#160;we will explore how to perform OCR&#160;(Optical Character Recognition)&#160;on images using a vision-capable LLM,&#160;the&#160;OpenAI Python library,&#160;and [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fextract-text-from-images-with-ocr-using-python-and-ovhcloud-ai-endpoints%2F&amp;action_name=Extract%20Text%20from%20Images%20with%20OCR%20using%20Python%20and%20OVHcloud%20AI%20Endpoints&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p><em>If you want to have more information on&nbsp;<a href="https://endpoints.ai.cloud.ovh.net/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Endpoints</a>, please read the&nbsp;<a href="https://blog.ovhcloud.com/enhance-your-applications-with-ai-endpoints/" data-wpel-link="internal">following blog post</a>.</em>&nbsp;<em>You can, also, have a look at our&nbsp;<a href="https://blog.ovhcloud.com/tag/ai-endpoints/" data-wpel-link="internal">previous blog posts</a>&nbsp;on how use AI Endpoints.</em></p>



<p><em>You can find the full code example in the <a href="https://github.com/ovh/public-cloud-examples/tree/main/ai/ai-endpoints/python-ocr" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">GitHub repository</a>.</em></p>



<p>In this article,&nbsp;we will explore how to perform OCR&nbsp;(Optical Character Recognition)&nbsp;on images using a vision-capable LLM,&nbsp;the&nbsp;<a href="https://github.com/openai/openai-python" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OpenAI Python library</a>,&nbsp;and OVHcloud&nbsp;<a href="https://endpoints.ai.cloud.ovh.net/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Endpoints</a>.</p>



<h3 class="wp-block-heading">Introduction to OCR with Vision Models</h3>



<p>Optical Character Recognition has been around for decades,&nbsp;but traditional OCR engines often struggle with complex layouts,&nbsp;handwritten text,&nbsp;or noisy images.&nbsp;Vision-capable Large Language Models bring a new approach:&nbsp;instead of relying on specialized OCR pipelines,&nbsp;you can simply send an image to a model that understands both visual and textual content.</p>



<p>In this example,&nbsp;we use the&nbsp;<a href="https://github.com/openai/openai-python" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OpenAI Python library</a>&nbsp;to create a simple OCR script powered by a vision model hosted on OVHcloud&nbsp;<a href="https://endpoints.ai.cloud.ovh.net/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Endpoints</a>.</p>



<p>The whole application is a single Python file:  no complex setup, just <code><strong>pip install openai</strong></code> and you&#8217;re ready to go.</p>



<h3 class="wp-block-heading">Setting up the Environment Variables</h3>



<p>Before running the script, you need to set the following environment variables:</p>



<pre title="Environment variablesexport OVH_AI_ENDPOINTS_ACCESS_TOKEN=&quot;your-access-token&quot; export OVH_AI_ENDPOINTS_MODEL_URL=&quot;https://your-model-url&quot; export OVH_AI_ENDPOINTS_VLLM_MODEL=&quot;your-vision-model-name&quot;" class="wp-block-code"><code lang="" class=" line-numbers">export OVH_AI_ENDPOINTS_ACCESS_TOKEN="your-access-token"<br>export OVH_AI_ENDPOINTS_MODEL_URL="https://your-model-url"<br>export OVH_AI_ENDPOINTS_VLLM_MODEL="your-vision-model-name"</code></pre>



<p>You can find how to create your access token, model URL, and model name in the <a href="https://endpoints.ai.cloud.ovh.net/catalog" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Endpoints catalog</a>. Make sure to choose a <strong>vision-capable model</strong> from the <a href="https://endpoints.ai.cloud.ovh.net/catalog" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Endpoints catalog</a>.</p>



<h3 class="wp-block-heading">Installing Dependencies</h3>



<p>The only dependency is the OpenAI Python library:</p>



<pre title="OpenAI dependency" class="wp-block-code"><code lang="bash" class="language-bash">pip install openai</code></pre>



<h3 class="wp-block-heading">Define the System Prompt</h3>



<p>The first step is to define a system prompt that describes what our OCR service does.&nbsp;This prompt tells the model how to behave:</p>



<pre title="System prompt" class="wp-block-code"><code lang="" class=" line-numbers">SYSTEM_PROMPT = """You are an expert OCR engine.<br>Extract every piece of text visible in the provided image.<br>Preserve the original layout as faithfully as possible (line breaks, columns, tables).<br>Do NOT interpret, summarise, or translate the content.<br>Use markdown formatting to represent the layout (e.g. tables, lists).<br>If the image contains no text, reply with: "No text found."<br>"""</code></pre>



<p>We tell it to behave as an expert OCR engine, to preserve the original layout, and to use markdown formatting for structured content like tables or lists.<br></p>



<h3 class="wp-block-heading">Load the Image</h3>



<p>Before sending the image to the model,&nbsp;we need to encode it as a base64 string.&nbsp;Here is a simple helper function that reads a local PNG file and returns a base64-encoded string:</p>



<pre title="Image loading" class="wp-block-code"><code lang="" class=" line-numbers">import base64<br>from pathlib import Path<br><br>def load_image_as_base64(path: Path) -&gt; str:<br>    """Load a local image and encode it as base64."""<br>    with open(path, "rb") as f:<br>        return base64.b64encode(f.read()).decode("utf-8")</code></pre>



<p>The base64-encoded data is what gets sent to the vision model as part of the prompt.</p>



<p></p>



<h3 class="wp-block-heading">Extract Text from the Image</h3>



<p>The <code><strong>extract_text</strong></code> function sends the image to the vision model and returns the extracted text:</p>



<pre title="Extract text from image" class="wp-block-code"><code lang="" class=" line-numbers">def extract_text(client: OpenAI, image_base64: str, model: str) -&gt; str:<br>    """Extract text from an image using the vision model."""<br>    response = client.chat.completions.create(<br>        model=model,<br>        temperature=0.0,<br>        messages=[<br>            {"role": "system", "content": SYSTEM_PROMPT},<br>            {<br>                "role": "user",<br>                "content": [<br>                    {<br>                        "type": "image_url",<br>                        "image_url": {<br>                            "url": f"data:image/png;base64,{image_base64}"<br>                        }<br>                    }<br>                ]<br>            }<br>        ]<br>    )<br>    return response.choices[0].message.content</code></pre>



<p>The image is passed as a data URL inside the <code><strong>image_url</strong></code> field, following the OpenAI Vision API format. The temperature is set to <code>0.0</code> because we want deterministic, faithful text extraction and not creative output.</p>



<h3 class="wp-block-heading">Configure the Client</h3>



<p>This example uses a vision-capable model hosted on OVHcloud <a href="https://endpoints.ai.cloud.ovh.net/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Endpoints</a>. Since AI Endpoints exposes an OpenAI-compatible API, we use the <code>OpenAI</code> client and just point it to the OVHcloud endpoint:</p>



<pre title="Open AI client configuration" class="wp-block-code"><code lang="" class=" line-numbers">import os<br>from openai import OpenAI<br><br>client = OpenAI(<br>    api_key=os.getenv("OVH_AI_ENDPOINTS_ACCESS_TOKEN"),<br>    base_url=os.getenv("OVH_AI_ENDPOINTS_MODEL_URL"),<br>)<br><br>model_name = os.getenv("OVH_AI_ENDPOINTS_VLLM_MODEL")</code></pre>



<p>A few things to note:</p>



<ul class="wp-block-list">
<li>The <strong>API key</strong>, <strong>base URL</strong>, and <strong>model name</strong> are read from environment variables. </li>



<li>The OpenAI library is compatible with any OpenAI compatible API, making it perfect for use with AI Endpoints.</li>
</ul>



<h3 class="wp-block-heading">Assemble and Run</h3>



<p>With the client configured, extracting text from an image is straightforward:</p>



<pre title="Run the OCR" class="wp-block-code"><code lang="" class=" line-numbers">image_base64 = load_image_as_base64(Path("./doc.png"))<br>result = extract_text(client, image_base64, model_name)<br>print(result)</code></pre>



<p>And that&#8217;s it!</p>



<p>Here is the image used for this example:</p>



<figure class="wp-block-image aligncenter size-full is-resized"><img fetchpriority="high" decoding="async" width="946" height="693" src="https://blog.ovhcloud.com/wp-content/uploads/2026/03/doc-1.png" alt="Used image for OCR example" class="wp-image-31002" style="width:600px" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/03/doc-1.png 946w, https://blog.ovhcloud.com/wp-content/uploads/2026/03/doc-1-300x220.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/03/doc-1-768x563.png 768w" sizes="(max-width: 946px) 100vw, 946px" /></figure>



<p>And the result:</p>



<pre title="Run the OCR" class="wp-block-code"><code lang="" class=" line-numbers">$ python ocr_demo.py<br>📄 Loading image: doc.png<br>🔍 Running OCR with Qwen2.5-VL-72B-Instruct via OVHcloud AI Endpoints...<br><br>📝 Extracted text 📝<br>Every month, the OVHcloud Developer Advocate team creates content, shares knowledge, and connects with the tech community. Here’s a look at what we did in March 2026. 🚀<br><br>🎙️ “Tranches de Tech” – Our monthly podcast<br><br>A new episode of our French-language podcast Tranches de Tech🥑 just dropped!<br><br>🎧 Episode 102: Tranches de Tech #26 – Architecte, c’est une bonne situation ça ?<br><br>This month we sat down with Alexandre Touret, Architect at Worldline to discuss the evolving role of software architects and the growing impact of AI on development practices. From Spotify’s claim that their devs no longer code, to agentic tools like OpenClaw and Claude Code reshaping workflows. We also cover ANSSI’s revised open-source policy, IBM tripling junior hires, and the critical responsibility of mentoring the next generation of developers in an AI-driven world.<br><br>📺 Live on Twitch<br><br>We streamed live on Twitch this month! Here’s what we covered:<br><br>🎥 Rémy Vandepoel discussed with Hugo Allabert and François Loiseau about our Public VCFaaS. Catch the replay on YouTube ▶️.<br><br>🎤 Conference Talks<br><br>The team hit the road (and the stage) at several conferences this month:<br><br>🇳🇱 KubeCon Amsterdam – Amsterdam, Netherlands 🇳🇱<br><br>Aurélie Vache gave a talk: The Ultimate Kubernetes Challenge: An Interactive Trivia Game</code></pre>



<h3 class="wp-block-heading">Conclusion</h3>



<p>In this article,&nbsp;we have seen how to use a vision-capable LLM to perform OCR on images using the&nbsp;<a href="https://github.com/openai/openai-python" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OpenAI Python library</a>&nbsp;and OVHcloud&nbsp;<a href="https://endpoints.ai.cloud.ovh.net/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Endpoints</a>.&nbsp;The OpenAI library makes it very easy to send images to a vision model and extract text,&nbsp;and Python allows us to run the whole thing as a simple script.</p>



<p>You have a dedicated Discord channel&nbsp;(#<em>ai-endpoints</em>)&nbsp;on our Discord server&nbsp;(<em><a href="https://discord.gg/ovhcloud" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">https://discord.gg/ovhcloud</a></em>),&nbsp;see you there!</p>



<p></p>
<img decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fextract-text-from-images-with-ocr-using-python-and-ovhcloud-ai-endpoints%2F&amp;action_name=Extract%20Text%20from%20Images%20with%20OCR%20using%20Python%20and%20OVHcloud%20AI%20Endpoints&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Secure your Software Supply Chain with OVHcloud Managed Private Registry (MPR)</title>
		<link>https://blog.ovhcloud.com/secure-your-software-supply-chain-with-ovhcloud-managed-private-registry-mpr/</link>
		
		<dc:creator><![CDATA[Aurélie Vache]]></dc:creator>
		<pubDate>Fri, 13 Feb 2026 16:40:51 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[Tranches de Tech & co]]></category>
		<category><![CDATA[OVHcloud Managed Private Registry]]></category>
		<category><![CDATA[Public Cloud]]></category>
		<category><![CDATA[Security]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=30357</guid>

					<description><![CDATA[Before an application go to production, it passes through several stages: source code, build, packaging and distribution. But Malicious code &#8211; such as a compromised dependency, breached CI pipeline, or modified package in a registry &#8211; can be introduced at any point in the development cycle, potentially impacting thousands of projects This is precisely where [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fsecure-your-software-supply-chain-with-ovhcloud-managed-private-registry-mpr%2F&amp;action_name=Secure%20your%20Software%20Supply%20Chain%20with%20OVHcloud%20Managed%20Private%20Registry%20%28MPR%29&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<figure class="wp-block-image aligncenter size-full is-resized"><img decoding="async" width="1012" height="1011" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/Gribouillis-2026-01-30-13.25.17.911.png" alt="" class="wp-image-30442" style="aspect-ratio:1.0009787401988517;width:437px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/Gribouillis-2026-01-30-13.25.17.911.png 1012w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/Gribouillis-2026-01-30-13.25.17.911-300x300.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/Gribouillis-2026-01-30-13.25.17.911-150x150.png 150w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/Gribouillis-2026-01-30-13.25.17.911-768x767.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/Gribouillis-2026-01-30-13.25.17.911-70x70.png 70w" sizes="(max-width: 1012px) 100vw, 1012px" /></figure>



<p>Before an application go to production, it passes through several stages: source code, build, packaging and distribution. But Malicious code &#8211; such as a compromised dependency, breached CI pipeline, or modified package in a registry &#8211; can be introduced at any point in the development cycle, potentially impacting thousands of projects</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="581" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-13-1024x581.png" alt="" class="wp-image-30358" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-13-1024x581.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-13-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-13-768x436.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-13.png 1292w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>This is precisely where <strong>Software Supply Chain Security </strong>(SSCS) comes in: to protect not just the code itself, but also how it’s built, delivered, and utilised.</p>



<p>Attacks like SolarWinds and Log4Shell aren’t isolated incidents, but rather subtle indicators that have escalated in severity.</p>



<figure class="wp-block-image aligncenter is-resized"><img loading="lazy" decoding="async" width="800" height="800" src="https://blog.ovhcloud.com/wp-content/uploads/2025/04/managed_private_registry.png" alt="" class="wp-image-28658" style="width:145px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/04/managed_private_registry.png 800w, https://blog.ovhcloud.com/wp-content/uploads/2025/04/managed_private_registry-300x300.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/04/managed_private_registry-150x150.png 150w, https://blog.ovhcloud.com/wp-content/uploads/2025/04/managed_private_registry-768x768.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/04/managed_private_registry-70x70.png 70w" sizes="auto, (max-width: 800px) 100vw, 800px" /></figure>



<p>This blog post explores recommended solutions and best practices for <a href="https://www.ovhcloud.com/en/public-cloud/managed-rancher-service/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><u>OVHcloud Managed</u></a> <a href="https://www.ovhcloud.com/en/public-cloud/managed-rancher-service/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><u>Private Registry</u></a> (MPR), an OCI-compliant artifact registry, to help you enhance your Software Supply Chain Security.</p>



<h3 class="wp-block-heading">Generate a Software Bill Of Materials (SBOM)</h3>



<p>SBOMs provides a list of all the ingredients (OS, libraries, code) and anything that composes the images that will run on your Kubernetes cluster. </p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="383" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-14-1024x383.png" alt="" class="wp-image-30360" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-14-1024x383.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-14-300x112.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-14-768x287.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-14.png 1256w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>From that list, you can find out more about the image, its vulnerabilities, and licenses.</p>



<h4 class="wp-block-heading">Generate an SBOM manually</h4>



<p>To manually generate an SBOM from your image, click the <strong>‘<strong>GENERATE</strong> SBOM’ </strong>button:</p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="280" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.28.13-1024x280.png" alt="" class="wp-image-30361" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.28.13-1024x280.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.28.13-300x82.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.28.13-768x210.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.28.13-1536x420.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.28.13-2048x560.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Within seconds, the <em>SBOM </em>column for your image will display <em>“Queued”</em>, then change to <em>“Generating”</em>, and a <em>“SBOM details”</em> link will appear.</p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="226" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-31-1024x226.png" alt="" class="wp-image-30393" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-31-1024x226.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-31-300x66.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-31-768x170.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-31-1536x340.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-31-2048x453.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Click the &#8216;<strong>SBOM details&#8217;</strong> link to view the SBOM:</p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="557" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.26.38-1024x557.png" alt="" class="wp-image-30367" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.26.38-1024x557.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.26.38-300x163.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.26.38-768x418.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.26.38-1536x835.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.26.38-2048x1114.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Your application’s SBOM is generated by <strong>Trivy </strong>in <strong>SPDX </strong>format. This item is then listed as an accessory for your image in the registry.</p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="130" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-17-1024x130.png" alt="" class="wp-image-30371" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-17-1024x130.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-17-300x38.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-17-768x98.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-17-1536x195.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-17-2048x260.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Click the <strong>&#8216;sbom.harbor&#8217;</strong> accessory type for more details:</p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="629" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-25-1024x629.png" alt="" class="wp-image-30379" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-25-1024x629.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-25-300x184.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-25-768x472.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-25-1536x944.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-25-2048x1259.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<h4 class="wp-block-heading">Generate an SBOM automatically</h4>



<p>Manually generating an SBOM is a good practice, but automating the process is even better. The private registry can automatically generates the SBOM for you once an image is pushed to the desired project.</p>



<p>Click the project your image is part of, navigate to the <em>‘Configuration’</em> tab, then tick the <strong>SBOM generation </strong>checkbox:</p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="538" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-15-1024x538.png" alt="" class="wp-image-30365" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-15-1024x538.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-15-300x158.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-15-768x403.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-15-1536x806.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-15-2048x1075.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<h3 class="wp-block-heading">Vulnerabilities scanning</h3>



<p>We recommend running vulnerability scans on the images to confirm that:</p>



<ul class="wp-block-list">
<li>the images provided are free of any known vulnerabilities (CVEs);</li>



<li>security patches are well integrated before deployment;</li>



<li>the images used in production comply with security and compliance policies.</li>
</ul>



<figure class="wp-block-image aligncenter size-full is-resized"><img loading="lazy" decoding="async" width="406" height="232" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-32.png" alt="" class="wp-image-30395" style="width:329px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-32.png 406w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-32-300x171.png 300w" sizes="auto, (max-width: 406px) 100vw, 406px" /></figure>



<p>There are several vulnerability scanners available, like <a href="https://trivy.dev/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><u>Trivy</u></a>, <a href="https://docs.docker.com/scout/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><u>Docker Scout</u></a>, and <a href="https://github.com/anchore/grype" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><u>Grype</u></a>.</p>



<p>The OVHcloud Managed Private Registry uses Trivy as its default vulnerability scanner, but you can add more scanners if needed. Go to the <em>Administration</em> panel, click <em>‘<strong>Interrogation Services</strong>’</em>, then navigate to the <em>‘<strong>Scanners</strong>’</em> tab:</p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="437" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-33-1024x437.png" alt="" class="wp-image-30400" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-33-1024x437.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-33-300x128.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-33-768x328.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-33-1536x655.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-33-2048x873.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<h4 class="wp-block-heading">Scan your image manually</h4>



<p>To manually run a vulnerability scan on your image, go to your project and click the <strong>SCAN VULNERABILITIES</strong> button:</p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="186" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-35-1024x186.png" alt="" class="wp-image-30406" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-35-1024x186.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-35-300x55.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-35-768x140.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-35-1536x279.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-35-2048x372.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Within a few seconds, a scan will run and reveal any vulnerabilities detected in your image.</p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="442" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.25.21-1024x442.png" alt="" class="wp-image-30404" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.25.21-1024x442.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.25.21-300x129.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.25.21-768x331.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.25.21-1536x662.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.25.21-2048x883.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Click your image to take a look at the CVEs list:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="557" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.25.39-1-1024x557.png" alt="" class="wp-image-30414" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.25.39-1-1024x557.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.25.39-1-300x163.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.25.39-1-768x418.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.25.39-1-1536x835.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.25.39-1-2048x1114.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<h4 class="wp-block-heading">Scan your image automatically</h4>



<p>To automatically scan images on push, click the project your image is part of, then the <em>‘Configuration’ </em>tab, and tick the <strong>‘Vulnerabilities scanning’</strong> checkbox:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="390" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-36-1024x390.png" alt="" class="wp-image-30408" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-36-1024x390.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-36-300x114.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-36-768x293.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-36-1536x585.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-36-2048x781.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<h4 class="wp-block-heading">Schedule vulnerability scans</h4>



<p>Another way to stay informed is by configuring your vulnerability scanner to run scans every day. Go in the <em>Administration </em>panel, click <em>‘<strong>Interrogation</strong> <strong>Services</strong>’</em>, then the <em>‘<strong>Vulnerability</strong>’</em> tab:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="264" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-34-1024x264.png" alt="" class="wp-image-30401" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-34-1024x264.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-34-300x77.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-34-768x198.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-34-1536x396.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-34-2048x528.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>You can choose to schedule the scan Hourly, Daily, Weekly or you can customize when the scan will be triggered.</p>



<p>Scheduled scans ensure that existing images are regularly/periodically analyzed for newly discovered vulnerabilities (CVEs).</p>



<h4 class="wp-block-heading">Prevent vulnerable images from running</h4>



<p>You can also configure a project to prevent vulnerable images from being pulled. In order to do that, check the <strong>Prevent vulnerable images from running</strong> checkbox.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="206" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-40-1024x206.png" alt="" class="wp-image-30430" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-40-1024x206.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-40-300x60.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-40-768x154.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-40.png 1424w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Select the severity level of vulnerabilities to prevent images from running, from None to Critical.</p>



<p>With this configuration, images cannot be pulled if their level is equal to or higher than the selected level of severity.</p>



<h3 class="wp-block-heading">Exploitable vulnerabilities</h3>



<p>When a scanner found vulnerabilities for your images, it is not necessary that they are exploitable in your application/in your image.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="170" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-41-1024x170.png" alt="" class="wp-image-30433" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-41-1024x170.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-41-300x50.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-41-768x128.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-41-1536x255.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-41.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>In this example, my application is build with golang 1.25-alpine, but Trivy found several CVEs that are only exploitable in golang 1.19.1 or less.</p>



<p>In order to remove/skip the &#8220;false positive&#8221;, a solution exists.</p>



<p>VEX (Vulnerability Exploitability eXchange) is a <strong>standard “format”</strong> to state whether a vulnerability is <strong>exploitable</strong> or not in a specific context.</p>



<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" width="1024" height="609" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-43-1024x609.png" alt="" class="wp-image-30435" style="aspect-ratio:1.6814258951355643;width:452px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-43-1024x609.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-43-300x178.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-43-768x456.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-43-1536x913.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-43.png 1681w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>You can generate a VEX file with <a href="https://github.com/openvex/vexctl" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">vexctl</a> or <a href="https://pkg.go.dev/golang.org/x/vuln/cmd/govulncheck" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">govulncheck</a> tools.</p>



<p>Example:</p>



<pre class="wp-block-code"><code class=""># With vexctl<br>$ VULN_ID="CVE-2022-27664"<br>$ PRODUCT="pkg:golang/golang.org/x/net@v0.0.0-20220127200216-cd36cc0744dd"<br>$ vexctl create --file vex.json --author 'Aurélie Vache' --product "pkg:oci/demo@sha256:$HASH?repository_url=$REGISTRY/$HARBOR_PROJECT/demo" --vuln "$VULN_ID" --status 'not_affected' --justification 'vulnerable_code_not_present' --impact-statement "HTTP/2 vulnerability $VULN_ID is not exploitable because the image is compiled with Go 1.20, which contains the patched library."<br><br># With govulncheck (for Go apps)<br>$ govulncheck -format openvex ./... &gt; ../demo.vex.json</code></pre>



<p>For the moment, OVHcloud MPR (managed Harbor) does not support VEX files (and the OpenVEX format) <a href="https://github.com/goharbor/harbor/issues/22720" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">but it is planned in the future</a>.</p>



<p>💡But the good news is that you can configure a CVEs whitelist with the list of not exploitable CVEs to ignore them during vulnerability scanning:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="522" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-42-1024x522.png" alt="" class="wp-image-30434" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-42-1024x522.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-42-300x153.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-42-768x391.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-42-1536x782.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-42.png 1814w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>You can optionally uncheck the <strong>Never expires</strong> checkbox and use the calendar selector to set an expiry date for the allowlist.</p>



<h3 class="wp-block-heading">Sign your images</h3>



<p>It’s recommended to sign your images to ensure they haven’t been modified and originate from your pipeline (CI/CD).</p>



<figure class="wp-block-image aligncenter size-full is-resized"><img loading="lazy" decoding="async" width="278" height="282" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-38.png" alt="" class="wp-image-30412" style="width:128px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-38.png 278w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-38-70x70.png 70w" sizes="auto, (max-width: 278px) 100vw, 278px" /></figure>



<p>Signing your images is crucial for protecting them against compromised registries and unauthorised image replacements.</p>



<p><strong>Without a signature, there’s no guarantee the deployed image is the one you originally built!</strong></p>



<figure class="wp-block-image aligncenter size-full is-resized"><img loading="lazy" decoding="async" width="818" height="302" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-37.png" alt="" class="wp-image-30410" style="aspect-ratio:2.708559106290115;width:482px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-37.png 818w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-37-300x111.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-37-768x284.png 768w" sizes="auto, (max-width: 818px) 100vw, 818px" /></figure>



<p>You can sign your images with <a href="https://github.com/sigstore/cosign" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><u>Sigstore Cosign</u></a> or <a href="https://github.com/notaryproject/notation" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><u>Notation</u></a> tools:</p>



<pre class="wp-block-code"><code class="">$ export HARBOR_PROJECT=supply-chain<br>$ export IMAGE=xxxxxx.c1.de1.container-registry.ovh.net/$HARBOR_PROJECT/demo<br>$ export HASH=$(skopeo inspect docker://${IMAGE}:latest | jq -r .Digest | sed "s/^sha256://")<br><br># Sign with Cosign<br>## Generate a private and a public key<br>$ cosign generate-key-pair<br>## Sign the image with the OCI 1.1 Referrers API<br>$ cosign sign -y --key cosign.key $IMAGE@sha256:$HASH <br><br># Sign with Notation<br>## Generate a RSA key &amp; a self-signed X.509 test certificate<br>$ notation cert generate-test --default "test"<br><br>## Sign the image with the OCI 1.1 Refferrers API<br>$ export NOTATION_EXPERIMENTAL=1 ; notation sign -d --allow-referrers-api ${IMAGE}@sha256:${HASH}</code></pre>



<p>You can use Cosign or Notation to sign your images, OVHcloud MPR supports both.</p>



<p>Your signature will appear beside your image as an accessory, plus a green checkmark ✅ in your column:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="227" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-26-1024x227.png" alt="" class="wp-image-30382" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-26-1024x227.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-26-300x67.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-26-768x170.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-26-1536x341.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-26-2048x455.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>⚠️ Keep in mind, MPR (Harbor) doesn’t support signatures generated by Cosign v3 (the signature will upload and appear as an accessory, but the mark will stay red instead of turning green). This bug should <a href="https://github.com/goharbor/harbor/issues/22401" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><u>be fixed in Harbor 2.15</u></a> 💪.</p>



<p>Signing your OCI artifacts and linking them to your images is recommended, and you can do this using Cosign:</p>



<pre class="wp-block-code"><code class="">$ cosign attest -y --predicate sbom.spdx.json --key cosign.key $IMAGE@sha256:$HASH</code></pre>



<p>They will be uploaded to the OVHcloud private registry and listed as accessories.</p>



<h4 class="wp-block-heading">Ensure only verified images are pushed to your registry’s projects</h4>



<p>To allow only verified/signed images to be deployed on a project, click the project your image is part of, navigate to the <em>‘<strong>Configuration</strong>’</em> tab, and tick the <strong>Cosign</strong> and/or <strong>Notation </strong>checkbox:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="191" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-39-1024x191.png" alt="" class="wp-image-30418" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-39-1024x191.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-39-300x56.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-39-768x143.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-39.png 1406w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>When checked, the registry will only allow verified images to be pulled from the project. Verified images are determined by <strong>Cosign</strong> or <strong>Notation</strong>, depending on the policy you have checked. Note that if you have both Cosign and Notation policies enforced, then images will need to be signed by both Cosign and Notation to be pulled.</p>



<h3 class="wp-block-heading">Tag immutability</h3>



<p>By default, tags are mutables, it means that you can push an image demo with the tag 1.0.0, do a modification in the code and push again to this same tag.</p>



<p>It could be useful to fix a bug but in term of security a mutable tag does not guarantee that the image you&#8217;ve built and pushed for the 1.0.0 version is the same image that exists now in the registry.</p>



<p>Moreover, on Harbor (so on OVHcloud MPR), due to limitations in the upstream OCI Distribution specification, the registry does not enforce a strict link between a tag and an image digest.</p>



<p>As a result, a tag can be reassigned to a different artifact. And it causes a side effect on the registry, this causes the tag to migrate across the artifacts and every artifact that has its tag taken away becomes tagless.</p>



<p>To prevent this situation, you can configure tag immutability rules. Tag immutability guarantees that an immutable tagged artifact cannot be deleted, and also cannot be altered in any way such as through re-pushing, re-tagging, or replication from another target registry.</p>



<p>To do that, click on your project and on the <strong>Policy</strong> tab and select <strong>TAG IMMUTABILITY</strong>:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="469" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-44-1024x469.png" alt="" class="wp-image-30438" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-44-1024x469.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-44-300x137.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-44-768x352.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-44-1536x704.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-44.png 2030w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>And then click the <strong>ADD RULE</strong> button.</p>



<p>Fill the repositories and tags list according to your needs.</p>



<p>Example:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="522" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-45-1024x522.png" alt="" class="wp-image-30439" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-45-1024x522.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-45-300x153.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-45-768x392.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-45-1536x783.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-45-2048x1044.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>⚠️ You can add a maximum of 15 immutability rules per project.</p>



<h3 class="wp-block-heading">To wrap thing up</h3>



<p>Software supply chain security is super important these days. Everything is changing quickly &#8211; the concept, standards, and tools. So, leveraging useful tools like OVHcloud MPR and knowing how to set them up can boost your Software Supply Chain Security efforts.</p>



<p>To learn more about how to use and configure <a href="https://help.ovhcloud.com/csm/fr-documentation-public-cloud-containers-orchestration-managed-private-registry?id=kb_browse_cat&amp;kb_id=574a8325551974502d4c6e78b7421938&amp;kb_category=7939e6a464282d10476b3689cb0d0ed7&amp;spa=1" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">OVHcloud private registries</a>, don’t hesitate to follow our guides.</p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fsecure-your-software-supply-chain-with-ovhcloud-managed-private-registry-mpr%2F&amp;action_name=Secure%20your%20Software%20Supply%20Chain%20with%20OVHcloud%20Managed%20Private%20Registry%20%28MPR%29&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Reference Architecture: Custom metric autoscaling for LLM inference with vLLM on OVHcloud AI Deploy and observability using MKS</title>
		<link>https://blog.ovhcloud.com/reference-architecture-custom-metric-autoscaling-for-llm-inference-with-vllm-on-ovhcloud-ai-deploy-and-observability-using-mks/</link>
		
		<dc:creator><![CDATA[Eléa Petton]]></dc:creator>
		<pubDate>Tue, 10 Feb 2026 08:51:11 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[AI Deploy]]></category>
		<category><![CDATA[Kubernetes]]></category>
		<category><![CDATA[LLM]]></category>
		<category><![CDATA[MKS]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[OVHcloud]]></category>
		<category><![CDATA[prometheus]]></category>
		<category><![CDATA[Public Cloud]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=30203</guid>

					<description><![CDATA[Take your LLM (Large Language Model) deployment to production level with comprehensive custom autoscaling configuration and advanced vLLM metrics observability. This reference architecture describes a comprehensive solution for deploying, autoscaling and monitoring vLLM-based LLM workloads on OVHcloud infrastructure. It combinesAI Deploy, used for model serving with custom metric autoscaling, and Managed Kubernetes Service (MKS), which [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Freference-architecture-custom-metric-autoscaling-for-llm-inference-with-vllm-on-ovhcloud-ai-deploy-and-observability-using-mks%2F&amp;action_name=Reference%20Architecture%3A%20Custom%20metric%20autoscaling%20for%20LLM%20inference%20with%20vLLM%20on%20OVHcloud%20AI%20Deploy%20and%20observability%20using%20MKS&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p><em><strong>Take your LLM (Large Language Model) deployment to production level with comprehensive custom autoscaling configuration and advanced vLLM metrics observability.</strong></em></p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="538" src="https://blog.ovhcloud.com/wp-content/uploads/2026/02/3-1024x538.jpg" alt="" class="wp-image-30579" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/02/3-1024x538.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/02/3-300x158.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/02/3-768x403.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/02/3.jpg 1200w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption"><em>vLLM metrics monitoring and observability based on OVHcloud infrastructure</em></figcaption></figure>



<p>This reference architecture describes a comprehensive solution for <strong>deploying, autoscaling and monitoring vLLM-based LLM workloads</strong> on OVHcloud infrastructure. It combines<strong>AI Deploy</strong>, used for <strong>model serving with custom metric autoscaling</strong>, and <strong>Managed Kubernetes Service (MKS)</strong>, which hosts the monitoring and observability stack.</p>



<p>By leveraging <strong>application-level Prometheus metrics exposed by vLLM</strong>, AI Deploy can automatically scale inference replicas based on real workload demand, ensuring <strong>high availability, consistent performance under load and efficient GPU utilisation</strong>. This autoscaling mechanism allows the platform to react dynamically to traffic spikes while maintaining predictable latency for end users.</p>



<p>On top of this scalable inference layer, the monitoring architecture provides <strong>observability</strong> through <strong>Prometheus</strong>, <strong>Grafana</strong> and Alertmanager. It enables real-time performance monitoring, capacity planning, and operational insights, while ensuring <strong>full data sovereignty</strong> for organisations running Large Language Models (LLMs) in production environments.</p>



<p><strong>What are the key benefits</strong>?</p>



<ul class="wp-block-list">
<li><strong>Cost-effective</strong>: Leverage managed services to minimise operational overhead</li>



<li><strong>Real-time observability</strong>: Track Time-to-First-Token (TTFT), throughput, and resource utilisation</li>



<li><strong>Sovereign infrastructure</strong>: All metrics and data remain within European datacentres</li>



<li><strong>Production-ready</strong>: Persistent storage, high availability, and automated monitoring</li>
</ul>



<h2 class="wp-block-heading">Context</h2>



<h3 class="wp-block-heading">AI Deploy</h3>



<p>OVHcloud AI Deploy is a<strong>&nbsp;Container as a Service</strong>&nbsp;(CaaS) platform designed to help you deploy, manage and scale AI models. It provides a solution that allows you to optimally deploy your applications/APIs based on Machine Learning (ML), Deep Learning (DL) or Large Language Models (LLMs).</p>



<p><strong>Key points to keep in mind</strong>:</p>



<ul class="wp-block-list">
<li><strong>Easy to use:</strong>&nbsp;Bring your own custom Docker image and deploy it in a command line or a few clicks surely</li>



<li><strong>High-performance computing:</strong>&nbsp;A complete range of GPUs available (H100, A100, V100S, L40S and L4)</li>



<li><strong>Scalability and flexibility:</strong>&nbsp;Supports automatic scaling, allowing your model to effectively handle fluctuating workloads</li>



<li><strong>Cost-efficient:</strong>&nbsp;Billing per minute, no surcharges</li>
</ul>



<h3 class="wp-block-heading">Managed Kubernetes Service</h3>



<p><strong>OVHcloud MKS</strong> is a fully managed Kubernetes platform designed to help you deploy, operate, and scale containerised applications in production. It provides a secure and reliable Kubernetes environment without the operational overhead of managing the control plane.</p>



<p><strong>What should you keep in mind?</strong></p>



<ul class="wp-block-list">
<li><strong>Cost-efficient</strong>: Only pay for worker nodes and consumed resources, with no additional charge for the Kubernetes control plane</li>



<li><strong>Fully managed Kubernetes</strong>: Certified upstream Kubernetes with automated control plane management, upgrades and high availability</li>



<li><strong>Production-ready by design</strong>: Built-in integrations with OVHcloud Load Balancers, networking and persistent storage</li>



<li><strong>Scalability and flexibility</strong>: Easily scale workloads and node pools to match application demand</li>



<li><strong>Open and portable</strong>: Based on standard Kubernetes APIs, enabling seamless integration with open-source ecosystems and avoiding vendor lock-in</li>
</ul>



<p>In the following guide, all services are deployed within the&nbsp;<strong>OVHcloud Public Cloud</strong>.</p>



<h2 class="wp-block-heading">Overview of the architecture</h2>



<p>This reference architecture describes a <strong>complete, secure and scalable solution</strong> to:</p>



<ul class="wp-block-list">
<li>Deploy an LLM with vLLM and <strong>AI Deploy</strong>, benefiting from automatic scaling based on custom metrics to ensure high service availability &#8211; vLLM exposes <code><mark class="has-inline-color has-ast-global-color-0-color"><strong>/metrics</strong></mark></code> via its public HTTPS endpoint on AI Deploy</li>



<li>Collect, store and visualise these vLLM metrics using Prometheus and Grafana on <strong>MKS</strong></li>
</ul>



<figure class="wp-block-image aligncenter size-full"><img loading="lazy" decoding="async" width="1200" height="630" src="https://blog.ovhcloud.com/wp-content/uploads/2026/02/1.jpg" alt="" class="wp-image-30578" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/02/1.jpg 1200w, https://blog.ovhcloud.com/wp-content/uploads/2026/02/1-300x158.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/02/1-1024x538.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/02/1-768x403.jpg 768w" sizes="auto, (max-width: 1200px) 100vw, 1200px" /><figcaption class="wp-element-caption"><em>vLLM metrics monitoring and observability architecture overview</em></figcaption></figure>



<p>Here you will find the main components of the architecture. The solution comprises three main layers:</p>



<ol class="wp-block-list">
<li><strong>Model serving layer</strong> with AI Deploy
<ul class="wp-block-list">
<li>vLLM containers running on top of GPUs for LLM inference</li>



<li>vLLM inference server exposing Prometheus metrics</li>



<li>Automatic scaling based on custom metrics to ensure high availability</li>



<li>HTTPS endpoints with Bearer token authentication</li>
</ul>
</li>



<li><strong>Monitoring and observability infrastructure</strong> using Kubernetes
<ul class="wp-block-list">
<li>Prometheus for metrics collection and storage</li>



<li>Grafana for visualisation and dashboards</li>



<li>Persistent volume storage for long-term retention</li>
</ul>
</li>



<li><strong>Network layer</strong>
<ul class="wp-block-list">
<li>Secure HTTPS communication between components</li>



<li>OVHcloud LoadBalancer for external access</li>
</ul>
</li>
</ol>



<p>To go further, some prerequisites must be checked!</p>



<h2 class="wp-block-heading">Prerequisites</h2>



<p>Before you begin, ensure you have:</p>



<ul class="wp-block-list">
<li>An&nbsp;<strong>OVHcloud Public Cloud</strong>&nbsp;account</li>



<li>An&nbsp;<strong>OpenStack user</strong>&nbsp;with the<a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-users?id=kb_article_view&amp;sysparm_article=KB0048170" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"> </a><strong><code><mark class="has-inline-color has-ast-global-color-0-color">Administrator</mark></code></strong> role</li>



<li><strong>ovhai CLI available</strong> &#8211;&nbsp;<em>install the&nbsp;<a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-cli-install-client?id=kb_article_view&amp;sysparm_article=KB0047844" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">ovhai CLI</a></em></li>



<li>A <strong>Hugging Face access</strong> &#8211; <em>create a&nbsp;<a href="https://huggingface.co/" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">Hugging Face account</a>&nbsp;and generate an&nbsp;<a href="https://huggingface.co/settings/tokens" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">access token</a></em></li>



<li><code><strong><mark class="has-inline-color has-ast-global-color-0-color">kubectl</mark></strong></code> installed and <code><strong><mark class="has-inline-color has-ast-global-color-0-color">helm</mark></strong></code> installed (at least version 3.x)</li>
</ul>



<p><strong>🚀 Now you have all the ingredients for our recipe, it’s time to deploy the Ministral 14B using AI Deploy and vLLM Docker container!</strong></p>



<h2 class="wp-block-heading">Architecture guide: From autoscaling to observability for LLMs served by vLLM</h2>



<p>Let’s set up and deploy this architecture!</p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="538" src="https://blog.ovhcloud.com/wp-content/uploads/2026/02/2-1024x538.jpg" alt="" class="wp-image-30580" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/02/2-1024x538.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/02/2-300x158.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/02/2-768x403.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/02/2.jpg 1200w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption"><em>Overview of the deployment workflow</em></figcaption></figure>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p><strong>✅ <em>Note</em></strong></p>



<p><strong><em>In this example, <a href="https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">mistralai/Ministral-3-14B-Instruct-2512</a> is used. Choose the open-source model of your choice and follow the same steps, adapting the model slug (from Hugging Face), the versions and the GPU(s) flavour.</em></strong></p>
</blockquote>



<p><em>Remember that all of the following steps can be automated using OVHcloud APIs!</em></p>



<h3 class="wp-block-heading">Step 1 &#8211; Manage access tokens</h3>



<p>Before introducing the monitoring stack, this architecture starts with the <strong>deployment of the <strong>Ministral 3 14B</strong> on OVHcloud AI Deploy</strong>, configured to <strong>autoscale based on custom Prometheus metrics exposed by vLLM itself</strong>.</p>



<p>Export your&nbsp;<a href="https://huggingface.co/settings/tokens" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">Hugging Face token</a>.</p>



<pre class="wp-block-code"><code class="">export MY_HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxx</code></pre>



<p><a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-cli-app-token?id=kb_article_view&amp;sysparm_article=KB0035280" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">Create a Bearer token</a>&nbsp;to access your AI Deploy app once it&#8217;s been deployed.</p>



<pre class="wp-block-code"><code class="">ovhai token create --role operator ai_deploy_token=my_operator_token</code></pre>



<p>Returning the following output:</p>



<p><code><strong>Id: 47292486-fb98-4a5b-8451-600895597a2b<br>Created At: 20-01-26 11:53:05<br>Updated At: 20-01-26 11:53:05<br>Spec:<br>Name: ai_deploy_token=my_operator_token<br>Role: AiTrainingOperator<br>Label Selector:<br>Status:<br>Value: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX<br>Version: 1</strong></code></p>



<p>You can now store and export your access token:</p>



<pre class="wp-block-code"><code class="">export MY_OVHAI_ACCESS_TOKEN=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX</code></pre>



<h3 class="wp-block-heading">Step 2 &#8211; LLM deployment using AI Deploy</h3>



<p>Before introducing the monitoring stack, this architecture starts with the <strong>deployment of the <strong>Ministral 3 14B</strong> on OVHcloud AI Deploy</strong>, configured to <strong>autoscale based on custom Prometheus metrics exposed by vLLM itself</strong>.</p>



<h4 class="wp-block-heading">1. Define the targeted vLLM metric for autoscaling</h4>



<p>Before proceeding with the deployment of the <strong>Ministral 3 14B</strong> endpoint, you have to choose the metric you want to use as the trigger for scaling.</p>



<p>Instead of relying solely on CPU/RAM utilisation, AI Deploy allows autoscaling decisions to be driven by <strong>application-level signals</strong>.</p>



<p>To do this, you can consult the <a href="https://docs.vllm.ai/en/latest/design/metrics/#v1-metrics" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">metrics exposed by vLLM</a>.</p>



<p>In this example, you can use a basic metric such as <code><mark class="has-inline-color has-ast-global-color-0-color"><strong>vllm:num_requests_running</strong></mark></code> to scale the number of replicas based on <strong>real inference load</strong>.</p>



<p>This enables:</p>



<ul class="wp-block-list">
<li>Faster reaction to traffic spikes</li>



<li>Better GPU utilisation</li>



<li>Reduced inference latency under load</li>



<li>Cost-efficient scaling</li>
</ul>



<p>Finally, the configuration chosen for scaling this application is as follows:</p>



<figure class="wp-block-table"><table class="has-fixed-layout"><thead><tr><th>Parameter</th><th>Value</th><th>Description</th></tr></thead><tbody><tr><td>Metric source</td><td><code>/metrics</code></td><td>vLLM Prometheus endpoint</td></tr><tr><td>Metric name</td><td><code>vllm:num_requests_running</code></td><td>Number of in-flight requests</td></tr><tr><td>Aggregation</td><td><code>AVERAGE</code></td><td>Mean across replicas</td></tr><tr><td>Target value</td><td><code>50</code></td><td>Desired load per replica</td></tr><tr><td>Min replicas</td><td><code>1</code></td><td>Baseline capacity</td></tr><tr><td>Max replicas</td><td><code>3</code></td><td>Burst capacity</td></tr></tbody></table></figure>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p><strong>✅ <em>Note</em></strong></p>



<p><em><strong>You can choose the metric that best suits your use case. You can also apply a patch to your AI Deploy deployment at any time to change the target metric for scaling</strong></em>.</p>
</blockquote>



<p>When the <strong>average number of running requests exceeds 50</strong>, AI Deploy automatically provisions <strong>additional GPU-backed replicas</strong>.</p>



<h4 class="wp-block-heading">2. Deploy Ministral 3 14B using AI Deploy</h4>



<p>Now you can deploy the LLM using the <strong><code>ovhai</code> CLI</strong>.</p>



<p>Key elements necessary for proper functioning:</p>



<ul class="wp-block-list">
<li>GPU-based inference: <strong><code><mark class="has-inline-color has-ast-global-color-0-color">1 x H100</mark></code></strong></li>



<li>vLLM OpenAI-compatible Docker image: <a href="https://hub.docker.com/r/vllm/vllm-openai/tags" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong><code><mark class="has-inline-color has-ast-global-color-0-color">vllm/vllm-openai:v0.13.0</mark></code></strong></a></li>



<li>Custom autoscaling rules based on Prometheus metrics: <code><strong><mark class="has-inline-color has-ast-global-color-0-color">vllm:num_requests_running</mark></strong></code></li>
</ul>



<p>Below is the reference command used to deploy the <strong><a href="https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">mistralai/Ministral-3-14B-Instruct-2512</a></strong>:</p>



<pre class="wp-block-code"><code class="">ovhai app run \<br>  --name vllm-ministral-14B-autoscaling-custom-metric \<br>  --default-http-port 8000 \<br>  --label ai_deploy_token=my_operator_token \<br>  --gpu 1 \<br>  --flavor h100-1-gpu \<br>  -e OUTLINES_CACHE_DIR=/tmp/.outlines \<br>  -e HF_TOKEN=$MY_HF_TOKEN \<br>  -e HF_HOME=/hub \<br>  -e HF_DATASETS_TRUST_REMOTE_CODE=1 \<br>  -e HF_HUB_ENABLE_HF_TRANSFER=0 \<br>  -v standalone:/hub:rw \<br>  -v standalone:/workspace:rw \<br>  --liveness-probe-path /health \<br>  --liveness-probe-port 8000 \<br>  --liveness-initial-delay-seconds 300 \<br>  --probe-path /v1/models \<br>  --probe-port 8000 \<br>  --initial-delay-seconds 300 \<br>  --auto-min-replicas 1 \<br>  --auto-max-replicas 3 \<br>  --auto-custom-api-url "http://&lt;SELF&gt;:8000/metrics" \<br>  --auto-custom-metric-format PROMETHEUS \<br>  --auto-custom-value-location vllm:num_requests_running \<br>  --auto-custom-target-value 50 \<br>  --auto-custom-metric-aggregation-type AVERAGE \<br>  vllm/vllm-openai:v0.13.0 \<br>  -- bash -c "python3 -m vllm.entrypoints.openai.api_server \<br>    --model mistralai/Ministral-3-14B-Instruct-2512 \<br>    --tokenizer_mode mistral \<br>    --load_format mistral \<br>    --config_format mistral \<br>    --enable-auto-tool-choice \<br>    --tool-call-parser mistral \<br>    --enable-prefix-caching"</code></pre>



<p>How to understand the different parameters of this command?</p>



<h5 class="wp-block-heading"><strong>a. Start your AI Deploy app</strong></h5>



<p>Launch a new app using&nbsp;<a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-cli-install-client?id=kb_article_view&amp;sysparm_article=KB0047844" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">ovhai CLI</a>&nbsp;and name it.</p>



<p><code><strong>ovhai app run --name vllm-ministral-14B-autoscaling-custom-metric</strong></code></p>



<h5 class="wp-block-heading"><strong>b. Define access</strong></h5>



<p>Define the HTTP API port and restrict access to your token.</p>



<p><strong><code>--default-http-port 8000</code><br><code>--label ai_deploy_token=my_operator_token</code></strong></p>



<h5 class="wp-block-heading"><strong>c. Configure GPU resources</strong></h5>



<p>Specify the hardware type (<code><strong>h100-1-gpu</strong></code>), which refers to an&nbsp;<strong>NVIDIA H100 GPU</strong>&nbsp;and the number (<strong>1</strong>).</p>



<p><code><strong>--gpu 1<br>--flavor h100-1-gpu</strong></code></p>



<p><strong><mark>⚠️WARNING!</mark></strong>&nbsp;For this model, one H100 is sufficient, but if you want to deploy another model, you will need to check which GPU you need. Note that you can also access L40S and A100 GPUs for your LLM deployment.</p>



<h5 class="wp-block-heading"><strong>d. Set up environment variables</strong></h5>



<p>Configure caching for the&nbsp;<strong>Outlines library</strong>&nbsp;(used for efficient text generation):</p>



<p><code><strong>-e OUTLINES_CACHE_DIR=/tmp/.outlines</strong></code></p>



<p>Pass the&nbsp;<strong>Hugging Face token</strong>&nbsp;(<code>$MY_HF_TOKEN</code>) for model authentication and download:</p>



<p><code><strong>-e HF_TOKEN=$MY_HF_TOKEN</strong></code></p>



<p>Set the&nbsp;<strong>Hugging Face cache directory</strong>&nbsp;to&nbsp;<code>/hub</code>&nbsp;(where models will be stored):</p>



<p><code><strong>-e HF_HOME=/hub</strong></code></p>



<p>Allow execution of&nbsp;<strong>custom remote code</strong>&nbsp;from Hugging Face datasets (required for some model behaviours):</p>



<p><code><strong>-e HF_DATASETS_TRUST_REMOTE_CODE=1</strong></code></p>



<p>Disable&nbsp;<strong>Hugging Face Hub transfer acceleration</strong>&nbsp;(to use standard model downloading):</p>



<p><code><strong>-e HF_HUB_ENABLE_HF_TRANSFER=0</strong></code></p>



<h5 class="wp-block-heading"><strong>e. Mount persistent volumes</strong></h5>



<p>Mount&nbsp;<strong>two persistent storage volumes</strong>:</p>



<ol class="wp-block-list">
<li><code>/hub</code>&nbsp;→ Stores Hugging Face model files</li>



<li><code>/workspace</code>&nbsp;→ Main working directory</li>
</ol>



<p>The&nbsp;<code>rw</code>&nbsp;flag means&nbsp;<strong>read-write access</strong>.</p>



<p><code><strong>-v standalone:/hub:rw<br>-v standalone:/workspace:rw</strong></code></p>



<h5 class="wp-block-heading"><strong>f. Health checks and readiness</strong></h5>



<p>Configure <strong>liveness and readiness probes</strong>:</p>



<ol class="wp-block-list">
<li><code>/health</code> verifies the container is alive</li>



<li><code>/v1/models</code> confirms the model is loaded and ready to serve requests</li>
</ol>



<p>The long initial delays (300 seconds) can be reduced; they correspond to the startup time of vLLM and the loading of the model on the GPU.</p>



<p><code><strong>--liveness-probe-path /health<br>--liveness-probe-port 8000<br>--liveness-initial-delay-seconds 300<br><br>--probe-path /v1/models<br>--probe-port 8000<br>--initial-delay-seconds 300</strong></code></p>



<h5 class="wp-block-heading"><strong>g. Autoscaling configuration (custom metrics)</strong></h5>



<p>First set the minimum and maximum number of replicas.</p>



<p><strong><code>--auto-min-replicas 1<br>--auto-max-replicas 3</code></strong></p>



<p>This guarantees basic availability (one replica always up) while allowing for peak capacity.</p>



<p>Then enable autoscaling based on application-level metrics exposed by vLLM.</p>



<p><strong><code>--auto-custom-api-url "http://&lt;SELF&gt;:8000/metrics"<br>--auto-custom-metric-format PROMETHEUS<br>--auto-custom-value-location vllm:num_requests_running<br>--auto-custom-target-value 50<br>--auto-custom-metric-aggregation-type AVERAGE</code></strong></p>



<p>AI Deploy:</p>



<ul class="wp-block-list">
<li>Scrapes the local <mark class="has-inline-color has-ast-global-color-0-color"><strong><code>/metrics</code></strong></mark> endpoint</li>



<li>Parses Prometheus-formatted metrics</li>



<li>Extracts the <strong><mark class="has-inline-color has-ast-global-color-0-color"><code>vllm:num_requests_running</code></mark></strong> gauge</li>



<li>Computes the average value across replicas</li>
</ul>



<p>Scaling behaviour:</p>



<ul class="wp-block-list">
<li>When the average number of in-flight requests exceeds <strong><code><mark class="has-inline-color has-ast-global-color-0-color">50</mark></code></strong>, AI Deploy adds replicas</li>



<li>When load decreases, replicas are scaled down</li>
</ul>



<p>This approach ensures high availability and predictable latency under fluctuating traffic.</p>



<h5 class="wp-block-heading"><strong>h. Choose the target Docker image and the startup command</strong></h5>



<p>Use the official <strong><a href="https://hub.docker.com/r/vllm/vllm-openai/tags" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">vLLM OpenAI-compatible Docker image</a></strong>.</p>



<p><strong><code>vllm/vllm-openai:v0.13.0</code></strong></p>



<p>Finally, run the model inside the container using a Python command to launch the vLLM API server:</p>



<ul class="wp-block-list">
<li><strong><code>python3 -m vllm.entrypoints.openai.api_server</code></strong>&nbsp;→ Starts the OpenAI-compatible vLLM API server</li>



<li><strong><code>--model mistralai/Ministral-3-14B-Instruct-2512</code></strong>&nbsp;→ Loads the&nbsp;<strong>Ministral 3 14B</strong>&nbsp;model from Hugging Face</li>



<li><strong><code>--tokenizer_mode mistral</code></strong>&nbsp;→ Uses the&nbsp;<strong>Mistral tokenizer</strong></li>



<li><strong><code>--load_format mistral</code></strong>&nbsp;→ Uses Mistral’s model loading format</li>



<li><strong><code>--config_format mistral</code></strong>&nbsp;→ Ensures the model configuration follows Mistral’s standard</li>



<li><code><strong>--enable-auto-tool-choice </strong></code>→ Automatic call of tools if necessary (function/tool call)</li>



<li><strong><code>--tool-call-parser mistral </code></strong>→ Tool calling support</li>



<li><strong><code>--enable-prefix-caching</code></strong> → Prefix caching for improved throughput and reduced latency</li>
</ul>



<p>You can now launch this command using <strong>ovhai CLI</strong>.</p>



<h4 class="wp-block-heading">3. Check AI Deploy app status</h4>



<p>You can now check if your&nbsp;<strong>AI Deploy</strong>&nbsp;app is alive:</p>



<pre class="wp-block-code"><code class="">ovhai app get &lt;your_vllm_app_id&gt;</code></pre>



<p><strong>Is your app in&nbsp;<code>RUNNING</code>&nbsp;status?</strong>&nbsp;Perfect! You can check in the logs that the server is started:</p>



<pre class="wp-block-code"><code class="">ovhai app logs &lt;your_vllm_app_id&gt;</code></pre>



<p><strong><mark>⚠️WARNING!</mark></strong>&nbsp;This step may take a little time as the LLM must be loaded.</p>



<h4 class="wp-block-heading">4. Test that the deployment is functional</h4>



<p>First you can request and send a prompt to the LLM. Launch the following query by asking the question of your choice:</p>



<pre class="wp-block-code"><code class="">curl https://&lt;your_vllm_app_id&gt;.app.gra.ai.cloud.ovh.net/v1/chat/completions \<br>  -H "Authorization: Bearer $MY_OVHAI_ACCESS_TOKEN" \<br>  -H "Content-Type: application/json" \<br>  -d '{<br>    "model": "mistralai/Ministral-3-14B-Instruct-2512",<br>    "messages": [<br>      {"role": "system", "content": "You are a helpful assistant."},<br>      {"role": "user", "content": "Give me the name of OVHcloud’s founder."}<br>    ],<br>    "stream": false<br>  }'</code></pre>



<p>You can also verify access to vLLM metrics.</p>



<pre class="wp-block-code"><code class="">curl -H "Authorization: Bearer $MY_OVHAI_ACCESS_TOKEN" \<br>  https://&lt;your_vllm_app_id&gt;.app.gra.ai.cloud.ovh.net/metrics</code></pre>



<p>If both tests show that the model deployment is functional and you receive 200 HTTP responses, you are ready to move on to the next step!</p>



<p>The next step is to set up the observability and monitoring stack. This autoscaling mechanism is <strong>fully independent</strong> from Prometheus used for observability:</p>



<ul class="wp-block-list">
<li>AI Deploy queries the local <strong><mark class="has-inline-color has-ast-global-color-0-color"><code>/metrics</code></mark></strong> endpoint internally</li>



<li>Prometheus scrapes the <strong>same metrics endpoint</strong> externally for monitoring, dashboards and potentially alerting</li>
</ul>



<p>This ensures:</p>



<ul class="wp-block-list">
<li>A single source of truth for metrics</li>



<li>No duplication of exporters</li>



<li>Consistent signals for scaling and observability</li>
</ul>



<h3 class="wp-block-heading">Step 3 &#8211; Create an MKS cluster</h3>



<p>From <a href="https://manager.eu.ovhcloud.com/#/hub/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OVHcloud Control Panel</a>, create a Kubernetes cluster using the <strong>MKS</strong>.</p>



<p>Consider using the following configuration for the current use case:</p>



<ul class="wp-block-list">
<li><strong>Location</strong>: GRA ( Gravelines) &#8211; <em>you can select the same region as for AI Deploy</em></li>



<li><strong>Network</strong>: Public</li>



<li><strong>Node pool</strong> :
<ul class="wp-block-list">
<li>Flavour : <code><strong><mark class="has-inline-color has-ast-global-color-0-color">b2-15</mark></strong></code> (or something similar)</li>



<li>Number of nodes: <strong><code><mark class="has-inline-color has-ast-global-color-0-color">3</mark></code></strong></li>



<li>Autoscaling : <strong><code><mark class="has-inline-color has-ast-global-color-0-color">OFF</mark></code></strong></li>
</ul>
</li>



<li><strong>Name your node pool:</strong> <strong><mark class="has-inline-color has-ast-global-color-0-color"><code>monitoring</code></mark></strong></li>
</ul>



<p>You should see your cluster (e.g. <code><mark class="has-inline-color has-ast-global-color-0-color"><strong>prometheus-vllm-metrics-ai-deploy</strong></mark></code>) in the list, along with the following information:</p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="632" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-3-1024x632.png" alt="" class="wp-image-30242" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-3-1024x632.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-3-300x185.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-3-768x474.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-3-1536x948.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-3-2048x1264.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>If the status is green with the <strong><mark style="color:#00d084" class="has-inline-color"><code>OK</code></mark></strong> label, you can proceed to the next step.</p>



<h3 class="wp-block-heading">Step 4 &#8211; Configure Kubernetes access</h3>



<p>Download your <strong>kubeconfig file</strong> from the OVHcloud Control Panel and configure <strong><code><mark class="has-inline-color has-ast-global-color-0-color">kubectl</mark></code></strong>:</p>



<pre class="wp-block-code"><code class=""># configure kubectl with your MKS cluster<br>export KUBECONFIG=/path/to/your/kubeconfig-xxxxxx.yml<br><br># verify cluster connectivity<br>kubectl cluster-info<br>kubectl get nodes</code></pre>



<p>Now,- you can create the <strong><mark class="has-inline-color has-ast-global-color-0-color"><code>values-prometheus.yaml</code></mark></strong> file:</p>



<pre class="wp-block-code"><code class=""># general configuration<br>nameOverride: "monitoring"<br>fullnameOverride: "monitoring"<br><br># Prometheus configuration<br>prometheus:<br>  prometheusSpec:<br>    # data retention (15d)<br>    retention: 15d<br>    <br>    # scrape interval (15s)<br>    scrapeInterval: 15s<br>    <br>    # persistent storage (required for production deployment)<br>    storageSpec:<br>      volumeClaimTemplate:<br>        spec:<br>          storageClassName: csi-cinder-high-speed  # OVHcloud storage<br>          accessModes: ["ReadWriteOnce"]<br>          resources:<br>            requests:<br>              storage: 50Gi  # (can be modified according to your needs)<br>    <br>    # scrape vLLM metrics from your AI Deploy instance (Ministral 3 14B)<br>    additionalScrapeConfigs:<br>      - job_name: 'vllm-ministral'<br>        scheme: https<br>        metrics_path: '/metrics'<br>        scrape_interval: 15s<br>        scrape_timeout: 10s<br>        <br>        # authentication using AI Deploy Bearer token stored Kubernetes Secret<br>        bearer_token_file: /etc/prometheus/secrets/vllm-auth-token/token<br>        static_configs:<br>          - targets:<br>              - '&lt;APP_ID&gt;.app.gra.ai.cloud.ovh.net'  # /!\ REPLACE THE &lt;APP_ID&gt; by yours /!\<br>            labels:<br>              service: 'vllm'<br>              model: 'ministral'<br>              environment: 'production'<br>        <br>        # TLS configuration<br>        tls_config:<br>          insecure_skip_verify: false<br>    <br>    # kube-prometheus-stack mounts the secret under /etc/prometheus/secrets/ and makes it accessible to Prometheus<br>    secrets:<br>      - vllm-auth-token<br><br># Grafana configuration (visualization layer)<br>grafana:<br>  enabled: true<br>  <br>  # disable automatic datasource provisioning<br>  sidecar:<br>    datasources:<br>      enabled: false<br>  <br>  # persistent dashboards<br>  persistence:<br>    enabled: true<br>    storageClassName: csi-cinder-high-speed<br>    size: 10Gi<br>  <br>  # /!\ DEFINE ADMIN PASSWORD - REPLACE "test" BY YOURS /!\<br>  adminPassword: "test"<br>  <br>  # access via OVHcloud LoadBalancer (public IP and managed LB)<br>  service:<br>    type: LoadBalancer<br>    port: 80<br>    annotations:<br>      # optional : limiter l'accès à certaines IPs<br>      # service.beta.kubernetes.io/ovh-loadbalancer-allowed-sources: "1.2.3.4/32"<br>  <br># alertmanager (optional but recommended for production)<br>alertmanager:<br>  enabled: true<br>  <br>  alertmanagerSpec:<br>    storage:<br>      volumeClaimTemplate:<br>        spec:<br>          storageClassName: csi-cinder-high-speed<br>          accessModes: ["ReadWriteOnce"]<br>          resources:<br>            requests:<br>              storage: 10Gi<br><br># cluster observability components<br>nodeExporter:<br>  enabled: true<br>  <br>kubeStateMetrics:<br>  enabled: true</code></pre>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p><strong>✅ <em>Note</em></strong></p>



<p><strong><em>On OVHcloud MKS, persistent storage is handled automatically through the Cinder CSI driver. When a PersistentVolumeClaim (PVC) references a supported <code>storageClassName</code> such as <code>csi-cinder-high-speed</code>, OVHcloud dynamically provisions the underlying Block Storage volume and attaches it to the node running the pod. This enables stateful components like Prometheus, Alertmanager and Grafana to persist data reliably without any manual volume management, making the architecture fully cloud-native and operationally simple.</em></strong></p>
</blockquote>



<p>Then create the <strong><code><mark class="has-inline-color has-ast-global-color-0-color">monitoring</mark></code></strong> namespace:</p>



<pre class="wp-block-code"><code class=""># create namespace<br>kubectl create namespace monitoring<br><br># verify creation<br>kubectl get namespaces | grep monitoring</code></pre>



<p>Finally,  configure the Bearer token secret to access vLLM metrics.</p>



<pre class="wp-block-code"><code class=""># create bearer token secret<br>kubectl create secret generic vllm-auth-token \<br>  --from-literal=token='"$MY_OVHAI_ACCESS_TOKEN"' \<br>  -n monitoring<br><br># verify secret creation<br>kubectl get secret vllm-auth-token -n monitoring<br><br># test token (optional)<br>kubectl get secret vllm-auth-token -n monitoring \<br>  -o jsonpath='{.data.token}' | base64 -d </code></pre>



<p>Right, if everything is working, let&#8217;s move on to deployment.</p>



<h3 class="wp-block-heading">Step 5 &#8211; Deploy Prometheus stack</h3>



<p>Add the Prometheus Helm repository and install the monitoring stack. The deployment creates:</p>



<ul class="wp-block-list">
<li>Prometheus StatefulSet with persistent storage</li>



<li>Grafana deployment with LoadBalancer access</li>



<li>Alertmanager for future alert configuration (optional)</li>



<li>Supporting components (node exporters, kube-state-metrics)</li>
</ul>



<pre class="wp-block-code"><code class=""># add Helm repository<br>helm repo add prometheus-community \<br>  https://prometheus-community.github.io/helm-charts<br>helm repo update<br><br># install monitoring stack<br>helm install monitoring prometheus-community/kube-prometheus-stack \<br>  --namespace monitoring \<br>  --values values-prometheus.yaml \<br>  --wait</code></pre>



<p>Then you can retrieve the LoadBalancer IP address to access Grafana:</p>



<pre class="wp-block-code"><code class="">kubectl get svc -n monitoring monitoring-grafana</code></pre>



<p>Finally, open your browser to <code><strong><mark class="has-inline-color has-ast-global-color-0-color">http://&lt;EXTERNAL-IP&gt;</mark></strong></code> and login with:</p>



<ul class="wp-block-list">
<li><strong>Username</strong>: <code><mark class="has-inline-color has-ast-global-color-0-color"><strong>admin</strong></mark></code></li>



<li><strong>Password</strong>: as configured in your <code><strong><mark class="has-inline-color has-ast-global-color-0-color">values-prometheus.yaml</mark></strong></code> file</li>
</ul>



<h3 class="wp-block-heading">Step 6 &#8211; Create Grafana dashboards</h3>



<p>In this step, you will be able to access Grafana interface and add your Prometheus as a new data source, then create a complete dashboard with different vLLM metrics.</p>



<h4 class="wp-block-heading">1. Add a new data source in Grafana</h4>



<p>First of all, create a new Prometheus connection inside Grafana:</p>



<ul class="wp-block-list">
<li>Navigate to <strong><mark class="has-inline-color has-ast-global-color-0-color"><code>Connections</code></mark></strong> → <strong><mark class="has-inline-color has-ast-global-color-0-color"><code>Data sources</code></mark></strong> → <strong><code><mark class="has-inline-color has-ast-global-color-0-color">Add data source</mark></code></strong></li>



<li>Select <strong>Prometheus</strong></li>



<li>Configure URL: <code><strong><mark class="has-inline-color has-ast-global-color-0-color">http://monitoring-prometheus:9090</mark></strong></code></li>



<li>Click <strong>Save &amp; test</strong></li>
</ul>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="609" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-4-1024x609.png" alt="" class="wp-image-30247" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-4-1024x609.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-4-300x178.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-4-768x457.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-4-1536x913.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-4-2048x1218.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Now that your Prometheus has been configured as a new data source, you can create your Grafana dashboard.</p>



<h4 class="wp-block-heading">2. Create your monitoring dashboard</h4>



<p>To begin with, you can use the following pre-configured Grafana dashboard by downloading this JSON file locally:</p>





<p>In the left-hand menu, select <strong><code><mark class="has-inline-color has-ast-global-color-0-color">Dashboard</mark></code></strong>:</p>



<ol class="wp-block-list">
<li>Navigate to <strong><code><mark class="has-inline-color has-ast-global-color-0-color">Dashboards</mark></code></strong> → <strong><code><mark class="has-inline-color has-ast-global-color-0-color">Import</mark></code></strong></li>



<li>Upload the provided dashboard JSON</li>



<li>Select <strong>Prometheus</strong> as datasource</li>



<li>Click <strong>Import</strong> and select the <strong><code><mark class="has-inline-color has-ast-global-color-0-color">vLLM-metrics-grafana-monitoring.json</mark></code></strong> file</li>
</ol>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="449" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-6-1024x449.png" alt="" class="wp-image-30250" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-6-1024x449.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-6-300x131.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-6-768x337.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-6-1536x673.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-6-2048x897.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>The dashboard provides real-time visibility for <strong>Ministral 3 14B</strong> deployed with vLLM container and OVHcloud AI Deploy.</p>



<p>You can now track:</p>



<ul class="wp-block-list">
<li><strong>Performance metrics</strong>: TTFT, inter-token latency, end-to-end latency</li>



<li><strong>Throughput indicators</strong>: Requests per second, token generation rates</li>



<li><strong>Resource utilisation</strong>: KV cache usage, active/waiting requests</li>



<li><strong>Capacity indicators</strong>: Queue depth, preemption rates</li>
</ul>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="540" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-7-1024x540.png" alt="" class="wp-image-30253" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-7-1024x540.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-7-300x158.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-7-768x405.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-7-1536x811.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-7-2048x1081.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Here are the key metrics tracked and displayed in the Grafana dashboard:</p>



<figure class="wp-block-table"><table class="has-fixed-layout"><thead><tr><th>Metric Category</th><th>Prometheus Metric</th><th>Description</th><th>Use case</th></tr></thead><tbody><tr><td><strong>Latency</strong></td><td><code>vllm:time_to_first_token_seconds</code></td><td>Time until first token generation</td><td>User experience monitoring</td></tr><tr><td><strong>Latency</strong></td><td><code>vllm:inter_token_latency_seconds</code></td><td>Time between tokens</td><td>Throughput optimisation</td></tr><tr><td><strong>Latency</strong></td><td><code>vllm:e2e_request_latency_seconds</code></td><td>End-to-end request time</td><td>SLA monitoring</td></tr><tr><td><strong>Throughput</strong></td><td><code>vllm:request_success_total</code></td><td>Successful requests counter</td><td>Capacity planning</td></tr><tr><td><strong>Resource</strong></td><td><code>vllm:kv_cache_usage_perc</code></td><td>KV cache memory usage</td><td>Memory management</td></tr><tr><td><strong>Queue</strong></td><td><code>vllm:num_requests_running</code></td><td>Active requests</td><td>Load monitoring</td></tr><tr><td><strong>Queue</strong></td><td><code>vllm:num_requests_waiting</code></td><td>Queued requests</td><td>Overload detection</td></tr><tr><td><strong>Capacity</strong></td><td><code>vllm:num_preemptions_total</code></td><td>Request preemptions</td><td>Peak load indicator</td></tr><tr><td><strong>Tokens</strong></td><td><code>vllm:prompt_tokens_total</code></td><td>Input tokens processed</td><td>Usage analytics</td></tr><tr><td><strong>Tokens</strong></td><td><code>vllm:generation_tokens_total</code></td><td>Output tokens generated</td><td>Cost tracking</td></tr></tbody></table></figure>



<p>Well done, you now have at your disposal:</p>



<ul class="wp-block-list">
<li>An endpoint of the Ministral 3 14B model deployed with vLLM thanks to <strong>OVHcloud AI Deploy</strong> and its autoscaling strategies based on custom metrics</li>



<li>Prometheus for metrics collection and Grafana for visualisation/dashboards thanks to <strong>OVHcloud MKS</strong></li>
</ul>



<p><strong>But how can you check that everything will work when the load increases?</strong></p>



<h3 class="wp-block-heading">Step 7 &#8211; Test autoscaling and real-time visualisation</h3>



<p>The first objective here is to force AI Deploy to:</p>



<ul class="wp-block-list">
<li>Increase <code>vllm:num_requests_running</code></li>



<li>&#8216;Saturate&#8217; a single replica</li>



<li>Trigger the <strong>scale up</strong></li>



<li>Observe replica increase + latency drop</li>
</ul>



<h4 class="wp-block-heading">1. Autoscaling testing strategy</h4>



<p>The goal is to combine:</p>



<ul class="wp-block-list">
<li><strong>High concurrency</strong></li>



<li><strong>Long prompts</strong> (KVcache heavy)</li>



<li><strong>Long generations</strong></li>



<li><strong>Bursty load</strong></li>
</ul>



<p>This is what vLLM autoscaling actually reacts to.</p>



<p>To do so, a Python code can simulate the expected behaviour:</p>



<pre class="wp-block-code"><code class="">import time<br>import threading<br>import random<br>from statistics import mean<br>from openai import OpenAI<br>from tqdm import tqdm<br><br>APP_URL = "https://&lt;APP_ID&gt;.app.gra.ai.cloud.ovh.net/v1" # /!\ REPLACE THE &lt;APP_ID&gt; by yours /!\<br>MODEL = "mistralai/Ministral-3-14B-Instruct-2512"<br>API_KEY = $MY_OVHAI_ACCESS_TOKEN<br><br>CONCURRENT_WORKERS = 500          # concurrency (main scaling trigger)<br>REQUESTS_PER_WORKER = 25<br>MAX_TOKENS = 768                  # generation pressure<br><br># some random prompts<br>SHORT_PROMPTS = [<br>    "Summarize the theory of relativity.",<br>    "Explain what a transformer model is.",<br>    "What is Kubernetes autoscaling?"<br>]<br><br>MEDIUM_PROMPTS = [<br>    "Explain how attention mechanisms work in transformer-based models, including self-attention and multi-head attention.",<br>    "Describe how vLLM manages KV cache and why it impacts inference performance."<br>]<br><br>LONG_PROMPTS = [<br>    "Write a very detailed technical explanation of how large language models perform inference, "<br>    "including tokenization, embedding lookup, transformer layers, attention computation, KV cache usage, "<br>    "GPU memory management, and how batching affects latency and throughput. Use examples.",<br>]<br><br>PROMPT_POOL = (<br>    SHORT_PROMPTS * 2 +<br>    MEDIUM_PROMPTS * 4 +<br>    LONG_PROMPTS * 6    # bias toward long prompts<br>)<br><br># openai compliance<br>client = OpenAI(<br>    base_url=APP_URL,<br>    api_key=API_KEY,<br>)<br><br># basic metrics<br>latencies = []<br>errors = 0<br>lock = threading.Lock()<br><br># worker<br>def worker(worker_id):<br>    global errors<br>    for _ in range(REQUESTS_PER_WORKER):<br>        prompt = random.choice(PROMPT_POOL)<br><br>        start = time.time()<br>        try:<br>            client.chat.completions.create(<br>                model=MODEL,<br>                messages=[{"role": "user", "content": prompt}],<br>                max_tokens=MAX_TOKENS,<br>                temperature=0.7,<br>            )<br>            elapsed = time.time() - start<br><br>            with lock:<br>                latencies.append(elapsed)<br><br>        except Exception as e:<br>            with lock:<br>                errors += 1<br><br># run<br>threads = []<br>start_time = time.time()<br><br>print("Starting autoscaling stress test...")<br>print(f"Concurrency: {CONCURRENT_WORKERS}")<br>print(f"Total requests: {CONCURRENT_WORKERS * REQUESTS_PER_WORKER}")<br><br>for i in range(CONCURRENT_WORKERS):<br>    t = threading.Thread(target=worker, args=(i,))<br>    t.start()<br>    threads.append(t)<br><br>for t in threads:<br>    t.join()<br><br>total_time = time.time() - start_time<br><br># results<br>print("\n=== AUTOSCALING BENCH RESULTS ===")<br>print(f"Total requests sent: {len(latencies) + errors}")<br>print(f"Successful requests: {len(latencies)}")<br>print(f"Errors: {errors}")<br>print(f"Total wall time: {total_time:.2f}s")<br><br>if latencies:<br>    print(f"Avg latency: {mean(latencies):.2f}s")<br>    print(f"Min latency: {min(latencies):.2f}s")<br>    print(f"Max latency: {max(latencies):.2f}s")<br>    print(f"Throughput: {len(latencies)/total_time:.2f} req/s")</code></pre>



<p><strong>How can you verify that autoscaling is working and that the load is being handled correctly without latency skyrocketing?</strong></p>



<h4 class="wp-block-heading">2. Hardware and platform-level monitoring</h4>



<p>First, <strong>AI Deploy Grafana</strong> answers <strong>&#8216;What resources are being used and how many replicas exist?</strong>&#8216;.</p>



<p>GPU utilisation, GPU memory, CPU, RAM and replica count are monitored through <strong>OVHcloud AI Deploy Grafana</strong> (monitoring URL), which exposes infrastructure and runtime metrics for the AI Deploy application. This layer provides visibility into <strong>resource saturation and scaling events</strong> managed by the AI Deploy platform itself.</p>



<p>Access it using the following URL (do not forget to replace <code><mark class="has-inline-color has-ast-global-color-0-color"><strong>&lt;APP_ID&gt;</strong></mark></code> by yours): <strong><code>https://monitoring.gra.ai.cloud.ovh.net/d/app/app-monitoring?var-app=</code><mark class="has-inline-color has-ast-global-color-0-color"><code>&lt;APP_ID&gt;</code></mark><code>&amp;orgId=1</code></strong></p>



<p>For example, check GPU/RAM metrics:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="540" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-8-1024x540.png" alt="" class="wp-image-30260" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-8-1024x540.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-8-300x158.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-8-768x405.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-8-1536x811.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-8-2048x1081.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>You can also monitor scale ups and downs in real time, as well as information on HTTP calls and much more!</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="540" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-9-1024x540.png" alt="" class="wp-image-30261" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-9-1024x540.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-9-300x158.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-9-768x405.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-9-1536x811.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-9-2048x1081.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<h4 class="wp-block-heading">3. Software and application-level monitoring</h4>



<p>Next the combination of MKS + Prometheus + Grafana answers <strong>&#8216;How the inference engine behaves internally&#8217;</strong>.</p>



<p>In fact, vLLM internal metrics (request concurrency, token throughput, latency indicators, KV cache pressure, etc.) are collected via the <strong>vLLM <code>/metrics</code> endpoint</strong> and scraped by <strong>Prometheus running on OVHcloud MKS</strong>, then visualised in a <strong>dedicated Grafana instance</strong>. This layer focuses on <strong>model behaviour and inference performance</strong>.</p>



<p>Find all these metrics via (just replace <strong><code><mark class="has-inline-color has-ast-global-color-0-color">&lt;EXTERNAL-IP&gt;</mark></code></strong>): <strong><code>http://<mark class="has-inline-color has-ast-global-color-0-color">&lt;EXTERNAL-IP&gt;</mark>/d/vllm-ministral-monitoring/ministral-14b-vllm-metrics-monitoring?orgId=1</code></strong></p>



<p>Find key metrics such as TTF, etc:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="540" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-10-1024x540.png" alt="" class="wp-image-30263" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-10-1024x540.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-10-300x158.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-10-768x405.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-10-1536x811.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-10-2048x1081.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>You can also find some information about <strong>&#8216;Model load and throughput&#8217;</strong>:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="540" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-11-1024x540.png" alt="" class="wp-image-30264" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-11-1024x540.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-11-300x158.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-11-768x405.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-11-1536x811.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-11-2048x1081.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>To go further and add even more metrics, you can refer to the vLLM documentation on &#8216;<a href="https://docs.vllm.ai/en/v0.7.2/getting_started/examples/prometheus_grafana.html" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Prometheus and Grafana</a>&#8216;.</p>



<h2 class="wp-block-heading">Conclusion</h2>



<p>This reference architecture provides a scalable, and production-ready approach for deploying LLM inference on OVHcloud using <strong>AI Deploy</strong> and the <a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-deploy-apps-deployments?id=kb_article_view&amp;sysparm_article=KB0047997#advanced-custom-metrics-for-autoscaling" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">autoscaling on custom metric feature</a>.</p>



<p>OVHcloud <strong>MKS</strong> is dedicated to running Prometheus and Grafana, enabling secure scraping and visualisation of <strong>vLLM internal metrics</strong> exposed via the <strong><mark class="has-inline-color has-ast-global-color-0-color"><code>/metrics</code> </mark></strong>endpoint.</p>



<p>By scraping vLLM metrics securely from AI Deploy into Prometheus and exposing them through Grafana, the architecture provides full visibility into model behaviour, performance and load, enabling informed scaling analysis, troubleshooting and capacity planning in production environments.</p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Freference-architecture-custom-metric-autoscaling-for-llm-inference-with-vllm-on-ovhcloud-ai-deploy-and-observability-using-mks%2F&amp;action_name=Reference%20Architecture%3A%20Custom%20metric%20autoscaling%20for%20LLM%20inference%20with%20vLLM%20on%20OVHcloud%20AI%20Deploy%20and%20observability%20using%20MKS&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Reference Architecture: build a sovereign n8n RAG workflow for AI agent using OVHcloud Public Cloud solutions</title>
		<link>https://blog.ovhcloud.com/reference-architecture-build-a-sovereign-n8n-rag-workflow-for-ai-agent-using-ovhcloud-public-cloud-solutions/</link>
		
		<dc:creator><![CDATA[Eléa Petton]]></dc:creator>
		<pubDate>Tue, 27 Jan 2026 13:12:03 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[AI Deploy]]></category>
		<category><![CDATA[AI Endpoints]]></category>
		<category><![CDATA[LLM]]></category>
		<category><![CDATA[Managed Database]]></category>
		<category><![CDATA[n8n]]></category>
		<category><![CDATA[Object Storage]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[OVHcloud]]></category>
		<category><![CDATA[Public Cloud]]></category>
		<category><![CDATA[RAG]]></category>
		<category><![CDATA[S3]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=29694</guid>

					<description><![CDATA[What if an n8n workflow, deployed in a&#160;sovereign environment, saved you time while giving you peace of mind? From document ingestion to targeted response generation, n8n acts as the conductor of your RAG pipeline without compromising data protection. In the current landscape of AI agents and knowledge assistants, connecting your internal documentation with&#160;Large Language Models&#160;(LLMs) [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Freference-architecture-build-a-sovereign-n8n-rag-workflow-for-ai-agent-using-ovhcloud-public-cloud-solutions%2F&amp;action_name=Reference%20Architecture%3A%20build%20a%20sovereign%20n8n%20RAG%20workflow%20for%20AI%20agent%20using%20OVHcloud%20Public%20Cloud%20solutions&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p><em><em>What if an n8n workflow, deployed in a&nbsp;</em><strong><em>sovereign environment</em></strong><em>, saved you time while giving you peace of mind? From document ingestion to targeted response generation, n8n acts as the conductor of your RAG pipeline without compromising data protection.</em></em></p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="576" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/ref-archi-n8n-rag-1024x576.jpg" alt="" class="wp-image-30002" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/ref-archi-n8n-rag-1024x576.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/ref-archi-n8n-rag-300x169.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/ref-archi-n8n-rag-768x432.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/ref-archi-n8n-rag-1536x864.jpg 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/ref-archi-n8n-rag.jpg 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption"><em>n8n workflow overview</em></figcaption></figure>



<p>In the current landscape of AI agents and knowledge assistants, connecting your internal documentation with&nbsp;<strong>Large Language Models</strong>&nbsp;(LLMs) is becoming a strategic differentiator.</p>



<p><strong>How?</strong>&nbsp;By building&nbsp;<strong>Agentic RAG systems</strong>&nbsp;capable of retrieving, reasoning, and acting autonomously based on external knowledge.</p>



<p>To make this possible, engineers need a way to connect&nbsp;<strong>retrieval pipelines (RAG)</strong>&nbsp;with&nbsp;<strong>tool-based orchestration</strong>.</p>



<p>This article outlines a&nbsp;<strong>reference architecture</strong>&nbsp;for building a&nbsp;<strong>fully automated RAG pipeline orchestrated by n8n</strong>, leveraging&nbsp;<strong>OVHcloud AI Endpoints</strong>&nbsp;and&nbsp;<strong>PostgreSQL with pgvector</strong>&nbsp;as core components.</p>



<p>The final result will be a system that automatically ingests Markdown documentation from&nbsp;<strong>Object Storage</strong>, creates embeddings with OVHcloud’s&nbsp;<strong>BGE-M3</strong>&nbsp;model available on AI Endpoints, and stores them in a&nbsp;<strong>Managed Database PostgreSQL</strong>&nbsp;with pgvector extension.</p>



<p>Lastly, you’ll be able to build an AI Agent that lets you chat with an LLM (<strong>GPT-OSS-120B</strong>&nbsp;on AI Endpoints). This agent, utilising the RAG implementation carried out upstream, will be an expert on OVHcloud products.</p>



<p>You can further improve the process by using an&nbsp;<strong>LLM guard</strong>&nbsp;to protect the questions sent to the LLM, and set up a chat memory to use conversation history for higher response quality.</p>



<p><strong>But what about n8n?</strong></p>



<p><a href="https://n8n.io/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong>n8n</strong></a>, the open-source workflow automation tool,&nbsp;offers many benefits and connects seamlessly with over&nbsp;<strong>300</strong>&nbsp;APIs, apps, and services:</p>



<ul class="wp-block-list">
<li><strong>Open-source</strong>: n8n is a 100% self-hostable solution, which means you retain full data control;</li>



<li><strong>Flexible</strong>: combines low-code nodes and custom JavaScript/Python logic;</li>



<li><strong>AI-ready</strong>: includes useful integrations for LangChain, OpenAI, and embedding support capabilities;</li>



<li><strong>Composable</strong>: enables simple connections between data, APIs, and models in minutes;</li>



<li><strong>Sovereign by design</strong>: compliant with privacy-sensitive or regulated sectors.</li>
</ul>



<p>This reference architecture serves as a blueprint for building a sovereign, scalable Retrieval Augmented Generation (<strong>RAG</strong>) platform using&nbsp;<strong>n8n</strong>&nbsp;and&nbsp;<strong>OVHcloud Public Cloud</strong>&nbsp;solutions.</p>



<p>This setup shows how to orchestrate data ingestion, generate embedding, and enable conversational AI by combining&nbsp;<strong>OVHcloud Object Storage</strong>,&nbsp;<strong>Managed Databases with PostgreSQL</strong>,&nbsp;<strong>AI Endpoints</strong>&nbsp;and&nbsp;<strong>AI Deploy</strong>.<strong>The result?</strong>&nbsp;An AI environment that is fully integrated, protects privacy, and is exclusively hosted on <strong>OVHcloud’s European infrastructure</strong>.</p>



<h2 class="wp-block-heading">Overview of the n8n workflow architecture for RAG </h2>



<p>The workflow involves the following steps:</p>



<ul class="wp-block-list">
<li><strong>Ingestion:</strong>&nbsp;documentation in markdown format is fetched from <strong>OVHcloud Object Storage (S3);</strong></li>



<li><strong>Preprocessing:</strong> n8n cleans and normalises the text, removing YAML front-matter and encoding noise;</li>



<li><strong>Vectorisation:</strong>&nbsp;Each document is embedded using the <strong>BGE-M3</strong> model, which is available via <strong>OVHcloud AI Endpoints;</strong></li>



<li><strong>Persistence:</strong> vectors and metadata are stored in <strong>OVHcloud PostgreSQL Managed Database</strong> using pgvector;</li>



<li><strong>Retrieval:</strong> when a user sends a query, n8n triggers a <strong>LangChain Agent</strong> that retrieves relevant chunks from the database;</li>



<li><strong>Reasoning and actions:</strong>&nbsp;The <strong>AI Agent node</strong> combines LLM reasoning, memory, and tool usage to generate a contextual response or trigger downstream actions (Slack reply, Notion update, API call, etc.).</li>
</ul>



<p>In this tutorial, all services are deployed within the <strong>OVHcloud Public Cloud</strong>.</p>



<h2 class="wp-block-heading">Prerequisites</h2>



<p>Before you start, double-check that you have:</p>



<ul class="wp-block-list">
<li>an <strong>OVHcloud Public Cloud</strong> account</li>



<li>an <strong>OpenStack user</strong> with the <a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-users?id=kb_article_view&amp;sysparm_article=KB0048170" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">&nbsp;following roles</a>:
<ul class="wp-block-list">
<li>Administrator</li>



<li>AI Operator</li>



<li>Object Storage Operator</li>
</ul>
</li>



<li>An <strong>API key</strong> for <a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-endpoints-getting-started?id=kb_article_view&amp;sysparm_article=KB0065401" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Endpoints</a></li>



<li><strong>ovhai CLI available</strong> – <em>install the </em><a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-cli-install-client?id=kb_article_view&amp;sysparm_article=KB0047844" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><em>ovhai CLI</em></a></li>



<li><strong>Hugging Face access</strong> – <em>create a </em><a href="https://huggingface.co/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><em>Hugging Face account</em></a><em> and generate an </em><a href="https://huggingface.co/settings/tokens" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><em>access token</em></a></li>
</ul>



<p><strong>🚀 Now that you have everything you need, you can start building your n8n workflow!</strong></p>



<h2 class="wp-block-heading">Architecture guide: n8n agentic RAG workflow</h2>



<p>You’re all set to configure and deploy your n8n workflow</p>



<p>⚙️<em> Keep in mind that the following steps can be completed using OVHcloud APIs!</em></p>



<h3 class="wp-block-heading">Step 1 &#8211; Build the RAG data ingestion pipeline</h3>



<p>This first step involves building the foundation of the entire RAG workflow by preparing the elements you need:</p>



<ul class="wp-block-list">
<li>n8n deployment</li>



<li>Object Storage bucket creation</li>



<li>PostgreSQL database creation</li>



<li>and more</li>
</ul>



<p>Remember to set up the proper credentials in n8n so the different elements can connect and function.</p>



<h4 class="wp-block-heading">1. Deploy n8n on OVHcloud VPS</h4>



<p>OVHcloud provides <a href="https://www.ovhcloud.com/en-gb/vps/vps-n8n/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong>VPS solutions compatible with n8n</strong></a><strong>.</strong> Get a ready-to-use virtual server with <strong>pre-installed n8n </strong>and start building automation workflows without manual setup. With plans ranging from <strong>6 vCores&nbsp;/&nbsp;12 GB RAM</strong> to <strong>24 vCores&nbsp;/&nbsp;96 GB RAM</strong>, you can choose the capacity that suits your workload.</p>



<p><strong>How to set up n8n on a VPS?</strong></p>



<p>Setting up n8n on an OVHcloud VPS generally involves:</p>



<ul class="wp-block-list">
<li>Choosing and provisioning your OVHcloud VPS plan;</li>



<li>Connecting to your server via SSH and carrying out the initial server configuration, which includes updating the OS;</li>



<li>Installing n8n, typically with Docker (recommended for ease of management and updates), or npm by following this <a href="https://help.ovhcloud.com/csm/en-gb-vps-install-n8n?id=kb_article_view&amp;sysparm_article=KB0072179" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">guide</a>;</li>



<li>Configuring n8n with a domain name, SSL certificate for HTTPS, and any necessary environment variables for databases or settings.</li>
</ul>



<p>While OVHcloud provides a robust VPS platform, you can find detailed n8n installation guides in the <a href="https://docs.n8n.io/hosting/installation/docker/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">official n8n documentation</a>.</p>



<p>Once the configuration is complete, you can configure the database and bucket in Object Storage.</p>



<h4 class="wp-block-heading">2. Create Object Storage bucket</h4>



<p>First, you have to set up your data source. Here you can store all your documentation in an S3-compatible <a href="https://www.ovhcloud.com/en-gb/public-cloud/object-storage/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Object Storage</a> bucket.</p>



<p>Here, assume that all the documentation files are in Markdown format.</p>



<p>From <strong>OVHcloud Control Panel</strong>, create a new Object Storage container with <strong>S3-compatible API </strong>solution; follow this <a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-storage-s3-getting-started-object-storage?id=kb_article_view&amp;sysparm_article=KB0034674" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">guide</a>.</p>



<p>When the bucket is ready, add your Markdown documentation to it.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1024x580.png" alt="" class="wp-image-29733" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p><strong>Note:</strong>&nbsp;For this tutorial, we’re using the various OVHcloud product documentation available in Open-Source on the GitHub repository maintained by OVHcloud members.</p>



<p><em>Click this </em><a href="https://github.com/ovh/docs.git" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><em>link</em></a><em> to access the repository.</em></p>
</blockquote>
</blockquote>



<p>How do you do that? Extract all the <a href="http://guide.en-gb.md" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong>guide.en-gb.md</strong></a> files from the GitHub repository and rename each one to match its parent folder.</p>



<p>Example: the documentation about ovhai cli installation <code><strong>docs/pages/public_cloud/ai_machine_learning/cli_10_howto_install_cli/</strong></code><a href="http://guide.en-gb.md" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong>guide.en-gb.md</strong></a> is stored in <strong>ovhcloud-products-documentation-md</strong> bucket as <a href="http://cli_10_howto_install_cli.md" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong>cli_10_howto_install_cli.md</strong></a></p>



<p>You should get an overview that looks like this:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-1024x580.png" alt="" class="wp-image-29735" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Keep the following elements and create a new credential in n8n named <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">OVHcloud S3 gra credentials</mark></strong></code>:</p>



<ul class="wp-block-list">
<li>S3 Endpoint: <a href="https://s3.gra.io.cloud.ovh.net/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">https://s3.gra.io.cloud.ovh.net/</mark></code></strong></a></li>



<li>Region: <strong><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">gra</mark></code></strong></li>



<li>Access Key ID: <strong><code>&lt;your_object_storage_user_access_key&gt;</code></strong></li>



<li>Secret Access Key: <strong><code>&lt;your_pbject_storage_user_secret_key&gt;</code></strong></li>
</ul>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-2-1024x580.png" alt="" class="wp-image-29736" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-2-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-2-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-2-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-2-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-2-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Then, create a new n8n node by selecting&nbsp;<strong>S3</strong>, then&nbsp;<strong>Get Multiple Files</strong>.<br>Configure this node as follows:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.20.47-1024x580.png" alt="" class="wp-image-29740" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.20.47-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.20.47-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.20.47-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.20.47-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.20.47-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Connect the node to the previous one before moving on to the next step.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.18.00-1024x580.png" alt="" class="wp-image-29741" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.18.00-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.18.00-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.18.00-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.18.00-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.18.00-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>With the first phase done, you can now configure the vector DB.</p>



<h4 class="wp-block-heading">3. Configure PostgreSQL Managed DB (pgvector)</h4>



<p>In this step, you can set up the vector database that lets you store the embeddings generated from your documents.</p>



<p>How? By using OVHcloud’s managed databases, a pgvector extension of&nbsp;<a href="https://www.ovhcloud.com/en-gb/public-cloud/postgresql/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">PostgreSQL</a>. Go to your OVHcloud Control Panel and follow the steps.</p>



<p>1. Navigate to&nbsp;<strong>Databases &amp; Analytics &gt; Databases</strong></p>



<p><strong>2. Create a new database and select&nbsp;<em>PostgreSQL</em>&nbsp;and a datacenter location</strong></p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/4-1024x580.png" alt="" class="wp-image-29758" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/4-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/4-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/4-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/4-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/4-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p><strong>3. Select&nbsp;<em>Production</em>&nbsp;plan and&nbsp;<em>Instance type</em></strong></p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/5-1024x580.png" alt="" class="wp-image-29759" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/5-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/5-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/5-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/5-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/5-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p><strong>4. Reset the user password and save it</strong></p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-1-1024x580.png" alt="" class="wp-image-29762" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-1-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-1-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-1-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-1-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-1-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p><strong>5. Whitelist the IP of your n8n instance as follows</strong></p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/7-1024x580.png" alt="" class="wp-image-29761" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/7-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/7-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/7-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/7-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/7-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p><strong>6. Take note of te following parameters</strong></p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/6-1024x580.png" alt="" class="wp-image-29760" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/6-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/6-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/6-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/6-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/6-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Make a note of this information and create a new credential in n8n named&nbsp;<strong>OVHcloud PGvector credentials</strong>:</p>



<ul class="wp-block-list">
<li>Host:<strong>&nbsp;<code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">&lt;db_hostname&gt;</mark></code></strong></li>



<li>Database:&nbsp;<strong>defaultdb</strong></li>



<li>User:&nbsp;<code>avnadmin</code></li>



<li>Password:&nbsp;<code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">&lt;db_password&gt;</mark></strong></code></li>



<li>Port:&nbsp;<strong>20184</strong></li>
</ul>



<p>Consider&nbsp;<code>enabling</code>&nbsp;the&nbsp;<strong>Ignore SSL Issues (Insecure)</strong>&nbsp;button as needed and setting the&nbsp;<strong>Maximum Number of Connections</strong>&nbsp;value to&nbsp;<strong><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">1000</mark></code></strong>.</p>



<figure class="wp-block-image"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/8-1024x580.png" alt="" class="wp-image-29763" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/8-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/8-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/8-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/8-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/8-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>✅ You’re now connected to the database! But what about the PGvector extension?</p>



<p>Add a PosgreSQL node in your n8n workflow&nbsp;<code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Execute a SQL query</mark></strong></code>,&nbsp;and create the extension through an SQL query, which should look like this:</p>



<pre class="wp-block-code"><code class="">-- drop table as needed<br>DROP TABLE IF EXISTS md_embeddings;<br><br>-- activate pgvector<br>CREATE EXTENSION IF NOT EXISTS vector;<br><br>-- create table<br>CREATE TABLE md_embeddings (<br>    id SERIAL PRIMARY KEY,<br>    text TEXT,<br>    embedding vector(1024),<br>    metadata JSONB<br>);</code></pre>



<p>You should get this n8n node:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.43.39-1024x580.png" alt="" class="wp-image-29752" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.43.39-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.43.39-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.43.39-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.43.39-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.43.39-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Finally, you can create a new table and name it&nbsp;<code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">md_embeddings</mark></strong></code>&nbsp;using this node. Create a&nbsp;<code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Stop and Error</mark></strong></code>&nbsp;node if you run into errors setting up the table.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.51.45-1024x580.png" alt="" class="wp-image-29753" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.51.45-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.51.45-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.51.45-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.51.45-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.51.45-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>All set! Your vector DB is prepped and ready for data! Keep in mind, you still need an&nbsp;<strong>embeddings model</strong> for the RAG data ingestion pipeline.</p>



<h4 class="wp-block-heading">4. Access to OVHcloud AI Endpoints</h4>



<p><strong>OVHcloud AI Endpoints</strong>&nbsp;is a managed service that provides&nbsp;<strong>ready-to-use APIs for AI models</strong>, including&nbsp;<strong>LLM, CodeLLM, embeddings, Speech-to-Text, and image models</strong>&nbsp;hosted within OVHcloud’s European infrastructure.</p>



<p>To vectorise the various documents in Markdown format, you have to select an embedding model:&nbsp;<a href="https://endpoints.ai.cloud.ovh.net/models/bge-m3" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong>BGE-M3</strong></a>.</p>



<p>Usually, your AI Endpoints API key should already be created. If not, head to the AI Endpoints menu in your OVHcloud Control Panel to generate a new API key.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-3-1-1024x580.png" alt="" class="wp-image-29775" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-3-1-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-3-1-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-3-1-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-3-1-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-3-1-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Once this is done, you can create new OpenAI credentials in your n8n.</p>



<p>Why do I need OpenAI credentials? Because <strong>AI Endpoints API&nbsp;</strong>is fully compatible with OpenAI’s, integrating it is simple and ensures the&nbsp;<strong>sovereignty of your data.</strong></p>



<p>How? Thanks to a single endpoint&nbsp;<a href="https://oai.endpoints.kepler.ai.cloud.ovh.net/v1" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>https://oai.endpoints.kepler.ai.cloud.ovh.net/v1</code></mark></strong></a>, you can request the different AI Endpoints models.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.45.33-1024x580.png" alt="" class="wp-image-29776" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.45.33-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.45.33-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.45.33-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.45.33-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.45.33-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>This means you can create a new n8n node by selecting&nbsp;<strong>Postgres PGVector Store</strong>&nbsp;and&nbsp;<strong>Add documents to Vector Store</strong>.<br>Set up this node as shown below:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.24-1024x580.png" alt="" class="wp-image-29781" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.24-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.24-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.24-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.24-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.24-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Then configure the <strong>Data Loader</strong> with a custom text splitting and a JSON type.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.38-1-1024x580.png" alt="" class="wp-image-29780" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.38-1-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.38-1-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.38-1-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.38-1-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.38-1-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>For the text splitter, here are some options:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-12.02.43-1024x580.png" alt="" class="wp-image-29786" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-12.02.43-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-12.02.43-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-12.02.43-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-12.02.43-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-12.02.43-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>To finish, select the&nbsp;<strong>BGE-M3</strong> embedding model from the model list and set the&nbsp;<strong>Dimensions</strong> to 1024.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.51-1024x580.png" alt="" class="wp-image-29784" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.51-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.51-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.51-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.51-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.51-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>You now have everything you need to build the ingestion pipeline.</p>



<h4 class="wp-block-heading">5. Set up the ingestion pipeline loop</h4>



<p>To make use of a fully automated document ingestion and vectorisation pipeline, you have to integrate some specific nodes, mainly:</p>



<ul class="wp-block-list">
<li>a <strong><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Loop Over Items</mark></code></strong> that downloads each markdown file one by one so that it can be vectorised;</li>



<li>a <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Code in JavaScript</mark></strong></code> that counts the number of files processed, which subsequently determines the number of requests sent to the embedding model;</li>



<li>an <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">If</mark></strong></code> condition that allows you to check when the 400 requests have been reached;</li>



<li>a <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Wait</mark></strong></code> node that pauses after every 400 requests to avoid getting rate-limited;</li>



<li>an S3 block <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Download a file</mark></strong></code> to download each markdown;</li>



<li>another <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Code in JavaScript</mark></strong></code> to extract and process text from Markdown files by cleaning and removing special characters before sending it to the embeddings model;</li>



<li>a PostgreSQL node to <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Execute a SQL</mark></strong></code> query to check that the table contains vectors after the process (loop) is complete.</li>
</ul>



<h5 class="wp-block-heading">5.1. Create a loop to process each documentation file</h5>



<p>Begin by creating a <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Loop Over Items</mark></strong></code> to process all the Markdown files one at a time. Set the <strong>batch size</strong> to <strong><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">1</mark></code></strong> in this loop.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-10.50.13-1024x580.png" alt="" class="wp-image-29788" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-10.50.13-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-10.50.13-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-10.50.13-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-10.50.13-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-10.50.13-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Add the <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>Loop</code></mark></strong> statement right after the S3 <strong><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Get Many Files</mark></code></strong> node as shown below:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.30.00-1024x580.png" alt="" class="wp-image-29797" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.30.00-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.30.00-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.30.00-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.30.00-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.30.00-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Time to put the loop’s content into action!</p>



<h5 class="wp-block-heading">5.2. Count the number of files using a code snippet</h5>



<p>Next, choose the <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Code in JavaScript</mark></strong></code> node from the list to see how many files have been processed. Set “Run Once for Each Item” <code><strong>Mode</strong></code> and “JavaScript” code <strong>Language</strong>, then add the following code snippet to the designated block.</p>



<pre class="wp-block-code"><code class="">// simple counter per item<br>const counter = $runIndex + 1;<br><br>return {<br>  counter<br>};</code></pre>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.05.47-1024x580.png" alt="" class="wp-image-29792" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.05.47-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.05.47-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.05.47-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.05.47-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.05.47-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Make sure this code snippet is included in the loop.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.33.57-1024x580.png" alt="" class="wp-image-29798" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.33.57-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.33.57-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.33.57-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.33.57-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.33.57-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>You can start adding the <mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><strong><code>if</code></strong></mark> part to the loop now.</p>



<h5 class="wp-block-heading">5.3. Add a condition that applies a rule every 400 requests</h5>



<p>Here, you need to create an <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">If</mark></strong></code> node and add the following condition, which you have set as an expression.</p>



<pre class="wp-block-code"><code class="">{{ (Number($json["counter"]) % 400) === 0 }}</code></pre>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.11.42-1024x580.png" alt="" class="wp-image-29794" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.11.42-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.11.42-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.11.42-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.11.42-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.11.42-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Add it immediately after counting the files:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.44.10-1024x580.png" alt="" class="wp-image-29800" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.44.10-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.44.10-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.44.10-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.44.10-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.44.10-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>If this condition <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">is true</mark></strong></code>, trigger the <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Wait</mark></strong></code> node.</p>



<h5 class="wp-block-heading">5.4. Insert a pause after each set of 400 requests</h5>



<p>Then insert a <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Wait</mark></strong></code> node to pause for a few seconds before resuming. You can insert <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Resume</mark></strong></code> “After Time Interval” and set the <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Wait Amount</mark></strong></code> to “60:00” seconds.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.23.39-1024x580.png" alt="" class="wp-image-29796" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.23.39-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.23.39-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.23.39-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.23.39-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.23.39-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Link it to the <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">If</mark></strong></code> condition when this is <strong>True</strong>.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.45.08-1024x580.png" alt="" class="wp-image-29801" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.45.08-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.45.08-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.45.08-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.45.08-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.45.08-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Next, you can go ahead and download the Markdown file, and then process it.</p>



<h5 class="wp-block-heading">5.5. Launch documentation download</h5>



<p>To do this, create a new <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Download a file</mark></strong></code> S3 node and configure it with this File Key expression:</p>



<pre class="wp-block-code"><code class="">{{ $('Process each documentation file').item.json.Key }}</code></pre>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.42.12-1024x580.png" alt="" class="wp-image-29804" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.42.12-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.42.12-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.42.12-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.42.12-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.42.12-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Want to connect it?  That’s easy, link it to the output of the <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Wait</mark></strong></code> and <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">If</mark></strong></code> statements when the ‘if’ statement returns <strong>False</strong>; this will allow the file to be processed only if the rate limit is not exceeded.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.49.05-1024x580.png" alt="" class="wp-image-29805" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.49.05-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.49.05-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.49.05-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.49.05-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.49.05-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>You’re almost done! Now you need to extract and process the text from the Markdown files – clean and remove any special characters before sending it to the embedding model.</p>



<h5 class="wp-block-heading">5.6 Clean Markdown text content</h5>



<p>Next, create another <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Code in JavaScript</mark></strong></code> to process text from Markdown files:</p>



<pre class="wp-block-code"><code class="">// extract binary content<br>const binary = $input.item.binary.data;<br><br>// decoding into clean UTF-8 text<br>let text = Buffer.from(binary.data, 'base64').toString('utf8');<br><br>// cleaning - remove non-printable characters<br>text = text<br>  .replace(/[^\x09\x0A\x0D\x20-\x7EÀ-ÿ€£¥•–—‘’“”«»©®™°±§¶÷×]/g, ' ')<br>  .replace(/\s{2,}/g, ' ')<br>  .trim();<br><br>// check lenght<br>if (text.length &gt; 14000) {<br>  text = text.slice(0, 14000);<br>}<br><br>return [{<br>  text,<br>  fileName: binary.fileName,<br>  mimeType: binary.mimeType<br>}];</code></pre>



<p>Select the <em>“Run Once for Each Item”</em> <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Mode</mark></strong></code> and place the previous code in the dedicated JavaScript block.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.02.04-1024x580.png" alt="" class="wp-image-29806" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.02.04-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.02.04-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.02.04-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.02.04-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.02.04-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>To finish, check that the output text has been sent to the document vectorisation system, which was set up in <strong>Step 3 – Configure PostgreSQL Managed DB (pgvector)</strong>.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.15.45-1024x580.png" alt="" class="wp-image-29808" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.15.45-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.15.45-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.15.45-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.15.45-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.15.45-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>How do I confirm that the table contains all elements after vectorisation?</p>



<h5 class="wp-block-heading">5.7 Double-check that the documents are in the table</h5>



<p>To confirm that your RAG system is working, make sure your vector database has different vectors; use a PostgreSQL node with <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Execute a SQL query</mark></strong></code> in your n8n workflow.</p>



<p>Then, run the following query:</p>



<pre class="wp-block-code"><code class="">-- count the number of elements<br>SELECT COUNT(*) FROM md_embeddings;</code></pre>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-20.28.49-1024x580.png" alt="" class="wp-image-29818" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-20.28.49-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-20.28.49-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-20.28.49-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-20.28.49-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-20.28.49-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Next, link this element to the <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Done</mark></strong></code> section of your <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Loop</mark></strong>, so the elements are counted when the process is complete.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.14.41-1024x580.png" alt="" class="wp-image-29773" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.14.41-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.14.41-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.14.41-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.14.41-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.14.41-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Congrats! You can now run the workflow to begin ingesting documents.</p>



<p>Click the <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Execute workflow</mark></strong></code> button and wait until the vectorization process is complete.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-11.41.52-1024x580.png" alt="" class="wp-image-29823" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-11.41.52-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-11.41.52-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-11.41.52-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-11.41.52-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-11.41.52-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Remember, everything should be green when it’s finished ✅.</p>



<h3 class="wp-block-heading">Step 2 – RAG chatbot</h3>



<p>With the data ingestion and vectorisation steps completed, you can now begin implementing your AI agent.</p>



<p>This involves building a <strong>RAG-based AI Agent</strong>&nbsp;by simply starting a chat with an LLM.</p>



<h4 class="wp-block-heading">1. Set up the chat box to start a conversation</h4>



<p>First, configure your AI Agent based on the RAG system, and add a new node in the same n8n workflow: <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Chat Trigger</mark></strong></code>.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.31.24-1024x580.png" alt="" class="wp-image-29834" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.31.24-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.31.24-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.31.24-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.31.24-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.31.24-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>This node will allow you to interact directly with your AI agent! But before that, you need to check that your message is safe.</p>



<p>This node will allow you to interact directly with your AI agent! But before that, you need to check that your message is secure.</p>



<h4 class="wp-block-heading">2. Set up your LLM Guard with AI Deploy</h4>



<p>To check whether a message is secure or not, use an LLM Guard.</p>



<p><strong>What’s an LLM Guard?</strong>&nbsp;This is a safety and control layer that sits between users and an LLM, or between the LLM and an external connection. Its main goal is to filter, monitor, and enforce rules on what goes into or comes out of the model 🔐.</p>



<p>You can use <a href="file:///Users/jdutse/Downloads/www.ovhcloud.com/en-gb/public-cloud/ai-deploy" data-wpel-link="internal">AI Deploy</a> from OVHcloud to deploy your desired LLM guard. With a single command line, this AI solution lets you deploy a Hugging Face model using vLLM Docker containers.</p>



<p>For more details, please refer to this <a href="https://blog.ovhcloud.com/mistral-small-24b-served-with-vllm-and-ai-deploy-one-command-to-deploy-llm/" data-wpel-link="internal">blog</a>.</p>



<p>For the use case covered in this article, you can use the open-source model <strong>meta-llama/Llama-Guard-3-8B</strong> available on <a href="https://huggingface.co/meta-llama/Llama-Guard-3-8B" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Hugging Face</a>.</p>



<h5 class="wp-block-heading">2.1 Create a Bearer token to request your custom AI Deploy endpoint</h5>



<p><a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-cli-app-token?id=kb_article_view&amp;sysparm_article=KB0035280" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Create a token</a> to access your AI Deploy app once it’s deployed.</p>



<pre class="wp-block-code"><code class="">ovhai token create --role operator ai_deploy_token=my_operator_token</code></pre>



<p>The following output is returned:</p>



<p><code><strong>Id: 47292486-fb98-4a5b-8451-600895597a2b<br>Created At: 20-10-25 8:53:05<br>Updated At: 20-10-25 8:53:05<br>Spec:<br>Name: ai_deploy_token=my_operator_token<br>Role: AiTrainingOperator<br>Label Selector:<br>Status:<br>Value: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX<br>Version: 1</strong></code></p>



<p>You can now store and export your access token to add it as a new credential in n8n.</p>



<pre class="wp-block-code"><code class="">export MY_OVHAI_ACCESS_TOKEN=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX</code></pre>



<h5 class="wp-block-heading">2.1 Start Llama Guard 3 model with AI Deploy</h5>



<p>Using <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">ovhai</mark></strong></code> CLI, launch the following command and vLLM start inference server.</p>



<pre class="wp-block-code"><code class="">ovhai app run \<br>	--name vllm-llama-guard3 \<br>        --default-http-port 8000 \<br>        --gpu 1 \<br>	--flavor l40s-1-gpu \<br>        --label ai_deploy_token=my_operator_token \<br>	--env OUTLINES_CACHE_DIR=/tmp/.outlines \<br>	--env HF_TOKEN=$MY_HF_TOKEN \<br>	--env HF_HOME=/hub \<br>	--env HF_DATASETS_TRUST_REMOTE_CODE=1 \<br>	--env HF_HUB_ENABLE_HF_TRANSFER=0 \<br>	--volume standalone:/workspace:RW \<br>	--volume standalone:/hub:RW \<br>	vllm/vllm-openai:v0.10.1.1 \<br>	-- bash -c python3 -m vllm.entrypoints.openai.api_server                       <br>                           --model meta-llama/Llama-Guard-3-8B \                     <br>                           --tensor-parallel-size 1 \                     <br>                           --dtype bfloat16</code></pre>



<p><em>Full command explained:</em></p>



<ul class="wp-block-list">
<li><code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">ovhai app run</mark></strong></code></li>
</ul>



<p>This is the core command to&nbsp;<strong>run an app</strong>&nbsp;using the&nbsp;<strong>OVHcloud AI Deploy</strong>&nbsp;platform.</p>



<ul class="wp-block-list">
<li><code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">--name vllm-llama-guard3</mark></strong></code></li>
</ul>



<p>Sets a&nbsp;<strong>custom name</strong>&nbsp;for the job. For example,&nbsp;<code>vllm-llama-guard3</code>.</p>



<ul class="wp-block-list">
<li><code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">--default-http-port 8000</mark></strong></code></li>
</ul>



<p>Exposes&nbsp;<strong>port 8000</strong>&nbsp;as the default HTTP endpoint. vLLM server typically runs on port 8000.</p>



<ul class="wp-block-list">
<li><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>--gpu&nbsp;</code>1</mark></strong></li>



<li><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>--flavor l40s-1-gpu</code></mark></strong></li>
</ul>



<p>Allocates&nbsp;<strong>1 GPU L40S</strong>&nbsp;for the app. You can adjust the GPU type and number depending on the model you have to deploy.</p>



<ul class="wp-block-list">
<li><code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">--volume standalone:/workspace:RW</mark></strong></code></li>



<li><code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">--volume standalone:/hub:RW</mark></strong></code></li>
</ul>



<p>Mounts&nbsp;<strong>two persistent storage volumes</strong>: <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>/workspace</code></mark></strong> which is the main working directory and <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">/hub</mark></strong></code>&nbsp;to store Hugging Face model files.</p>



<ul class="wp-block-list">
<li><code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">--env OUTLINES_CACHE_DIR=/tmp/.outlines</mark></strong></code></li>



<li><strong><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">--env HF_TOKEN=$MY_HF_TOKEN</mark></code></strong></li>



<li><code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">--env HF_HOME=/hub</mark></strong></code></li>



<li><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><strong>--env HF_DATASETS_TRUST_REMOTE_CODE=1</strong></mark></code></li>



<li><code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">--env HF_HUB_ENABLE_HF_TRANSFER=0</mark></strong></code></li>
</ul>



<p>These are Hugging Face&nbsp;<strong>environment variables</strong> you have to set. Please export your Hugging Face access token as environment variable before starting the app: <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">export MY_HF_TOKEN=***********</mark></strong></code></p>



<ul class="wp-block-list">
<li><code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">vllm/vllm-openai:v0.10.1.1</mark></strong></code></li>
</ul>



<p>Use the&nbsp;<strong><code>v<strong><code>llm/vllm-openai</code></strong></code></strong>&nbsp;Docker image (a pre-configured vLLM OpenAI API server).</p>



<ul class="wp-block-list">
<li><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><strong>-- bash -c python3 -m vllm.entrypoints.openai.api_server                       <br>                    --model meta-llama/Llama-Guard-3-8B \                     <br>                    --tensor-parallel-size 1 \                     <br>                    --dtype bfloat16</strong></mark></code></li>
</ul>



<p>Finally, run a<strong>&nbsp;bash shell</strong>&nbsp;inside the container and executes a Python command to launch the vLLM API server.</p>



<h5 class="wp-block-heading">2.2 Check to confirm your AI Deploy app is RUNNING</h5>



<p>Replace the <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">&lt;app_id></mark></strong></code> by yours.</p>



<pre class="wp-block-code"><code class="">ovhai app get &lt;app_id&gt;</code></pre>



<p>You should get:</p>



<p><code>History:<br>DATE STATE<br>20-1O-25 09:58:00 QUEUED<br>20-10-25 09:58:01 INITIALIZING<br>04-04-25 09:58:07 PENDING<br>04-04-25 10:03:10&nbsp;<strong>RUNNING</strong><br>Info:<br>Message: App is running</code></p>



<h5 class="wp-block-heading">2.3 Create a new n8n credential with AI Deploy app URL and Bearer access token</h5>



<p>First, using your <code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><strong>&lt;app_id></strong></mark></code>, retrieve your AI Deploy app URL.</p>



<pre class="wp-block-code"><code class="">ovhai app get <span style="background-color: initial; font-family: inherit; font-size: inherit; text-align: initial; font-weight: inherit;">&lt;app_id&gt;</span> -o json | jq '.status.url' -r</code></pre>



<p>Then, create a new OpenAI credential from your n8n workflow, using your AI Deploy URL and the Bearer token as an API key.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.49.14-1024x580.png" alt="" class="wp-image-29837" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.49.14-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.49.14-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.49.14-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.49.14-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.49.14-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Don&#8217;t forget to replace <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>6e10e6a5-2862-4c82-8c08-26c458ca12c7</code></mark></strong> with your <span style="background-color: initial; font-family: inherit; font-size: inherit; text-align: initial; font-weight: inherit;"><strong><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">&lt;app_id></mark></code></strong></span>.</p>



<h5 class="wp-block-heading">2.4 Create the LLM Guard node in n8n workflow</h5>



<p>Create a new <strong>OpenAI node</strong> to <strong>Message a model</strong> and select the new AI Deploy credential for LLM Guard usage.</p>



<p>Next, create the prompt as follows:</p>



<pre class="wp-block-code"><code class="">{{ $('Chat with the OVHcloud product expert').item.json.chatInput }}</code></pre>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.09.43-1024x580.png" alt="" class="wp-image-29840" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.09.43-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.09.43-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.09.43-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.09.43-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.09.43-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Then, use an <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">If</mark></strong></code> node to determine if the scenario is <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>safe</code></mark></strong> or <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>unsafe</code></mark></strong>:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.25.29-1024x580.png" alt="" class="wp-image-29842" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.25.29-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.25.29-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.25.29-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.25.29-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.25.29-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>If the message is <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">unsafe</mark></strong></code>, send an error message right away to stop the workflow.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.26.49-1024x580.png" alt="" class="wp-image-29843" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.26.49-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.26.49-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.26.49-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.26.49-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.26.49-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>But if the message is <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">safe</mark></strong></code>, you can send the request to the AI Agent without issues 🔐.</p>



<h4 class="wp-block-heading">3. Set up AI Agent</h4>



<p>The&nbsp;<strong>AI Agent</strong>&nbsp;node in&nbsp;<strong>n8n</strong>&nbsp;acts as an intelligent orchestration layer that combines&nbsp;<strong>LLMs, memory, and external tools</strong>&nbsp;within an automated workflow.</p>



<p>It allows you to:</p>



<ul class="wp-block-list">
<li>Connect a <strong>Large Language Model</strong> using APIs (e.g., LLMs from AI Endpoints);</li>



<li>Use <strong>tools</strong> such as HTTP requests, databases, or RAG retrievers so the agent can take actions or fetch real information;</li>



<li>Maintain <strong>conversational memory</strong> via PostgreSQL databases;</li>



<li>Integrate directly with chat platforms (e.g., Slack, Teams) for interactive assistants (optional).</li>
</ul>



<p>Simply put, n8n becomes an&nbsp;<strong>agentic automation framework</strong>, enabling LLMs to not only provide answers, but also think, choose, and perform actions.</p>



<p>Please note that you can change and customise this n8n <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">AI Agent</mark></strong></code> node to fit your use cases, using features like function calling or structured output. This is the most basic configuration for the given use case. You can go even further with different agents.</p>



<p>🧑‍💻&nbsp;<strong>How do I implement this RAG?</strong></p>



<p>First, create an <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">AI Agent</mark></strong></code> node in <strong>n8n</strong> as follows:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1024x580.png" alt="" class="wp-image-29933" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Then, a series of steps are required, the first of which is creating prompts.</p>



<h5 class="wp-block-heading">3.1 Create prompts</h5>



<p>In the AI Agent node on your n8n workflow, edit the user and system prompts.</p>



<p>Begin by creating the&nbsp;<strong>prompt</strong>,&nbsp;which is also the&nbsp;<strong>user message</strong>:</p>



<pre class="wp-block-code"><code class="">{{ $('Chat with the OVHcloud product expert').item.json.chatInput }}</code></pre>



<p>Then create the <strong>System Message</strong> as shown below:</p>



<pre class="wp-block-code"><code class="">You have access to a retriever tool connected to a knowledge base.  <br>Before answering, always search for relevant documents using the retriever tool.  <br>Use the retrieved context to answer accurately.  <br>If no relevant documents are found, say that you have no information about it.</code></pre>



<p>You should get a configuration like this:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1-1024x580.png" alt="" class="wp-image-29935" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>🤔 Well, an LLM is now needed for this to work!</p>



<h5 class="wp-block-heading">3.2 Select LLM using AI Endpoints API</h5>



<p>First, add an <strong>OpenAI Chat Model</strong> node, and then set it as the <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Chat Model</mark></strong></code> for your agent.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-3-1024x580.png" alt="" class="wp-image-29939" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-3-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-3-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-3-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-3-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-3-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Next, select one of the&nbsp;<a href="https://www.ovhcloud.com/en/public-cloud/ai-endpoints/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OVHcloud AI Endpoints</a>&nbsp;from the list provided, because they are compatible with Open AI APIs.</p>



<p>✅ <strong>How?</strong> By using the right API <a href="https://oai.endpoints.kepler.ai.cloud.ovh.net/v1" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>https://oai.endpoints.kepler.ai.cloud.ovh.net/v1</code></mark></strong></a></p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-2-1024x580.png" alt="" class="wp-image-29936" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-2-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-2-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-2-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-2-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-2-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>The <a href="https://www.ovhcloud.com/en/public-cloud/ai-endpoints/catalog/gpt-oss-120b/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong>GPT OSS 120B</strong></a> model has been selected for this use case. Other models, such as Llama, Mistral, and Qwen, are also available.</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p><mark style="background-color:#fcb900" class="has-inline-color">⚠️ <strong>WARNING</strong> ⚠️</mark></p>



<p>If you are using a recent version of n8n, you will likely encounter the <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>/responses</code></mark></strong> issue (linked to OpenAI compatibility). To resolve this, you will need to disable the button <strong><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Use Responses API</mark></code></strong> and everything will work correctly</p>
</blockquote>



<figure class="wp-block-image aligncenter size-full is-resized"><img loading="lazy" decoding="async" width="829" height="675" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/02_44_08-1.jpg" alt="" class="wp-image-30352" style="aspect-ratio:1.2281554640124863;width:409px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/02_44_08-1.jpg 829w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/02_44_08-1-300x244.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/02_44_08-1-768x625.jpg 768w" sizes="auto, (max-width: 829px) 100vw, 829px" /><figcaption class="wp-element-caption"><em>Tips to fix /responses issue</em></figcaption></figure>



<p>Your LLM is now set to answer your questions! Don’t forget, it needs access to the knowledge base.</p>



<h5 class="wp-block-heading">3.3 Connect the knowledge base to the RAG retriever</h5>



<p>As usual, the first step is to create an n8n node called <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">PGVector Vector Store nod</mark>e</strong></code> and enter your PGvector credentials.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-4-1024x580.png" alt="" class="wp-image-29943" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-4-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-4-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-4-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-4-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-4-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Next, link this element to the <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>Tools</code></mark></strong> section of the AI Agent node.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-5-1024x580.png" alt="" class="wp-image-29944" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-5-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-5-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-5-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-5-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-5-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Remember to connect your PG vector database so that the retriever can access the previously generated embeddings. Here’s an overview of what you’ll get.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-6-1024x580.png" alt="" class="wp-image-29945" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-6-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-6-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-6-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-6-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-6-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>⏳Nearly done! The final step is to add the database memory.</p>



<h5 class="wp-block-heading">3.4 Manage conversation history with database memory</h5>



<p>Creating&nbsp;<strong>Database Memory</strong>&nbsp;node in n8n (PostgreSQL) lets you link it to your AI Agent, so it can store and retrieve past conversation history. This enables the model to remember and use context from multiple interactions.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-7-1024x580.png" alt="" class="wp-image-29946" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-7-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-7-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-7-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-7-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-7-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>So link this PostgreSQL database to the <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Memory</mark></strong></code> section of your AI agent.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-8-1024x580.png" alt="" class="wp-image-29947" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-8-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-8-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-8-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-8-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-8-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Congrats! 🥳 Your&nbsp;<strong>n8n RAG workflow</strong>&nbsp;is now complete. Ready to test it?</p>



<h4 class="wp-block-heading">4. Make the most of your automated workflow</h4>



<p>Want to try it? It’s easy!</p>



<p>By clicking the orange <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>Open chat</code></mark></strong> button, you can ask the AI agent questions about OVHcloud products, particularly where you need technical assistance.</p>



<figure class="wp-block-video"><video height="1660" style="aspect-ratio: 2930 / 1660;" width="2930" controls src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/video-n8n1.mp4"></video></figure>



<p>For example, you can ask the LLM about rate limits in OVHcloud AI Endpoints and get the information in seconds.</p>



<figure class="wp-block-video"><video height="1660" style="aspect-ratio: 2930 / 1660;" width="2930" controls src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/video-n8n2.mp4"></video></figure>



<p>You can now build your own autonomous RAG system using OVHcloud Public Cloud, suited for a wide range of applications.</p>



<h2 class="wp-block-heading">What’s next?</h2>



<p>To sum up, this reference architecture provides a guide on using&nbsp;<strong>n8n</strong> with&nbsp;<strong>OVHcloud AI Endpoints</strong>,&nbsp;<strong>AI Deploy</strong>,&nbsp;<strong>Object Storage</strong>, and&nbsp;<strong>PostgreSQL + pgvector</strong> to build a fully controlled, autonomous&nbsp;<strong>RAG AI system</strong>.</p>



<p>Teams can build scalable AI assistants that work securely and independently in their cloud environment by orchestrating ingestion, embedding generation, vector storage, retrieval, and LLM safety check, and reasoning within a single workflow.</p>



<p>With the core architecture in place, you can add more features to improve the capabilities and robustness of your agentic RAG system:</p>



<ul class="wp-block-list">
<li>Web search</li>



<li>Images with OCR</li>



<li>Audio files transcribed using the Whisper model</li>
</ul>



<p>This delivers an extensive knowledge base and a wider variety of use cases!</p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Freference-architecture-build-a-sovereign-n8n-rag-workflow-for-ai-agent-using-ovhcloud-public-cloud-solutions%2F&amp;action_name=Reference%20Architecture%3A%20build%20a%20sovereign%20n8n%20RAG%20workflow%20for%20AI%20agent%20using%20OVHcloud%20Public%20Cloud%20solutions&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		<enclosure url="https://blog.ovhcloud.com/wp-content/uploads/2025/11/video-n8n1.mp4" length="11190376" type="video/mp4" />
<enclosure url="https://blog.ovhcloud.com/wp-content/uploads/2025/11/video-n8n2.mp4" length="9881210" type="video/mp4" />

			</item>
		<item>
		<title>Safety first: Detect harmful texts using an AI safeguard agent</title>
		<link>https://blog.ovhcloud.com/safety-first-detect-harmful-texts-using-an-ai-safeguard-agent/</link>
		
		<dc:creator><![CDATA[Alexandre Movsessian]]></dc:creator>
		<pubDate>Thu, 22 Jan 2026 10:46:11 +0000</pubDate>
				<category><![CDATA[Deploy & Scale]]></category>
		<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[Machine learning]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=30185</guid>

					<description><![CDATA[This article explains how to use the Qwen 3 Guard safeguard models provided by OVHCloud. Using this guide, you can analyse and moderate texts for LLM applications, chat platforms, customer support systems, or any other text-based services requiring safe and compliant interactions. Our focus will be on written content, such as conversations or plain text. [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fsafety-first-detect-harmful-texts-using-an-ai-safeguard-agent%2F&amp;action_name=Safety%20first%3A%20Detect%20harmful%20texts%20using%20an%20AI%20safeguard%20agent&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="981" height="463" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image.png" alt="" class="wp-image-30187" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image.png 981w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-300x142.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-768x362.png 768w" sizes="auto, (max-width: 981px) 100vw, 981px" /></figure>



<p class="has-text-align-left"><strong>This article explains how to use the Qwen 3 Guard safeguard models provided by OVHCloud.</strong></p>



<p>Using this guide, you can analyse and moderate texts for LLM applications, chat platforms, customer support systems, or any other text-based services requiring safe and compliant interactions.</p>



<p>Our focus will be on written content, such as conversations or plain text. Although image moderators exist, they won’t be covered here.</p>



<h2 class="wp-block-heading"><strong>Introduction</strong></h2>



<p><strong><br></strong>As <strong>Large Language Models</strong> (LLMs) continue to grow, access to information has become more seamless, but this ease of access makes it easier to generate, and be exposed to, harmful or toxic content.</p>



<p>LLMs can be prompted with malicious queries (e.g., “How do I make a bomb?”) and some models might comply by generating potentially dangerous responses. This risk is particularly concerning given the widespread availability of LLMs, to both minors and malicious actors alike.</p>



<p>To combat this, LLM providers train their models to reject toxic prompts, and integrate safety features to prevent the creation of harmful content. Even so, users often craft ‘<strong>jailbreaks</strong>’, which are specific prompts designed to get around these safety measures.</p>



<p>As a result, providers have created <strong>specialised safeguard models</strong> to find and remove toxic content in writing.</p>



<h1 class="wp-block-heading">What is toxicity?</h1>



<p>Toxicity is inherently difficult to define, as perceptions vary depending on factors such as individual sensitivity, cultural background, age, and personal experience.</p>



<p>Perceptions of content can vary widely. For example, some users may find certain jokes offensive, while others consider them perfectly acceptable. Similarly, roleplaying with an AI chat may be enjoyable for some, yet deemed inappropriate by others depending on the context.</p>



<p>Furthermore, each moderation system focuses on different categories of harmful content, based on the specific data and instructions it was trained on. For instance, models developed in the United States tend to be highly sensitive to hate speech, political content, and other related categories.</p>



<p>Because jailbreak attempts are a fairly new issue, existing moderation models often fail to address them.</p>



<p>Below are the toxicity categories for the Qwen 3 Guard models:</p>



<figure class="wp-block-table"><table class="has-fixed-layout"><tbody><tr><td><strong>Name</strong></td><td><strong>Description</strong></td></tr><tr><td><em>Violent</em></td><td>Content that provides detailed instructions, methods, or advice on how to commit acts of violence, including the manufacture, acquisition, or use of weapons. Also includes depictions of violence.</td></tr><tr><td><em>Nonviolent illegal acts</em></td><td>Content providing guidance or advice for nonviolent criminal activities like hacking, unauthorised drug manufacturing, or theft.</td></tr><tr><td><em>Sexual content or sexual acts</em></td><td>Content with sexual depictions, references, or descriptions of people. Also includes content with explicit sexual imagery, references, or descriptions of illegal or unethical sexual acts, such as rape, bestiality, incest, and sexual slavery.</td></tr><tr><td><em>Personally identifiable information</em></td><td>Content that shares or discloses sensitive personal identifying information, with authorisation, such as name, ID number, address, phone number, medical records, financial details, and account passwords, etc.</td></tr><tr><td><em>Suicide &amp; self-harm</em></td><td>Content advocating, directly encouraging, or detailing methods for self-harm, suicide, or dangerous activities that could lead to serious injury or death.</td></tr><tr><td><em>Unethical acts</em></td><td>Any immoral or unethical content or acts, including but not limited to bias, discrimination, stereotype, injustice, hate speech, offensive language, harassment, insults, threat, defamation, extremism, misinformation regarding ethics, and other behaviours that, while not illegal, are still considered unethical.</td></tr><tr><td><em>Politically sensitive topics</em></td><td>The deliberate creation or spread of false information about government actions, historical events, or public figures that is demonstrably untrue and poses risk of public deception or social harm.</td></tr><tr><td><em>Copyright violation</em></td><td>Content that includes unauthorised reproduction, distribution, public display, or derivative use of copyrighted materials, such as novels, scripts, lyrics, and other legally protected creative works, without the copyright holder’s clear consent.</td></tr><tr><td><em>Jailbreak</em></td><td>Content that explicitly attempts to override the model&#8217;s system prompt or model conditioning.</td></tr></tbody></table></figure>



<p>These categories are <strong>not mutually exclusive</strong>. A text may very well contain both Unethical Acts and Violence, for example. Most notably, jailbreaks often include another kind of toxic query as it is designed to bypass security guardrails. The Qwen 3 Guard moderator, however, will only return one category.</p>



<p>These categories were arbitrarily chosen by Qwen 3 Guard creators; they can’t be changed, but <strong>you may choose to ignore some</strong> depending on your use case.</p>



<h1 class="wp-block-heading">Metrics</h1>



<p><em>Attack</em>: An attack refers to any attempt to produce harmful or toxic content. This is either a prompt crafted to make an LLM generate harmful output, or just a user’s toxic message in a chat system.</p>



<p><em>Attack Success Rates (ASR)</em>: This is a metric used to assess the effectiveness of a moderation system. It represents the <strong>proportion of attacks that successfully bypass the moderator</strong> and go undetected. A lower ASR indicates a more robust moderation system.</p>



<p><em>False positive</em>: A false positive occurs when benign, nontoxic content is incorrectly flagged as harmful by the moderator.</p>



<p><em>False Positive Rate (FPR)</em>: The FPR measures how often a moderation system misclassifies safe content as toxic. It complements the ASR by reflecting the <strong>model’s ability to correctly allow harmless content through</strong>. A lower FPR indicates better reliability.</p>



<h2 class="wp-block-heading">Qwen 3 Guard</h2>



<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Qwen 3 Guard was launched in October 2025 by Qwen, Alibaba’s AI team. After extensive testing and evaluation, we found this model to be the most effective in safeguarding content.</p>



<p>Besides being efficient, Qwen 3 Guard can detect toxicity across nine categories, including jailbreak attempts, a feature that isn’t common in safeguard models.</p>



<p>It also provides explanations by specifying the exact category detected.</p>



<h3 class="wp-block-heading">Specs</h3>



<ul class="wp-block-list">
<li>Base model: Qwen 3</li>



<li>Flavours: 0.6B, 4B, 8B</li>



<li>Context size: 32,768 tokens</li>



<li>Languages: English, French and 117 other languages and dialects</li>



<li>Tasks:<ul><li>Detection of toxicity in raw text</li></ul><ul><li>Detection of toxicity in LLM dialogue</li></ul><ul><li>Detection of answer refusal (LLM dialogue only)</li></ul>
<ul class="wp-block-list">
<li>Classification of toxicity</li>
</ul>
</li>
</ul>



<h3 class="wp-block-heading">Availability</h3>



<p><a href="https://www.ovhcloud.com/en/public-cloud/ai-endpoints/catalog" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">https://www.ovhcloud.com/en/public-cloud/ai-endpoints/catalog</a></p>



<p>There are two flavours of Qwen 3 Guard available on OVHCloud:</p>



<p><strong><em>Qwen 3 Guard 0.6B</em></strong>: This lightweight model is very effective at detecting overt toxic content.</p>



<p><strong><em>Qwen 3 Guard 8B</em></strong>: This heavier model comes in handy when confronted with more nuanced examples.</p>



<h3 class="wp-block-heading">Scores</h3>



<figure class="wp-block-table"><table class="has-fixed-layout"><tbody><tr><td><strong>&nbsp;</strong></td><td><strong><em>ASR</em></strong></td><td><strong><em>FPR</em></strong></td></tr><tr><td><strong><em>Qwen 3 Guard 0.6B</em></strong></td><td>0.20</td><td>0.06</td></tr><tr><td><strong><em>Qwen 3 Guard 8B</em></strong></td><td>0.20</td><td>0.04</td></tr></tbody></table></figure>



<h3 class="wp-block-heading">&nbsp;</h3>



<h3 class="wp-block-heading">Notes</h3>



<ul class="wp-block-list">
<li>The Qwen 3 Guard models has three safety labels for more precise moderation: Safe, Controversial, Unsafe</li>



<li>Although the model can moderate chats, it is recommended to process each part of the dialogue individually rather than submitting the entire conversation at once. Guard Models, like any LLMs, perform better in detection when the context size is kept extremely brief.</li>



<li>Since Qwen Guard is developed by a Chinese company, its interpretation of toxic content may differ from yours. If necessary, you can overlook certain categories.</li>
</ul>



<h1 class="wp-block-heading">How do I set up my own moderator?</h1>



<p>First, you need to choose the flavour you want:</p>



<ul class="wp-block-list">
<li><strong><em>Qwen 3 Guard 0.6B</em></strong> is <strong>lightweight</strong>, <strong>fast</strong>, <strong>efficient</strong> and is great at detecting <strong>overt toxic content</strong>, like <em>Sexual Content</em> or <em>Violence</em> in texts.</li>
</ul>



<ul class="wp-block-list">
<li><strong><em>Qwen 3 Guard 8B</em></strong> is heavier, slightly slower but it is more effective against <strong>more nuanced toxic content </strong>like <em>Jailbreak</em> or <em>Unethical Acts</em>, and has a <strong>lower false positive rate</strong>.</li>
</ul>



<p>Your use case is the key to choosing the right model. Do you need to moderate a large volume of text? Is processing speed a priority? How crucial is it to minimise false positives? Are you dealing with nuanced toxic content, or is it more overt?</p>



<p>Carefully considering these questions will help you determine which of the two models is most suitable for your needs.</p>



<p>Both models can be tested on the playground:</p>



<p><a href="https://www.ovhcloud.com/en/public-cloud/ai-endpoints/catalog" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">https://www.ovhcloud.com/en/public-cloud/ai-endpoints/catalog</a></p>



<p>Once you’ve made you choice, you need to send the texts you want checked to the AI Endpoints API.</p>



<p>First install the <em>requests</em> library:</p>



<pre class="wp-block-code"><code class="">pip install requests</code></pre>



<p>Next, export your access token to the <em>OVH_AI_ENDPOINTS_ACCESS_TOKEN</em> environment variable:</p>



<pre class="wp-block-code"><code class="">export OVH_AI_ENDPOINTS_ACCESS_TOKEN=&lt;your-access-token&gt;</code></pre>



<p><em>If you don’t have an access token key yet, follow the steps in the </em><a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-endpoints-getting-started?id=kb_article_view&amp;sysparm_article=KB0065401" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><em>AI Endpoints – Getting Started</em></a> <em>guide</em></p>



<p>Finally, run the following Python code:</p>



<pre class="wp-block-code"><code class="">import os<br>import requests<br><br>url = "https://oai.endpoints.kepler.ai.cloud.ovh.net/v1/chat/completions"<br><br>payload = {<br>"messages": [{"role": "user", "content": "How do I cook meth ?"}],<br>"model": , #Qwen/Qwen3Guard-Gen-0.6B or Qwen/Qwen3Guard-Gen-8B<br>"seed": 21<br>}<br><br>headers = {<br>"Content-Type": "application/json",<br>"Authorization": f"Bearer {os.getenv('OVH_AI_ENDPOINTS_ACCESS_TOKEN')}",<br>}<br><br>response = requests.post(url, json=payload, headers=headers)<br>if response.status_code == 200:<br># Handle response<br>response_data = response.json()<br># Parse JSON response<br>choices = response_data["choices"]<br>for choice in choices:<br>text = choice["message"]["content"]<br># Process text<br>print(text)<br>else:<br>print("Error:", response.status_code, response.text)</code></pre>



<p>The model will respond with a label (Safe, Controversial, Unsafe) and if the text is Controversial or Unsafe, it will return the associated category.</p>



<pre class="wp-block-code"><code class="">Safety: Unsafe<br>Categories: Nonviolent Illegal Acts</code></pre>



<p>Our moderation models are available for free during the beta phase. You can test them with a different model or within the playground.</p>



<h2 class="wp-block-heading"><strong>Conclusion</strong></h2>



<p>Two models are currently available for OVHCloud moderation users:<br><strong>•</strong> Qwen 3 Guard 0.6B: <strong>Lightweight</strong>, <strong>fast</strong>, <strong>efficient,</strong> great at detecting <strong>overt toxic content</strong><br><strong>•</strong> Qwen 3 Guard 8B: <strong>Heavier, slightly slower but more effective against more nuanced toxic content</strong><br><br>Which approach and which tool should you choose? Well, it&#8217;s up to you, depending on your use cases, teams, or needs, etc.<br><br>As we&#8217;ve seen in this blog post, OVHcloud AIEndpoint users can start using these models right away, safely and free of charge.<br><br>They are still in beta phase for now, so we&#8217;d appreciate your feedback!</p>



<p></p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fsafety-first-detect-harmful-texts-using-an-ai-safeguard-agent%2F&amp;action_name=Safety%20first%3A%20Detect%20harmful%20texts%20using%20an%20AI%20safeguard%20agent&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Moving Beyond Ingress: Why should OVHcloud Managed Kubernetes Service (MKS) users start looking at the Gateway API?</title>
		<link>https://blog.ovhcloud.com/moving-beyond-ingress-why-should-ovhcloud-managed-kubernetes-service-mks-users-start-looking-at-the-gateway-api/</link>
		
		<dc:creator><![CDATA[Aurélie Vache&nbsp;and&nbsp;Antonin Anchisi]]></dc:creator>
		<pubDate>Mon, 15 Dec 2025 09:26:36 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[Tranches de Tech & co]]></category>
		<category><![CDATA[Kubernetes]]></category>
		<category><![CDATA[OVHcloud Managed Kubernetes]]></category>
		<category><![CDATA[Public Cloud]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=30016</guid>

					<description><![CDATA[For years, the Kubernetes Ingress API, and the popular Ingress NGINX controller (ingress-nginx), have been the default way to expose applications running inside a Kubernetes cluster. But the ecosystem is changing: the Kubernetes SIG network has announced the retirement of Ingress NGINX in March 2026. After March 2026 the Ingress NGINX will no longer get [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fmoving-beyond-ingress-why-should-ovhcloud-managed-kubernetes-service-mks-users-start-looking-at-the-gateway-api%2F&amp;action_name=Moving%20Beyond%20Ingress%3A%20Why%20should%20OVHcloud%20Managed%20Kubernetes%20Service%20%28MKS%29%20users%20start%20looking%20at%20the%20Gateway%20API%3F&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" width="1024" height="680" src="https://blog.ovhcloud.com/wp-content/uploads/2025/12/Gribouillis-2025-12-02-13.47.59.631-1024x680.png" alt="" class="wp-image-30084" style="width:669px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/12/Gribouillis-2025-12-02-13.47.59.631-1024x680.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/12/Gribouillis-2025-12-02-13.47.59.631-300x199.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/12/Gribouillis-2025-12-02-13.47.59.631.png 1505w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>For years, the Kubernetes <strong>Ingress</strong> API, and the popular Ingress NGINX controller (ingress-nginx), have been the default way to expose applications running inside a Kubernetes cluster.</p>



<p>But the ecosystem is changing: the Kubernetes SIG network has announced the <a href="https://kubernetes.io/blog/2025/11/11/ingress-nginx-retirement/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">retirement of Ingress NGINX</a> in March 2026.</p>



<p>After <strong>March 2026 </strong>the Ingress NGINX will no longer get new features, new releases, security patches and bug fixes.</p>



<p>Furthermore, the <a href="https://kubernetes.io/docs/concepts/services-networking/ingress/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Kubernetes project <strong>recommends using Gateway instead of Ingress</strong></a>.</p>



<p>The Ingress API has already been frozen, which means it is no longer being developed, and will have no further changes or updates made to it. The Kubernetes project has no plans to remove Ingress from Kubernetes.</p>



<p>While OVHcloud Managed Kubernetes Service (MKS) does not yet provide a native <strong>GatewayClass</strong>, you can already benefit from Gateway API capabilities today by deploying your own controller 💪 .</p>



<p>Also, until Gateway API becomes fully integrated with OpenStack providers, there is an <strong>intermediate option</strong>: using a <strong>modern, actively maintained Ingress controller</strong> other than ingress-nginx.</p>



<h3 class="wp-block-heading">The limitations of the current Ingress controller model</h3>



<p>The traditional Kubernetes Ingress model was intentionally simple: define an <code>Ingress</code>, install an <code>Ingress Controller</code>, and let it configure a single proxy (usually Nginx) to route traffic.</p>



<p>This design works, but it comes with limitations:</p>



<p>&#8211; Single Monolithic “Entry Point”: All HTTP routing for the entire cluster goes through <strong>one shared proxy</strong>. It adds complexity, configuration conflicts and scaling challenges.<br>&#8211; Protocol limitations: only <strong>HTTP and HTTPS</strong>.Support for gRPC, HTTP/2, TCP, UDP or TLS passthrough is inconsistent and controller-specific.<br>&#8211; Heavy Reliance on Annotations: Advanced features (timeouts, rewrites, header handling…) rely on custom annotations.<br>&#8211; Strong 3rd parties and cloud Load Balancers support: Every <a href="https://kubernetes.io/docs/concepts/services-networking/ingress-controllers/#additional-controllers" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Ingress controllers</a> (3rd parties providers) come with their specialized annotations.</p>



<p>Finally, as mentioned, the most used Ingress controller, Ingress NGINX, will be retired in March 2026.</p>



<h3 class="wp-block-heading">A Transitional Solution: Using a Modern Ingress Controller (Traefik, Contour, HAProxy…)</h3>



<p>Before moving to the Gateway API, as a transitional solution, OVHcloud MKS users can simply replace Ingress Nginx with a <strong>modern, actively maintained Ingress controller</strong>.</p>



<p>This allows you to:</p>



<p>&#8211; keep using your existing <code>Ingress</code> manifests<br>&#8211; keep the same architecture: Service type LoadBalancer → OVHcloud Public Cloud Load Balancer → Ingress Controller<br>&#8211; avoid relying on unsupported or deprecated components<br>&#8211; gain features (better gRPC support, built‑in dashboards, improved L7 behaviour&#8230;)</p>



<h4 class="wp-block-heading">Popular alternatives:</h4>



<p><a href="https://doc.traefik.io/traefik/providers/kubernetes-ingress/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong>Traefik</strong></a>:<br>&#8211; Very easy to deploy<br>&#8211; Excellent support for HTTP/2, gRPC, WebSockets<br>&#8211; Built‑in dashboard<br>&#8211; Supports both Ingress and Gateway API<br>&#8211; Actively maintained<br>&#8211; Seamless migration from NGINX Ingress Controller to Traefik with <a href="https://doc.traefik.io/traefik/reference/routing-configuration/kubernetes/ingress-nginx/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">NGINX annotation compatibility</a></p>



<p><strong><a href="https://projectcontour.io/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Contour</a> (Envoy)</strong>:<br>&#8211; Envoy-based Ingress Controller<br>&#8211; Excellent performance<br>&#8211; Good stepping‑stone toward Gateway API</p>



<p><a href="https://www.haproxy.com/documentation/kubernetes-ingress/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong>HAProxy Ingress</strong></a>:<br>&#8211; Extremely performant<br>&#8211; Enterprise-grade L7 routing<br>&#8211; Optional Gateway API support</p>



<p><strong><a href="https://docs.nginx.com/nginx-gateway-fabric/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">NGINX Gateway Fabric</a> (NGF)</strong>:<br>&#8211; The successor to Ingress NGINX<br>&#8211; Built directly around Gateway API<br>&#8211; Still maturing but a strong long‑term candidate</p>



<p>If you are interested, you can read the more<a href="https://kubernetes.io/docs/concepts/services-networking/ingress-controllers/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"> exhaustive list of Ingress controllers</a>.</p>



<h3 class="wp-block-heading">Installing an Alternative Ingress Controller on OVHcloud MKS</h3>



<p>We will show you how to install <strong>Traefik</strong>, as an alternative Ingress controller and use it to spawn a single OVHcloud Public Cloud Load Balancer (based on OpenStack Octavia).</p>



<p>Install Traefik:</p>



<pre class="wp-block-code"><code class="">helm repo add traefik https://traefik.github.io/charts<br>helm repo update<br><br>helm install traefik traefik/traefik --namespace traefik --create-namespace --set service.type=LoadBalancer</code></pre>



<p>This automatically triggers:<br>&#8211; the OpenStack CCM (used by OVHcloud)<br>&#8211; the creation of an OVHcloud Public Cloud Load Balancer<br>&#8211; exposure of Traefik through a public IP</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="179" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-11-1024x179.png" alt="" class="wp-image-30035" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-11-1024x179.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-11-300x52.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-11-768x134.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-11-1536x268.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-11-2048x358.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>After several seconds, the Load Balancer will be active.</p>



<p>Check that Traefik is running:</p>



<pre class="wp-block-code"><code class="">$ kubectl get all -n traefik<br>NAME                           READY   STATUS    RESTARTS   AGE<br>pod/traefik-6777c5db85-pddd6   1/1     Running   0          31s<br><br>NAME              TYPE           CLUSTER-IP     EXTERNAL-IP   PORT(S)                      AGE<br>service/traefik   LoadBalancer   10.3.129.188   &lt;pending&gt;     80:30267/TCP,443:30417/TCP   31s<br><br>NAME                      READY   UP-TO-DATE   AVAILABLE   AGE<br>deployment.apps/traefik   1/1     1            1           31s<br><br>NAME                                 DESIRED   CURRENT   READY   AGE<br>replicaset.apps/traefik-6777c5db85   1         1         1       31s</code></pre>



<p>Then in order to use it, create an <code>ingress.yaml</code> file with the following content:</p>



<pre class="wp-block-code"><code class="">apiVersion: networking.k8s.io/v1<br>kind: Ingress<br>metadata:<br>  name: my-app-ingress<br>  namespace: default<br>  annotations:<br>    kubernetes.io/ingress.class: "traefik"  # Specifies Traefik as the ingress controller<br>spec:<br>  rules:<br>    - host: my-app.local<br>      http:<br>        paths:<br>          - path: /<br>            pathType: Prefix<br>            backend:<br>              service:<br>                name: my-app-service<br>                port:<br>                  number: 80</code></pre>



<p>And apply it in your cluster:</p>



<pre class="wp-block-code"><code class="">kubectl apply -f ingress.yaml</code></pre>



<p>Using this type of alternative provides a <strong>fully supported, modern Ingress Controller</strong> while you prepare a long‑term transition to the Gateway API.</p>



<h3 class="wp-block-heading">Gateway API: A modern, flexible networking model</h3>



<p>The <strong>Gateway API</strong> is the next-generation Kubernetes networking specification. It introduces clearer roles and more flexible architectures.</p>



<p>Gateway API splits responsibilities across:<br>&#8211; <strong>GatewayClass</strong>: defines the type of gateway and which controller manages it<br>&#8211; <strong>Gateway</strong>: the actual entry point (e.g., a Load Balancer)<br>&#8211; <strong>Routes</strong>: routing rules, protocol-specific (HTTPRoute, TLSRoute, GRPCRoute, TCPRoute…)</p>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" width="800" height="700" src="https://blog.ovhcloud.com/wp-content/uploads/2025/12/image-1.png" alt="" class="wp-image-30065" style="width:558px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/12/image-1.png 800w, https://blog.ovhcloud.com/wp-content/uploads/2025/12/image-1-300x263.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/12/image-1-768x672.png 768w" sizes="auto, (max-width: 800px) 100vw, 800px" /></figure>



<p>Gateway API supports:<br>&#8211; HTTP(S)<br>&#8211; HTTP/2<br>&#8211; gRPC<br>&#8211; TCP<br>&#8211; TLS passthrough<br>…in a consistent and portable way.</p>



<p>Unlike Ingress, Gateway API is explicitly designed to allow providers like OVHcloud, AWS, GCP, Azure to:<br>&#8211; provision Load Balancers (LB)<br>&#8211; manage listeners<br>&#8211; expose multiple ports<br>&#8211; integrate with their LB features<br>This paves the way for native OVHcloud <strong>GatewayClass</strong> support.</p>



<h3 class="wp-block-heading">How does it work today on OVHcloud MKS?</h3>



<p>OVHcloud MKS relies on the OpenStack Cloud Controller Manager (CCM) to provision OVHcloud <strong>Public Cloud</strong> Load Balancers in response to a Service of type <code>LoadBalancer</code>.</p>



<p>Since MKS does not yet include a native <code>GatewayClass</code>, you can use Gateway API today as follows:</p>



<p>1. You deploy an existing Gateway Controller (Envoy Gateway, Traefik, Contour/Envoy…) and its GatewayClass.<br>2. The controller deploys a Data Plane proxy inside the cluster.<br>3. To expose that proxy, you still have to create a <code>Service</code> of type <strong>LoadBalancer</strong> (and your app of course).<br>4. The CCM provisions an OVHcloud Public Cloud Load Balancer and forwards traffic to your proxy.</p>



<p>Thanks to that, you will have a fully functional Gateway API. The workflow is very similar to that which is required for using NGINX Ingress controller.</p>



<h3 class="wp-block-heading">Using the Gateway API on OVHcloud MKS today</h3>



<p>You can already use the Gateway API by deploying your preferred controller.</p>



<p>Here’s an example using<a href="https://gateway.envoyproxy.io/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"> Envoy Gateway</a>, one of the most future-proof options.</p>



<p>Install Gateway API CRDs:</p>



<pre class="wp-block-code"><code class="">kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/latest/download/standard-install.yaml</code></pre>



<p>Deploy Envoy Gateway:</p>



<pre class="wp-block-code"><code class="">helm install eg oci://docker.io/envoyproxy/gateway-helm -n envoy-gateway-system --create-namespace</code></pre>



<p>You should have a result like this:</p>



<pre class="wp-block-code"><code class="">$ helm install eg oci://docker.io/envoyproxy/gateway-helm -n envoy-gateway-system --create-namespace<br><br>Pulled: docker.io/envoyproxy/gateway-helm:1.6.0<br>Digest: sha256:5c55e7844ae8cff3152ca00330234ef61b1f9fa3d466f50db2c63a279f1cd1df<br>NAME: eg<br>LAST DEPLOYED: Mon Dec  1 16:27:07 2025<br>NAMESPACE: envoy-gateway-system<br>STATUS: deployed<br>REVISION: 1<br>TEST SUITE: None<br>NOTES:<br>**************************************************************************<br>*** PLEASE BE PATIENT: Envoy Gateway may take a few minutes to install ***<br>**************************************************************************<br><br>Envoy Gateway is an open source project for managing Envoy Proxy as a standalone or Kubernetes-based application gateway.<br><br>Thank you for installing Envoy Gateway! 🎉<br><br>Your release is named: eg. 🎉<br><br>Your release is in namespace: envoy-gateway-system. 🎉<br><br>To learn more about the release, try:<br><br>  $ helm status eg -n envoy-gateway-system<br>  $ helm get all eg -n envoy-gateway-system<br><br>To have a quickstart of Envoy Gateway, please refer to https://gateway.envoyproxy.io/latest/tasks/quickstart.<br><br>To get more details, please visit https://gateway.envoyproxy.io and https://github.com/envoyproxy/gateway.</code></pre>



<p>Check the Envoy gateway is running:</p>



<pre class="wp-block-code"><code class="">$ kubectl get po -n envoy-gateway-system<br>NAME                            READY   STATUS    RESTARTS   AGE<br>envoy-gateway-9cbbc577c-5h5qw   1/1     Running   0          16m</code></pre>



<p>As a quickstart, you can install directly the <a href="https://gateway-api.sigs.k8s.io/api-types/gatewayclass/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">GatewayClass</a>, <a href="https://gateway-api.sigs.k8s.io/api-types/gateway/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Gateway</a>, <a href="https://gateway-api.sigs.k8s.io/api-types/httproute/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">HTTPRoute</a> and an example app:</p>



<pre class="wp-block-code"><code class="">kubectl apply -f https://github.com/envoyproxy/gateway/releases/download/latest/quickstart.yaml -n default</code></pre>



<p>This command deploys a <code>GatewayClass</code>, a <code>Gateway</code>, a <code>HTTPRoute</code> and an app deployed in a deployment and exposed through a service:</p>



<pre class="wp-block-code"><code class="">gatewayclass.gateway.networking.k8s.io/eg created<br>gateway.gateway.networking.k8s.io/eg created<br>serviceaccount/backend created<br>service/backend created<br>deployment.apps/backend created<br>httproute.gateway.networking.k8s.io/backend created</code></pre>



<p>As you can see, a GatewayClass have been deployed:</p>



<pre class="wp-block-code"><code class="">$ kubectl get gatewayclass -o yaml | kubectl neat<br>apiVersion: v1<br>items:<br>- apiVersion: gateway.networking.k8s.io/v1<br>  kind: GatewayClass<br>  metadata:<br>    name: eg<br>  spec:<br>    controllerName: gateway.envoyproxy.io/gatewayclass-controller<br>kind: List<br>metadata:<br>  resourceVersion: ""</code></pre>



<p>Note that a GatewayClass is a cluster-wide resource so you don&#8217;t have to specify any namespace.</p>



<p>A Gateway have been deployed also:</p>



<pre class="wp-block-code"><code class="">$ kubectl get gateway -o yaml -n default | kubectl neat<br>apiVersion: v1<br>items:<br>- apiVersion: gateway.networking.k8s.io/v1<br>  kind: Gateway<br>  metadata:<br>    name: eg<br>    namespace: default<br>  spec:<br>    gatewayClassName: eg<br>    listeners:<br>    - allowedRoutes:<br>        namespaces:<br>          from: Same<br>      name: http<br>      port: 80<br>      protocol: HTTP<br>kind: List<br>metadata:<br>  resourceVersion: ""</code></pre>



<p>A HTTPRoute also:</p>



<pre class="wp-block-code"><code class="">$ kubectl get httproute -o yaml -n default | kubectl neat<br>apiVersion: v1<br>items:<br>- apiVersion: gateway.networking.k8s.io/v1<br>  kind: HTTPRoute<br>  metadata:<br>    name: backend<br>    namespace: default<br>  spec:<br>    hostnames:<br>    - www.example.com<br>    parentRefs:<br>    - group: gateway.networking.k8s.io<br>      kind: Gateway<br>      name: eg<br>    rules:<br>    - backendRefs:<br>      - group: ""<br>        kind: Service<br>        name: backend<br>        port: 3000<br>        weight: 1<br>      matches:<br>      - path:<br>          type: PathPrefix<br>          value: /<br>kind: List<br>metadata:<br>  resourceVersion: ""</code></pre>



<p>In order to retrieve the external IP (of the external Load Balancer), you just have to get information about the Gateway and export it in an environment variable:</p>



<pre class="wp-block-code"><code class="">$ kubectl get gateway eg<br>NAME   CLASS   ADDRESS        PROGRAMMED   AGE<br>eg     eg      xx.xxx.xx.xxx   True        18m<br><br>$ export GATEWAY_HOST=$(kubectl get gateway/eg -o jsonpath='{.status.addresses[0].value}')<br><br>$ echo $GATEWAY_HOST<br>xx.xxx.xx.xxx</code></pre>



<p>And finally, a <code>backend</code> service have been deployed with its deployment:</p>



<pre class="wp-block-code"><code class="">$ kubectl get pod,svc -l app=backend -n default<br>NAME                           READY   STATUS    RESTARTS   AGE<br>pod/backend-765694d47f-zr6hh   1/1     Running   0          21m<br><br>NAME              TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE<br>service/backend   ClusterIP   10.3.114.179   &lt;none&gt;        3000/TCP   21m</code></pre>



<p>In order to create your own <code>Gateway</code> and <code>*Route</code> resources, don&#8217;t hesitate to take a look at the <a href="https://gateway-api.sigs.k8s.io/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Gateway API website</a>.</p>



<h3 class="wp-block-heading">Conclusion</h3>



<p>Two migration paths are currently available for OVHcloud MKS users:</p>



<ul class="wp-block-list">
<li>Short-term: switch to a modern Ingress Controller (Traefik, Contour, HAProxy, NGF&#8230;). It provides full support for current Ingress usage, without requiring API changes.</li>



<li>Long-term: adopt the Gateway API. Gateway API brings multi‑protocol support, clearer separation of roles, and is the strategic direction of Kubernetes networking.</li>
</ul>



<p>Which approach and which tool should you choose? Well, it’s up to you, depending on your use cases, your teams, your needs… 🙂</p>



<p>As we have seen in this blog post, OVHcloud MKS users can begin adopting these technologies today, safely and incrementally.</p>



<p>This ecosystem is evolving quickly, so stay tuned to find out about the coming release of a pre-installed official GatewayClass (based on OpenStack Octavia) 💪.</p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fmoving-beyond-ingress-why-should-ovhcloud-managed-kubernetes-service-mks-users-start-looking-at-the-gateway-api%2F&amp;action_name=Moving%20Beyond%20Ingress%3A%20Why%20should%20OVHcloud%20Managed%20Kubernetes%20Service%20%28MKS%29%20users%20start%20looking%20at%20the%20Gateway%20API%3F&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Industrial Excellence meets Artificial Intelligence: Behind the Scenes with Smart Datacenter</title>
		<link>https://blog.ovhcloud.com/industrial-excellence-meets-artificial-intelligence-behind-the-scenes-with-smart-datacenter/</link>
		
		<dc:creator><![CDATA[Ali Chehade,&nbsp;Julien Jay&nbsp;and&nbsp;Christian Sharp]]></dc:creator>
		<pubDate>Fri, 12 Dec 2025 14:35:42 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[cooling]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=30107</guid>

					<description><![CDATA[At OVHcloud, we are constantly looking for ways to improve our operations and reduce our impact on the environment. This has been a defining part of the company since 1999 and is a key part of our organisational DNA and our commercial model. We are very proud to present the new Smart Datacenter cooling system, [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Findustrial-excellence-meets-artificial-intelligence-behind-the-scenes-with-smart-datacenter%2F&amp;action_name=Industrial%20Excellence%20meets%20Artificial%20Intelligence%3A%20Behind%20the%20Scenes%20with%20Smart%20Datacenter&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p></p>



<p>At OVHcloud, we are constantly looking for ways to improve our operations and reduce our impact on the environment. This has been a defining part of the company since 1999 and is a key part of our organisational DNA and our commercial model.</p>



<p>We are very proud to present the new Smart Datacenter cooling system, which significantly improves energy and water efficiency while delivering a significant reduction in carbon impact across the entire cooling chain, from manufacturing and transport to daily operations.</p>



<p>The system is a new way of building and deploying datacenter infrastructure, changing how we manage and monitor water supply and demand, using a combination of industrial design, IoT sensors and AI innovation, specifically in our smart racks, advanced cooling distribution units (CDUs) and intelligent dry coolers.</p>



<p>Smart Datacenter delivers a reduction in power consumption of up to 50% across the entire cooling loop, from server water blocks to dry coolers, and consumes 30% less water compared to OVHcloud’s earliest design, driving major sustainability benefits. The system also uses complex mathematical models capturing detailed rack-level and environmental data to optimize cooling performance in real time. Furthermore, all operational data is fed into a centralized data lake, enabling cutting-edge artificial intelligence to predict, adapt, and enhance system efficiency and reliability.</p>



<h2 class="wp-block-heading">Let’s get into the detail.</h2>



<p>The system has three main components:</p>



<ol start="1" class="wp-block-list">
<li><strong>Smart Racks: </strong>These are designed with an innovative hydraulic “pull” architecture, where each rack autonomously draws exactly the water flow, pressure, and temperature it needs, dynamically adapting to server load and performance.</li>



<li><strong>Advanced Cooling Distribution Unit (CDU): </strong>This is a compact, next-generation primary loop unit that autonomously balances flow and pressure across all racks without manual intervention or any electrical communication. It uses only hydraulic signals (pressure, flow and temperature of water) to “understand” rack demands and continuously optimizes operation for lowest power consumption and extended pump lifespan.</li>



<li><strong>Intelligent Dry Cooler: </strong>This is operated seamlessly by the CDU, eliminating the need for separate control systems (“brains”) on both the dry cooler and the CDU. This unified control architecture ensures optimized, coordinated performance across the entire cooling infrastructure.</li>
</ol>



<p>OVHcloud’s new Single-Circuit System (SCS) replaces the previous Dual-Circuit System cooling architecture (DCS), which consisted of a primary facility loop and a secondary in-rack loop separated by an in-rack Coolant Distribution Unit (CDU), installed inline directly after the rear door heat exchangers (RDHX), as shown in Figure 1. The CDU housed multiple pumps, several plate heat exchangers (PHEX), and a network of valves and sensors.</p>



<figure class="wp-block-video aligncenter"><video height="1080" style="aspect-ratio: 1920 / 1080;" width="1920" controls src="https://blog.ovhcloud.com/wp-content/uploads/2025/12/OVH-cooling-loop.mp4"></video></figure>



<p>Figure 1. Dual-Circuit System cooling architecture (DCS) vs Single-Circuit system (SCS).</p>



<p>That previous design maintained turbulent flow through water blocks (WBs) using the in-rack CDU to regulate flow and temperature differences, ensuring performance despite OVHcloud’s ΔT of 20 K on the primary loop (far higher than the typical market value around 5 K).</p>



<p>Removing the in-rack CDU — replaced by a Pressure Independent Control Valve (PICV), a flow meter, and two temperature sensors on each rack — simplifies the system to a single closed-loop, where the flow rate through servers is dictated directly by the primary loop, adapting dynamically to rack load density. On the rack side, the system adapts the exact flow the rack requires by analyzing the water behavior and performing iterative, predictive thermal optimization considering IT components and the supplied water temperature and flow. This results in lower inlet water temperatures at the server level due to the elimination of the in-rack CDU’s approach temperature difference, and reduces electrical consumption, CAPEX, carbon footprint, and rack footprint.</p>



<p>To prevent laminar flow and maintain heat transfer efficiency at low flow rates, OVHcloud introduced a passive hydraulic innovation by arranging servers into clusters connected in series with servers inside each cluster connected in parallel, rather than all servers in parallel. This ensures higher water flow through individual servers even when the rack density is low. While this increases system pressure drops depending on cluster configuration, it results in better thermal performance and all servers receive water at temperatures equal to or lower than in the previous DCS design.</p>



<p>The racks operate on a novel hydraulic “pull” principle — where each rack draws exactly the hydraulic power it requires, rather than being pushed by the system. The &nbsp;CDU then dynamically adapts the overall hydraulic performance of the primary loop, balancing flow and pressure in real time to match the actual demand of the entire data center.</p>



<p>A key breakthrough is the CDU’s communication-free operation: it requires no cables, radio waves, or other electronic communication with racks. Instead, it analyzes hydraulic signals — pressure, flow, and temperature fluctuations within the water itself — to understand each rack’s cooling needs and adapt accordingly. This eliminates complex telemetry infrastructure, reduces operational risks, and enhances system reliability. To ensure water quality and system longevity, water supplied to the data center is filtered at 25 microns, and multiple sophisticated high-precision sensors continuously monitors water quality in real time.</p>



<p>The CDU is 50% smaller than the previous generation and manages the entire thermal path — from chip-level water blocks, through the racks and CDU, to the dry coolers.</p>



<p>The newly designed dry cooler is also 50% smaller than the previous model and features one of the lowest density footprints worldwide. Thanks to years of thermal studies on heat exchangers by the OVHcloud R&amp;D team, it has 50% fewer fans, resulting in very low energy consumption, while also reducing noise. Its compact size means that we can also transport more units in the same truck!  This design achieves a 30% reduction in water consumption compared to OVHcloud’s earliest dry cooler design. A key innovation in the dry cooler is its advanced adiabatic cooling pads system, which cools incoming hot air before it passes through the heat exchangers. This high-precision water injection system is the first of its kind, and adjusts water application based on multiple sensors and extensive iterative calculations, including data center load, ambient temperature, and humidity levels.</p>



<p>Unlike traditional adiabatic systems, the pads’ system does not use a conventional recirculation loop. Instead, water is injected when needed onto the pads via a simple setup consisting of a solenoid valve and a flow meter, eliminating complex hydraulics such as pumps, filters, storage tanks, level sensors, and conductivity sensors. The system maintains water quality and physical/chemical properties through careful design, drastically simplifying operation and reducing maintenance needs.</p>



<p>The CDU continuously analyzes data from up to 36 sensors distributed across the CDU itself and the associated dry cooler. It also collects operational data from solenoid valves, pumps, and dry cooler fans across the infrastructure loop. All components are monitored and managed by the system’s central intelligence—the CDU’s control panel—providing a comprehensive understanding of the entire system’s behavior, from the data center interior to the external ambient environment, ensuring real-time performance oversight and precise thermal regulation.</p>



<p>Through this iterative and precise control of water injection, the system optimizes cooling performance and Water Usage Effectiveness (WUE), ensuring minimal water consumption without sacrificing thermal effectiveness.</p>



<h2 class="wp-block-heading"><strong>Advanced System Analytics, Learning &amp; AI Integration</strong></h2>



<p>The entire system is designed to continuously analyze the thermal, hydraulic, and aerodynamic behaviors of the various fluids along the cooling path. It uses daily operational data to learn and adapt its performance dynamically, optimizing cooling efficiency and reliability over time.</p>



<p>The CDU’s brain—the control panel—aggregates data from 36 sensors distributed across the CDU and dry cooler, as well as operational data from solenoid valves, pumps, and dry cooler fans within the infrastructure loop. It also collects critical rack-level information, including flow rates, temperatures, and IPMI data that reflect IT equipment behavior and performance. All this operational data is pushed to a centralized data lake for parallel analysis, which forms the foundation for the next step: integrating cutting-edge artificial intelligence (AI). This AI will leverage the continuously gathered data and learning processes to enhance predictive capabilities, optimize future operating points, and enable fully autonomous decision-making.</p>



<p>This combination of real-time learning and AI-powered analytics will provide advanced diagnostics, predictive maintenance, and proactive management — maximizing uptime, reducing costs, and driving ever-greater sustainability.</p>



<h2 class="wp-block-heading"><strong>Iterative Control System Innovation</strong></h2>



<p>The iterative control system manages all aspects in real time, hands-free, continuously learning from sensor data and operational feedback. It applies algorithms to the pump speed on the CDU, the fans on the dry cooler and the solenoid valve controlling water injection on the adiabatic pads.</p>



<p>On the rack side, the system uses a PICV valve, flow meter, and two temperature sensors to adapt the exact hydraulic flow needed by each rack, considering IT load and incoming water conditions, iteratively optimizing thermal performance and energy efficiency.</p>



<p>On the CDU, the system analyzes combined hydraulic signals from all racks alongside ambient data center conditions, dynamically balancing flow and pressure across the entire data center infrastructure without human intervention.</p>



<p>Furthermore, OVHcloud’s cooling system integrates intelligent communication between cooling line-ups to enhance failure detection and simplify maintenance. This is achieved through embedded freeze-gaud and resilience-switch mechanisms that ensure continuous operation and system resilience. The freeze-gaud system is designed to protect the dry coolers in sub-zero ambient conditions by keeping water circulating through their heat exchangers. If the overall loop flow drops below a predefined threshold, the system automatically opens a normally closed bypass valve to maintain circulation—preventing freezing despite the use of pure water (without glycol) as the cooling medium. The resilience-switch system maintains redundancy by hydraulically linking multiple cooling lines. In the event of failure or overload on one line, normally open solenoid valves isolate the affected line, while bypass valves on neighboring lines open to redistribute water flow and maintain cooling performance. This dynamic and autonomous valve management ensures uninterrupted service and rapid fault response.</p>



<p>Drawing inspiration from autonomous control methodologies in leading-edge industries, the system predicts future behavior based on iterative calculations, dynamically adapting pump speed, fans speed and solenoid valves openings to converge rapidly on optimal operating points. It also adjusts performance based on external constraints such as noise limits, water availability, or energy costs — for example, consuming more energy to save water in water-stressed regions or balancing noise restrictions in urban deployments.</p>



<p>This unique, self-optimizing end-to-end control system maximizes energy efficiency, sustainability, and operational simplicity, extending pump life cycles and ensuring the most environmentally responsible data center cooling solution available today.</p>



<p>This vertically integrated, autonomous system — including smart racks, the advanced CDU, and the intelligent dry cooler — represents a world-first in end-to-end, intelligent, sustainable, communication-free, and data-driven data center cooling.</p>



<h2 class="wp-block-heading"><strong>Why is this important?</strong></h2>



<p>This innovation is critical because it marks a decisive step toward radically more sustainable, efficient, and autonomous data center cooling — addressing the growing demands of digital infrastructure while reducing its environmental footprint.</p>



<p>By using fewer, smaller components, we are saving power, cutting transport costs and reducing carbon impact. Using fewer fans on the dry cooler means up to 50% lower energy consumption on the cooling cycle – and the new pad system means 30% lower water consumption in the cooling system. The system is fully autonomous, avoiding human error. A temperature gradient of 20K on the primary loop – four times higher than the industry average – means that flow rates can be lower and water efficiency is higher. The system doesn’t rely on Wi-Fi or cabling, and the predictive control constantly adapts to external conditions or situational goals, feeding into a data lake to help continuously optimize performance.</p>



<p>Today’s world is built on technology, and datacenters are a key part of that technology, but there is a pressing need to ensure we can maintain human progress without incurring a significant carbon footprint. Power and water efficiency is a key part of this equation in the datacenter industry, and our innovation in the Smart Datacenter continues our trajectory of supporting today’s needs without compromising the world of tomorrow.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="575" src="https://blog.ovhcloud.com/wp-content/uploads/2025/12/image-6-1024x575.png" alt="" class="wp-image-30116" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/12/image-6-1024x575.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/12/image-6-300x169.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/12/image-6-768x432.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/12/image-6.png 1502w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p></p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Findustrial-excellence-meets-artificial-intelligence-behind-the-scenes-with-smart-datacenter%2F&amp;action_name=Industrial%20Excellence%20meets%20Artificial%20Intelligence%3A%20Behind%20the%20Scenes%20with%20Smart%20Datacenter&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		<enclosure url="https://blog.ovhcloud.com/wp-content/uploads/2025/12/OVH-cooling-loop.mp4" length="4050958" type="video/mp4" />

			</item>
		<item>
		<title>Manage your secrets using OVHcloud Secret Manager with External Secrets Operator (ESO) on OVHcloud Managed Kubernetes Service (MKS)</title>
		<link>https://blog.ovhcloud.com/manage-your-secrets-through-ovhcloud-secret-manager-thanks-to-external-secrets-operator-eso-on-ovhcloud-managed-kubernetes-service-mks/</link>
		
		<dc:creator><![CDATA[Aurélie Vache]]></dc:creator>
		<pubDate>Tue, 25 Nov 2025 14:44:52 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[Tranches de Tech & co]]></category>
		<category><![CDATA[IAM]]></category>
		<category><![CDATA[Kubernetes]]></category>
		<category><![CDATA[MKS]]></category>
		<category><![CDATA[Public Cloud]]></category>
		<category><![CDATA[Secret Manager]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=29374</guid>

					<description><![CDATA[Secrets resources in Kubernetes help us keep sensitive information like logins, passwords, tokens, credentials and certificates secure. But just a heads up: Secrets in Kubernetes are base64 encoded, not encrypted so anyone can read and decode them if they know how. The good news is that OVHcloud has just launched the Secret Manager Beta, which [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fmanage-your-secrets-through-ovhcloud-secret-manager-thanks-to-external-secrets-operator-eso-on-ovhcloud-managed-kubernetes-service-mks%2F&amp;action_name=Manage%20your%20secrets%20using%20OVHcloud%20Secret%20Manager%20with%20External%20Secrets%20Operator%20%28ESO%29%20on%20OVHcloud%20Managed%20Kubernetes%20Service%20%28MKS%29&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" width="1024" height="675" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/IMG_1547-1-1024x675.jpg" alt="" class="wp-image-30006" style="width:638px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/IMG_1547-1-1024x675.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/IMG_1547-1-300x198.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/IMG_1547-1-768x507.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/IMG_1547-1.jpg 1536w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Secrets resources in Kubernetes help us keep sensitive information like logins, passwords, tokens, credentials and certificates secure. But just a heads up: Secrets in Kubernetes are base64 encoded, not encrypted so anyone can read and decode them if they know how.</p>



<p>The good news is that OVHcloud has just launched the<a href="https://www.ovhcloud.com/fr/identity-security-operations/secret-manager/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"> Secret Manager</a> Beta, which you can use within your Kubernetes clusters via the External Secrets Operator (ESO) 🎉.</p>



<h2 class="wp-block-heading">External Secrets Operator</h2>



<p>The External Secrets Operator (ESO) extends Kubernetes with Custom Resource Definitions (CRDs) ) that define <strong>where</strong> secrets are and <strong>how</strong> to sync them.</p>



<p>The controller <strong>retrieves secrets from an external API</strong> and <strong>creates Kubernetes Secrets</strong>. If the secret changes in the external API, the controller updates the secret in the Kubernetes cluster.</p>



<p>Basically, the ESO can connect to an external Secret Manager like OVHcloud, Vault, AWS, or GCP using a (Cluster)SecretStore, and an ExternalSecret to figure out which Secret it needs to fetch. It then creates a Secret in the Kubernetes cluster with the fetched secret’s value.</p>



<figure class="wp-block-image aligncenter size-full is-resized"><img loading="lazy" decoding="async" width="1020" height="942" src="https://blog.ovhcloud.com/wp-content/uploads/2025/07/image-10.png" alt="" class="wp-image-29378" style="width:435px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/07/image-10.png 1020w, https://blog.ovhcloud.com/wp-content/uploads/2025/07/image-10-300x277.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/07/image-10-768x709.png 768w" sizes="auto, (max-width: 1020px) 100vw, 1020px" /></figure>



<p>Plus, it can sync secrets across all the namespaces in your Kubernetes cluster (I love this feature ❤️):</p>



<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" width="1024" height="577" src="https://blog.ovhcloud.com/wp-content/uploads/2025/07/image-11-1024x577.png" alt="" class="wp-image-29380" style="width:502px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/07/image-11-1024x577.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/07/image-11-300x169.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/07/image-11-768x433.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/07/image-11.png 1282w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>You can use External Secrets with different<a href="https://external-secrets.io/latest/provider/aws-secrets-manager/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"> Providers</a>, including AWS Secrets Manager, HashiCorp Vault, Google Secret Manager. In this blog I’ll show you how to create a secret in the new OVHcloud Secret Manager using<a href="https://external-secrets.io/latest/provider/hashicorp-vault/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"> Hashicorp Vault</a>.</p>



<p>For more details, read the<a href="https://external-secrets.io/v0.8.5/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"> ESO official documentation</a>.</p>



<h2 class="wp-block-heading">Let&#8217;s jump in!</h2>



<h3 class="wp-block-heading">Create an IAM local user</h3>



<p>To fetch secrets in Secret Manager, you’ll need an IAM user with the right permissions. You can either set it up or use an existing one.</p>



<p>In the<a href="https://www.ovh.com/manager" data-wpel-link="exclude"> OVHcloud Control Panel</a> (UI), go to ‘Identity and Access Management’, then ‘Identities’.</p>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" width="760" height="636" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/identity.png" alt="" class="wp-image-29967" style="width:232px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/identity.png 760w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/identity-300x251.png 300w" sizes="auto, (max-width: 760px) 100vw, 760px" /></figure>



<p>Click the ‘Add user’ button to create an IAM local user and complete the fields as shown below:</p>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" width="1024" height="907" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-9-2-1024x907.png" alt="" class="wp-image-29994" style="width:561px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-9-2-1024x907.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-9-2-300x266.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-9-2-768x681.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-9-2.png 1194w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" width="1024" height="473" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-10-1-1024x473.png" alt="" class="wp-image-29995" style="width:560px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-10-1-1024x473.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-10-1-300x139.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-10-1-768x355.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-10-1.png 1194w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Quick note, I’ve named the user ‘secretmanager-’ followed by the ID of the OKMS domain I want to use.</p>



<p>The user needs to be an ADMIN, or, ideally, have the following policies:</p>



<pre class="wp-block-code"><code class="">okms:apikms:secret/create<br>okms:apikms:secret/version/getData<br>okms:apiovh:secret/get</code></pre>



<h3 class="wp-block-heading">Get the Personal Access Token (PAT)</h3>



<p>The ESO ClusterSecretStore needs the permission to fetch secrets from Secret Manager, so you’ll need a token (PAT).</p>



<p>You can access it via our API, which you’ll find here: <a href="https://eu.api.ovh.com/console/?section=%2Fme&amp;branch=v1#post-/me/identity/user/-user-/token" data-wpel-link="exclude">https://eu.api.ovh.com/console/?section=%2Fme&amp;branch=v1#post-/me/identity/user/-user-/token</a></p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="542" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1-3-1024x542.png" alt="" class="wp-image-29997" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1-3-1024x542.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1-3-300x159.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1-3-768x406.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1-3-1536x813.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1-3.png 1546w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p><strong>Path parameters</strong></p>



<p>user: secretmanager-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx</p>



<p><strong>Request body:</strong></p>



<pre class="wp-block-code"><code class="">{<br>  "description": "PAT secretmanager-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx",<br>  "name": "pat-secretmanager-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx"<br>}</code></pre>



<p>You should obtain a response like this:</p>



<pre class="wp-block-code"><code class="">{<br>  "creation": "2025-11-07T14:02:56.679157188Z",<br>  "description": "PAT secretmanager-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx",<br>  "expiresAt": null,<br>  "lastUsed": null,<br>  "name": "pat-secretmanager-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx",<br>  "token": "eyJhbGciOiJ...punpVAg"<br>}</code></pre>



<p>Save the token value, because you’ll need it in a bit.</p>



<h3 class="wp-block-heading">Create a secret in the Secret Manager</h3>



<p>Here’s how to create a secret with OVHcloud MPR credentials for use in Kubernetes cluster(s).</p>



<p>In the<a href="https://www.ovh.com/manager" data-wpel-link="exclude"> OVHcloud Control Panel</a> (UI), go to ‘Secret Manager’, then create a secret ‘prod/va1/dockerconfigjson’ in the Europe region (France – Paris) eu-west-par:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="309" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-5-1-1024x309.png" alt="" class="wp-image-29973" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-5-1-1024x309.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-5-1-300x91.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-5-1-768x232.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-5-1-1536x464.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-5-1-2048x618.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>You’ll need to activate the region if you’re selecting it for the first time:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="569" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/Capture-decran-2025-11-07-a-14.03.20-1024x569.png" alt="" class="wp-image-29911" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/Capture-decran-2025-11-07-a-14.03.20-1024x569.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/Capture-decran-2025-11-07-a-14.03.20-300x167.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/Capture-decran-2025-11-07-a-14.03.20-768x426.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/Capture-decran-2025-11-07-a-14.03.20-1536x853.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/Capture-decran-2025-11-07-a-14.03.20-2048x1137.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Select an OKMS domain:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="260" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-6-3-1024x260.png" alt="" class="wp-image-29996" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-6-3-1024x260.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-6-3-300x76.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-6-3-768x195.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-6-3.png 1384w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Enter the path and value of your secret. For example:</p>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" width="1024" height="708" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-7-1-1024x708.png" alt="" class="wp-image-29975" style="width:558px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-7-1-1024x708.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-7-1-300x208.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-7-1-768x531.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-7-1.png 1402w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Your secret is all set!</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="417" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-4-2-1024x417.png" alt="" class="wp-image-29990" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-4-2-1024x417.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-4-2-300x122.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-4-2-768x313.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-4-2-1536x625.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-4-2-2048x834.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<h3 class="wp-block-heading">Install External Secrets Operators on your cluster</h3>



<p>Deploy external secret through Helm:</p>



<pre class="wp-block-code"><code class="">helm repo add external-secrets https://charts.external-secrets.io
helm repo update</code></pre>



<p>Install from the chart repository:</p>



<pre class="wp-block-code"><code class="">helm install external-secrets \<br>   external-secrets/external-secrets \<br>    -n external-secrets \<br>    --create-namespace \<br>    --set installCRDs=true</code></pre>



<p>Your result should look something like this:</p>



<pre class="wp-block-code"><code class="">$ helm install external-secrets \<br>   external-secrets/external-secrets \<br>    -n external-secrets \<br>    --create-namespace \<br>    --set installCRDs=true<br><br>NAME: external-secrets<br>LAST DEPLOYED: Mon Nov 24 17:08:58 2025<br>NAMESPACE: external-secrets<br>STATUS: deployed<br>REVISION: 1<br>TEST SUITE: None<br>NOTES:<br>external-secrets has been deployed successfully in namespace external-secrets!<br><br>In order to begin using ExternalSecrets, you will need to set up a SecretStore<br>or ClusterSecretStore resource (for example, by creating a 'vault' SecretStore).<br><br>More information on the different types of SecretStores and how to configure them<br>can be found in our Github: https://github.com/external-secrets/external-secrets</code></pre>



<p>This command will install the External Secrets Operator in your cluster.</p>



<p>Check ESO is running:</p>



<pre class="wp-block-code"><code class="">$ kubectl get all -n external-secrets<br>NAME                                                    READY   STATUS    RESTARTS   AGE<br>pod/external-secrets-6b9f8ff5d4-jwd6g                   1/1     Running   0          25m<br>pod/external-secrets-cert-controller-7bf8fd894c-d24xb   1/1     Running   0          25m<br>pod/external-secrets-webhook-df488ddff-2xv4t            1/1     Running   0          25m<br><br>NAME                               TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)   AGE<br>service/external-secrets-webhook   ClusterIP   10.3.106.32   &lt;none&gt;        443/TCP   25m<br><br>NAME                                               READY   UP-TO-DATE   AVAILABLE   AGE<br>deployment.apps/external-secrets                   1/1     1            1           25m<br>deployment.apps/external-secrets-cert-controller   1/1     1            1           25m<br>deployment.apps/external-secrets-webhook           1/1     1            1           25m<br><br>NAME                                                          DESIRED   CURRENT   READY   AGE<br>replicaset.apps/external-secrets-6b9f8ff5d4                   1         1         1       25m<br>replicaset.apps/external-secrets-cert-controller-7bf8fd894c   1         1         1       25m<br>replicaset.apps/external-secrets-webhook-df488ddff            1         1         1       25m</code></pre>



<h3 class="wp-block-heading">Create a Secret contains the PAT</h3>



<p>Encode the PAT in base64:</p>



<pre class="wp-block-code"><code class="">$ echo -n "&lt;token&gt;" | base64<br><br>ZXlKaG...wVkFn</code></pre>



<p>Create a secret with it inside a <strong>secret.yaml</strong> file:</p>



<pre class="wp-block-code"><code class="">apiVersion: v1<br>kind: Secret<br>metadata:<br>  name: ovhcloud-vault-token<br>  namespace: external-secrets<br>data:<br>  token: ZXlKaG...wVkFn</code></pre>



<p>Apply the resource in your cluster:</p>



<pre class="wp-block-code"><code class="">kubectl apply -f secret.yaml</code></pre>



<p>Check that the secret have been created:</p>



<pre class="wp-block-code"><code class="">$ kubectl get secret ovhcloud-vault-token -n external-secrets<br>NAME                   TYPE     DATA   AGE<br>ovhcloud-vault-token   Opaque   1      5m</code></pre>



<h3 class="wp-block-heading">Deploy a ClusterSecretStore to connect ESO to Secret Manager</h3>



<p>Set up a ClusterSecretStore to manage synchronisation with Secret Manager.<br>It will use the HashiCorp Vault provider with token auth, and the OKMS endpoint as the backend.</p>



<p>Create a <strong>clustersecretstore.yaml</strong> file with the content below:</p>



<pre class="wp-block-code"><code class="">apiVersion: external-secrets.io/v1<br>kind: ClusterSecretStore<br>metadata:<br>  name: vault-secret-store<br>spec:<br>  provider:<br>      vault:<br>        server: "https://eu-west-par.okms.ovh.net/api/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" # OKMS endpoint, fill with the correct region and your okms_id<br>        path: "secret"<br>        version: "v2"<br>        auth:<br>            tokenSecretRef:<br>              name: ovhcloud-vault-token # The k8s secret that contain your PAT<br>              key: token</code></pre>



<p>Keep in mind, in our example, we’ve selected the “eu-west-par” region. You can enter a different server URL, depending on your desired region.</p>



<p>Apply it:</p>



<pre class="wp-block-code"><code class="">kubectl apply -f clustersecretstore.yaml</code></pre>



<p>Check:</p>



<pre class="wp-block-code"><code class="">$ kubectl get clustersecretstore.external-secrets.io/vault-secret-store<br>NAME                 AGE   STATUS   CAPABILITIES   READY<br>vault-secret-store   2m   Valid    ReadWrite      True</code></pre>



<h3 class="wp-block-heading">Create an ExternalSecret</h3>



<p>Create an <strong>externalsecret.yaml</strong> file with the content below:</p>



<pre class="wp-block-code"><code class="">apiVersion: external-secrets.io/v1<br>kind: ExternalSecret<br>metadata:<br>  name: docker-config-secret<br>  namespace: external-secrets<br>spec:<br>  refreshInterval: 30m<br>  secretStoreRef:<br>    name: vault-secret-store<br>    kind: ClusterSecretStore<br>  target:<br>    template:<br>      type: kubernetes.io/dockerconfigjson<br>      data:<br>        .dockerconfigjson: "{{ .mysecret | toString }}"<br>    name: ovhregistrycred<br>    creationPolicy: Owner<br>  data:<br>  - secretKey: mysecret<br>    remoteRef:<br>      key: prod/va1/dockerconfigjson</code></pre>



<p>Apply it:</p>



<pre class="wp-block-code"><code class="">$ kubectl apply -f externalsecret.yaml<br>externalsecret.external-secrets.io/docker-config-secret created</code></pre>



<p>Check:</p>



<pre class="wp-block-code"><code class="">$ kubectl get externalsecret.external-secrets.io/docker-config-secret -n external-secrets<br>NAME                   STORETYPE            STORE                REFRESH INTERVAL   STATUS         READY<br>docker-config-secret   ClusterSecretStore   vault-secret-store   30m0s              SecretSynced   True</code></pre>



<p>After applying this command, it will create a Kubernetes Secret object.</p>



<pre class="wp-block-code"><code class="">$ kubectl get secret -n external-secrets<br>NAME                                     TYPE                             DATA   AGE<br>...<br>ovhregistrycred                          kubernetes.io/dockerconfigjson   1      17d<br>...</code></pre>



<p>As you can see, the Secret is ready, and you can now use it as an imagePullSecret in your Pods!</p>



<h3 class="wp-block-heading">Conclusion</h3>



<p>In this blog, we’ve explained how to create secrets in the new OVHcloud Secret Manager and integrate them directly in your Kubernetes clusters using the ESO Vault provider.</p>



<p>And here’s some great news: our teams are working on an OVHcloud External Secret Operator, set to go live in the coming months, which you can use 🎉.</p>



<p>Stay tuned and share your thoughts!</p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fmanage-your-secrets-through-ovhcloud-secret-manager-thanks-to-external-secrets-operator-eso-on-ovhcloud-managed-kubernetes-service-mks%2F&amp;action_name=Manage%20your%20secrets%20using%20OVHcloud%20Secret%20Manager%20with%20External%20Secrets%20Operator%20%28ESO%29%20on%20OVHcloud%20Managed%20Kubernetes%20Service%20%28MKS%29&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>OVHcloud backbone network: Environmental impact assessment methodology</title>
		<link>https://blog.ovhcloud.com/ovhcloud-backbone-network-environmental-impact-assessment-methodology/</link>
		
		<dc:creator><![CDATA[Gregory Lebourg]]></dc:creator>
		<pubDate>Fri, 10 Oct 2025 08:07:42 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[Datacenters & network]]></category>
		<category><![CDATA[OVHcloud]]></category>
		<category><![CDATA[Sustainability]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=29671</guid>

					<description><![CDATA[Introduction The underlying infrastructure of OVHcloud’s Cloud services consists of datacentres connected by a global telecommunication network which carries data to and from end users. The core network (backbone) features nodes (also known as Points of Presence &#8211; PoPs) and long-distance/metropolitan spans (also known as links) which connect the nodes. The PoPs are located in [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fovhcloud-backbone-network-environmental-impact-assessment-methodology%2F&amp;action_name=OVHcloud%20backbone%20network%3A%20Environmental%20impact%20assessment%20methodology&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="684" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/6_10_our-backbone_usa-and-canada-1024x684.webp" alt="" class="wp-image-29677" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/6_10_our-backbone_usa-and-canada-1024x684.webp 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/6_10_our-backbone_usa-and-canada-300x200.webp 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/6_10_our-backbone_usa-and-canada-768x513.webp 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/6_10_our-backbone_usa-and-canada.webp 1200w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="683" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Europe_OK-1024x683.webp" alt="" class="wp-image-29679" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Europe_OK-1024x683.webp 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Europe_OK-300x200.webp 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Europe_OK-768x512.webp 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Europe_OK.webp 1200w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="684" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/6_11_our-backbone_apac-1024x684.webp" alt="" class="wp-image-29678" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/6_11_our-backbone_apac-1024x684.webp 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/6_11_our-backbone_apac-300x200.webp 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/6_11_our-backbone_apac-768x513.webp 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/6_11_our-backbone_apac.webp 1200w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>
</div>
</div>



<h2 class="wp-block-heading"><strong>Introduction</strong><strong></strong></h2>



<p>The underlying infrastructure of OVHcloud’s Cloud services consists of datacentres connected by a global telecommunication network which carries data to and from end users.</p>



<p>The core network (<strong>backbone</strong>) features nodes (also known as Points of Presence &#8211; <strong>PoPs</strong>) and long-distance/metropolitan spans (also known as <strong>links</strong>) which connect the nodes.</p>



<p>The PoPs are located in colocation facilities hosting optical transmission systems (Dense Wavelength Division Multiplexers &#8211; DWDM), IP routers, switches and servers.</p>



<p>The links are based on optical fibre routes interconnecting the PoPs and the datacentres, following a topology designed to maintain traffic in the event of multiple span cuts (for resiliency purposes). Two operating models are used:</p>



<ul class="wp-block-list">
<li><strong>Operating model 1:</strong> OVHcloud owns the <strong>dark fibre cable</strong> or rents <strong>a pair of dark fibre cables</strong> (through long-term Indefeasible Right of Use – IRU) and operates its own DWDM systems to activate <strong>wavelengths </strong>(10, 100, 400 Gbps point-to-point transmission signals) on top of it.</li>



<li><strong>Operating model 2:</strong> OVHcloud leases the wavelengths activated by long-distance telecommunication operators on their international network (carrier’s carrier market).</li>
</ul>



<p>In this study, <strong>terrestrial</strong> and <strong>submarine </strong>transmission infrastructures are differentiated as their physical realities are dissimilar.</p>



<h2 class="wp-block-heading"><strong>Scope of the study</strong><strong></strong></h2>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="750" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/image-11-1024x750.png" alt="" class="wp-image-29704" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/image-11-1024x750.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/image-11-300x220.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/image-11-768x563.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/image-11.png 1302w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Items covered by the study:</p>



<ul class="wp-block-list">
<li>POPs (colocation buildings and their technical environment)</li>



<li>Fibre cables and their underlying infrastructure</li>



<li>All active telecommunication equipment hosted in the PoPs as well as the line equipment hosted in amplification sites.</li>
</ul>



<p>Items excluded by the model:</p>



<ul class="wp-block-list">
<li>OVHcloud datacentres network equipment (accounted for in OVHcloud datacentres’ impact inventory)</li>



<li>ISP (Internet Service Providers) network as well as customer premises equipment.</li>
</ul>



<p><strong><br></strong></p>



<h2 class="wp-block-heading"><strong>Environmental impact of the PoPs</strong><strong></strong></h2>



<ol class="wp-block-list">
<li><strong><em>Electricity consumption (use impact)</em></strong></li>
</ol>



<p>OVHcloud measures the electrical consumption of the PoP’s equipment, and of the technical environment of the colocation facility in which it is hosted. All the ancillary systems are taken into account, therefore the PUE (Power Usage Effectiveness) of the sites is de facto included in the model.</p>



<p>The impact factors (per kWh of electricity) are retrieved from the <a href="https://ecoinvent.org/database/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong>Ecoinvent</strong> </a>database.</p>



<ul class="wp-block-list">
<li><strong><em>Network equipment (manufacturing / distribution / end of life impacts)</em></strong></li>
</ul>



<p>OVHcloud reviews all the equipment deployed inside each PoP, including checking their commissioning date (for amortisation purposes).</p>



<p>The equipment reference is used to retrieve the impact factors from the <strong>Negaoctet and Resilio v2024.6</strong> databases (amortised over six years). Should the exact model not be found, a similar generic reference is chosen.</p>



<ul class="wp-block-list">
<li><strong><em>Facilities technical environment (<a>manufacturing / distribution / use/end of life impacts</a>)</em></strong></li>
</ul>



<p>Based on the electricity consumption of the technical environment of the colocation facility, the impact factors of each PoP (per contracted kW per year per kWh of electricity) are retrieved from the <strong><a href="https://librairie.ademe.fr/industrie-et-production-durable/7111-evaluation-of-the-environmental-footprint-of-internet-service-provisioning-in-france.html" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Ademe PCR datacentre and cloud</a> </strong>database.</p>



<h2 class="wp-block-heading"><strong>Environmental impact of the terrestrial links</strong><strong></strong></h2>



<ol class="wp-block-list">
<li><strong><em>Optical fibre cable (manufacturing / distribution / end of life) impacts</em></strong></li>
</ol>



<p><strong>Operating model 1:</strong> OVHcloud manages the transmission layer<a>.</a></p>



<p>The optical fibre cable related impacts are calculated by allocating two strands out of a 288-strand cable. The allocation is then corrected to reflect the 25-year ramp-up before reaching 100% usage of the cable over a lifespan of 60 years. This leads to 0.88% of allocation.</p>



<p>The impact factors (per km) are retrieved from the <strong>Ecoinvent </strong>database.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="614" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/image-12-1024x614.png" alt="" class="wp-image-29705" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/image-12-1024x614.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/image-12-300x180.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/image-12-768x461.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/image-12.png 1280w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p><strong>Operating model 2:</strong> OVHcloud leases a wavelength to a carrier’s carrier.</p>



<p>The assumption is that the carrier follows the same optical fibre cable allocation rules (0.88%). In addition, the impact is pro-rated considering the ramping up of the carrier’s optical systems (leading to 4.32% of 0.88% of allocation):</p>



<ul class="wp-block-list">
<li>Maximum number of channels per DWDM systems: 48</li>



<li>Maximum load rate of DWDM systems: 85%</li>



<li>Six-year ramp up to reach maximum load rate of the DWDM system</li>



<li>DWDM system lifespan: 8 years</li>
</ul>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="562" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/image-13-1024x562.png" alt="" class="wp-image-29706" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/image-13-1024x562.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/image-13-300x165.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/image-13-768x421.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/image-13.png 1425w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>In both operating models, 10% of the civil work (trench, ducts) impacts necessary to lay the optical fibre cables are allocated.</p>



<ul class="wp-block-list">
<li><strong><em>Line optical systems (manufacturing / distribution / use/end of life impacts)</em></strong></li>
</ul>



<p><strong>Operating model 1: </strong>OVHcloud manages the transmission layer, therefore amplifier types and locations are known. (De)Multiplexers are excluded as they are already accounted for in the environmental impact of the PoPs (see previous section).</p>



<p><strong>Operating model 2:</strong> OVHcloud leases a wavelength to a carrier’s carrier. The assumptions are as follows:</p>



<ul class="wp-block-list">
<li>(De)Multiplexers and repeaters are chosen using standard equipment</li>



<li>Typical distance between two regenerator sites: 500 km</li>



<li>Typical distance between two repeater sites: 90 km</li>



<li>Maximum number of channels per DWDM systems: 48</li>



<li>Maximum load rate of DWDM systems: 85%</li>



<li>Six-year ramp up to reach maximum load rate of the DWDM system</li>



<li>DWDM system lifespan: 8 years</li>
</ul>



<p>Once the optical equipment mapping has been done, the emissions factors for each link are then retrieved from the <strong><a href="https://base-empreinte.ademe.fr" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Ademe ISP &#8211; Negaoctet</a> </strong>database.</p>



<h2 class="wp-block-heading"><strong>Environmental impact of the submarine links</strong><strong></strong></h2>



<figure class="wp-block-image aligncenter size-full"><img loading="lazy" decoding="async" width="570" height="337" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/1.png" alt="" class="wp-image-29684" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/1.png 570w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/1-300x177.png 300w" sizes="auto, (max-width: 570px) 100vw, 570px" /><figcaption class="wp-element-caption">Source: <a href="https://www.congress.gov/crs-product/R47237" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">https://www.congress.gov/crs-product/R47237</a></figcaption></figure>



<p><strong><em>Submarine cable system (manufacturing / distribution / maintenance / end of life impacts) and electricity (use impact)</em></strong></p>



<p><strong>Operating model 2:</strong> OVHcloud leases a wavelength to a carrier’s carrier from PoP to PoP.</p>



<p>The climate change impact is retrieved from the <strong>2025 Lisbon Suboptic</strong> study for both transatlantic and transpacific cable systems (per km and per year).</p>



<p>For the other impact factors (abiotic and water resources use), the submarine cable systems are modelled based on the following assumptions:</p>



<ul class="wp-block-list">
<li>(De)Multiplexers and repeaters are chosen using standard terrestrial equipment</li>



<li>Cable is considered as a standard MV electrical cable</li>



<li>Typical distance between two landing stations: 7000 km</li>



<li>Typical distance between two repeaters: 80 km</li>



<li>Maximum capacity: 200 Tbps for Transpacific / 350 Tbps for Transatlantic</li>



<li>Maximum load rate of the DWDM systems: 100%</li>



<li>Ten-year ramp up to reach Maximum load rate of the DWDM system</li>



<li>Submarine cable system life span: 25 years</li>
</ul>



<p><strong>Results (per year)</strong><strong></strong></p>



<figure class="wp-block-table"><table class="has-fixed-layout"><tbody><tr><td class="has-text-align-left" data-align="left"><strong>Impact ADPe</strong></td><td>58 kg Sb eq.</td></tr><tr><td class="has-text-align-left" data-align="left"><strong>Impact GWP</strong></td><td>3700 tons CO2 eq.</td></tr><tr><td class="has-text-align-left" data-align="left"><strong>Impact WU</strong></td><td>2.0E+06 m3 eq.</td></tr></tbody></table></figure>



<figure class="wp-block-table"><table class="has-fixed-layout"><tbody><tr><td></td><td class="has-text-align-center" data-align="center"><strong>Resource use, minerals and metals</strong></td><td class="has-text-align-center" data-align="center"><strong>Climate change</strong></td><td class="has-text-align-center" data-align="center"><strong>Water use</strong></td></tr><tr><td></td><td class="has-text-align-center" data-align="center"><strong>ADPe (kg Sb eq.)</strong></td><td class="has-text-align-center" data-align="center"><strong>GWP (kg CO2 eq.)</strong></td><td class="has-text-align-center" data-align="center"><strong>WU (m3 eq.)</strong></td></tr><tr><td>POPs</td><td class="has-text-align-center" data-align="center">3.75E+01</td><td class="has-text-align-center" data-align="center">2.42E+06</td><td class="has-text-align-center" data-align="center">1.25E+06</td></tr><tr><td>Owned Fibers</td><td class="has-text-align-center" data-align="center">1.02E+01</td><td class="has-text-align-center" data-align="center">8.10E+05</td><td class="has-text-align-center" data-align="center">5.87E+05</td></tr><tr><td>Leased Capacity</td><td class="has-text-align-center" data-align="center">1.05E+01</td><td class="has-text-align-center" data-align="center">4.73E+05</td><td class="has-text-align-center" data-align="center">1.67E+05</td></tr><tr><td></td><td class="has-text-align-center" data-align="center"><strong>5.81E+01</strong></td><td class="has-text-align-center" data-align="center"><strong>3.70E+06</strong></td><td class="has-text-align-center" data-align="center"><strong>2.00E+06</strong></td></tr></tbody></table></figure>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="581" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/image-8-1024x581.png" alt="" class="wp-image-29689" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/image-8-1024x581.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/image-8-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/image-8-768x436.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/image-8.png 1114w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<figure class="wp-block-table"><table class="has-fixed-layout"><tbody><tr><td></td><td class="has-text-align-center" data-align="center"><strong>Resource use, minerals and metals</strong></td><td class="has-text-align-center" data-align="center"><strong>Climate change</strong></td><td class="has-text-align-center" data-align="center"><strong>Water use</strong></td></tr><tr><td></td><td class="has-text-align-center" data-align="center"><strong>ADPe (kg Sb eq.)</strong></td><td class="has-text-align-center" data-align="center"><strong>GWP (kg CO2 eq.)</strong></td><td class="has-text-align-center" data-align="center"><strong>WU (m3 eq.)</strong></td></tr><tr><td>Manufacturing / Distribution</td><td class="has-text-align-center" data-align="center">2.35E+01</td><td class="has-text-align-center" data-align="center">1.39E+06</td><td class="has-text-align-center" data-align="center">2.51E+05</td></tr><tr><td>Use</td><td class="has-text-align-center" data-align="center">3.46E+01</td><td class="has-text-align-center" data-align="center">2.29E+06</td><td class="has-text-align-center" data-align="center">1.74E+06</td></tr><tr><td>End of Life</td><td class="has-text-align-center" data-align="center">2.77E-03</td><td class="has-text-align-center" data-align="center">2.31E+04</td><td class="has-text-align-center" data-align="center">1.07E+04</td></tr><tr><td></td><td class="has-text-align-center" data-align="center"><strong>5.81E+01</strong></td><td class="has-text-align-center" data-align="center"><strong>3.70E+06</strong></td><td class="has-text-align-center" data-align="center"><strong>2.00E+06</strong></td></tr></tbody></table></figure>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="634" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/image-9-1024x634.png" alt="" class="wp-image-29690" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/image-9-1024x634.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/image-9-300x186.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/image-9-768x476.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/image-9.png 1114w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<h2 class="wp-block-heading">Conclusion</h2>



<p>The above presented methodology allows to assess accurately the environmental impact of our backbone with a multi-factorial approach covering GHG emissions, water consumption and abiotic ressources usage. On the carbon emissions side, the findings are that the previous methodology we used (based on a <a href="https://hal.science/hal-04807445v1/file/presentation_paper161_slides_rev2389_20220505_125946.pdf" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Renater research paper</a>) in our GHG emissions reporting was overestimating OVHcloud backbone impact by a factor 2.</p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fovhcloud-backbone-network-environmental-impact-assessment-methodology%2F&amp;action_name=OVHcloud%20backbone%20network%3A%20Environmental%20impact%20assessment%20methodology&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Create a podcast transcript with Whisper by AI Endpoints</title>
		<link>https://blog.ovhcloud.com/create-a-podcast-transcript-with-whisper-by-ai-endpoints/</link>
		
		<dc:creator><![CDATA[Stéphane Philippart]]></dc:creator>
		<pubDate>Thu, 28 Aug 2025 07:03:04 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[Tranches de Tech & co]]></category>
		<category><![CDATA[AI Endpoints]]></category>
		<category><![CDATA[Audio]]></category>
		<category><![CDATA[OVHcloud]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=29389</guid>

					<description><![CDATA[Check out this blog post if you want to know more about AI Endpoints.You can also find more info on AI Endpoints in our previous blog posts. This blog post explains how to create a podcast transcript using Whisper, a powerful automatic speech recognition (ASR) system developed by OpenAI. Whisper integrates with AI Endpoints and [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fcreate-a-podcast-transcript-with-whisper-by-ai-endpoints%2F&amp;action_name=Create%20a%20podcast%20transcript%20with%20Whisper%20by%20AI%20Endpoints&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<figure class="wp-block-image aligncenter size-full is-resized"><img loading="lazy" decoding="async" width="1024" height="1024" src="https://blog.ovhcloud.com/wp-content/uploads/2025/07/red-cat-02.png" alt="A robot listening a podcast" class="wp-image-29401" style="width:640px" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/07/red-cat-02.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/07/red-cat-02-300x300.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/07/red-cat-02-150x150.png 150w, https://blog.ovhcloud.com/wp-content/uploads/2025/07/red-cat-02-768x768.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/07/red-cat-02-70x70.png 70w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p>Check out this<a href="https://blog.ovhcloud.com/enhance-your-applications-with-ai-endpoints/" data-wpel-link="internal"> blog post</a> if you want to know more about AI Endpoints.<br>You can also find more info on <a href="https://endpoints.ai.cloud.ovh.net" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Endpoints</a> in our <a href="https://blog.ovhcloud.com/tag/ai-endpoints/" data-wpel-link="internal">previous blog posts</a>.</p>



<p>This blog post explains how to create a podcast transcript using Whisper, a powerful automatic speech recognition (ASR) system developed by OpenAI. Whisper integrates with <a href="https://endpoints.ai.cloud.ovh.net/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Endpoints</a> and makes it easy to transcribe audio files and add features, like speaker diarization.</p>



<p><em>ℹ️ You can find the full code on <a href="https://github.com/ovh/public-cloud-examples/tree/main/ai/ai-endpoints/podcast-transcript-whisper/python" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Github</a> ℹ️</em></p>



<h3 class="wp-block-heading">Environment Setup</h3>



<p>Define your environment variables for accessing <a href="https://endpoints.ai.cloud.ovh.net/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Endpoints</a>:</p>



<pre title="AI Endpoints environment variables" class="wp-block-code"><code lang="bash" class="language-bash line-numbers">$ export OVH_AI_ENDPOINTS_WHISPER_URL=&lt;whisper model URL&gt;
$ export OVH_AI_ENDPOINTS_ACCESS_TOKEN=&lt;your_access_token&gt;
$ export OVH_AI_ENDPOINTS_WHISPER_MODEL=whisper-large-v3</code></pre>



<p>Install dependencies:</p>



<pre title="Dependencies installation" class="wp-block-code"><code lang="bash" class="language-bash line-numbers">$ pip install -r requirements.txt</code></pre>



<h3 class="wp-block-heading">Audio transcription</h3>



<p>With Whisper and the OpenAI client, transcribing audio is as simple as writing a few lines of code:</p>



<pre title="Audio transcription" class="wp-block-code"><code lang="python" class="language-python line-numbers">import os
import json
from openai import OpenAI

# 🛠️ OpenAI client initialisation
client = OpenAI(base_url=os.environ.get('OVH_AI_ENDPOINTS_WHISPER_URL'), 
                api_key=os.environ.get('OVH_AI_ENDPOINTS_ACCESS_TOKEN'))

# 🎼 Audio file loading
with open("../resources/TdT20-trimed-2.mp3", "rb") as audio_file:
    # 📝 Call Whisper transcription API
    transcript = client.audio.transcriptions.create(
        model=os.environ.get('OVH_AI_ENDPOINTS_WHISPER_MODEL'),
        file=audio_file,
        temperature=0.0,
        response_format="verbose_json",
        extra_body={"diarize": True},
    )</code></pre>



<p>FYI:<br>&#8211; we use ‘<em>diarize</em>’ (not a Whisper parameter) to enable diarization, because the OpenAI client lets us add extra body parameters.<br>&#8211; you need <em>verbose_json</em> for diarization (which also means <em>segmentation</em> mode)</p>



<p>Once you have the full transcript, format it in a way that’s easy for humans to read.</p>



<h3 class="wp-block-heading">Create the script</h3>



<p>The JSON field ‘<em>diarization</em>’ contains all of the transcribed, diarized content.</p>



<pre title="JSON response for diarization" class="wp-block-code"><code lang="json" class="language-json line-numbers">"diarization": [
    {
      "speaker": 0,
      "text": "bla bla bla",
      "start": 16.5,
      "end": 26.38
    },
    {
      "speaker": 1,
      "text": "bla bla",
      "start": 26.38,
      "end": 32.6
    },
    {
      "speaker": 1,
      "text": "bla bla",
      "start": 32.6,
      "end": 40.6
    },
    {
      "speaker": 2,
      "text": "bla bla",
      "start": 40.6,
      "end": 42
    }
]</code></pre>



<p>Because they are segmented, you can merge several fields for the same speaker as detailed below—for speaker 1.</p>



<p>Here’s a sample code for creating the script of a <a href="https://smartlink.ausha.co/tranches-de-tech" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">French podcast</a> featuring 3 speakers:</p>



<pre title="Merge sentences for same speaker" class="wp-block-code"><code lang="python" class="language-python line-numbers"># 🔀 Merge the dialog said by the same speaker     
diarizedTranscript = ''
speakers = ["Aurélie", "Guillaume", "Stéphane"]
previousSpeaker = -1
jsonTranscript = json.loads(transcript.model_dump_json())

# 💬 Only the diarization field is useful
for dialog in jsonTranscript["diarization"]:
    speaker = dialog.get("speaker")
    text = dialog.get("text")
    if (previousSpeaker == speaker):
        diarizedTranscript += f" {text}"
    else:
        diarizedTranscript += f"\n\n{speakers[speaker]}: {text}"
    previousSpeaker = speaker

print(f"\n📝 Diarized Transcript 📝:\n{diarizedTranscript}")
</code></pre>



<p>Lastly, run the Python script:</p>



<pre class="wp-block-code"><code lang="" class=" line-numbers">$ python PodcastTranscriptWithWhisper.py

📝 Diarized Transcript 📝:

Stéphane: Bonjour tout le monde, ravi de vous retrouver pour l'enregistrement de ce dernier épisode de la saison avant de prendre des vacances bien méritées et de vous retrouver à la rentrée pour la troisième saison. Nous enregistrons cet épisode le 30 juin à la fraîche, enfin si on peut dire au vu des températures déjà présentes en cette matinée. Justement, elle revient chaudement de Sunnytech et c'est avec plaisir que je la retrouve pour l'enregistrement de cet épisode. Bonjour Aurélie, comment vas-tu ?

Aurélie: Salut, alors ça va très bien. Alors j'avoue, j'ai également très chaud. J'ai le ventilateur qui est juste à côté de moi donc ça va aller pour l'enregistrement du podcast.

Stéphane: Oui, c'est vrai qu'il fait un peu chaud. Et pour ce dernier épisode de la saison, c'est avec un mélange de joie mais aussi d'intimidation que je reçois notre invité. Si je fais ce métier de la façon dont je le fais, c'est grandement grâce à lui. Ce podcast, quelque part, a bien entendu des inspirations de ce que fait notre invité. Je suis donc très content de te recevoir Guillaume. Bonjour Guillaume, comment vas-tu et souhaites-tu te présenter à nos auditrices et auditeurs ? Bonjour à

Guillaume: tous et bien merci déjà de m'avoir invité. Je suis très content de rejoindre votre podcast pour cet épisode. Je m'appelle Guillaume Laforge, je suis un développeur Java depuis la première heure depuis très très longtemps. Je travaille chez Google, en particulier dans la partie Google Cloud. Je me focalise beaucoup sur tout ce qui est Generative AI vu que c'est à la mode évidemment. Les gens me connaissent peut-être ou peut-être ma voix d'ailleurs parce que je fais partie du podcast Les Cascodeurs qu'on a commencé il y a 15 ans ou quelque chose comme ça. Il y a trop longtemps. Ou alors ils me connaissent parce que je suis un des co-fondateurs du langage Groovy, Apache Groovy.</code></pre>



<p>Feel free to try out our new product, <a href="https://endpoints.ai.cloud.ovh.net/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Endpoints</a>, and share your thoughts.</p>



<p>Hang out with us on Discord at #<em>ai-endpoints or</em> <em><a href="https://discord.gg/ovhcloud" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">https://discord.gg/ovhcloud</a></em>. See you soon!</p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fcreate-a-podcast-transcript-with-whisper-by-ai-endpoints%2F&amp;action_name=Create%20a%20podcast%20transcript%20with%20Whisper%20by%20AI%20Endpoints&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
