<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>OVHcloud Engineering Archives - OVHcloud Blog</title>
	<atom:link href="https://blog.ovhcloud.com/category/engineering/feed/" rel="self" type="application/rss+xml" />
	<link></link>
	<description>Innovation for Freedom</description>
	<lastBuildDate>Wed, 29 Apr 2026 07:00:34 +0000</lastBuildDate>
	<language>en-GB</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>https://blog.ovhcloud.com/wp-content/uploads/2019/07/cropped-cropped-nouveau-logo-ovh-rebranding-32x32.gif</url>
	<title>OVHcloud Engineering Archives - OVHcloud Blog</title>
	<link></link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>KubeCon + CloudNativeCon Europe 2026 in Amsterdam: feedback and highlights</title>
		<link>https://blog.ovhcloud.com/kubecon-cloudnativecon-europe-2026-in-amsterdam-feedback-and-highlights/</link>
		
		<dc:creator><![CDATA[Aurélie Vache&nbsp;and&nbsp;Rémy Vandepoel]]></dc:creator>
		<pubDate>Wed, 29 Apr 2026 07:00:31 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[Tranches de Tech & co]]></category>
		<category><![CDATA[Kubecon]]></category>
		<category><![CDATA[OVHcloud]]></category>
		<category><![CDATA[OVHcloud Events]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=31275</guid>

					<description><![CDATA[From March 23 to 26, 2026, the KubeCon + CloudNativeCon Europe took place in Amsterdam. Aurélie Vache and Rémy Vandepoel attended alongside 26 other OVHcloud employees. In this blog, they share their thoughts about this second KubeCon set in the land of tulips. KubeCon Europe 2026: the maturity milestone Back from Amsterdam, the buzz of [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fkubecon-cloudnativecon-europe-2026-in-amsterdam-feedback-and-highlights%2F&amp;action_name=KubeCon%20%2B%20CloudNativeCon%20Europe%202026%20in%20Amsterdam%3A%20feedback%20and%20highlights&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p>From March 23 to 26, 2026, the <a href="https://events.linuxfoundation.org/kubecon-cloudnativecon-europe/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">KubeCon + CloudNativeCon Europe</a> took place in Amsterdam.</p>



<p>Aurélie Vache and Rémy Vandepoel attended alongside 26 other OVHcloud employees. In this blog, they share their thoughts about this second KubeCon set in the land of tulips.</p>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<figure class="wp-block-image size-large"><img fetchpriority="high" decoding="async" width="1024" height="768" src="https://blog.ovhcloud.com/wp-content/uploads/2026/04/HEQP8AIX0AAEr98-1-1024x768.jpg" alt="" class="wp-image-31279" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/04/HEQP8AIX0AAEr98-1-1024x768.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/HEQP8AIX0AAEr98-1-300x225.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/HEQP8AIX0AAEr98-1-768x576.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/HEQP8AIX0AAEr98-1-1536x1152.jpg 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/HEQP8AIX0AAEr98-1-2048x1536.jpg 2048w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<figure class="wp-block-image size-full"><img decoding="async" width="799" height="533" src="https://blog.ovhcloud.com/wp-content/uploads/2026/04/55176825056_8ec98f339b_c.jpg" alt="" class="wp-image-31280" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/04/55176825056_8ec98f339b_c.jpg 799w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/55176825056_8ec98f339b_c-300x200.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/55176825056_8ec98f339b_c-768x512.jpg 768w" sizes="(max-width: 799px) 100vw, 799px" /></figure>
</div>
</div>



<h3 class="wp-block-heading" id="REXKubeCon2026Amsterdam-Context">KubeCon Europe 2026: the maturity milestone</h3>



<p>Back from Amsterdam, the buzz of the RAI halls still echoes in our ears. This 2026 edition of KubeCon + CloudNativeCon Europe wasn’t just another Kubernetes conference. It marked a turning point for this event: the point of maturity. And this is evident just by looking at the numbers: 13,500 attendees for this edition! The largest attendance ever recorded!</p>



<p>While previous years were about exploration and expansion, 2026 was the year of massive industrialization, with one non-negotiable pre-requirement: digital sovereignty.</p>



<figure class="wp-block-image aligncenter size-full is-resized"><img decoding="async" width="799" height="533" src="https://blog.ovhcloud.com/wp-content/uploads/2026/04/55169871701_c147fd0dda_c.jpg" alt="" class="wp-image-31282" style="aspect-ratio:1.4990505586153107;width:678px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/04/55169871701_c147fd0dda_c.jpg 799w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/55169871701_c147fd0dda_c-300x200.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/55169871701_c147fd0dda_c-768x512.jpg 768w" sizes="(max-width: 799px) 100vw, 799px" /></figure>



<p>Key figures from the 2026 edition:</p>



<ul class="wp-block-list">
<li>13,500+ attendees (46% first-time attendees)</li>



<li>100 countries represented</li>



<li>3,474 unique organizations/companies</li>



<li>891 sessions</li>



<li>230 projects in the CNCF landscape with 19.9 million contributors</li>
</ul>



<p><strong>CNCF Contributors by Geography (Last 12 Months)</strong></p>



<ul class="wp-block-list">
<li>Europe: <strong>38.8%</strong> of contributions (ahead of the United States)</li>



<li>United States: 36.29%</li>



<li>Germany: 9.82% (leading in Europe)</li>



<li>France: 4.68%</li>



<li>Switzerland: 2.49%</li>



<li>Strong signals for digital sovereignty, a key theme of this year’s keynotes 💪</li>
</ul>



<h3 class="wp-block-heading">Colocated events</h3>



<p>KubeCon + CloudNativeCon Europe 2026 traditionally kicks off with a full day dedicated to co-located events. This year was no exception, with an impressive lineup of 16 events, including well-known favorites such as ArgoCon, BackstageCon, CiliumCon, Platform Engineering Day, Kubernetes on Edge Day, and Observability Day.</p>



<p>Among the newcomer events, <strong>Open Sovereign Cloud Day</strong> was a stand out, as it highlighted the growing importance of cloud sovereignty in Europe.</p>



<p>During CiliumCon, we were proud to see the spotlight on our MKS Standard offer 🚀.</p>



<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" width="1024" height="768" src="https://blog.ovhcloud.com/wp-content/uploads/2026/04/IMG-20260323-WA00291-1024x768.jpg" alt="" class="wp-image-31283" style="width:566px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/04/IMG-20260323-WA00291-1024x768.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/IMG-20260323-WA00291-300x225.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/IMG-20260323-WA00291-768x576.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/IMG-20260323-WA00291-1536x1152.jpg 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/IMG-20260323-WA00291.jpg 1600w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<h3 class="wp-block-heading">OVHcloud Presence</h3>



<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" width="1024" height="585" src="https://blog.ovhcloud.com/wp-content/uploads/2026/04/signal-2026-03-24-10-23-27-765-1024x585.jpg" alt="" class="wp-image-31276" style="aspect-ratio:1.7504278491247434;width:618px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/04/signal-2026-03-24-10-23-27-765-1024x585.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/signal-2026-03-24-10-23-27-765-300x171.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/signal-2026-03-24-10-23-27-765-768x439.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/signal-2026-03-24-10-23-27-765-1536x877.jpg 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/signal-2026-03-24-10-23-27-765.jpg 1600w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>OVHcloud had a strong presence at the event, with two different booths serving two different purposes.</p>



<p>One was located in the <em>Activation Zone</em>, designed as an interactive space to engage with attendees through a video game &#8220;Gaming Camp: Beat Cloud Villains!&#8221;<em>, </em>described as<em> &#8220;Join the fight against the villains of the cloud. Take on Hidden Cost, Jailor Stack, and Autonomous Zero, and prove yourself as a true Guardian of the Cloud.&#8221;</em></p>



<p>Players were welcomed to step into a two-player fighting game inspired by the style of <em>Street Fighter</em>, where strategy and skill are your best weapons. Winners won exclusive t-shirts.</p>



<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" width="1024" height="768" src="https://blog.ovhcloud.com/wp-content/uploads/2026/04/PXL_20260324_125635211.MP2_-1024x768.jpg" alt="" class="wp-image-31285" style="width:520px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/04/PXL_20260324_125635211.MP2_-1024x768.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/PXL_20260324_125635211.MP2_-300x225.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/PXL_20260324_125635211.MP2_-768x576.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/PXL_20260324_125635211.MP2_-1536x1152.jpg 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/PXL_20260324_125635211.MP2_-2048x1536.jpg 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>The second booth had a more corporate focus, highlighting OVHcloud’s broader portfolio, strategic positioning, and enterprise offerings. It provided a space for deeper conversations around demos, use cases, and cloud strategies.</p>



<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" width="1024" height="768" src="https://blog.ovhcloud.com/wp-content/uploads/2026/04/PXL_20260324_134841194.MP2_-1024x768.jpg" alt="" class="wp-image-31286" style="width:599px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/04/PXL_20260324_134841194.MP2_-1024x768.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/PXL_20260324_134841194.MP2_-300x225.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/PXL_20260324_134841194.MP2_-768x576.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/PXL_20260324_134841194.MP2_-1536x1152.jpg 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/PXL_20260324_134841194.MP2_-2048x1536.jpg 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>The opportunity was too good to pass up, so we took the chance to interview key players in the ecosystem, as well as customers of our solutions.</p>



<p>We conducted five interviews and had many discussions, and we can’t wait to share them with you soon!</p>



<p>Here’s a sneak peek featuring <strong>Sudeep Goswami</strong>, CEO of <strong>Traefik Labs</strong>:</p>



<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" width="1024" height="683" src="https://blog.ovhcloud.com/wp-content/uploads/2026/04/KubeConOVH_127-1024x683.jpg" alt="" class="wp-image-31287" style="aspect-ratio:1.4992503748125936;width:450px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/04/KubeConOVH_127-1024x683.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/KubeConOVH_127-300x200.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/KubeConOVH_127-768x512.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/KubeConOVH_127-1536x1024.jpg 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/KubeConOVH_127-2048x1365.jpg 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>These interviews will soon be available on YouTube, so stay tuned!</p>



<h3 class="wp-block-heading">Aurélie Vache&#8217;s talk</h3>



<p>Getting accepted to KubeCon is not easy, and Aurélie, our Developer Advocate and CNCF Ambassador, rose to the challenge by once again presenting a new talk.</p>



<p><em>“The Ultimate Kubernetes Challenge: An Interactive Trivia Game”:</em></p>



<p>&#8220;<em>Kubernetes has become the de facto standard for deploying and operating containerized applications. We use it, as well as its ecosystem, on a daily basis, but do we know them as well as we think we do?</em></p>



<p><em>With a mix of quiz and live demos, come learn and/or improve your knowledge. You will discover (or rediscover) the key concepts of Kubernetes (pods, secrets, services…), internal components but also best practices.</em></p>



<p><em>In this fun and dynamic talk, come compete throughout the quiz and explore the wonderful world of Kubernetes.</em></p>



<p><em>Icing on the cake: the first will win some swags.</em>&#8220;</p>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="768" src="https://blog.ovhcloud.com/wp-content/uploads/2026/04/IMG-20260325-WA0051-1024x768.jpg" alt="" class="wp-image-31292" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/04/IMG-20260325-WA0051-1024x768.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/IMG-20260325-WA0051-300x225.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/IMG-20260325-WA0051-768x576.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/IMG-20260325-WA0051-1536x1152.jpg 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/IMG-20260325-WA0051.jpg 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="768" src="https://blog.ovhcloud.com/wp-content/uploads/2026/04/IMG-20260325-WA00521-1024x768.jpg" alt="" class="wp-image-31293" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/04/IMG-20260325-WA00521-1024x768.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/IMG-20260325-WA00521-300x225.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/IMG-20260325-WA00521-768x576.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/IMG-20260325-WA00521-1536x1152.jpg 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/IMG-20260325-WA00521.jpg 1600w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>
</div>
</div>



<p>During this talk, attendees tested their Kubernetes knowledge through an interactive quiz, with results presented via illustrated slides and live, hands-on demos.</p>



<p>Giving a talk at 5 p.m., during the final session of the second day, was an ambitious way to finish up. But thanks to the interactive format of her talk, attendees were able to enjoy testing their knowledge while discovering tips about Kubernetes and its concepts and features.</p>



<p>Three OVHcloud MKS clusters were created especially for the occasion, one with 3 nodes, one with zero nodes, and one with 3 nodes across 3 Availability Zones:</p>



<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2026/04/image-2026-4-15_8-20-59-1024x580.png" alt="" class="wp-image-31294" style="aspect-ratio:1.765536773898217;width:486px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/04/image-2026-4-15_8-20-59-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/image-2026-4-15_8-20-59-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/image-2026-4-15_8-20-59-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/image-2026-4-15_8-20-59-1536x869.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/image-2026-4-15_8-20-59.png 1862w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Watch the talk here:</p>



<figure class="wp-block-embed aligncenter is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio"><div class="wp-block-embed__wrapper">
<span class="videowrapper embed-youtube-nocookie aspect_ratio_563"><iframe loading="lazy" title="The Ultimate Kubernetes Challenge: An Interactive Trivia Game - Aurélie Vache, OVHcloud" width="1200" height="675" src="https://www.youtube-nocookie.com/embed/7LeveaxQtGs?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe></span> <!-- /.videowrapper -->
</div></figure>



<h3 class="wp-block-heading">Keynotes: Toward “Agent-Based” and Autonomous AI</h3>



<p>Plenary sessions at the event were dominated by a convergence of Kubernetes and Artificial Intelligence. This term, already ubiquitous in tech news, was bound to be a major focus here. Jonathan Bryce, the Executive Director of Cloud &amp; Infrastructure at the Linux Foundation and an iconic figure in the ecosystem, made a strong point by reminding the audience that while Kubernetes is everywhere (82% adoption rate), AI in production remains a major challenge.</p>



<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" width="1024" height="768" src="https://blog.ovhcloud.com/wp-content/uploads/2026/04/PXL_20260324_081828458.MP_-1024x768.jpg" alt="" class="wp-image-31295" style="width:407px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/04/PXL_20260324_081828458.MP_-1024x768.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/PXL_20260324_081828458.MP_-300x225.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/PXL_20260324_081828458.MP_-768x576.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/PXL_20260324_081828458.MP_-1536x1152.jpg 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/PXL_20260324_081828458.MP_-2048x1536.jpg 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>In November, during the latest KubeCon + CoudNativeCon NA at Atlanta, the CNCF launched the &#8220;<a href="https://www.cncf.io/announcements/2025/11/11/cncf-launches-certified-kubernetes-ai-conformance-program-to-standardize-ai-workloads-on-kubernetes/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Certified Kubernetes AI Conformance Program to Standardize AI Workloads on Kubernetes</a>&#8220;.  5 months later, several companies including the OVHcloud Managed Kubernetes Services (MKS) platform, succeeded this new program with their own certified Kubernetes AI platform.</p>



<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" width="1024" height="768" src="https://blog.ovhcloud.com/wp-content/uploads/2026/04/PXL_20260324_083621436.MP_-1024x768.jpg" alt="" class="wp-image-31296" style="width:431px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/04/PXL_20260324_083621436.MP_-1024x768.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/PXL_20260324_083621436.MP_-300x225.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/PXL_20260324_083621436.MP_-768x576.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/PXL_20260324_083621436.MP_-1536x1152.jpg 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/PXL_20260324_083621436.MP_-2048x1536.jpg 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>During the keynotes we even saw a real plane!</p>



<figure class="wp-block-image aligncenter size-full is-resized"><img loading="lazy" decoding="async" width="800" height="534" src="https://blog.ovhcloud.com/wp-content/uploads/2026/04/55166324614_dd452b5f68_c.jpg" alt="" class="wp-image-31297" style="aspect-ratio:1.4981024097101614;width:455px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/04/55166324614_dd452b5f68_c.jpg 800w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/55166324614_dd452b5f68_c-300x200.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/55166324614_dd452b5f68_c-768x513.jpg 768w" sizes="auto, (max-width: 800px) 100vw, 800px" /></figure>



<p>And to top it off, seeing Michelin present the Top End User Award to SNCF was a real highlight for us. <em>Cocoricoooo!</em> 🇫🇷</p>



<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" width="1024" height="682" src="https://blog.ovhcloud.com/wp-content/uploads/2026/04/HEQyuKaWQAAn_3z-1024x682.jpg" alt="" class="wp-image-31298" style="aspect-ratio:1.501451415253588;width:514px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/04/HEQyuKaWQAAn_3z-1024x682.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/HEQyuKaWQAAn_3z-300x200.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/HEQyuKaWQAAn_3z-768x512.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/HEQyuKaWQAAn_3z-1536x1024.jpg 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/HEQyuKaWQAAn_3z.jpg 2000w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<h3 class="wp-block-heading" id="REXKubeCon2026Amsterdam-KeyTrends">Key Trends</h3>



<p>Find below the most frequently discussed technical pillars that will remain prominent in the coming months and years:</p>



<p>* <strong>Agent-based AI:</strong> The focus is shifting from training to inference. The announcement of Dapr Agents 1.0 shows that Kubernetes will now orchestrate agents capable of making real-time decisions on the infrastructure.</p>



<p>* <strong>GPU Standardization (DRA)</strong>: Thanks to NVIDIA’s widespread adoption of Dynamic Resource Allocation (DRA) drivers, GPU scheduling is becoming as simple and granular as CPU scheduling. A boon for cost optimization.</p>



<p>* <strong>Sovereignty</strong>: Sovereignty is no longer a legal concept; it is an architecture. We have seen a rise in encryption tools for data in transit and at rest (Confidential Computing) natively integrated into CNIs such as Cilium.</p>



<p>* <strong>FinOps 2.0</strong>: With 67% of AI compute dedicated to inference by the end of 2026, precise monitoring of GPU consumption via projects like Kepler has become essential for the economic viability of projects.</p>



<h3 class="wp-block-heading" id="REXKubeCon2026Amsterdam-TheGatewayAPIisbecomingthestandard">The Gateway API is becoming the standard</h3>



<p>As we announced in our blog post <em>“<a href="https://blog.ovhcloud.com/moving-beyond-ingress-why-should-ovhcloud-managed-kubernetes-service-mks-users-start-looking-at-the-gateway-api/" data-wpel-link="internal">Moving Beyond Ingress: Why should OVHcloud Managed Kubernetes Service (MKS) users start looking at the Gateway API?</a>”</em>, the ingress-nginx controller, the most widely used ingress controller, has now been archived.</p>



<p>Now, after 8 years of development, 275 released versions, and nearly 20k GitHub stars, the maintainers of the Kubernetes Gateway API introduced<a href="https://kubernetes.io/blog/2026/03/20/ingress2gateway-1-0-release/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"> <strong>ingress2gateway v1.0</strong></a>, a tool designed to simplify migration. It automatically converts Ingress resources including annotations into Gateway API resources. The recommended approach remains pragmatic: first migrate the controller while keeping existing Ingress objects, then gradually transition to the Gateway API. Attempting a full migration in a single step is considered risky and unnecessary.</p>



<p>Additionally, <a href="https://github.com/kubernetes-sigs/gateway-api/releases/tag/v1.5.0" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Gateway API version 1.5</a> represents a major milestone: five features have moved from experimental status to the Standard channel in a single release.</p>



<p>Amongst them:</p>



<ul class="wp-block-list">
<li><strong>ListenerSet</strong>: delegates TLS listener management outside of the Gateway&nbsp;</li>



<li><strong>TLSRoute</strong>: SNI-based routing in either termination or passthrough mode</li>



<li>Client certificate validation for mTLS at the ingress layer</li>



<li>Native CORS filter for HTTPRoute</li>
</ul>



<p>The Kubernetes Gateway API is now establishing itself as much more than just a successor to Ingress: it is evolving into Kubernetes’ unified network control plane.</p>



<h2 class="wp-block-heading">Favorite talk</h2>



<p>As usual, Aurélie wasn’t able to attend many talks, but among the 2-3 she did see, there was one that really had a &#8220;wow&#8221; effect on her:</p>



<p>« <a href="https://kccnceu2026.sched.com/event/2CW5p/an-immersive-and-visual-journey-into-kubernetes-networking-benoit-entzmann-feesh" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">An immersive and visual journey into kubernetes networking</a> ».</p>



<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" width="1024" height="768" src="https://blog.ovhcloud.com/wp-content/uploads/2026/04/PXL_20260326_100115641-1024x768.jpg" alt="" class="wp-image-31300" style="width:405px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/04/PXL_20260326_100115641-1024x768.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/PXL_20260326_100115641-300x225.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/PXL_20260326_100115641-768x576.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/PXL_20260326_100115641-1536x1152.jpg 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/PXL_20260326_100115641-2048x1536.jpg 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" width="1024" height="768" src="https://blog.ovhcloud.com/wp-content/uploads/2026/04/PXL_20260326_100805212-1024x768.jpg" alt="" class="wp-image-31301" style="width:407px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/04/PXL_20260326_100805212-1024x768.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/PXL_20260326_100805212-300x225.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/PXL_20260326_100805212-768x576.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/PXL_20260326_100805212-1536x1152.jpg 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/PXL_20260326_100805212-2048x1536.jpg 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p><strong>Benoit</strong>, a DevSecOps engineer at Feesh in Switzerland with extensive expertise in Kubernetes networking, created a video game using Godot with four levels: “pod-to-pod basics”, “pod-to-pod advanced”, “service mesh sidecar”, and “service mesh with ambient mode”.</p>



<p>Across these four levels, he explains Kubernetes networking in a vanilla setup, then with Cilium and Istio, all from the perspective of a TCP packet, represented as a fish.</p>



<p>Networking and I don’t exactly get along, and I’ll admit I’ve always struggled with it. Even now, although I’ve had no choice but to work with Kubernetes and service mesh, I still find it challenging. But seeing the fish swim from frontend to backend, enter a building underwater (the node), interact with an eBPF program… it really makes things more visual and intuitive.</p>



<p>On Thursday morning, after the keynote, the room with 2000 seats was packed!</p>



<p>Explaining networking by building a 3D game from scratch specifically for the occasion: hats off to you!</p>



<p>Benoit had an issue on stage, because he had built the game in 4K and it didn’t display properly on the projection screen. Luckily, about 30 seconds before showtime, the production team and he managed to fix it. He went on stage without showing any of that stress 💪.</p>



<p>Replay:</p>



<figure class="wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio"><div class="wp-block-embed__wrapper">
<span class="videowrapper embed-youtube-nocookie aspect_ratio_563"><iframe loading="lazy" title="An Immersive and Visual Journey Into Kubernetes Networking - Benoit Entzmann, Feesh" width="1200" height="675" src="https://www.youtube-nocookie.com/embed/Xtjpdy8OmQQ?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe></span> <!-- /.videowrapper -->
</div></figure>



<h3 class="wp-block-heading" id="REXKubeCon2026Amsterdam-KubeConin45seconds">KubeCon in 45 seconds</h3>



<p>To keep memories of these 3-4 amazing days, we created a &#8220;KubeCon Europe 2026 in 45 seconds movie:</p>



<figure class="wp-block-embed aligncenter is-type-rich is-provider-twitter wp-block-embed-twitter"><div class="wp-block-embed__wrapper">
<blockquote class="twitter-tweet" data-width="550" data-dnt="true"><p lang="en" dir="ltr"><a href="https://twitter.com/hashtag/KubeCon?src=hash&amp;ref_src=twsrc%5Etfw" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">#KubeCon</a> 2026 in 45 seconds 🎥⏱️<br><br>The energy. Conversations. The community.<a href="https://twitter.com/hashtag/Sovereignty?src=hash&amp;ref_src=twsrc%5Etfw" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">#Sovereignty</a>, <a href="https://twitter.com/hashtag/Kubernetes?src=hash&amp;ref_src=twsrc%5Etfw" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">#Kubernetes</a> at scale, <a href="https://twitter.com/hashtag/reversibility?src=hash&amp;ref_src=twsrc%5Etfw" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">#reversibility</a> — same themes in every conversation. That&#39;s why we show up.<br><br>Thanks for the moments you can&#39;t script 👋<a href="https://twitter.com/hashtag/CloudNativeCon?src=hash&amp;ref_src=twsrc%5Etfw" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">#CloudNativeCon</a> <a href="https://t.co/dBinAqM04u" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">pic.twitter.com/dBinAqM04u</a></p>&mdash; OVHcloud (@OVHcloud) <a href="https://twitter.com/OVHcloud/status/2044048614977122614?ref_src=twsrc%5Etfw" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">April 14, 2026</a></blockquote><script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
</div></figure>



<h3 class="wp-block-heading" id="REXKubeCon2026Amsterdam-Conclusion">Conclusion</h3>



<p>KubeCon Amsterdam proved once again that the strength of open source lies in its community.</p>



<p>From the halls of the RAI to the technical sessions, the excitement was palpable. We’re leaving with our heads full of ideas, but above all with the certainty that collaboration remains the key to solving the complex challenges of modern IT. This was particularly evident in the packed conference rooms and the crowded aisles of the exhibition hall.</p>



<p>One thing is certain: the future of Cloud Native is being written together, and we at OVHcloud look forward to contributing to it with you by helping you get the most out of Kubernetes through our<a href="https://www.ovhcloud.com/fr/public-cloud/kubernetes/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"> managed platform</a>. Because we’re convinced that for businesses in 2026, the challenge will no longer be how to run Kubernetes, but how to use it to innovate faster and better than the competition.</p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fkubecon-cloudnativecon-europe-2026-in-amsterdam-feedback-and-highlights%2F&amp;action_name=KubeCon%20%2B%20CloudNativeCon%20Europe%202026%20in%20Amsterdam%3A%20feedback%20and%20highlights&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Discover the External Secret Operator (ESO) OVHcloud Provider to manage your Kubernetes secrets  🎉</title>
		<link>https://blog.ovhcloud.com/discover-the-external-secret-operator-eso-ovhcloud-provider-to-manage-your-kubernetes-secrets-%f0%9f%8e%89/</link>
		
		<dc:creator><![CDATA[Aurélie Vache]]></dc:creator>
		<pubDate>Tue, 14 Apr 2026 07:02:22 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[Tranches de Tech & co]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=31032</guid>

					<description><![CDATA[Several months ago, we released the Beta version of the OVHcloud Secret Manager and we guided you how to manage your secrets thanks to the existing External Secret Operator (ESO) Hashicorp Vault provider. As our Secret Manager is now in General Availability, our teams worked on the development of an OVHcloud ESO Provider now available [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fdiscover-the-external-secret-operator-eso-ovhcloud-provider-to-manage-your-kubernetes-secrets-%25f0%259f%258e%2589%2F&amp;action_name=Discover%20the%20External%20Secret%20Operator%20%28ESO%29%20OVHcloud%20Provider%20to%20manage%20your%20Kubernetes%20secrets%20%20%F0%9F%8E%89&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" width="1024" height="681" src="https://blog.ovhcloud.com/wp-content/uploads/2026/04/Gribouillis-2026-04-10-15.57.01.910-1024x681.png" alt="" class="wp-image-31204" style="aspect-ratio:1.503658927864753;width:524px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/04/Gribouillis-2026-04-10-15.57.01.910-1024x681.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/Gribouillis-2026-04-10-15.57.01.910-300x200.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/Gribouillis-2026-04-10-15.57.01.910-768x511.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/Gribouillis-2026-04-10-15.57.01.910.png 1532w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Several months ago, we released the Beta version of the OVHcloud Secret Manager and we guided you <a href="https://blog.ovhcloud.com/manage-your-secrets-through-ovhcloud-secret-manager-thanks-to-external-secrets-operator-eso-on-ovhcloud-managed-kubernetes-service-mks/" data-wpel-link="internal">how to manage your secrets thanks to the existing External Secret Operator (ESO) Hashicorp Vault provider</a>.</p>



<p>As our Secret Manager is now in General Availability, our teams worked on the development of an OVHcloud ESO Provider now available in the <a href="https://github.com/external-secrets/external-secrets/releases/tag/v2.3.0" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">ESO v2.3.0 new release</a> 🎉.</p>



<p>In this blog post, you will learn how to create a new secret in the OVHcloud Secret Manager and how to manage it within your Kubernetes clusters through the <a href="https://external-secrets.io/latest/provider/ovhcloud/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OVHcloud ESO provider</a>.</p>



<h3 class="wp-block-heading">External Secrets Operator (ESO)</h3>



<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="225" height="225" src="https://blog.ovhcloud.com/wp-content/uploads/2026/04/image.png" alt="" class="wp-image-31088" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/04/image.png 225w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/image-150x150.png 150w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/image-70x70.png 70w" sizes="auto, (max-width: 225px) 100vw, 225px" /></figure>



<p>The <strong>External Secrets Operator</strong> (ESO), a CNCF sanbox project since 2022, is a Kubernetes operator that integrates external secret management systems.</p>



<p>The operator reads the information from an external APIs and automatically injects the values into a <a href="https://kubernetes.io/docs/concepts/configuration/secret/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Kubernetes Secret</a>. If the secret changes in the external API, the operator updates the secret in the Kubernetes cluster.</p>



<p>The ESO connects to an external Secret Manager, such as <a href="https://external-secrets.io/latest/provider/ovhcloud/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OVHcloud</a>, Vault, AWS, or GCP, via a provider configured in a <strong>(Cluster)SecretStore.</strong> An <strong>ExternalSecret</strong> resource then specifies which secrets to retrieve. ESO fetches those values and creates a corresponding Kubernetes Secret within the cluster.</p>



<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" width="1024" height="943" src="https://blog.ovhcloud.com/wp-content/uploads/2026/04/Gribouillis-2026-04-09-14.55.33.553-1024x943.png" alt="" class="wp-image-31170" style="aspect-ratio:1.0859073039196323;width:484px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/04/Gribouillis-2026-04-09-14.55.33.553-1024x943.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/Gribouillis-2026-04-09-14.55.33.553-300x276.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/Gribouillis-2026-04-09-14.55.33.553-768x707.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/Gribouillis-2026-04-09-14.55.33.553.png 1097w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>For more details, read the <a href="https://external-secrets.io/" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">ESO official documentation</a>.</p>



<h3 class="wp-block-heading">Prerequisites</h3>



<p>To be able to use the ESO OVHcloud provider, you need to follow some prerequisites:</p>



<ul class="wp-block-list">
<li>Have an OVHcloud account</li>



<li>Created an <a href="https://www.ovhcloud.com/en/identity-security-operations/key-management-service/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OKMS</a> domain (&#8220;<em>305db938-331f-454d-83a7-3a0a29291661</em>&#8221; for example in this blog post)</li>



<li><a href="https://github.com/ovh/public-cloud-examples/tree/main/iam/create-user-and-generate-pat-token-with-cli" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Created an IAM local user</a> (&#8220;<em>secretmanager-305db938-331f-454d-83a7-3a0a29291661</em>&#8221; for example in this blog post)</li>



<li>Installed the <a href="https://github.com/ovh/ovhcloud-cli/?tab=readme-ov-file#installation" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OVHcloud CLI</a></li>



<li>Have a Kubernetes cluster</li>
</ul>



<p>The ESO OVH provider supports both <code><em>token</em></code> and <code><em>mTLS</em></code> authentication. In this blog post, we will use the token authentication mode. Please follow the <a href="https://external-secrets.io/latest/provider/ovhcloud/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OVHcloud ESO provider</a> guide if you wish to use mTLS authentication mode.</p>



<h4 class="wp-block-heading">Generate a PAT token (For token authentication only)</h4>



<p>The ESO (<strong>Cluster)SecretStore</strong> needs the permission to fetch secrets from Secret Manager.</p>



<p>If you want to use token autentication, you’ll need a token (PAT). You can use the ovhcloud CLI to do that:</p>



<pre class="wp-block-code"><code class="">PAT_TOKEN=$(ovhcloud iam user token create &lt;iam-local-user-name&gt; --name pat-&lt;iam-local-user-name&gt; --description "PAT secret manager for domain &lt;okms-id&gt;" -o json  | jq .details.token |  tr -d '"')<br><br>echo $PAT_TOKEN<br>&lt;your-token&gt;</code></pre>



<p>You should have a result like this:</p>



<pre class="wp-block-code"><code class="">$ PAT_TOKEN=$(ovhcloud iam user token create secretmanager-305db938-331f-454d-83a7-3a0a29291661 --name pat-secretmanager-305db938-331f-454d-83a7-3a0a29291661 --description "PAT secret manager for domain 305db938-331f-454d-83a7-3a0a29291661" -o json  | jq .details.token |  tr -d '"')<br>2026/04/07 14:07:45 Final parameters:<br>{<br> "description": "PAT secret manager for domain 305db938-331f-454d-83a7-3a0a29291661",<br> "name": "pat-secretmanager-305db938-331f-454d-83a7-3a0a29291661"<br>}<br><br>$ echo $PAT_TOKEN<br>eyJhbGciOiJFZERTQSIsImtpZCI6IjgzMkFGNUE5ODg3MzFCMDNGM0EzMTRFMDJFRUJFRjBGNDE5MUY0Q0YiLCJraW5kIjoicGF0IiwidHlwIjoiSldUIn0.eyJ0b2tlbiI6InBBSFh1WE5JdVNHYVpmV3F2OUFzVmJrU3UwR2UySTJrdFU0OGdTZkwyZ1k9In0.-VDbiUf4vNm1KB9qSv7i4sGMCvxs_EuZFAETB-eaOFf3IX8-9m7akN800--ASgXy55_DDFHdy4Z5uSq8lww-Bw</code></pre>



<p>Encode the PAT token in base 64 and save it in an environment variable:</p>



<pre class="wp-block-code"><code class="">export PAT_TOKEN_B64=$(echo -n $PAT_TOKEN | base64)<br>echo $PAT_TOKEN_B64</code></pre>



<h4 class="wp-block-heading">Retrieve and save the KMS information</h4>



<p>List the OKMS domains:</p>



<pre class="wp-block-code"><code class="">$ ovhcloud okms list<br>┌──────────────────────────────────────┬─────────────┐<br>│                  id                  │   region    │<br>├──────────────────────────────────────┼─────────────┤<br>│ 305db938-331f-454d-83a7-3a0a29291661 │ eu-west-par │<br>│ xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx │ eu-west-par │<br>└──────────────────────────────────────┴─────────────┘</code></pre>



<p>Save the KMS endpoint and the OKMS ID in two environment variables. For example:</p>



<pre class="wp-block-code"><code class="">export OKMS_ID="305db938-331f-454d-83a7-3a0a29291661"<br>export KMS_ENDPOINT=$(ovhcloud okms get 305db938-331f-454d-83a7-3a0a29291661 -o json | jq .restEndpoint | xargs)</code></pre>



<h4 class="wp-block-heading">Create a secret in the Secret Manager</h4>



<p>In the<a href="https://www.ovh.com/manager" data-wpel-link="exclude"> OVHcloud Control Panel</a> (UI), go to ‘Secret Manager’ section and click on the <strong>Create a secret</strong> button.</p>



<p>Then in order to create a secret ‘prod/eu-west-par/dockerconfigjson’, in the Europe region (France – Paris) eu-west-par, choose this region:</p>



<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="695" height="674" src="https://blog.ovhcloud.com/wp-content/uploads/2026/04/Capture-decran-2026-04-13-a-14.13.25.png" alt="" class="wp-image-31231" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/04/Capture-decran-2026-04-13-a-14.13.25.png 695w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/Capture-decran-2026-04-13-a-14.13.25-300x291.png 300w" sizes="auto, (max-width: 695px) 100vw, 695px" /></figure>



<p>Then, choose the OKMS domain and create&#8221;prod/eu-west-par/dockerconfigjson&#8221; in the path and fill the content:</p>



<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="704" height="718" src="https://blog.ovhcloud.com/wp-content/uploads/2026/04/Capture-decran-2026-04-13-a-14.13.15.png" alt="" class="wp-image-31232" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/04/Capture-decran-2026-04-13-a-14.13.15.png 704w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/Capture-decran-2026-04-13-a-14.13.15-294x300.png 294w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/Capture-decran-2026-04-13-a-14.13.15-70x70.png 70w" sizes="auto, (max-width: 704px) 100vw, 704px" /></figure>



<p>Finally, click on the <strong>Create</strong> button to finalise the creation of the new secret.</p>



<h4 class="wp-block-heading">Install or update the ESO</h4>



<p>If you&#8217;d never installed ESO in your Kubernetes cluster, you can install it via Helm:</p>



<pre class="wp-block-code"><code class="">helm repo add external-secrets https://charts.external-secrets.io<br>helm repo update<br><br>helm install external-secrets \<br>   external-secrets/external-secrets \<br>    -n external-secrets \<br>    --create-namespace \<br>    --set installCRDs=true</code></pre>



<p>If you already installed it, now you should update it in order to use this new provider:</p>



<pre class="wp-block-code"><code class="">helm upgrade external-secrets external-secrets/external-secrets -n external-secrets</code></pre>



<p>⚠️ In order to use the OVHcloud provider, you need to have a running instance of ESO equals to version <strong>2.3.0</strong> or more.</p>



<pre class="wp-block-code"><code class="">$ helm list -n external-secrets<br><br>NAME            	NAMESPACE       	REVISION	UPDATED                              	STATUS  	CHART                 	APP VERSION<br>external-secrets	external-secrets	1       	2026-04-13 13:56:29.071329 +0200 CEST	deployed	external-secrets-2.3.0	v2.3.0</code></pre>



<h3 class="wp-block-heading">Let&#8217;s deploy a Secret in Kubernetes using the ESO provider!</h3>



<h4 class="wp-block-heading">Deploy a ClusterSecretStore to connect ESO to Secret Manager</h4>



<p>Set up a <strong>ClusterSecretStore</strong> to manage synchronization with Secret Manager.<br>It will use the OVHcloud provider with token authorization mode, and the OKMS endpoint as the backend.</p>



<p>Create a <strong>clustersecretstore.yaml.template</strong> file with the content below:</p>



<pre class="wp-block-code"><code class="">apiVersion: external-secrets.io/v1<br>kind: ClusterSecretStore<br>metadata:<br>  name: secret-store-ovh<br>spec:<br>  provider:<br>    ovh:<br>      server: "$KMS_ENDPOINT" # for example: "https://eu-west-rbx.okms.ovh.net"<br>      okmsid: "$OKMS_ID" # for example: "734b9b45-8b1a-469c-b140-b10bd6540017"<br>      auth:<br>        token:<br>          tokenSecretRef:<br>            name: ovh-token<br>            namespace: external-secrets<br>            key: token<br>---<br>apiVersion: v1<br>kind: Secret<br>metadata:<br>  name: ovh-token<br>  namespace: external-secrets<br>data:<br>  token: $PAT_TOKEN_B64</code></pre>



<p>Generate the <strong>clustersecretstore.yaml</strong> file from the environment variables you defined:</p>



<pre class="wp-block-code"><code class=""><code>envsubst &lt; clustersecretstore.yaml.template &gt; clustersecretstore.yaml</code></code></pre>



<p>You should obtain a file filled with the OVHcloud KMS information:</p>



<pre class="wp-block-code"><code class="">apiVersion: external-secrets.io/v1<br>kind: ClusterSecretStore<br>metadata:<br>  name: secret-store-ovh<br>spec:<br>  provider:<br>    ovh:<br>      server: "https://eu-west-par.okms.ovh.net" # for example: "https://eu-west-rbx.okms.ovh.net"<br>      okmsid: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" # for example: "734b9b45-8b1a-469c-b140-b10bd6540017"<br>      auth:<br>        token:<br>          tokenSecretRef:<br>            name: ovh-token<br>            namespace: external-secrets<br>            key: token<br>---<br>apiVersion: v1<br>kind: Secret<br>metadata:<br>  name: ovh-token<br>  namespace: external-secrets<br>data:<br>  token: ZXlK...UJ3</code></pre>



<p>Apply it in your Kubernetes cluster:</p>



<pre class="wp-block-code"><code class="">kubectl apply -f clustersecretstore.yaml</code></pre>



<p>Check:</p>



<pre class="wp-block-code"><code class="">$ kubectl get clustersecretstore.external-secrets.io/secret-store-ovh<br><br>NAME               AGE   STATUS   CAPABILITIES   READY<br>secret-store-ovh   7s    Valid    ReadWrite      True</code></pre>



<h3 class="wp-block-heading">Create an ExternalSecret</h3>



<p>Create an <strong>externalsecret.yaml</strong> file with the content below:</p>



<pre class="wp-block-code"><code class="">apiVersion: external-secrets.io/v1<br>kind: ExternalSecret<br>metadata:<br>  name: docker-config-secret<br>  namespace: external-secrets<br>spec:<br>  refreshInterval: 30m<br>  secretStoreRef:<br>    name: secret-store-ovh<br>    kind: ClusterSecretStore<br>  target:<br>    template:<br>      type: kubernetes.io/dockerconfigjson<br>      data:<br>        .dockerconfigjson: "{{ .mysecret | toString }}"<br>    name: ovhregistrycred<br>    creationPolicy: Owner<br>  data:<br>  - secretKey: ovhregistrycred<br>    remoteRef:<br>      key: prod/eu-west-par/dockerconfigjson</code></pre>



<p>Apply it:</p>



<pre class="wp-block-code"><code class="">$ kubectl apply -f externalsecret.yaml<br><br>externalsecret.external-secrets.io/docker-config-secret created</code></pre>



<p>Check:</p>



<pre class="wp-block-code"><code class="">$ kubectl get externalsecret.external-secrets.io/docker-config-secret -n external-secrets <br><br>NAME                   STORETYPE            STORE              REFRESH INTERVAL   STATUS         READY   LAST SYNC<br>docker-config-secret   ClusterSecretStore   secret-store-ovh   30m                SecretSynced   True    4s</code></pre>



<p>After applying this command, it will create a Kubernetes Secret object.</p>



<pre class="wp-block-code"><code class="">$ kubectl get secret ovhregistrycred -n external-secrets<br><br>NAME              TYPE                             DATA   AGE<br>ovhregistrycred   kubernetes.io/dockerconfigjson   1      49s</code></pre>



<p>The Kubernetes <strong>Secret</strong> have been created 🎉</p>



<p>We created a Secret directly from the key, but the OVHcloud ESO provider allows you to fetch the original secret from different parameters (fetch the whole secret, fetch nested values, fetch multiple secrets…), according to your needs.</p>



<h3 class="wp-block-heading">Conclusion</h3>



<p>In this blog, we’ve explained how to create secrets in the OVHcloud Secret Manager and then integrate them directly in your Kubernetes clusters using the new ESO OVHcloud provider.</p>



<p>With this brand new OVHcloud provider, you will have a smoother integration between the Secret Manager and your Kubernetes clusters with ESO.</p>



<p>Our team are working on several other integrations, so stay tuned, and please share your thoughts with us!</p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fdiscover-the-external-secret-operator-eso-ovhcloud-provider-to-manage-your-kubernetes-secrets-%25f0%259f%258e%2589%2F&amp;action_name=Discover%20the%20External%20Secret%20Operator%20%28ESO%29%20OVHcloud%20Provider%20to%20manage%20your%20Kubernetes%20secrets%20%20%F0%9F%8E%89&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Wrappers on Linux Workstations</title>
		<link>https://blog.ovhcloud.com/wrappers-on-linux-workstations/</link>
		
		<dc:creator><![CDATA[Isabelle Bauer]]></dc:creator>
		<pubDate>Mon, 13 Apr 2026 14:26:16 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=31132</guid>

					<description><![CDATA[As Linux Sys Admins, we are sometimes faced with dilemmas regarding what we can or cannot allow on machines. Some functionalities are very important to users for their daily tasks and overall better use of their devices, but they sometimes also come with security concerns.We came up with a way to still allow most of [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fwrappers-on-linux-workstations%2F&amp;action_name=Wrappers%20on%20Linux%20Workstations&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<figure class="wp-block-image aligncenter size-full"><img loading="lazy" decoding="async" width="648" height="419" src="https://blog.ovhcloud.com/wp-content/uploads/2026/04/image-5.png" alt="" class="wp-image-31146" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/04/image-5.png 648w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/image-5-300x194.png 300w" sizes="auto, (max-width: 648px) 100vw, 648px" /></figure>



<p>As Linux Sys Admins, we are sometimes faced with dilemmas regarding what we can or cannot allow on machines.<br><br>Some functionalities are very important to users for their daily tasks and overall better use of their devices, but they sometimes also come with security concerns.<br>We came up with a way to still allow most of these functionalities, while having more control over them, but also their outcome.<br><br>While the Linux user community is technical, their missions are still quite heterogeneous. They can range from developers, sysadmins, network engineers and more&#8230;<br>And they all work with very different workflows (from front-end web to the low-level driver). Sometimes on the laptop, on a docker, on a local VM or remotely on a development VM. Some may even need to hook up via a specific hardware.<br><br>Which leaves us users who are very much used to having access to every aspect of their personal computers rather frustrated when they are too limited.</p>



<p></p>



<h3 class="wp-block-heading"><br>Combining Usability and Security</h3>



<figure class="wp-block-image aligncenter size-full is-resized"><img loading="lazy" decoding="async" width="568" height="371" src="https://blog.ovhcloud.com/wp-content/uploads/2026/04/image-4.png" alt="" class="wp-image-31145" style="width:734px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/04/image-4.png 568w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/image-4-300x196.png 300w" sizes="auto, (max-width: 568px) 100vw, 568px" /></figure>



<p>Wrappers are usually used for abstraction and convenience; they often are relied on to simplify command-line workflows, enforce consistent parameters, or adapt legacy tools to modern environments.<br>In the case at hand, they are used a bit more like &#8220;guardrails.&#8221;</p>



<p>Take, as example, package management. Tools like apt are powerful but inherently risky when misused (or maliciously used), and capable of altering a system’s status by removing critical dependencies, etc&#8230;<br>Instead of exposing these tools directly (or completely removing access to them), our team provides a wrapped version that preserves essential functionality, while explicitly blocking operations that could compromise the system’s integrity.</p>



<h3 class="wp-block-heading">Why would our user base need access to apt? Why not just completely remove that option?</h3>



<p>Since our user base is rather technical, and knows their operating system rather well in general, they should be able to install authorized packages, or to update or remove them whenever they like (even though we also have a daily, automatic updates running too).<br>Plus, if they encounter any basic dpkg/apt issue, it makes sense for them to be able to resolve them autonomously.</p>



<p>Here is the list of options we made available:</p>



<pre class="wp-block-code"><code class="">Usage:<br>ovh-apt &lt;install|reinstall|remove|purge&gt; [OPTIONS] &lt;package|package=version&gt; [package...]<br>ovh-apt &lt;update|autoclean|clean&gt;<br>ovh-apt &lt;fix&gt; (this executes apt-get install -f)<br>ovh-apt &lt;fix-dpkg&gt; (this executes dpkg --configure -a)<br>Examples:<br>ovh-apt update<br>ovh-apt install vim<br>ovh-apt install vim=2:8.1.2269-1ubuntu5.17<br>ovh-apt install --only-upgrade bash<br>ovh-apt fix</code></pre>



<p></p>



<p>We also have a list of protected packages, to avoid having very useful ones deleted; firewall configuration, systemd, sudo, etc…<br>Basically, this includes everything that could have an impact on security or system integrity. As well, this wrapper in specific is non-interactive – in order to make sure a root shell is never offered – as can be the case natively with dpkg.</p>



<h3 class="wp-block-heading">How to prevent specific packages from being uninstalled?</h3>



<p>We have a .txt file containing a bunch of package names (one per line).<br>In our ovh-apt script, we look into that file, and if we find a corresponding package, we exit the script.</p>



<pre class="wp-block-code"><code class="">re="^(Purg|Remv) ([^ ]+) "<br>IFS="<br>"<br>protected="$(cat /etc/ovh/ovh-apt/protected.txt)"<br>apt_output=$(cat nohup.out)<br>for line in $apt_output ; do<br>if [[ "$line" =~ $re ]]; then<br>package="${BASH_REMATCH[2]}"<br>if [[ " ${protected[*]} " =~ [[:space:]]${package}[[:space:]] ]]; then<br>echo "Error: Package $package is protected, won't do."<br>cancel=1<br>fi<br>fi<br>done<br>unset IFS</code></pre>



<p></p>



<p></p>



<h3 class="wp-block-heading">There is more</h3>



<figure class="wp-block-image aligncenter size-full is-resized"><img loading="lazy" decoding="async" width="453" height="309" src="https://blog.ovhcloud.com/wp-content/uploads/2026/04/image-3.png" alt="" class="wp-image-31144" style="width:737px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/04/image-3.png 453w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/image-3-300x205.png 300w" sizes="auto, (max-width: 453px) 100vw, 453px" /></figure>



<p>By default, you would need root access on unix systems to be able to change the keyboard layout. We decided to make a wrapper to allow for some users to set the layout of their choice.<br>This implementation was also heavily requested by Linux users, and very understandably so.</p>



<p>Here&#8217;s how it works:</p>



<pre class="wp-block-code"><code class="">Usage: ovh-keyboard &lt;command&gt; [options]<br>Commands:<br>show -&gt; Show current keyboard configuration.<br>set &lt;layout&gt; -&gt; Update keyboard configuration.<br>Valid options: fr, us, gb, ca, es, it, de, pt</code></pre>



<p>Just for funsies, here is the list of other wrappers we use:<br><br><strong>ovh_backlightctl</strong> Allows user to control the backlight options of their monitors.<br><strong>ovh_snap </strong>Allows users to manage a list of snap packages on their device protected<br><strong>ovh_swapclean</strong> Obviously.<br><strong>ovh-systemctl</strong> Allows specific and unharmful systemctl commands.<br><strong>nmcli_wrapper</strong> Blocks the –show-secrets options with nmcli. ‘Cause we don’t want secrets to be seen (that’s why they’re secret).</p>



<p>We definitely will keep on using wrappers, whether it is for user or security needs, when the use case allows it. We find this way of handling the accessibility / security compromise fits quite well with how we manage the Linux parc so far.</p>



<p></p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fwrappers-on-linux-workstations%2F&amp;action_name=Wrappers%20on%20Linux%20Workstations&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Reference Architecture: Deploying a vision-language model with vLLM on OVHcloud MKS for high performance inference and full observability</title>
		<link>https://blog.ovhcloud.com/reference-architecture-deploying-a-vision-language-model-with-vllm-on-ovhcloud-mks-for-high-performance-inference-and-full-observability/</link>
		
		<dc:creator><![CDATA[Eléa Petton]]></dc:creator>
		<pubDate>Fri, 10 Apr 2026 07:48:53 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[GPU]]></category>
		<category><![CDATA[Kubernetes]]></category>
		<category><![CDATA[LLM]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[OVHcloud]]></category>
		<category><![CDATA[prometheus]]></category>
		<category><![CDATA[Public Cloud]]></category>
		<category><![CDATA[vLLM]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=30455</guid>

					<description><![CDATA[Ensure complete&#160;digital sovereignty&#160;of your AI models with end-to-end control through open-source solutions on OVHcloud’s&#160;Managed Kubernetes Service. This reference architecture demonstrates how to deploy a Large Language Model (LLM) inference system using vLLM on&#160;OVHcloud Managed Kubernetes Service&#160;(MKS). The solution leverages NVIDIA L40S GPUs to serve the&#160;Qwen3-VL-8B-Instruct&#160;multimodal model (vision + text) with OpenAI-compatible API endpoints. This comprehensive [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Freference-architecture-deploying-a-vision-language-model-with-vllm-on-ovhcloud-mks-for-high-performance-inference-and-full-observability%2F&amp;action_name=Reference%20Architecture%3A%20Deploying%20a%20vision-language%20model%20with%20vLLM%20on%20OVHcloud%20MKS%20for%20high%20performance%20inference%20and%20full%20observability&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p><em><em>Ensure complete&nbsp;<strong>digital sovereignty</strong>&nbsp;of your AI models with end-to-end control through open-source solutions on OVHcloud’s&nbsp;<strong>Managed Kubernetes Service</strong>.</em></em></p>



<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" width="703" height="1024" src="https://blog.ovhcloud.com/wp-content/uploads/2026/04/ref-archi-mks-vllm-703x1024.jpg" alt="vLLM on OVHcloud MKS for high availability and full observability" class="wp-image-31153" style="width:710px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/04/ref-archi-mks-vllm-703x1024.jpg 703w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/ref-archi-mks-vllm-206x300.jpg 206w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/ref-archi-mks-vllm-768x1118.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/ref-archi-mks-vllm-1055x1536.jpg 1055w, https://blog.ovhcloud.com/wp-content/uploads/2026/04/ref-archi-mks-vllm.jpg 1260w" sizes="auto, (max-width: 703px) 100vw, 703px" /><figcaption class="wp-element-caption"><em><em>vLLM on OVHcloud MKS for high availability and full observability</em></em></figcaption></figure>



<p>This reference architecture demonstrates how to deploy a Large Language Model (LLM) inference system using vLLM on&nbsp;<a href="https://www.ovhcloud.com/fr/public-cloud/kubernetes/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OVHcloud Managed Kubernetes Service</a>&nbsp;(MKS). The solution leverages NVIDIA L40S GPUs to serve the&nbsp;<a href="https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Qwen3-VL-8B-Instruct</a>&nbsp;multimodal model (vision + text) with OpenAI-compatible API endpoints.</p>



<p>This comprehensive guide shows you how to deploy, to scale automatically, and how to monitor vLLM-based LLM workloads on the OVHcloud infrastructure.</p>



<p><strong>What are the key benefits?</strong></p>



<ul class="wp-block-list">
<li><strong>Cost-effectiveness:</strong>&nbsp;Leverage managed services to minimise operational overhead</li>



<li><strong>Real-time observability:</strong>&nbsp;Track Time-to-First-Token (TTFT), throughput, and resource utilisation</li>



<li><strong>Sovereign infrastructure:</strong>&nbsp;Keep all metrics and data within European datacentres</li>



<li><strong>Scalable by design:</strong>&nbsp;Automatically scale GPU inference replicas based on real workload demand</li>
</ul>



<h2 class="wp-block-heading">Context</h2>



<h3 class="wp-block-heading">Managed Kubernetes Service</h3>



<p><strong>OVHcloud MKS</strong>&nbsp;is a fully managed Kubernetes platform designed to help you deploy, operate, and scale containerised applications in production. It provides a secure and reliable Kubernetes environment without the operational overhead of managing the control plane.</p>



<p><strong>How does this benefit you?</strong></p>



<ul class="wp-block-list">
<li><strong>Cost-efficient</strong>: Pay only for worker nodes and consumed resources, with no additional charge for the Kubernetes control plane</li>



<li><strong>Fully managed Kubernetes</strong>: Certified upstream Kubernetes with automated control plane management, provided upgrades and high availability</li>



<li><strong>Production-ready by design</strong>: Built-in integrations with OVHcloud Load Balancers, networking, and persistent storage</li>



<li><strong>Scalable and flexible</strong>: Scale workloads easily, node pools to match application demand</li>



<li><strong>Open and portable</strong>: Based on standard Kubernetes APIs, enable seamless integration with open-source ecosystems and avoid vendor lock-in</li>
</ul>



<p>In the following guide, all services are deployed within the&nbsp;<strong>OVHcloud Public Cloud</strong>.</p>



<h2 class="wp-block-heading">Architecture overview</h2>



<p>This reference architecture demonstrates a basic deployment of vLLM for vision-language model inference on OVHcloud Managed Kubernetes Service, featuring:</p>



<ul class="wp-block-list">
<li><strong>High-availability deployment</strong>&nbsp;with 2 GPU nodes (NVIDIA L40S)</li>



<li><strong>Optimised GPU utilisation</strong>&nbsp;with proper driver configuration</li>



<li><strong>Scalable infrastructure</strong>&nbsp;supporting vision-language models</li>



<li><strong>Comprehensive monitoring</strong>&nbsp;using Prometheus, Grafana, and DCGM</li>



<li><strong>Full observability</strong>&nbsp;for both application and hardware metrics</li>
</ul>



<p><strong>Data flow</strong>:</p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="538" src="https://blog.ovhcloud.com/wp-content/uploads/2026/03/data_ia_archi-3-1-1024x538.jpg" alt="" class="wp-image-30985" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/03/data_ia_archi-3-1-1024x538.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/03/data_ia_archi-3-1-300x158.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/03/data_ia_archi-3-1-768x403.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/03/data_ia_archi-3-1-1536x806.jpg 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/03/data_ia_archi-3-1-2048x1075.jpg 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption"><em>Data Flow</em></figcaption></figure>



<ol class="wp-block-list">
<li><strong>Inference request:</strong>
<ul class="wp-block-list">
<li>User → LoadBalancer → Gateway → NGINX Ingress → &#8220;Qwen3 VL&#8221; Service → vLLM Pod → GPU</li>



<li>Response follows reverse path with streaming support</li>
</ul>
</li>



<li><strong>Metrics collection:</strong>
<ul class="wp-block-list">
<li>vLLM Pods expose <code>/metrics</code> endpoint (port <code><strong><mark class="has-inline-color has-ast-global-color-0-color">8000</mark></strong></code>)</li>



<li>DCGM Exporters expose GPU metrics (port <code><strong><mark class="has-inline-color has-ast-global-color-0-color">9400</mark></strong></code>)</li>



<li>Prometheus scrapes both endpoints every 30 seconds</li>



<li>Grafana queries Prometheus for visualization</li>
</ul>
</li>



<li><strong>Load distribution</strong>
<ul class="wp-block-list">
<li>NGINX Ingress uses cookie-based session affinity</li>



<li>vLLM Service uses ClientIP session affinity</li>



<li>Anti-affinity ensures 1 pod per GPU node</li>
</ul>
</li>
</ol>



<h2 class="wp-block-heading">Prerequisites</h2>



<p>Before you begin, ensure you have:</p>



<ul class="wp-block-list">
<li>An&nbsp;<strong>OVHcloud Public Cloud</strong>&nbsp;account</li>



<li>An&nbsp;<strong>OpenStack user</strong>&nbsp;with the<a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-users?id=kb_article_view&amp;sysparm_article=KB0048170" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">&nbsp;</a><strong><code>Administrator</code></strong>&nbsp;role</li>



<li><strong>Hugging Face access</strong>&nbsp;–&nbsp;<em>create a&nbsp;<a href="https://huggingface.co/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Hugging Face account</a>&nbsp;and generate an&nbsp;<a href="https://huggingface.co/settings/tokens" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">access token</a></em></li>



<li><code><strong>kubectl</strong></code>&nbsp;already installed and&nbsp;<code><strong>helm</strong></code>&nbsp;installed (at least version 3.x)</li>
</ul>



<p><strong>🚀 Now you have all the ingredients, it’s time to deploy the recipe for&nbsp;<a href="https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Qwen/Qwen3-VL-8B-Instruct</a>&nbsp;using vLLM and MKS!</strong></p>



<h2 class="wp-block-heading">Architecture guide: Native GPU deployment of vLLM on MKS with full stack observability</h2>



<p>This reference architecture describes a<strong>&nbsp;Large Language Model</strong>&nbsp;deployment using&nbsp;<strong>vLLM inference server&nbsp;</strong>and&nbsp;<strong>Kubernetes</strong>, to enjoy the&nbsp;benefits of a service that&#8217;s both highly available and monitorable in real time.</p>



<h3 class="wp-block-heading">Step 1 &#8211; Create MKS cluster and Node pools</h3>



<p>From&nbsp;<a href="https://www.ovh.com/manager/" target="_blank" rel="noreferrer noopener" data-wpel-link="exclude">OVHcloud Control Panel</a>, create a Kubernetes cluster using the&nbsp;<strong>MKS</strong>. </p>



<p>Navigate to: <code>Public Cloud</code> → <code>Managed Kubernetes Service</code> → <code>Create a cluster</code></p>



<h4 class="wp-block-heading">1. Configure cluster</h4>



<p>Consider using the following configuration for the current use case:</p>



<ul class="wp-block-list">
<li><strong>Name:</strong> <code><strong><mark class="has-inline-color has-ast-global-color-0-color">vllm-deployment-l40s-qwen3-8b</mark></strong></code></li>



<li><strong>Location</strong>: 1-AZ Region &#8211; Gravelines (<code><strong><mark class="has-inline-color has-ast-global-color-0-color">GRA11</mark></strong></code>)</li>



<li><strong>Plan:</strong> Free (or Standard)</li>



<li><strong>Network</strong>: attach a <strong>Private network </strong>(e.g. <code><strong><mark class="has-inline-color has-ast-global-color-0-color">0000 - AI Private Network</mark></strong></code>)</li>



<li><strong>Version:</strong> Latest stable (e.g. <code><strong><mark class="has-inline-color has-ast-global-color-0-color">1.34</mark></strong></code>)</li>
</ul>



<h4 class="wp-block-heading">2. Create GPU Node pool</h4>



<p>During the cluster creation, configure the vLLM Node pool for GPUs:</p>



<ul class="wp-block-list">
<li><strong>Node pool name:</strong> <code><mark class="has-inline-color has-ast-global-color-0-color">vllm</mark></code></li>



<li><strong>Flavor:</strong> <code><mark class="has-inline-color has-ast-global-color-0-color">L40S-90</mark></code></li>



<li><strong>Number of nodes:</strong> <code><mark class="has-inline-color has-ast-global-color-0-color">2</mark></code></li>



<li><strong>Autoscaling:</strong> Disabled (OFF)</li>
</ul>



<p><strong>Why L40S-90?</strong></p>



<ul class="wp-block-list">
<li>Cost-effective for single-model deployment (1 GPU per node)</li>



<li>Sufficient RAM (90GB) for <strong><code><mark class="has-inline-color has-ast-global-color-0-color">Qwen3-VL-8B</mark></code></strong> model</li>
</ul>



<p>You should see your cluster (e.g.&nbsp;<code><strong><mark class="has-inline-color has-ast-global-color-0-color">vllm-deployment-l40s-qwen3-8b</mark></strong></code>) in the list, along with the following information:</p>



<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="930" height="588" src="https://blog.ovhcloud.com/wp-content/uploads/2026/03/image-1.png" alt="" class="wp-image-30745" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/03/image-1.png 930w, https://blog.ovhcloud.com/wp-content/uploads/2026/03/image-1-300x190.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/03/image-1-768x486.png 768w" sizes="auto, (max-width: 930px) 100vw, 930px" /></figure>



<p>You can now set up the node pool dedicated to monitoring.</p>



<h4 class="wp-block-heading">3. Create CPU Node pool</h4>



<p>From your cluster, click on <code><strong><mark class="has-inline-color has-ast-global-color-0-color">Add a node pool</mark></strong></code> and configure it as follow:</p>



<ul class="wp-block-list">
<li><strong>Node pool name:</strong> <mark class="has-inline-color has-ast-global-color-0-color"><code>monitoring</code></mark></li>



<li><strong>Flavor:</strong> <code><mark class="has-inline-color has-ast-global-color-0-color">B2-15</mark></code></li>



<li><strong>Number of nodes:</strong> <code><mark class="has-inline-color has-ast-global-color-0-color">1</mark></code></li>



<li><strong>Autoscaling:</strong> Disabled (OFF)</li>
</ul>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>✅ <strong>Note</strong></p>



<p><strong><em>Monitoring stack can run on GPU nodes if cost is a concern. Dedicated CPU node provides better isolation and resource management.</em></strong></p>
</blockquote>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="365" src="https://blog.ovhcloud.com/wp-content/uploads/2026/03/monitoring-node-pool-creation-1024x365.png" alt="" class="wp-image-30743" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/03/monitoring-node-pool-creation-1024x365.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/03/monitoring-node-pool-creation-300x107.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/03/monitoring-node-pool-creation-768x274.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/03/monitoring-node-pool-creation.png 1283w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>If the status is green with the&nbsp;<strong><code><mark class="has-inline-color has-ast-global-color-0-color">OK</mark></code></strong>&nbsp;label, you can proceed to the next step.</p>



<h4 class="wp-block-heading">4. Configure Kubernetes access</h4>



<p>Once your nodes have been provisioned, you can download the <strong>Kubeconfig file</strong> and configure kubectl with your MKS cluster.</p>



<pre class="wp-block-code"><code class=""># configure kubectl with your MKS cluster<br>export KUBECONFIG=/path/to/your/kubeconfig-xxxxxx.yml<br><br># verify cluster connectivity<br>kubectl cluster-info<br>kubectl get nodes</code></pre>



<p>Returning:</p>



<p><code>NAME &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; STATUS &nbsp; ROLES&nbsp; &nbsp; AGE &nbsp; VERSION<br>monitoring-node-xxxxxx &nbsp; Ready&nbsp; &nbsp; &lt;none&gt; &nbsp; 1d &nbsp; v1.34.2<br>vllm-node-yyyyyy &nbsp; &nbsp; &nbsp; &nbsp; Ready&nbsp; &nbsp; &lt;none&gt; &nbsp; 1d &nbsp; v1.34.2<br>vllm-node-zzzzzz &nbsp; &nbsp; &nbsp; &nbsp; Ready&nbsp; &nbsp; &lt;none&gt; &nbsp; 1d &nbsp; v1.34.2</code></p>



<p>Before going further, add a label to the CPU node for monitoring workloads.</p>



<pre class="wp-block-code"><code class="">CPU_NODE=$(kubectl get nodes -o json | \<br>  jq -r '.items[] | select(.status.allocatable."nvidia.com/gpu" == null) | .metadata.name')<br>kubectl label node $CPU_NODE node-role=monitoring</code></pre>



<p>Finally, check with the following command:</p>



<pre class="wp-block-code"><code class="">NAME                     GPU      ROLE<br>monitoring-node-xxxxxx   &lt;none&gt;   monitoring<br>vllm-node-yyyyyy         1        &lt;none&gt;<br>vllm-node-zzzzzz         1        &lt;none&gt;</code></pre>



<p>Once both nodes are in <strong>Ready</strong> status, you can proceed to the next step.</p>



<h3 class="wp-block-heading">Step 2 &#8211; Install GPU operator</h3>



<p>To start, consider setting up the GPU operator.</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p><strong>✅ Note</strong></p>



<p><em><strong>This step is based on this OVHcloud documentation: <a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-kubernetes-deploy-gpu-application?id=kb_article_view&amp;sysparm_article=KB0049707" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Deploying a GPU application on OVHcloud Managed Kubernetes Service</a></strong></em></p>
</blockquote>



<h4 class="wp-block-heading">1. Add NVIDIA helm repository and create namespace</h4>



<p>Add NVIDIA helm repo:</p>



<pre class="wp-block-code"><code class="">helm repo add nvidia https://helm.ngc.nvidia.com/nvidia<br>helm repo update</code></pre>



<p>And create Namespace as follow.</p>



<pre class="wp-block-code"><code class="">kubectl create namespace gpu-operator</code></pre>



<h4 class="wp-block-heading">2. Install GPU operator with correct configuration</h4>



<p>The GPU Operator must be configured with specific driver versions to ensure compatibility with vLLM containers.</p>



<p>However, the default installation uses recent drivers (<code><strong><mark class="has-inline-color has-ast-global-color-0-color">580.x</mark></strong></code> with <strong><code><mark class="has-inline-color has-ast-global-color-0-color">CUDA 13.x</mark></code></strong>) which are incompatible with vLLM containers (<strong><code><mark class="has-inline-color has-ast-global-color-0-color">CUDA 12.x</mark></code></strong>).</p>



<p><strong>Solution:</strong> Force driver version <strong><code><mark class="has-inline-color has-ast-global-color-0-color">535.183.01</mark></code></strong> (<code><strong><mark class="has-inline-color has-ast-global-color-0-color">CUDA 12.2</mark></strong></code>).</p>



<pre class="wp-block-code"><code class="">helm install gpu-operator nvidia/gpu-operator \<br>  -n gpu-operator \<br>  --set driver.enabled=true \<br>  --set driver.version="535.183.01" \<br>  --set toolkit.enabled=true \<br>  --set operator.defaultRuntime=containerd \<br>  --set devicePlugin.enabled=true \<br>  --set dcgmExporter.enabled=true \<br>  --set dcgmExporter.image="dcgm-exporter" \<br>  --set dcgmExporter.version="3.1.7-3.1.4-ubuntu20.04" \<br>  --set gfd.enabled=true \<br>  --set migManager.enabled=false \<br>  --set nodeStatusExporter.enabled=true \<br>  --set validator.driver.enable=false \<br>  --set validator.toolkit.enable=false \<br>  --set validator.plugin.enable=false \<br>  --timeout 20m</code></pre>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>✅ <strong>Note </strong></p>



<p><em><strong>Specifying the DCGM version may only be necessary if you encounter problems with the default image (e.g. <code><mark class="has-inline-color has-ast-global-color-0-color">‘ImagePullBackOff’</mark></code>). If this is the case, add the following parameters:<br><code><mark class="has-inline-color has-ast-global-color-0-color">--set dcgmExporter.repository="nvcr.io/nvidia/k8s"<br>--set dcgmExporter.image="dcgm-exporter"<br>--set dcgmExporter.version="3.1.7-3.1.4-ubuntu20.04"</mark></code></strong></em></p>
</blockquote>



<pre class="wp-block-code"><code class="">kubectl get pods -n gpu-operator</code></pre>



<p>Note that all pods should reach <strong>Running</strong> state in 5-10 minutes.</p>



<p>You can also check the GPU availability:</p>



<pre class="wp-block-code"><code class="">kubectl get nodes -o json | jq -r '.items[] | select(.status.allocatable."nvidia.com/gpu" != null) | "\(.metadata.name): \(.status.allocatable."nvidia.com/gpu") GPU(s)"'</code></pre>



<p>Returning:</p>



<p><code>vllm-node-<code>yyyyyy</code>: 1 GPU(s)<br>vllm-node-zzzzzz: 1 GPU(s)</code></p>



<p>And you can test to run <code><strong><mark class="has-inline-color has-ast-global-color-0-color">nvidia-smi</mark></strong></code>:</p>



<pre class="wp-block-code"><code class="">DRIVER_POD=$(kubectl get pods -n gpu-operator -l app=nvidia-driver-daemonset -o name | head -1)<br>kubectl exec -n gpu-operator $DRIVER_POD -- nvidia-smi</code></pre>



<p>If GPU tests are working properly, you can move on DCGM service configuration.</p>



<h4 class="wp-block-heading">3. Configure DCGM service</h4>



<p><strong>Why is DCGM Exporter required?</strong></p>



<p>DCGM (Data Centre GPU Manager) is NVIDIA&#8217;s official tool for monitoring GPUs in production. The goal is to be able to collect and display metrics from both GPU nodes.</p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="571" src="https://blog.ovhcloud.com/wp-content/uploads/2026/03/data_ia_archi-1-1024x571.jpg" alt="" class="wp-image-30746" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/03/data_ia_archi-1-1024x571.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/03/data_ia_archi-1-300x167.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/03/data_ia_archi-1-768x428.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/03/data_ia_archi-1-1536x856.jpg 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/03/data_ia_archi-1.jpg 1733w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption"><em>GPU monitoring with DCGM</em></figcaption></figure>



<p>The metrics provided are:</p>



<ul class="wp-block-list">
<li><code><strong><mark class="has-inline-color has-ast-global-color-0-color">DCGM_FI_DEV_GPU_UTIL</mark></strong></code> &#8211; GPU utilisation (%)</li>



<li><strong><code><mark class="has-inline-color has-ast-global-color-0-color">DCGM_FI_DEV_GPU_TEMP</mark></code></strong> &#8211; GPU temperature (°C)</li>



<li><strong><code><mark class="has-inline-color has-ast-global-color-0-color">DCGM_FI_DEV_FB_USED</mark></code></strong> &#8211; VRAM used (MB)</li>



<li><strong><code><mark class="has-inline-color has-ast-global-color-0-color">DCGM_FI_DEV_FB_FREE</mark></code></strong> &#8211; Free VRAM (MB)</li>



<li><strong><code><mark class="has-inline-color has-ast-global-color-0-color">DCGM_FI_DEV_POWER_USAGE</mark></code></strong> &#8211; Power consumption (W)</li>



<li>And 50+ other GPU metrics</li>
</ul>



<p>Next, ensure DCGM service has the correct labels and port configuration:</p>



<pre class="wp-block-code"><code class="">kubectl patch svc nvidia-dcgm-exporter -n gpu-operator --type merge -p '{<br>  "metadata": {<br>    "labels": {<br>      "app": "nvidia-dcgm-exporter"<br>    }<br>  },<br>  "spec": {<br>    "ports": [<br>      {<br>        "name": "metrics",<br>        "port": 9400,<br>        "targetPort": 9400,<br>        "protocol": "TCP"<br>      }<br>    ]<br>  }<br>}'</code></pre>



<p>Verify the endpoints (should show 2 IPs, one per GPU node).</p>



<pre class="wp-block-code"><code class="">kubectl get endpoints nvidia-dcgm-exporter -n gpu-operator</code></pre>



<p><code>NAME &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ENDPOINTS &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; AGE<br>nvidia-dcgm-exporter &nbsp; x.x.x.x:9400,x.x.x.x:9400 &nbsp; 17d</code></p>



<h3 class="wp-block-heading">Step 3 &#8211; Deploy Qwen3 VL 8B with vLLM inference server</h3>



<p>The deployment of the <strong>Qwen 3 VL 8B</strong> model on two L40S GPU nodes is carried out in several stages.</p>



<h4 class="wp-block-heading">1. Create namespace and Hugging Face secret</h4>



<p>Start by creating Namespace:</p>



<pre class="wp-block-code"><code class="">kubectl create namespace vllm</code></pre>



<p>Next, you must retrieve your Hugging Face token and replace the&nbsp;<code><strong><mark class="has-inline-color has-ast-global-color-0-color">HF_TOKEN</mark></strong></code>&nbsp;value by your own:</p>



<pre class="wp-block-code"><code class="">export HF_TOKEN="hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"</code></pre>



<p>Create your secret as follow:</p>



<pre class="wp-block-code"><code class="">kubectl create secret generic huggingface-secret \<br>  --from-literal=token=$HF_TOKEN \<br>  --namespace=vllm</code></pre>



<p>Verify you obtain the following output by launching:</p>



<pre class="wp-block-code"><code class="">kubectl get secret huggingface-secret -n vllm</code></pre>



<p><code>NAME &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; TYPE &nbsp; &nbsp; DATA &nbsp; AGE<br>huggingface-secret &nbsp; Opaque &nbsp; 1&nbsp; &nbsp; &nbsp; 14d</code></p>



<h4 class="wp-block-heading">2. Create vLLM deployment configuration</h4>



<p>First, you can create <code><strong><a href="https://github.com/ovh/public-cloud-examples/blob/main/containers-orchestration/managed-kubernetes/gpu-cluster-for-vllm-deployment-and-observability/vllm/vllm-deployment-2nodes.yaml" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">vllm-deployment-2nodes.yaml</a></strong></code> file.</p>



<p>Deploy vLLM:</p>



<pre class="wp-block-code"><code class="">kubectl apply -f vllm-deployment-2nodes.yaml</code></pre>



<p>You can monitor the deployment (it should take 8-10 minutes for model download and loading).</p>



<pre class="wp-block-code"><code class="">kubectl get pods -n vllm -o wide -w</code></pre>



<p>Expected output after 10 minutes:</p>



<pre class="wp-block-code"><code class="">NAME               READY  STATUS   RESTARTS  AGE  IP       NODE  <br>qwen3-vl-xxxx-yyy  1/1    Running  0         1d   X.X.X.X  vllm-node-yyyyyy<br>qwen3-vl-xxxx-zzz  1/1    Running  0         1d   X.X.X.X  vllm-node-zzzzzz</code></pre>



<p>You can also check the container logs:</p>



<pre class="wp-block-code"><code class="">kubectl logs -f -n vllm &lt;pod-name&gt;</code></pre>



<p>You should find in the logs: &#8220;<code>Uvicorn running on http://0.0.0.0:8000</code>&#8220;</p>



<p>Is everything installed correctly? Then let&#8217;s move on to the next step.</p>



<h4 class="wp-block-heading">3. Add service label</h4>



<p>Ensure service has the correct label for <strong><code><mark class="has-inline-color has-ast-global-color-0-color">ServiceMonitor</mark></code></strong> discovery.</p>



<pre class="wp-block-code"><code class="">kubectl label svc qwen3-vl-service -n vllm app=qwen3-vl --overwrite</code></pre>



<p>You can now verify by launching the following command.</p>



<pre class="wp-block-code"><code class="">kubectl get svc qwen3-vl-service -n vllm --show-labels | grep "app=qwen3-vl"</code></pre>



<p>Returning:</p>



<p><code>qwen3-vl-service&nbsp; ClusterIP&nbsp; X.X.X.X &nbsp;&lt;none&gt;  8000/TCP  1d &nbsp;app=qwen3-vl</code></p>



<h3 class="wp-block-heading">Step 4 &#8211; Install NGINX ingress controller</h3>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p><mark style="color:#cf2e2e" class="has-inline-color">⚠️ <strong>Moving beyond Ingress</strong></mark></p>



<p><strong><em><mark style="color:#cf2e2e" class="has-inline-color">Follow this <a href="https://blog.ovhcloud.com/moving-beyond-ingress-why-should-ovhcloud-managed-kubernetes-service-mks-users-start-looking-at-the-gateway-api/" data-wpel-link="internal">tutorial</a> if you want to use Gateway instead of Ingress.</mark></em></strong></p>
</blockquote>



<h4 class="wp-block-heading">1. Add helm repository and configure Ingress</h4>



<p>First of all, add helm repository:</p>



<pre class="wp-block-code"><code class="">helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx<br>helm repo update</code></pre>



<p>Create configuration file with <code><strong><a href="https://github.com/ovh/public-cloud-examples/blob/main/containers-orchestration/managed-kubernetes/gpu-cluster-for-vllm-deployment-and-observability/ingress/ingress-nginx-values.yaml" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">ingress-nginx-values.yaml</a></strong></code>.</p>



<p>Then, install NGINX Ingress:</p>



<pre class="wp-block-code"><code class="">helm install ingress-nginx ingress-nginx/ingress-nginx \<br>  --namespace ingress-nginx \<br>  --create-namespace \<br>  -f ingress-nginx-values.yaml \<br>  --wait</code></pre>



<p>Wait for LoadBalancer IP. The external IP assignment should take 1-2 minutes.</p>



<pre class="wp-block-code"><code class="">kubectl get svc -n ingress-nginx ingress-nginx-controller -w</code></pre>



<p>Once <code><strong><mark class="has-inline-color has-ast-global-color-0-color">&lt;EXTERNAL-IP&gt;</mark></strong></code> is no longer , Ctrl+C and export it:</p>



<pre class="wp-block-code"><code class="">export EXTERNAL_IP=&lt;EXTERNAL-IP&gt;<br>echo "API URL: http://$EXTERNAL_IP"</code></pre>



<h4 class="wp-block-heading">2. Create vLLM Ingress resource</h4>



<p>Next, create vLLM Ingress using <strong><code><a href="https://github.com/ovh/public-cloud-examples/blob/ep-vllm-deployment-observability-mks/containers-orchestration/managed-kubernetes/gpu-cluster-for-vllm-deployment-and-observability/vllm/vllm-ingress.yaml" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">vllm-ingress.yaml</a></code></strong>.</p>



<p>Apply it as follow:</p>



<pre class="wp-block-code"><code class="">kubectl apply -f vllm-ingress.yaml</code></pre>



<p>You can now test different API calls to verify that your deployment is functional.</p>



<h4 class="wp-block-heading">3. Test API</h4>



<p>Firstly, check if the model is available:</p>



<pre class="wp-block-code"><code class="">curl http://$EXTERNAL_IP/v1/models | jq</code></pre>



<pre class="wp-block-preformatted"><code>{<br>  "object": "list",<br>  "data": [<br>    {<br>      "id": "qwen3-vl-8b",<br>      "object": "model",<br>      "created": 1772472143,<br>      "owned_by": "vllm",<br>      "root": "Qwen/Qwen3-VL-8B-Instruct",<br>      "parent": null,<br>      "max_model_len": 8192,<br>      "permission": [<br>        {<br>          "id": "modelperm-8fb35cdd3208b068",<br>          "object": "model_permission",<br>          "created": 1772472143,<br>          "allow_create_engine": false,<br>          "allow_sampling": true,<br>          "allow_logprobs": true,<br>          "allow_search_indices": false,<br>          "allow_view": true,<br>          "allow_fine_tuning": false,<br>          "organization": "*",<br>          "group": null,<br>          "is_blocking": false<br>        }<br>      ]<br>    }<br>  ]<br>}</code></pre>



<p>Next, test inference using the following request:</p>



<pre class="wp-block-code"><code class="">curl http://$EXTERNAL_IP/v1/chat/completions \<br>  -H "Content-Type: application/json" \<br>  -d '{<br>    "model": "qwen3-vl-8b",<br>    "messages": [{"role": "user", "content": "Count from 1 to 10."}],<br>    "max_tokens": 100<br>  }' | jq '.choices[0].message.content'</code></pre>



<p><code>"1, 2, 3, 4, 5, 6, 7, 8, 9, 10"</code></p>



<p>Great! You&#8217;re almost there…</p>



<h3 class="wp-block-heading">Step 5 &#8211; Install Prometheus stack</h3>



<p>Now, set up the monitoring stack that provides complete observability for&nbsp;<strong>application-level&nbsp;</strong>(vLLM) and&nbsp;<strong>hardware-level</strong>&nbsp;(GPU) metrics:</p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="763" src="https://blog.ovhcloud.com/wp-content/uploads/2026/03/monitoring-architecture-1024x763.jpg" alt="" class="wp-image-30871" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/03/monitoring-architecture-1024x763.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/03/monitoring-architecture-300x223.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/03/monitoring-architecture-768x572.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/03/monitoring-architecture-1536x1144.jpg 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/03/monitoring-architecture.jpg 1673w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption"><em>Monitoring architecture</em></figcaption></figure>



<h4 class="wp-block-heading">1. Add helm repository and create namespace</h4>



<p>Add Prometheus helm repo:</p>



<pre class="wp-block-code"><code class="">helm repo add prometheus-community https://prometheus-community.github.io/helm-charts<br>helm repo update</code></pre>



<p>Then, create the <code><strong><mark class="has-inline-color has-ast-global-color-0-color">monitoring</mark></strong></code> Namespace.</p>



<pre class="wp-block-code"><code class="">kubectl create namespace monitoring</code></pre>



<h4 class="wp-block-heading">2. Create Prometheus deployment configuration and installation</h4>



<p>First, create <code><strong><a href="https://github.com/ovh/public-cloud-examples/blob/main/containers-orchestration/managed-kubernetes/gpu-cluster-for-vllm-deployment-and-observability/monitoring/prometheus.yaml" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">prometheus.yaml</a></strong></code> file.</p>



<p>Install Prometheus stack:</p>



<pre class="wp-block-code"><code class="">helm install prometheus prometheus-community/kube-prometheus-stack \<br>  -n monitoring \<br>  -f prometheus.yaml \<br>  --timeout 10m \<br>  --wait</code></pre>



<p>Now,&nbsp;monitor its installation and wait until the pods are ready:</p>



<pre class="wp-block-code"><code class="">kubectl get pods -n monitoring -w</code></pre>



<p>If all pods are running successfully, you can proceed to the next step.</p>



<h4 class="wp-block-heading">3. Check that the installation is operational</h4>



<p>First access Grafana in background:</p>



<pre class="wp-block-code"><code class="">kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80 &amp;</code></pre>



<p>Test Grafana health:</p>



<pre class="wp-block-code"><code class="">curl -s http://localhost:3000/api/health | jq</code></pre>



<pre class="wp-block-preformatted"><code>{<br>  "database": "ok",<br>  "version": "12.3.3",<br>  "commit": "2a14494b2d6ab60f860d8b27603d0ccb264336f6"<br>}</code></pre>



<p>You can now access to Grafana locally via <strong><a href="http://localhost:3000" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><code>http://localhost:3000</code></a></strong>. You will have to use:</p>



<ul class="wp-block-list">
<li>Login: <code><strong><mark style="color:#cf2e2e" class="has-inline-color">admin</mark></strong></code></li>



<li>Password: <code><strong><mark style="color:#cf2e2e" class="has-inline-color">Admin123!vLLM</mark></strong></code></li>
</ul>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="518" src="https://blog.ovhcloud.com/wp-content/uploads/2026/03/image-2-1024x518.png" alt="" class="wp-image-30804" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/03/image-2-1024x518.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/03/image-2-300x152.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/03/image-2-768x389.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/03/image-2.png 1322w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Well done! You can now proceed to the configuration step.</p>



<h3 class="wp-block-heading">Step 6 &#8211; Configure ServiceMonitors</h3>



<p>The ServiceMonitors is used to tell Prometheus which endpoints to scrape for metrics.</p>



<h4 class="wp-block-heading">1. Create vLLM ServiceMonitor</h4>



<p>Retrieve the file from the GitHub repository: <code><strong><a href="https://github.com/ovh/public-cloud-examples/blob/main/containers-orchestration/managed-kubernetes/gpu-cluster-for-vllm-deployment-and-observability/monitoring/vllm-servicemonitor.yaml" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">vllm-servicemonitor.yaml</a></strong></code>.</p>



<p>Next, apply and check that the ServiceMonitor <code><strong><mark class="has-inline-color has-ast-global-color-0-color">vllm-metrics</mark></strong></code> exists:</p>



<pre class="wp-block-code"><code class="">kubectl apply -f vllm-servicemonitor.yaml<br>kubectl get servicemonitor -n vllm</code></pre>



<h4 class="wp-block-heading">2. Create DCGM ServiceMonitor</h4>



<p>First, create the <code><strong><a href="https://github.com/ovh/public-cloud-examples/blob/main/containers-orchestration/managed-kubernetes/gpu-cluster-for-vllm-deployment-and-observability/monitoring/dcgm-servicemonitor.yaml" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">dcgm-servicemonitor.yaml</a></strong></code> file.</p>



<p>Once again, apply and verify:</p>



<pre class="wp-block-code"><code class="">kubectl apply -f dcgm-servicemonitor.yaml<br>kubectl get servicemonitor -n gpu-operator</code></pre>



<pre class="wp-block-preformatted"><code>gpu-operator                  1d<br>nvidia-dcgm-exporter          1d<br>nvidia-node-status-exporter   1d</code></pre>



<h4 class="wp-block-heading">3. Configure Prometheus for Cross-Namespace discovery</h4>



<p>Apply a patch to allow Prometheus to discover ServiceMonitors in all namespaces.</p>



<pre class="wp-block-code"><code class="">kubectl patch prometheus prometheus-kube-prometheus-prometheus -n monitoring --type merge -p '{<br>  "spec": {<br>    "serviceMonitorNamespaceSelector": {},<br>    "podMonitorNamespaceSelector": {}<br>  }<br>}'</code></pre>



<p>Now you have to restart Prometheus.</p>



<ol class="wp-block-list">
<li>Delete Prometheus pod to force configuration reload</li>



<li>Wait for Prometheus to restart</li>
</ol>



<pre class="wp-block-code"><code class="">kubectl delete pod prometheus-prometheus-kube-prometheus-prometheus-0 -n monitoring<br><br>kubectl wait --for=condition=Ready \<br>  pod/prometheus-prometheus-kube-prometheus-prometheus-0 \<br>  -n monitoring \<br>  --timeout=180s</code></pre>



<p>Wait about 2 minutes for discovery and finally, verify targets:</p>



<pre class="wp-block-code"><code class="">kubectl port-forward -n monitoring \<br>  prometheus-prometheus-kube-prometheus-prometheus-0 9090:9090 &amp;</code></pre>



<p>You can open in browser: <a href="http://localhost:9090/targets" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><code><strong><mark class="has-inline-color has-ast-global-color-0-color">http://localhost:9090/targets</mark></strong></code></a> and search for:</p>



<ul class="wp-block-list">
<li><code><strong><mark class="has-inline-color has-ast-global-color-0-color">vllm</mark></strong></code></li>



<li><strong><code><mark class="has-inline-color has-ast-global-color-0-color">dcgm</mark></code></strong></li>
</ul>



<p>Note that the expected targets are: </p>



<ul class="wp-block-list">
<li>serviceMonitor/vllm/vllm-metrics/0   (2/2 UP)</li>



<li>serviceMonitor/gpu-operator/nvidia-dcgm-exporter/0 (2/2 UP)</li>
</ul>



<h3 class="wp-block-heading">Step 7 &#8211; Create Grafana dashboards</h3>



<p>In this final step, the goal is to create two Grafana dashboards to track both the software side with vLLM metrics and the hardware metrics that will monitor the GPU consumption and system.</p>



<h4 class="wp-block-heading">1. vLLM application metrics</h4>



<p>The dashboard provides insights into vLLM application performance, request handling, and resource utilization based on the following metrics:</p>



<figure class="wp-block-table"><table class="has-fixed-layout"><thead><tr><th>Metric</th><th>Type</th><th>Description</th><th>Unit</th><th>Dashboard Usage</th></tr></thead><tbody><tr><td><code>vllm:request_success_total</code></td><td>Counter</td><td>Total successful requests</td><td>count</td><td>Request Rate, Total Requests</td></tr><tr><td><code>vllm:num_requests_running</code></td><td>Gauge</td><td>Requests currently being processed</td><td>count</td><td>Queue Depth, Active Requests</td></tr><tr><td><code>vllm:num_requests_waiting</code></td><td>Gauge</td><td>Requests waiting in queue</td><td>count</td><td>Queue Depth, Queued Requests</td></tr><tr><td><code>vllm:time_to_first_token_seconds</code></td><td>Histogram</td><td>Latency until first token generated</td><td>seconds</td><td>TTFT P50/P95/P99</td></tr><tr><td><code>vllm:e2e_request_latency_seconds</code></td><td>Histogram</td><td>Total end-to-end latency</td><td>seconds</td><td>E2E Latency P50/P95/P99</td></tr><tr><td><code>vllm:generation_tokens_total</code></td><td>Counter</td><td>Total tokens generated (output)</td><td>count</td><td>Token Generation Rate, Throughput</td></tr><tr><td><code>vllm:prompt_tokens_total</code></td><td>Counter</td><td>Total prompt tokens (input)</td><td>count</td><td>Token Generation Rate, Avg Tokens</td></tr><tr><td><code>vllm:kv_cache_usage_perc</code></td><td>Gauge</td><td>GPU KV cache utilization</td><td>0-1 (0-100%)</td><td>KV Cache Usage</td></tr><tr><td><code>vllm:prefix_cache_hits_total</code></td><td>Counter</td><td>Number of prefix cache hits</td><td>count</td><td>Cache Hit Rate</td></tr><tr><td><code>vllm:prefix_cache_queries_total</code></td><td>Counter</td><td>Number of prefix cache queries</td><td>count</td><td>Cache Hit Rate</td></tr><tr><td><code>vllm:request_queue_time_seconds</code></td><td>Histogram</td><td>Time spent waiting in queue</td><td>seconds</td><td>Request Queue Time</td></tr><tr><td><code>vllm:request_prefill_time_seconds</code></td><td>Histogram</td><td>Prefill phase time</td><td>seconds</td><td>Prefill Time</td></tr><tr><td><code>vllm:request_decode_time_seconds</code></td><td>Histogram</td><td>Decode phase time</td><td>seconds</td><td>Decode Time</td></tr><tr><td><code>vllm:inter_token_latency_seconds</code></td><td>Histogram</td><td>Latency between each token</td><td>seconds</td><td>Inter-Token Latency</td></tr><tr><td><code>vllm:num_preemptions_total</code></td><td>Counter</td><td>Number of preemptions (OOM)</td><td>count</td><td>Preemptions</td></tr><tr><td><code>vllm:prompt_tokens_cached_total</code></td><td>Counter</td><td>Prompt tokens cached</td><td>count</td><td>Cached Tokens</td></tr><tr><td><code>vllm:request_prompt_tokens</code></td><td>Histogram</td><td>Prompt size distribution</td><td>count</td><td>(Table)</td></tr><tr><td><code>vllm:request_generation_tokens</code></td><td>Histogram</td><td>Generated tokens distribution</td><td>count</td><td>(Table)</td></tr><tr><td><code>vllm:iteration_tokens_total</code></td><td>Histogram</td><td>Tokens per iteration</td><td>count</td><td>(Advanced analysis)</td></tr></tbody></table></figure>



<p>This <strong>vLLM Grafana dashboard</strong> is composed of 23 panels:</p>



<p>The dashboard provides insights into LLM application performance, request handling, and resource utilisation based on the previous metrics.</p>



<figure class="wp-block-table"><table class="has-fixed-layout"><thead><tr><th>Type</th><th>Nombre</th><th>Panels</th></tr></thead><tbody><tr><td><strong>Timeseries</strong></td><td>12</td><td>Request Rate, Queue Depth, TTFT, E2E Latency, Token Gen, Cache Usage, Cache Hit, Queue Time, Prefill/Decode, Inter-Token, Preemptions, Avg Tokens</td></tr><tr><td><strong>Stat</strong></td><td>10</td><td>Throughput, TTFT P95, Active Req, Queued Req, Cache Hit Rate, Cache Usage, Total Req, Total Tokens, Cached Tokens, Preemptions</td></tr><tr><td><strong>Table</strong></td><td>1</td><td>Pod Performance</td></tr></tbody></table></figure>



<p>Now create the dashboard using <a href="https://github.com/ovh/public-cloud-examples/blob/ep-vllm-deployment-observability-mks/containers-orchestration/managed-kubernetes/gpu-cluster-for-vllm-deployment-and-observability/grafana-dashboards/vllm-app-dashboard.json" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"></a><code><strong><a href="https://github.com/ovh/public-cloud-examples/blob/main/containers-orchestration/managed-kubernetes/gpu-cluster-for-vllm-deployment-and-observability/grafana-dashboards/vllm-app-dashboard.json" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">vllm-app-dashboard.json</a></strong></code>. Then, launch:</p>



<pre class="wp-block-code"><code class="">echo "Importing vLLM application dashboard..."<br>curl -X POST \<br>  'http://localhost:3000/api/dashboards/db' \<br>  -H 'Content-Type: application/json' \<br>  -u 'admin:Admin123!vLLM' \<br>  -d @vllm-app-dashboard.json | jq '.status, .url'</code></pre>



<p>Next, you an access the vLLM dashboard and follow metrics in real time:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="686" src="https://blog.ovhcloud.com/wp-content/uploads/2026/03/image-3-1024x686.png" alt="" class="wp-image-30858" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/03/image-3-1024x686.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/03/image-3-300x201.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/03/image-3-768x514.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/03/image-3.png 1230w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>This dashboard is also essential to track hardware consumption for comprehensive monitoring.</p>



<h4 class="wp-block-heading">2. GPU hardware metrics</h4>



<p>Take advantage of the most useful DCGM metrics to check both the functioning and consumption of your hardware resources:</p>



<figure class="wp-block-table"><table class="has-fixed-layout"><thead><tr><th>Metric</th><th>Type</th><th>Description</th><th>Unit</th><th>Normal Thresholds</th><th>Dashboard Usage</th></tr></thead><tbody><tr><td><code>DCGM_FI_DEV_GPU_UTIL</code></td><td>Gauge</td><td>GPU utilization (compute)</td><td>% (0-100)</td><td>70-95% optimal</td><td>GPU Utilization</td></tr><tr><td><code>DCGM_FI_DEV_GPU_TEMP</code></td><td>Gauge</td><td>GPU temperature</td><td>°C</td><td>&lt; 85°C normal</td><td>GPU Temperature</td></tr><tr><td><code>DCGM_FI_DEV_FB_USED</code></td><td>Gauge</td><td>VRAM used</td><td>MB</td><td>Variable by model</td><td>GPU Memory Used</td></tr><tr><td><code>DCGM_FI_DEV_FB_FREE</code></td><td>Gauge</td><td>VRAM free</td><td>MB</td><td>&gt; 2GB recommended</td><td>GPU Memory Free</td></tr><tr><td><code>DCGM_FI_DEV_POWER_USAGE</code></td><td>Gauge</td><td>Power consumption</td><td>Watts</td><td>&lt; 300W (L40S)</td><td>GPU Power Usage</td></tr><tr><td><code>DCGM_FI_DEV_SM_CLOCK</code></td><td>Gauge</td><td>GPU clock speed (compute)</td><td>MHz</td><td>Variable</td><td>GPU Clock Speed</td></tr><tr><td><code>DCGM_FI_DEV_MEM_CLOCK</code></td><td>Gauge</td><td>Memory clock speed</td><td>MHz</td><td>Variable</td><td>Memory Clock Speed</td></tr><tr><td><code>DCGM_FI_DEV_NVLINK_BANDWIDTH_TOTAL</code></td><td>Counter</td><td>Total NVLink bandwidth</td><td>bytes/s</td><td>(If multi-GPU)</td><td>NVLink Bandwidth</td></tr><tr><td><code>DCGM_FI_DEV_PCIE_TX_BYTES</code></td><td>Counter</td><td>PCIe data transmitted</td><td>bytes</td><td>(I/O monitoring)</td><td>PCIe TX</td></tr><tr><td><code>DCGM_FI_DEV_PCIE_RX_BYTES</code></td><td>Counter</td><td>PCIe data received</td><td>bytes</td><td>(I/O monitoring)</td><td>PCIe RX</td></tr><tr><td><code>DCGM_FI_DEV_ECC_DBE_VOL_TOTAL</code></td><td>Counter</td><td>ECC double-bit errors</td><td>count</td><td>0 ideal</td><td>(Health check)</td></tr><tr><td><code>DCGM_FI_DEV_ECC_SBE_VOL_TOTAL</code></td><td>Counter</td><td>ECC single-bit errors</td><td>count</td><td>&lt; 10/day acceptable</td><td>(Health check)</td></tr></tbody></table></figure>



<p>This&nbsp;<strong>hardware Grafana dashboard</strong>&nbsp;is composed of 13 panels with GPU hardware and system metrics. A detailed view is also available GPU util (%), temperature (°C), vRAM (GB) and power (Watt).</p>



<figure class="wp-block-table"><table class="has-fixed-layout"><thead><tr><th>Type</th><th>Count</th><th>Panels</th></tr></thead><tbody><tr><td><strong>Timeseries</strong></td><td>8</td><td>GPU Util, GPU Mem, GPU Temp, GPU Power, CPU Usage, RAM Usage, Network I/O, Disk I/O</td></tr><tr><td><strong>Stat</strong></td><td>4</td><td>Avg GPU Util, Avg GPU Temp, Total GPU Mem, Total GPU Power</td></tr><tr><td><strong>Table</strong></td><td>1</td><td>Hardware Status</td></tr></tbody></table></figure>



<p>Please refer to <code><strong><a href="https://github.com/ovh/public-cloud-examples/blob/main/containers-orchestration/managed-kubernetes/gpu-cluster-for-vllm-deployment-and-observability/grafana-dashboards/hardware-dashboard.json" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">hardware-dashboard.json</a></strong></code> by loading it as follows:</p>



<pre class="wp-block-code"><code class="">echo "Importing hardware dashboard..."<br>curl -X POST \<br>  'http://localhost:3000/api/dashboards/db' \<br>  -H 'Content-Type: application/json' \<br>  -u 'admin:Admin123!vLLM' \<br>  -d @hardware-dashboard.json | jq '.status, .url'</code></pre>



<p>Finally, track resource consumption using this hardware dashboard:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="686" src="https://blog.ovhcloud.com/wp-content/uploads/2026/03/image-4-1024x686.png" alt="" class="wp-image-30859" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/03/image-4-1024x686.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/03/image-4-300x201.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/03/image-4-768x514.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/03/image-4.png 1230w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Congratulations! Everything is working. You can now test your model and track the various metrics in real time.</p>



<h3 class="wp-block-heading">Step 8 &#8211; LLM testing and performance tracking</h3>



<p>Start by installing Python dependencies:</p>



<pre class="wp-block-code"><code class="">pip3 install openai tqdm</code></pre>



<p>Replace the <strong><mark class="has-inline-color has-ast-global-color-0-color">&lt;EXTERNAL_IP&gt;</mark></strong> by the vLLM service external IP and launch the performance test thanks to the following <a href="https://github.com/ovh/public-cloud-examples/blob/ep-vllm-deployment-observability-mks/containers-orchestration/managed-kubernetes/gpu-cluster-for-vllm-deployment-and-observability/llm-inference-performance-test.py" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><code><strong>Python code</strong></code></a>:</p>



<pre class="wp-block-code"><code class="">import time<br>import threading<br>import random<br>from statistics import mean<br>from openai import OpenAI<br>from tqdm import tqdm<br><br>APP_URL = "http://94.23.185.22/v1"<br>MODEL = "qwen3-vl-8b"<br><br>CONCURRENT_WORKERS = 500          # concurrency<br>REQUESTS_PER_WORKER = 10<br>MAX_TOKENS = 200                  # generation pressure<br><br># some random prompts<br>SHORT_PROMPTS = [<br>    "Summarize the theory of relativity.",<br>    "Explain what a transformer model is.",<br>    "What is Kubernetes autoscaling?"<br>]<br><br>MEDIUM_PROMPTS = [<br>    "Explain how attention mechanisms work in transformer-based models, including self-attention and multi-head attention.",<br>    "Describe how vLLM manages KV cache and why it impacts inference performance."<br>]<br><br>LONG_PROMPTS = [<br>    "Write a very detailed technical explanation of how large language models perform inference, "<br>    "including tokenization, embedding lookup, transformer layers, attention computation, KV cache usage, "<br>    "GPU memory management, and how batching affects latency and throughput. Use examples.",<br>]<br><br>PROMPT_POOL = (<br>    SHORT_PROMPTS * 2 +<br>    MEDIUM_PROMPTS * 4 +<br>    LONG_PROMPTS * 6    # bias toward long prompts<br>)<br><br># openai compliance<br>client = OpenAI(<br>    base_url=APP_URL,<br>    api_key="foo"<br>)<br><br># basic metrics<br>latencies = []<br>errors = 0<br>lock = threading.Lock()<br><br># worker<br>def worker(worker_id):<br>    global errors<br>    for _ in range(REQUESTS_PER_WORKER):<br>        prompt = random.choice(PROMPT_POOL)<br><br>        start = time.time()<br>        try:<br>            client.chat.completions.create(<br>                model=MODEL,<br>                messages=[{"role": "user", "content": prompt}],<br>                max_tokens=MAX_TOKENS,<br>                temperature=0.7,<br>            )<br>            elapsed = time.time() - start<br><br>            with lock:<br>                latencies.append(elapsed)<br><br>        except Exception as e:<br>            with lock:<br>                errors += 1<br><br># run<br>threads = []<br>start_time = time.time()<br><br>print("\n-&gt; STARTING PERFORMANCE TEST:")<br>print(f"Concurrency: {CONCURRENT_WORKERS}")<br>print(f"Total requests: {CONCURRENT_WORKERS * REQUESTS_PER_WORKER}")<br><br>for i in range(CONCURRENT_WORKERS):<br>    t = threading.Thread(target=worker, args=(i,))<br>    t.start()<br>    threads.append(t)<br><br>for t in threads:<br>    t.join()<br><br>total_time = time.time() - start_time<br><br># results<br>print("\n-&gt; BENCH RESULTS:")<br>print(f"Total requests sent: {len(latencies) + errors}")<br>print(f"Successful requests: {len(latencies)}")<br>print(f"Errors: {errors}")<br>print(f"Total wall time: {total_time:.2f}s")<br><br>if latencies:<br>    print(f"Avg latency: {mean(latencies):.2f}s")<br>    print(f"Min latency: {min(latencies):.2f}s")<br>    print(f"Max latency: {max(latencies):.2f}s")<br>    print(f"Throughput: {len(latencies)/total_time:.2f} req/s")</code></pre>



<p>Returning:</p>



<pre class="wp-block-preformatted"><code>-&gt; STARTING PERFORMANCE TEST:</code><br><code>Concurrency: 500<br>Total requests: 5000</code><br><code><br>-&gt; BENCH RESULTS:<br>Total requests sent: 5000<br>Successful requests: 5000<br>Errors: 0<br>Total wall time: 225.54s<br>Avg latency: 21.45s<br>Min latency: 6.06s<br>Max latency: 25.19s<br>Throughput: 22.17 req/s</code></pre>



<p>Don&#8217;t forget to track GPU and vLLM metrics in your Grafana dashboards!</p>



<h2 class="wp-block-heading">Conslusion</h2>



<p>This reference architecture demonstrates a<strong>&nbsp;vLLM deployment on OVHcloud Managed Kubernetes Service (MKS)</strong>&nbsp;with comprehensive GPU monitoring. Benefits include:</p>



<ul class="wp-block-list">
<li><strong>High Performance</strong>: GPU-accelerated inference with L40S</li>



<li><strong>Scalability</strong>: Kubernetes-native, horizontal scaling-ready</li>



<li><strong>Reliability</strong>: Health checks, auto-restart, monitoring</li>



<li><strong>API Compatibility</strong>: OpenAI-compatible endpoints</li>



<li><strong>Multimodality</strong>: Vision &amp; text capabilities</li>



<li><strong>Full stack monitoring</strong>: Complete vLLM application and hardware dashboards</li>
</ul>



<h2 class="wp-block-heading">Going Further</h2>



<p>Your current architecture is&nbsp;<strong>functional.&nbsp;</strong>However, if desired,&nbsp;<strong>it could be improved into a full production-ready&nbsp;solution.</strong></p>



<p><strong>Wish to take production hardening a step further?</strong></p>



<p>Go further with the following enhancements:</p>



<ol class="wp-block-list">
<li><strong>Authentication &amp; authorization</strong>
<ul class="wp-block-list">
<li>vLLM API authentication</li>



<li>Grafana authentication</li>



<li>Prometheus security</li>
</ul>
</li>



<li><strong>High availability &amp; load balancing</strong>
<ul class="wp-block-list">
<li>Grafana high availability with multiple replicas and shared storage</li>



<li>Prometheus high availability</li>



<li>vLLM Horizontal Pod Autoscaling (HPA) based on custom metrics</li>
</ul>
</li>



<li><strong>Data persistence &amp; backup</strong>
<ul class="wp-block-list">
<li>Prometheus long-term storage with persistent storage</li>



<li>Grafana Dashboard Backup</li>
</ul>
</li>



<li><strong>Observability enhancements</strong>
<ul class="wp-block-list">
<li>Distributed tracing by adding OpenTelemetry for request tracing</li>



<li>Alerting rules with production-ready alert rules</li>
</ul>
</li>
</ol>



<p></p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Freference-architecture-deploying-a-vision-language-model-with-vllm-on-ovhcloud-mks-for-high-performance-inference-and-full-observability%2F&amp;action_name=Reference%20Architecture%3A%20Deploying%20a%20vision-language%20model%20with%20vLLM%20on%20OVHcloud%20MKS%20for%20high%20performance%20inference%20and%20full%20observability&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Extract Text from Images with OCR using Python and OVHcloud AI Endpoints</title>
		<link>https://blog.ovhcloud.com/extract-text-from-images-with-ocr-using-python-and-ovhcloud-ai-endpoints/</link>
		
		<dc:creator><![CDATA[Stéphane Philippart]]></dc:creator>
		<pubDate>Wed, 01 Apr 2026 12:55:19 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[AI Endpoints]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=30992</guid>

					<description><![CDATA[If you want to have more information on&#160;AI Endpoints, please read the&#160;following blog post.&#160;You can, also, have a look at our&#160;previous blog posts&#160;on how use AI Endpoints. You can find the full code example in the GitHub repository. In this article,&#160;we will explore how to perform OCR&#160;(Optical Character Recognition)&#160;on images using a vision-capable LLM,&#160;the&#160;OpenAI Python library,&#160;and [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fextract-text-from-images-with-ocr-using-python-and-ovhcloud-ai-endpoints%2F&amp;action_name=Extract%20Text%20from%20Images%20with%20OCR%20using%20Python%20and%20OVHcloud%20AI%20Endpoints&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p><em>If you want to have more information on&nbsp;<a href="https://endpoints.ai.cloud.ovh.net/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Endpoints</a>, please read the&nbsp;<a href="https://blog.ovhcloud.com/enhance-your-applications-with-ai-endpoints/" data-wpel-link="internal">following blog post</a>.</em>&nbsp;<em>You can, also, have a look at our&nbsp;<a href="https://blog.ovhcloud.com/tag/ai-endpoints/" data-wpel-link="internal">previous blog posts</a>&nbsp;on how use AI Endpoints.</em></p>



<p><em>You can find the full code example in the <a href="https://github.com/ovh/public-cloud-examples/tree/main/ai/ai-endpoints/python-ocr" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">GitHub repository</a>.</em></p>



<p>In this article,&nbsp;we will explore how to perform OCR&nbsp;(Optical Character Recognition)&nbsp;on images using a vision-capable LLM,&nbsp;the&nbsp;<a href="https://github.com/openai/openai-python" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OpenAI Python library</a>,&nbsp;and OVHcloud&nbsp;<a href="https://endpoints.ai.cloud.ovh.net/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Endpoints</a>.</p>



<h3 class="wp-block-heading">Introduction to OCR with Vision Models</h3>



<p>Optical Character Recognition has been around for decades,&nbsp;but traditional OCR engines often struggle with complex layouts,&nbsp;handwritten text,&nbsp;or noisy images.&nbsp;Vision-capable Large Language Models bring a new approach:&nbsp;instead of relying on specialized OCR pipelines,&nbsp;you can simply send an image to a model that understands both visual and textual content.</p>



<p>In this example,&nbsp;we use the&nbsp;<a href="https://github.com/openai/openai-python" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OpenAI Python library</a>&nbsp;to create a simple OCR script powered by a vision model hosted on OVHcloud&nbsp;<a href="https://endpoints.ai.cloud.ovh.net/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Endpoints</a>.</p>



<p>The whole application is a single Python file:  no complex setup, just <code><strong>pip install openai</strong></code> and you&#8217;re ready to go.</p>



<h3 class="wp-block-heading">Setting up the Environment Variables</h3>



<p>Before running the script, you need to set the following environment variables:</p>



<pre title="Environment variablesexport OVH_AI_ENDPOINTS_ACCESS_TOKEN=&quot;your-access-token&quot; export OVH_AI_ENDPOINTS_MODEL_URL=&quot;https://your-model-url&quot; export OVH_AI_ENDPOINTS_VLLM_MODEL=&quot;your-vision-model-name&quot;" class="wp-block-code"><code lang="" class=" line-numbers">export OVH_AI_ENDPOINTS_ACCESS_TOKEN="your-access-token"<br>export OVH_AI_ENDPOINTS_MODEL_URL="https://your-model-url"<br>export OVH_AI_ENDPOINTS_VLLM_MODEL="your-vision-model-name"</code></pre>



<p>You can find how to create your access token, model URL, and model name in the <a href="https://endpoints.ai.cloud.ovh.net/catalog" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Endpoints catalog</a>. Make sure to choose a <strong>vision-capable model</strong> from the <a href="https://endpoints.ai.cloud.ovh.net/catalog" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Endpoints catalog</a>.</p>



<h3 class="wp-block-heading">Installing Dependencies</h3>



<p>The only dependency is the OpenAI Python library:</p>



<pre title="OpenAI dependency" class="wp-block-code"><code lang="bash" class="language-bash">pip install openai</code></pre>



<h3 class="wp-block-heading">Define the System Prompt</h3>



<p>The first step is to define a system prompt that describes what our OCR service does.&nbsp;This prompt tells the model how to behave:</p>



<pre title="System prompt" class="wp-block-code"><code lang="" class=" line-numbers">SYSTEM_PROMPT = """You are an expert OCR engine.<br>Extract every piece of text visible in the provided image.<br>Preserve the original layout as faithfully as possible (line breaks, columns, tables).<br>Do NOT interpret, summarise, or translate the content.<br>Use markdown formatting to represent the layout (e.g. tables, lists).<br>If the image contains no text, reply with: "No text found."<br>"""</code></pre>



<p>We tell it to behave as an expert OCR engine, to preserve the original layout, and to use markdown formatting for structured content like tables or lists.<br></p>



<h3 class="wp-block-heading">Load the Image</h3>



<p>Before sending the image to the model,&nbsp;we need to encode it as a base64 string.&nbsp;Here is a simple helper function that reads a local PNG file and returns a base64-encoded string:</p>



<pre title="Image loading" class="wp-block-code"><code lang="" class=" line-numbers">import base64<br>from pathlib import Path<br><br>def load_image_as_base64(path: Path) -&gt; str:<br>    """Load a local image and encode it as base64."""<br>    with open(path, "rb") as f:<br>        return base64.b64encode(f.read()).decode("utf-8")</code></pre>



<p>The base64-encoded data is what gets sent to the vision model as part of the prompt.</p>



<p></p>



<h3 class="wp-block-heading">Extract Text from the Image</h3>



<p>The <code><strong>extract_text</strong></code> function sends the image to the vision model and returns the extracted text:</p>



<pre title="Extract text from image" class="wp-block-code"><code lang="" class=" line-numbers">def extract_text(client: OpenAI, image_base64: str, model: str) -&gt; str:<br>    """Extract text from an image using the vision model."""<br>    response = client.chat.completions.create(<br>        model=model,<br>        temperature=0.0,<br>        messages=[<br>            {"role": "system", "content": SYSTEM_PROMPT},<br>            {<br>                "role": "user",<br>                "content": [<br>                    {<br>                        "type": "image_url",<br>                        "image_url": {<br>                            "url": f"data:image/png;base64,{image_base64}"<br>                        }<br>                    }<br>                ]<br>            }<br>        ]<br>    )<br>    return response.choices[0].message.content</code></pre>



<p>The image is passed as a data URL inside the <code><strong>image_url</strong></code> field, following the OpenAI Vision API format. The temperature is set to <code>0.0</code> because we want deterministic, faithful text extraction and not creative output.</p>



<h3 class="wp-block-heading">Configure the Client</h3>



<p>This example uses a vision-capable model hosted on OVHcloud <a href="https://endpoints.ai.cloud.ovh.net/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Endpoints</a>. Since AI Endpoints exposes an OpenAI-compatible API, we use the <code>OpenAI</code> client and just point it to the OVHcloud endpoint:</p>



<pre title="Open AI client configuration" class="wp-block-code"><code lang="" class=" line-numbers">import os<br>from openai import OpenAI<br><br>client = OpenAI(<br>    api_key=os.getenv("OVH_AI_ENDPOINTS_ACCESS_TOKEN"),<br>    base_url=os.getenv("OVH_AI_ENDPOINTS_MODEL_URL"),<br>)<br><br>model_name = os.getenv("OVH_AI_ENDPOINTS_VLLM_MODEL")</code></pre>



<p>A few things to note:</p>



<ul class="wp-block-list">
<li>The <strong>API key</strong>, <strong>base URL</strong>, and <strong>model name</strong> are read from environment variables. </li>



<li>The OpenAI library is compatible with any OpenAI compatible API, making it perfect for use with AI Endpoints.</li>
</ul>



<h3 class="wp-block-heading">Assemble and Run</h3>



<p>With the client configured, extracting text from an image is straightforward:</p>



<pre title="Run the OCR" class="wp-block-code"><code lang="" class=" line-numbers">image_base64 = load_image_as_base64(Path("./doc.png"))<br>result = extract_text(client, image_base64, model_name)<br>print(result)</code></pre>



<p>And that&#8217;s it!</p>



<p>Here is the image used for this example:</p>



<figure class="wp-block-image aligncenter size-full is-resized"><img loading="lazy" decoding="async" width="946" height="693" src="https://blog.ovhcloud.com/wp-content/uploads/2026/03/doc-1.png" alt="Used image for OCR example" class="wp-image-31002" style="width:600px" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/03/doc-1.png 946w, https://blog.ovhcloud.com/wp-content/uploads/2026/03/doc-1-300x220.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/03/doc-1-768x563.png 768w" sizes="auto, (max-width: 946px) 100vw, 946px" /></figure>



<p>And the result:</p>



<pre title="Run the OCR" class="wp-block-code"><code lang="" class=" line-numbers">$ python ocr_demo.py<br>📄 Loading image: doc.png<br>🔍 Running OCR with Qwen2.5-VL-72B-Instruct via OVHcloud AI Endpoints...<br><br>📝 Extracted text 📝<br>Every month, the OVHcloud Developer Advocate team creates content, shares knowledge, and connects with the tech community. Here’s a look at what we did in March 2026. 🚀<br><br>🎙️ “Tranches de Tech” – Our monthly podcast<br><br>A new episode of our French-language podcast Tranches de Tech🥑 just dropped!<br><br>🎧 Episode 102: Tranches de Tech #26 – Architecte, c’est une bonne situation ça ?<br><br>This month we sat down with Alexandre Touret, Architect at Worldline to discuss the evolving role of software architects and the growing impact of AI on development practices. From Spotify’s claim that their devs no longer code, to agentic tools like OpenClaw and Claude Code reshaping workflows. We also cover ANSSI’s revised open-source policy, IBM tripling junior hires, and the critical responsibility of mentoring the next generation of developers in an AI-driven world.<br><br>📺 Live on Twitch<br><br>We streamed live on Twitch this month! Here’s what we covered:<br><br>🎥 Rémy Vandepoel discussed with Hugo Allabert and François Loiseau about our Public VCFaaS. Catch the replay on YouTube ▶️.<br><br>🎤 Conference Talks<br><br>The team hit the road (and the stage) at several conferences this month:<br><br>🇳🇱 KubeCon Amsterdam – Amsterdam, Netherlands 🇳🇱<br><br>Aurélie Vache gave a talk: The Ultimate Kubernetes Challenge: An Interactive Trivia Game</code></pre>



<h3 class="wp-block-heading">Conclusion</h3>



<p>In this article,&nbsp;we have seen how to use a vision-capable LLM to perform OCR on images using the&nbsp;<a href="https://github.com/openai/openai-python" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OpenAI Python library</a>&nbsp;and OVHcloud&nbsp;<a href="https://endpoints.ai.cloud.ovh.net/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Endpoints</a>.&nbsp;The OpenAI library makes it very easy to send images to a vision model and extract text,&nbsp;and Python allows us to run the whole thing as a simple script.</p>



<p>You have a dedicated Discord channel&nbsp;(#<em>ai-endpoints</em>)&nbsp;on our Discord server&nbsp;(<em><a href="https://discord.gg/ovhcloud" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">https://discord.gg/ovhcloud</a></em>),&nbsp;see you there!</p>



<p></p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fextract-text-from-images-with-ocr-using-python-and-ovhcloud-ai-endpoints%2F&amp;action_name=Extract%20Text%20from%20Images%20with%20OCR%20using%20Python%20and%20OVHcloud%20AI%20Endpoints&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Secure your Software Supply Chain with OVHcloud Managed Private Registry (MPR)</title>
		<link>https://blog.ovhcloud.com/secure-your-software-supply-chain-with-ovhcloud-managed-private-registry-mpr/</link>
		
		<dc:creator><![CDATA[Aurélie Vache]]></dc:creator>
		<pubDate>Fri, 13 Feb 2026 16:40:51 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[Tranches de Tech & co]]></category>
		<category><![CDATA[OVHcloud Managed Private Registry]]></category>
		<category><![CDATA[Public Cloud]]></category>
		<category><![CDATA[Security]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=30357</guid>

					<description><![CDATA[Before an application go to production, it passes through several stages: source code, build, packaging and distribution. But Malicious code &#8211; such as a compromised dependency, breached CI pipeline, or modified package in a registry &#8211; can be introduced at any point in the development cycle, potentially impacting thousands of projects This is precisely where [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fsecure-your-software-supply-chain-with-ovhcloud-managed-private-registry-mpr%2F&amp;action_name=Secure%20your%20Software%20Supply%20Chain%20with%20OVHcloud%20Managed%20Private%20Registry%20%28MPR%29&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<figure class="wp-block-image aligncenter size-full is-resized"><img loading="lazy" decoding="async" width="1012" height="1011" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/Gribouillis-2026-01-30-13.25.17.911.png" alt="" class="wp-image-30442" style="aspect-ratio:1.0009787401988517;width:437px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/Gribouillis-2026-01-30-13.25.17.911.png 1012w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/Gribouillis-2026-01-30-13.25.17.911-300x300.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/Gribouillis-2026-01-30-13.25.17.911-150x150.png 150w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/Gribouillis-2026-01-30-13.25.17.911-768x767.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/Gribouillis-2026-01-30-13.25.17.911-70x70.png 70w" sizes="auto, (max-width: 1012px) 100vw, 1012px" /></figure>



<p>Before an application go to production, it passes through several stages: source code, build, packaging and distribution. But Malicious code &#8211; such as a compromised dependency, breached CI pipeline, or modified package in a registry &#8211; can be introduced at any point in the development cycle, potentially impacting thousands of projects</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="581" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-13-1024x581.png" alt="" class="wp-image-30358" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-13-1024x581.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-13-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-13-768x436.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-13.png 1292w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>This is precisely where <strong>Software Supply Chain Security </strong>(SSCS) comes in: to protect not just the code itself, but also how it’s built, delivered, and utilised.</p>



<p>Attacks like SolarWinds and Log4Shell aren’t isolated incidents, but rather subtle indicators that have escalated in severity.</p>



<figure class="wp-block-image aligncenter is-resized"><img loading="lazy" decoding="async" width="800" height="800" src="https://blog.ovhcloud.com/wp-content/uploads/2025/04/managed_private_registry.png" alt="" class="wp-image-28658" style="width:145px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/04/managed_private_registry.png 800w, https://blog.ovhcloud.com/wp-content/uploads/2025/04/managed_private_registry-300x300.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/04/managed_private_registry-150x150.png 150w, https://blog.ovhcloud.com/wp-content/uploads/2025/04/managed_private_registry-768x768.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/04/managed_private_registry-70x70.png 70w" sizes="auto, (max-width: 800px) 100vw, 800px" /></figure>



<p>This blog post explores recommended solutions and best practices for <a href="https://www.ovhcloud.com/en/public-cloud/managed-rancher-service/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><u>OVHcloud Managed</u></a> <a href="https://www.ovhcloud.com/en/public-cloud/managed-rancher-service/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><u>Private Registry</u></a> (MPR), an OCI-compliant artifact registry, to help you enhance your Software Supply Chain Security.</p>



<h3 class="wp-block-heading">Generate a Software Bill Of Materials (SBOM)</h3>



<p>SBOMs provides a list of all the ingredients (OS, libraries, code) and anything that composes the images that will run on your Kubernetes cluster. </p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="383" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-14-1024x383.png" alt="" class="wp-image-30360" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-14-1024x383.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-14-300x112.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-14-768x287.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-14.png 1256w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>From that list, you can find out more about the image, its vulnerabilities, and licenses.</p>



<h4 class="wp-block-heading">Generate an SBOM manually</h4>



<p>To manually generate an SBOM from your image, click the <strong>‘<strong>GENERATE</strong> SBOM’ </strong>button:</p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="280" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.28.13-1024x280.png" alt="" class="wp-image-30361" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.28.13-1024x280.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.28.13-300x82.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.28.13-768x210.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.28.13-1536x420.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.28.13-2048x560.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Within seconds, the <em>SBOM </em>column for your image will display <em>“Queued”</em>, then change to <em>“Generating”</em>, and a <em>“SBOM details”</em> link will appear.</p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="226" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-31-1024x226.png" alt="" class="wp-image-30393" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-31-1024x226.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-31-300x66.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-31-768x170.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-31-1536x340.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-31-2048x453.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Click the &#8216;<strong>SBOM details&#8217;</strong> link to view the SBOM:</p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="557" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.26.38-1024x557.png" alt="" class="wp-image-30367" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.26.38-1024x557.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.26.38-300x163.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.26.38-768x418.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.26.38-1536x835.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.26.38-2048x1114.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Your application’s SBOM is generated by <strong>Trivy </strong>in <strong>SPDX </strong>format. This item is then listed as an accessory for your image in the registry.</p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="130" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-17-1024x130.png" alt="" class="wp-image-30371" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-17-1024x130.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-17-300x38.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-17-768x98.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-17-1536x195.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-17-2048x260.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Click the <strong>&#8216;sbom.harbor&#8217;</strong> accessory type for more details:</p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="629" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-25-1024x629.png" alt="" class="wp-image-30379" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-25-1024x629.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-25-300x184.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-25-768x472.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-25-1536x944.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-25-2048x1259.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<h4 class="wp-block-heading">Generate an SBOM automatically</h4>



<p>Manually generating an SBOM is a good practice, but automating the process is even better. The private registry can automatically generates the SBOM for you once an image is pushed to the desired project.</p>



<p>Click the project your image is part of, navigate to the <em>‘Configuration’</em> tab, then tick the <strong>SBOM generation </strong>checkbox:</p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="538" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-15-1024x538.png" alt="" class="wp-image-30365" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-15-1024x538.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-15-300x158.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-15-768x403.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-15-1536x806.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-15-2048x1075.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<h3 class="wp-block-heading">Vulnerabilities scanning</h3>



<p>We recommend running vulnerability scans on the images to confirm that:</p>



<ul class="wp-block-list">
<li>the images provided are free of any known vulnerabilities (CVEs);</li>



<li>security patches are well integrated before deployment;</li>



<li>the images used in production comply with security and compliance policies.</li>
</ul>



<figure class="wp-block-image aligncenter size-full is-resized"><img loading="lazy" decoding="async" width="406" height="232" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-32.png" alt="" class="wp-image-30395" style="width:329px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-32.png 406w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-32-300x171.png 300w" sizes="auto, (max-width: 406px) 100vw, 406px" /></figure>



<p>There are several vulnerability scanners available, like <a href="https://trivy.dev/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><u>Trivy</u></a>, <a href="https://docs.docker.com/scout/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><u>Docker Scout</u></a>, and <a href="https://github.com/anchore/grype" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><u>Grype</u></a>.</p>



<p>The OVHcloud Managed Private Registry uses Trivy as its default vulnerability scanner, but you can add more scanners if needed. Go to the <em>Administration</em> panel, click <em>‘<strong>Interrogation Services</strong>’</em>, then navigate to the <em>‘<strong>Scanners</strong>’</em> tab:</p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="437" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-33-1024x437.png" alt="" class="wp-image-30400" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-33-1024x437.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-33-300x128.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-33-768x328.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-33-1536x655.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-33-2048x873.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<h4 class="wp-block-heading">Scan your image manually</h4>



<p>To manually run a vulnerability scan on your image, go to your project and click the <strong>SCAN VULNERABILITIES</strong> button:</p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="186" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-35-1024x186.png" alt="" class="wp-image-30406" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-35-1024x186.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-35-300x55.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-35-768x140.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-35-1536x279.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-35-2048x372.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Within a few seconds, a scan will run and reveal any vulnerabilities detected in your image.</p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="442" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.25.21-1024x442.png" alt="" class="wp-image-30404" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.25.21-1024x442.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.25.21-300x129.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.25.21-768x331.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.25.21-1536x662.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.25.21-2048x883.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Click your image to take a look at the CVEs list:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="557" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.25.39-1-1024x557.png" alt="" class="wp-image-30414" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.25.39-1-1024x557.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.25.39-1-300x163.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.25.39-1-768x418.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.25.39-1-1536x835.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/Capture-decran-2026-01-29-a-14.25.39-1-2048x1114.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<h4 class="wp-block-heading">Scan your image automatically</h4>



<p>To automatically scan images on push, click the project your image is part of, then the <em>‘Configuration’ </em>tab, and tick the <strong>‘Vulnerabilities scanning’</strong> checkbox:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="390" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-36-1024x390.png" alt="" class="wp-image-30408" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-36-1024x390.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-36-300x114.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-36-768x293.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-36-1536x585.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-36-2048x781.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<h4 class="wp-block-heading">Schedule vulnerability scans</h4>



<p>Another way to stay informed is by configuring your vulnerability scanner to run scans every day. Go in the <em>Administration </em>panel, click <em>‘<strong>Interrogation</strong> <strong>Services</strong>’</em>, then the <em>‘<strong>Vulnerability</strong>’</em> tab:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="264" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-34-1024x264.png" alt="" class="wp-image-30401" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-34-1024x264.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-34-300x77.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-34-768x198.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-34-1536x396.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-34-2048x528.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>You can choose to schedule the scan Hourly, Daily, Weekly or you can customize when the scan will be triggered.</p>



<p>Scheduled scans ensure that existing images are regularly/periodically analyzed for newly discovered vulnerabilities (CVEs).</p>



<h4 class="wp-block-heading">Prevent vulnerable images from running</h4>



<p>You can also configure a project to prevent vulnerable images from being pulled. In order to do that, check the <strong>Prevent vulnerable images from running</strong> checkbox.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="206" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-40-1024x206.png" alt="" class="wp-image-30430" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-40-1024x206.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-40-300x60.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-40-768x154.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-40.png 1424w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Select the severity level of vulnerabilities to prevent images from running, from None to Critical.</p>



<p>With this configuration, images cannot be pulled if their level is equal to or higher than the selected level of severity.</p>



<h3 class="wp-block-heading">Exploitable vulnerabilities</h3>



<p>When a scanner found vulnerabilities for your images, it is not necessary that they are exploitable in your application/in your image.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="170" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-41-1024x170.png" alt="" class="wp-image-30433" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-41-1024x170.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-41-300x50.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-41-768x128.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-41-1536x255.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-41.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>In this example, my application is build with golang 1.25-alpine, but Trivy found several CVEs that are only exploitable in golang 1.19.1 or less.</p>



<p>In order to remove/skip the &#8220;false positive&#8221;, a solution exists.</p>



<p>VEX (Vulnerability Exploitability eXchange) is a <strong>standard “format”</strong> to state whether a vulnerability is <strong>exploitable</strong> or not in a specific context.</p>



<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" width="1024" height="609" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-43-1024x609.png" alt="" class="wp-image-30435" style="aspect-ratio:1.6814258951355643;width:452px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-43-1024x609.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-43-300x178.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-43-768x456.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-43-1536x913.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-43.png 1681w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>You can generate a VEX file with <a href="https://github.com/openvex/vexctl" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">vexctl</a> or <a href="https://pkg.go.dev/golang.org/x/vuln/cmd/govulncheck" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">govulncheck</a> tools.</p>



<p>Example:</p>



<pre class="wp-block-code"><code class=""># With vexctl<br>$ VULN_ID="CVE-2022-27664"<br>$ PRODUCT="pkg:golang/golang.org/x/net@v0.0.0-20220127200216-cd36cc0744dd"<br>$ vexctl create --file vex.json --author 'Aurélie Vache' --product "pkg:oci/demo@sha256:$HASH?repository_url=$REGISTRY/$HARBOR_PROJECT/demo" --vuln "$VULN_ID" --status 'not_affected' --justification 'vulnerable_code_not_present' --impact-statement "HTTP/2 vulnerability $VULN_ID is not exploitable because the image is compiled with Go 1.20, which contains the patched library."<br><br># With govulncheck (for Go apps)<br>$ govulncheck -format openvex ./... &gt; ../demo.vex.json</code></pre>



<p>For the moment, OVHcloud MPR (managed Harbor) does not support VEX files (and the OpenVEX format) <a href="https://github.com/goharbor/harbor/issues/22720" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">but it is planned in the future</a>.</p>



<p>💡But the good news is that you can configure a CVEs whitelist with the list of not exploitable CVEs to ignore them during vulnerability scanning:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="522" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-42-1024x522.png" alt="" class="wp-image-30434" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-42-1024x522.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-42-300x153.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-42-768x391.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-42-1536x782.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-42.png 1814w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>You can optionally uncheck the <strong>Never expires</strong> checkbox and use the calendar selector to set an expiry date for the allowlist.</p>



<h3 class="wp-block-heading">Sign your images</h3>



<p>It’s recommended to sign your images to ensure they haven’t been modified and originate from your pipeline (CI/CD).</p>



<figure class="wp-block-image aligncenter size-full is-resized"><img loading="lazy" decoding="async" width="278" height="282" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-38.png" alt="" class="wp-image-30412" style="width:128px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-38.png 278w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-38-70x70.png 70w" sizes="auto, (max-width: 278px) 100vw, 278px" /></figure>



<p>Signing your images is crucial for protecting them against compromised registries and unauthorised image replacements.</p>



<p><strong>Without a signature, there’s no guarantee the deployed image is the one you originally built!</strong></p>



<figure class="wp-block-image aligncenter size-full is-resized"><img loading="lazy" decoding="async" width="818" height="302" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-37.png" alt="" class="wp-image-30410" style="aspect-ratio:2.708559106290115;width:482px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-37.png 818w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-37-300x111.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-37-768x284.png 768w" sizes="auto, (max-width: 818px) 100vw, 818px" /></figure>



<p>You can sign your images with <a href="https://github.com/sigstore/cosign" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><u>Sigstore Cosign</u></a> or <a href="https://github.com/notaryproject/notation" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><u>Notation</u></a> tools:</p>



<pre class="wp-block-code"><code class="">$ export HARBOR_PROJECT=supply-chain<br>$ export IMAGE=xxxxxx.c1.de1.container-registry.ovh.net/$HARBOR_PROJECT/demo<br>$ export HASH=$(skopeo inspect docker://${IMAGE}:latest | jq -r .Digest | sed "s/^sha256://")<br><br># Sign with Cosign<br>## Generate a private and a public key<br>$ cosign generate-key-pair<br>## Sign the image with the OCI 1.1 Referrers API<br>$ cosign sign -y --key cosign.key $IMAGE@sha256:$HASH <br><br># Sign with Notation<br>## Generate a RSA key &amp; a self-signed X.509 test certificate<br>$ notation cert generate-test --default "test"<br><br>## Sign the image with the OCI 1.1 Refferrers API<br>$ export NOTATION_EXPERIMENTAL=1 ; notation sign -d --allow-referrers-api ${IMAGE}@sha256:${HASH}</code></pre>



<p>You can use Cosign or Notation to sign your images, OVHcloud MPR supports both.</p>



<p>Your signature will appear beside your image as an accessory, plus a green checkmark ✅ in your column:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="227" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-26-1024x227.png" alt="" class="wp-image-30382" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-26-1024x227.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-26-300x67.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-26-768x170.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-26-1536x341.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-26-2048x455.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>⚠️ Keep in mind, MPR (Harbor) doesn’t support signatures generated by Cosign v3 (the signature will upload and appear as an accessory, but the mark will stay red instead of turning green). This bug should <a href="https://github.com/goharbor/harbor/issues/22401" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><u>be fixed in Harbor 2.15</u></a> 💪.</p>



<p>Signing your OCI artifacts and linking them to your images is recommended, and you can do this using Cosign:</p>



<pre class="wp-block-code"><code class="">$ cosign attest -y --predicate sbom.spdx.json --key cosign.key $IMAGE@sha256:$HASH</code></pre>



<p>They will be uploaded to the OVHcloud private registry and listed as accessories.</p>



<h4 class="wp-block-heading">Ensure only verified images are pushed to your registry’s projects</h4>



<p>To allow only verified/signed images to be deployed on a project, click the project your image is part of, navigate to the <em>‘<strong>Configuration</strong>’</em> tab, and tick the <strong>Cosign</strong> and/or <strong>Notation </strong>checkbox:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="191" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-39-1024x191.png" alt="" class="wp-image-30418" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-39-1024x191.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-39-300x56.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-39-768x143.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-39.png 1406w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>When checked, the registry will only allow verified images to be pulled from the project. Verified images are determined by <strong>Cosign</strong> or <strong>Notation</strong>, depending on the policy you have checked. Note that if you have both Cosign and Notation policies enforced, then images will need to be signed by both Cosign and Notation to be pulled.</p>



<h3 class="wp-block-heading">Tag immutability</h3>



<p>By default, tags are mutables, it means that you can push an image demo with the tag 1.0.0, do a modification in the code and push again to this same tag.</p>



<p>It could be useful to fix a bug but in term of security a mutable tag does not guarantee that the image you&#8217;ve built and pushed for the 1.0.0 version is the same image that exists now in the registry.</p>



<p>Moreover, on Harbor (so on OVHcloud MPR), due to limitations in the upstream OCI Distribution specification, the registry does not enforce a strict link between a tag and an image digest.</p>



<p>As a result, a tag can be reassigned to a different artifact. And it causes a side effect on the registry, this causes the tag to migrate across the artifacts and every artifact that has its tag taken away becomes tagless.</p>



<p>To prevent this situation, you can configure tag immutability rules. Tag immutability guarantees that an immutable tagged artifact cannot be deleted, and also cannot be altered in any way such as through re-pushing, re-tagging, or replication from another target registry.</p>



<p>To do that, click on your project and on the <strong>Policy</strong> tab and select <strong>TAG IMMUTABILITY</strong>:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="469" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-44-1024x469.png" alt="" class="wp-image-30438" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-44-1024x469.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-44-300x137.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-44-768x352.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-44-1536x704.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-44.png 2030w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>And then click the <strong>ADD RULE</strong> button.</p>



<p>Fill the repositories and tags list according to your needs.</p>



<p>Example:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="522" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-45-1024x522.png" alt="" class="wp-image-30439" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-45-1024x522.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-45-300x153.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-45-768x392.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-45-1536x783.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-45-2048x1044.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>⚠️ You can add a maximum of 15 immutability rules per project.</p>



<h3 class="wp-block-heading">To wrap thing up</h3>



<p>Software supply chain security is super important these days. Everything is changing quickly &#8211; the concept, standards, and tools. So, leveraging useful tools like OVHcloud MPR and knowing how to set them up can boost your Software Supply Chain Security efforts.</p>



<p>To learn more about how to use and configure <a href="https://help.ovhcloud.com/csm/fr-documentation-public-cloud-containers-orchestration-managed-private-registry?id=kb_browse_cat&amp;kb_id=574a8325551974502d4c6e78b7421938&amp;kb_category=7939e6a464282d10476b3689cb0d0ed7&amp;spa=1" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">OVHcloud private registries</a>, don’t hesitate to follow our guides.</p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fsecure-your-software-supply-chain-with-ovhcloud-managed-private-registry-mpr%2F&amp;action_name=Secure%20your%20Software%20Supply%20Chain%20with%20OVHcloud%20Managed%20Private%20Registry%20%28MPR%29&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Reference Architecture: Custom metric autoscaling for LLM inference with vLLM on OVHcloud AI Deploy and observability using MKS</title>
		<link>https://blog.ovhcloud.com/reference-architecture-custom-metric-autoscaling-for-llm-inference-with-vllm-on-ovhcloud-ai-deploy-and-observability-using-mks/</link>
		
		<dc:creator><![CDATA[Eléa Petton]]></dc:creator>
		<pubDate>Tue, 10 Feb 2026 08:51:11 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[AI Deploy]]></category>
		<category><![CDATA[Kubernetes]]></category>
		<category><![CDATA[LLM]]></category>
		<category><![CDATA[MKS]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[OVHcloud]]></category>
		<category><![CDATA[prometheus]]></category>
		<category><![CDATA[Public Cloud]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=30203</guid>

					<description><![CDATA[Take your LLM (Large Language Model) deployment to production level with comprehensive custom autoscaling configuration and advanced vLLM metrics observability. This reference architecture describes a comprehensive solution for deploying, autoscaling and monitoring vLLM-based LLM workloads on OVHcloud infrastructure. It combinesAI Deploy, used for model serving with custom metric autoscaling, and Managed Kubernetes Service (MKS), which [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Freference-architecture-custom-metric-autoscaling-for-llm-inference-with-vllm-on-ovhcloud-ai-deploy-and-observability-using-mks%2F&amp;action_name=Reference%20Architecture%3A%20Custom%20metric%20autoscaling%20for%20LLM%20inference%20with%20vLLM%20on%20OVHcloud%20AI%20Deploy%20and%20observability%20using%20MKS&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p><em><strong>Take your LLM (Large Language Model) deployment to production level with comprehensive custom autoscaling configuration and advanced vLLM metrics observability.</strong></em></p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="538" src="https://blog.ovhcloud.com/wp-content/uploads/2026/02/3-1024x538.jpg" alt="" class="wp-image-30579" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/02/3-1024x538.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/02/3-300x158.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/02/3-768x403.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/02/3.jpg 1200w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption"><em>vLLM metrics monitoring and observability based on OVHcloud infrastructure</em></figcaption></figure>



<p>This reference architecture describes a comprehensive solution for <strong>deploying, autoscaling and monitoring vLLM-based LLM workloads</strong> on OVHcloud infrastructure. It combines<strong>AI Deploy</strong>, used for <strong>model serving with custom metric autoscaling</strong>, and <strong>Managed Kubernetes Service (MKS)</strong>, which hosts the monitoring and observability stack.</p>



<p>By leveraging <strong>application-level Prometheus metrics exposed by vLLM</strong>, AI Deploy can automatically scale inference replicas based on real workload demand, ensuring <strong>high availability, consistent performance under load and efficient GPU utilisation</strong>. This autoscaling mechanism allows the platform to react dynamically to traffic spikes while maintaining predictable latency for end users.</p>



<p>On top of this scalable inference layer, the monitoring architecture provides <strong>observability</strong> through <strong>Prometheus</strong>, <strong>Grafana</strong> and Alertmanager. It enables real-time performance monitoring, capacity planning, and operational insights, while ensuring <strong>full data sovereignty</strong> for organisations running Large Language Models (LLMs) in production environments.</p>



<p><strong>What are the key benefits</strong>?</p>



<ul class="wp-block-list">
<li><strong>Cost-effective</strong>: Leverage managed services to minimise operational overhead</li>



<li><strong>Real-time observability</strong>: Track Time-to-First-Token (TTFT), throughput, and resource utilisation</li>



<li><strong>Sovereign infrastructure</strong>: All metrics and data remain within European datacentres</li>



<li><strong>Production-ready</strong>: Persistent storage, high availability, and automated monitoring</li>
</ul>



<h2 class="wp-block-heading">Context</h2>



<h3 class="wp-block-heading">AI Deploy</h3>



<p>OVHcloud AI Deploy is a<strong>&nbsp;Container as a Service</strong>&nbsp;(CaaS) platform designed to help you deploy, manage and scale AI models. It provides a solution that allows you to optimally deploy your applications/APIs based on Machine Learning (ML), Deep Learning (DL) or Large Language Models (LLMs).</p>



<p><strong>Key points to keep in mind</strong>:</p>



<ul class="wp-block-list">
<li><strong>Easy to use:</strong>&nbsp;Bring your own custom Docker image and deploy it in a command line or a few clicks surely</li>



<li><strong>High-performance computing:</strong>&nbsp;A complete range of GPUs available (H100, A100, V100S, L40S and L4)</li>



<li><strong>Scalability and flexibility:</strong>&nbsp;Supports automatic scaling, allowing your model to effectively handle fluctuating workloads</li>



<li><strong>Cost-efficient:</strong>&nbsp;Billing per minute, no surcharges</li>
</ul>



<h3 class="wp-block-heading">Managed Kubernetes Service</h3>



<p><strong>OVHcloud MKS</strong> is a fully managed Kubernetes platform designed to help you deploy, operate, and scale containerised applications in production. It provides a secure and reliable Kubernetes environment without the operational overhead of managing the control plane.</p>



<p><strong>What should you keep in mind?</strong></p>



<ul class="wp-block-list">
<li><strong>Cost-efficient</strong>: Only pay for worker nodes and consumed resources, with no additional charge for the Kubernetes control plane</li>



<li><strong>Fully managed Kubernetes</strong>: Certified upstream Kubernetes with automated control plane management, upgrades and high availability</li>



<li><strong>Production-ready by design</strong>: Built-in integrations with OVHcloud Load Balancers, networking and persistent storage</li>



<li><strong>Scalability and flexibility</strong>: Easily scale workloads and node pools to match application demand</li>



<li><strong>Open and portable</strong>: Based on standard Kubernetes APIs, enabling seamless integration with open-source ecosystems and avoiding vendor lock-in</li>
</ul>



<p>In the following guide, all services are deployed within the&nbsp;<strong>OVHcloud Public Cloud</strong>.</p>



<h2 class="wp-block-heading">Overview of the architecture</h2>



<p>This reference architecture describes a <strong>complete, secure and scalable solution</strong> to:</p>



<ul class="wp-block-list">
<li>Deploy an LLM with vLLM and <strong>AI Deploy</strong>, benefiting from automatic scaling based on custom metrics to ensure high service availability &#8211; vLLM exposes <code><mark class="has-inline-color has-ast-global-color-0-color"><strong>/metrics</strong></mark></code> via its public HTTPS endpoint on AI Deploy</li>



<li>Collect, store and visualise these vLLM metrics using Prometheus and Grafana on <strong>MKS</strong></li>
</ul>



<figure class="wp-block-image aligncenter size-full"><img loading="lazy" decoding="async" width="1200" height="630" src="https://blog.ovhcloud.com/wp-content/uploads/2026/02/1.jpg" alt="" class="wp-image-30578" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/02/1.jpg 1200w, https://blog.ovhcloud.com/wp-content/uploads/2026/02/1-300x158.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/02/1-1024x538.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/02/1-768x403.jpg 768w" sizes="auto, (max-width: 1200px) 100vw, 1200px" /><figcaption class="wp-element-caption"><em>vLLM metrics monitoring and observability architecture overview</em></figcaption></figure>



<p>Here you will find the main components of the architecture. The solution comprises three main layers:</p>



<ol class="wp-block-list">
<li><strong>Model serving layer</strong> with AI Deploy
<ul class="wp-block-list">
<li>vLLM containers running on top of GPUs for LLM inference</li>



<li>vLLM inference server exposing Prometheus metrics</li>



<li>Automatic scaling based on custom metrics to ensure high availability</li>



<li>HTTPS endpoints with Bearer token authentication</li>
</ul>
</li>



<li><strong>Monitoring and observability infrastructure</strong> using Kubernetes
<ul class="wp-block-list">
<li>Prometheus for metrics collection and storage</li>



<li>Grafana for visualisation and dashboards</li>



<li>Persistent volume storage for long-term retention</li>
</ul>
</li>



<li><strong>Network layer</strong>
<ul class="wp-block-list">
<li>Secure HTTPS communication between components</li>



<li>OVHcloud LoadBalancer for external access</li>
</ul>
</li>
</ol>



<p>To go further, some prerequisites must be checked!</p>



<h2 class="wp-block-heading">Prerequisites</h2>



<p>Before you begin, ensure you have:</p>



<ul class="wp-block-list">
<li>An&nbsp;<strong>OVHcloud Public Cloud</strong>&nbsp;account</li>



<li>An&nbsp;<strong>OpenStack user</strong>&nbsp;with the<a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-users?id=kb_article_view&amp;sysparm_article=KB0048170" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"> </a><strong><code><mark class="has-inline-color has-ast-global-color-0-color">Administrator</mark></code></strong> role</li>



<li><strong>ovhai CLI available</strong> &#8211;&nbsp;<em>install the&nbsp;<a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-cli-install-client?id=kb_article_view&amp;sysparm_article=KB0047844" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">ovhai CLI</a></em></li>



<li>A <strong>Hugging Face access</strong> &#8211; <em>create a&nbsp;<a href="https://huggingface.co/" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">Hugging Face account</a>&nbsp;and generate an&nbsp;<a href="https://huggingface.co/settings/tokens" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">access token</a></em></li>



<li><code><strong><mark class="has-inline-color has-ast-global-color-0-color">kubectl</mark></strong></code> installed and <code><strong><mark class="has-inline-color has-ast-global-color-0-color">helm</mark></strong></code> installed (at least version 3.x)</li>
</ul>



<p><strong>🚀 Now you have all the ingredients for our recipe, it’s time to deploy the Ministral 14B using AI Deploy and vLLM Docker container!</strong></p>



<h2 class="wp-block-heading">Architecture guide: From autoscaling to observability for LLMs served by vLLM</h2>



<p>Let’s set up and deploy this architecture!</p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="538" src="https://blog.ovhcloud.com/wp-content/uploads/2026/02/2-1024x538.jpg" alt="" class="wp-image-30580" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/02/2-1024x538.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/02/2-300x158.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/02/2-768x403.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/02/2.jpg 1200w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption"><em>Overview of the deployment workflow</em></figcaption></figure>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p><strong>✅ <em>Note</em></strong></p>



<p><strong><em>In this example, <a href="https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">mistralai/Ministral-3-14B-Instruct-2512</a> is used. Choose the open-source model of your choice and follow the same steps, adapting the model slug (from Hugging Face), the versions and the GPU(s) flavour.</em></strong></p>
</blockquote>



<p><em>Remember that all of the following steps can be automated using OVHcloud APIs!</em></p>



<h3 class="wp-block-heading">Step 1 &#8211; Manage access tokens</h3>



<p>Before introducing the monitoring stack, this architecture starts with the <strong>deployment of the <strong>Ministral 3 14B</strong> on OVHcloud AI Deploy</strong>, configured to <strong>autoscale based on custom Prometheus metrics exposed by vLLM itself</strong>.</p>



<p>Export your&nbsp;<a href="https://huggingface.co/settings/tokens" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">Hugging Face token</a>.</p>



<pre class="wp-block-code"><code class="">export MY_HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxx</code></pre>



<p><a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-cli-app-token?id=kb_article_view&amp;sysparm_article=KB0035280" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">Create a Bearer token</a>&nbsp;to access your AI Deploy app once it&#8217;s been deployed.</p>



<pre class="wp-block-code"><code class="">ovhai token create --role operator ai_deploy_token=my_operator_token</code></pre>



<p>Returning the following output:</p>



<p><code><strong>Id: 47292486-fb98-4a5b-8451-600895597a2b<br>Created At: 20-01-26 11:53:05<br>Updated At: 20-01-26 11:53:05<br>Spec:<br>Name: ai_deploy_token=my_operator_token<br>Role: AiTrainingOperator<br>Label Selector:<br>Status:<br>Value: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX<br>Version: 1</strong></code></p>



<p>You can now store and export your access token:</p>



<pre class="wp-block-code"><code class="">export MY_OVHAI_ACCESS_TOKEN=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX</code></pre>



<h3 class="wp-block-heading">Step 2 &#8211; LLM deployment using AI Deploy</h3>



<p>Before introducing the monitoring stack, this architecture starts with the <strong>deployment of the <strong>Ministral 3 14B</strong> on OVHcloud AI Deploy</strong>, configured to <strong>autoscale based on custom Prometheus metrics exposed by vLLM itself</strong>.</p>



<h4 class="wp-block-heading">1. Define the targeted vLLM metric for autoscaling</h4>



<p>Before proceeding with the deployment of the <strong>Ministral 3 14B</strong> endpoint, you have to choose the metric you want to use as the trigger for scaling.</p>



<p>Instead of relying solely on CPU/RAM utilisation, AI Deploy allows autoscaling decisions to be driven by <strong>application-level signals</strong>.</p>



<p>To do this, you can consult the <a href="https://docs.vllm.ai/en/latest/design/metrics/#v1-metrics" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">metrics exposed by vLLM</a>.</p>



<p>In this example, you can use a basic metric such as <code><mark class="has-inline-color has-ast-global-color-0-color"><strong>vllm:num_requests_running</strong></mark></code> to scale the number of replicas based on <strong>real inference load</strong>.</p>



<p>This enables:</p>



<ul class="wp-block-list">
<li>Faster reaction to traffic spikes</li>



<li>Better GPU utilisation</li>



<li>Reduced inference latency under load</li>



<li>Cost-efficient scaling</li>
</ul>



<p>Finally, the configuration chosen for scaling this application is as follows:</p>



<figure class="wp-block-table"><table class="has-fixed-layout"><thead><tr><th>Parameter</th><th>Value</th><th>Description</th></tr></thead><tbody><tr><td>Metric source</td><td><code>/metrics</code></td><td>vLLM Prometheus endpoint</td></tr><tr><td>Metric name</td><td><code>vllm:num_requests_running</code></td><td>Number of in-flight requests</td></tr><tr><td>Aggregation</td><td><code>AVERAGE</code></td><td>Mean across replicas</td></tr><tr><td>Target value</td><td><code>50</code></td><td>Desired load per replica</td></tr><tr><td>Min replicas</td><td><code>1</code></td><td>Baseline capacity</td></tr><tr><td>Max replicas</td><td><code>3</code></td><td>Burst capacity</td></tr></tbody></table></figure>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p><strong>✅ <em>Note</em></strong></p>



<p><em><strong>You can choose the metric that best suits your use case. You can also apply a patch to your AI Deploy deployment at any time to change the target metric for scaling</strong></em>.</p>
</blockquote>



<p>When the <strong>average number of running requests exceeds 50</strong>, AI Deploy automatically provisions <strong>additional GPU-backed replicas</strong>.</p>



<h4 class="wp-block-heading">2. Deploy Ministral 3 14B using AI Deploy</h4>



<p>Now you can deploy the LLM using the <strong><code>ovhai</code> CLI</strong>.</p>



<p>Key elements necessary for proper functioning:</p>



<ul class="wp-block-list">
<li>GPU-based inference: <strong><code><mark class="has-inline-color has-ast-global-color-0-color">1 x H100</mark></code></strong></li>



<li>vLLM OpenAI-compatible Docker image: <a href="https://hub.docker.com/r/vllm/vllm-openai/tags" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong><code><mark class="has-inline-color has-ast-global-color-0-color">vllm/vllm-openai:v0.13.0</mark></code></strong></a></li>



<li>Custom autoscaling rules based on Prometheus metrics: <code><strong><mark class="has-inline-color has-ast-global-color-0-color">vllm:num_requests_running</mark></strong></code></li>
</ul>



<p>Below is the reference command used to deploy the <strong><a href="https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">mistralai/Ministral-3-14B-Instruct-2512</a></strong>:</p>



<pre class="wp-block-code"><code class="">ovhai app run \<br>  --name vllm-ministral-14B-autoscaling-custom-metric \<br>  --default-http-port 8000 \<br>  --label ai_deploy_token=my_operator_token \<br>  --gpu 1 \<br>  --flavor h100-1-gpu \<br>  -e OUTLINES_CACHE_DIR=/tmp/.outlines \<br>  -e HF_TOKEN=$MY_HF_TOKEN \<br>  -e HF_HOME=/hub \<br>  -e HF_DATASETS_TRUST_REMOTE_CODE=1 \<br>  -e HF_HUB_ENABLE_HF_TRANSFER=0 \<br>  -v standalone:/hub:rw \<br>  -v standalone:/workspace:rw \<br>  --liveness-probe-path /health \<br>  --liveness-probe-port 8000 \<br>  --liveness-initial-delay-seconds 300 \<br>  --probe-path /v1/models \<br>  --probe-port 8000 \<br>  --initial-delay-seconds 300 \<br>  --auto-min-replicas 1 \<br>  --auto-max-replicas 3 \<br>  --auto-custom-api-url "http://&lt;SELF&gt;:8000/metrics" \<br>  --auto-custom-metric-format PROMETHEUS \<br>  --auto-custom-value-location vllm:num_requests_running \<br>  --auto-custom-target-value 50 \<br>  --auto-custom-metric-aggregation-type AVERAGE \<br>  vllm/vllm-openai:v0.13.0 \<br>  -- bash -c "python3 -m vllm.entrypoints.openai.api_server \<br>    --model mistralai/Ministral-3-14B-Instruct-2512 \<br>    --tokenizer_mode mistral \<br>    --load_format mistral \<br>    --config_format mistral \<br>    --enable-auto-tool-choice \<br>    --tool-call-parser mistral \<br>    --enable-prefix-caching"</code></pre>



<p>How to understand the different parameters of this command?</p>



<h5 class="wp-block-heading"><strong>a. Start your AI Deploy app</strong></h5>



<p>Launch a new app using&nbsp;<a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-cli-install-client?id=kb_article_view&amp;sysparm_article=KB0047844" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">ovhai CLI</a>&nbsp;and name it.</p>



<p><code><strong>ovhai app run --name vllm-ministral-14B-autoscaling-custom-metric</strong></code></p>



<h5 class="wp-block-heading"><strong>b. Define access</strong></h5>



<p>Define the HTTP API port and restrict access to your token.</p>



<p><strong><code>--default-http-port 8000</code><br><code>--label ai_deploy_token=my_operator_token</code></strong></p>



<h5 class="wp-block-heading"><strong>c. Configure GPU resources</strong></h5>



<p>Specify the hardware type (<code><strong>h100-1-gpu</strong></code>), which refers to an&nbsp;<strong>NVIDIA H100 GPU</strong>&nbsp;and the number (<strong>1</strong>).</p>



<p><code><strong>--gpu 1<br>--flavor h100-1-gpu</strong></code></p>



<p><strong><mark>⚠️WARNING!</mark></strong>&nbsp;For this model, one H100 is sufficient, but if you want to deploy another model, you will need to check which GPU you need. Note that you can also access L40S and A100 GPUs for your LLM deployment.</p>



<h5 class="wp-block-heading"><strong>d. Set up environment variables</strong></h5>



<p>Configure caching for the&nbsp;<strong>Outlines library</strong>&nbsp;(used for efficient text generation):</p>



<p><code><strong>-e OUTLINES_CACHE_DIR=/tmp/.outlines</strong></code></p>



<p>Pass the&nbsp;<strong>Hugging Face token</strong>&nbsp;(<code>$MY_HF_TOKEN</code>) for model authentication and download:</p>



<p><code><strong>-e HF_TOKEN=$MY_HF_TOKEN</strong></code></p>



<p>Set the&nbsp;<strong>Hugging Face cache directory</strong>&nbsp;to&nbsp;<code>/hub</code>&nbsp;(where models will be stored):</p>



<p><code><strong>-e HF_HOME=/hub</strong></code></p>



<p>Allow execution of&nbsp;<strong>custom remote code</strong>&nbsp;from Hugging Face datasets (required for some model behaviours):</p>



<p><code><strong>-e HF_DATASETS_TRUST_REMOTE_CODE=1</strong></code></p>



<p>Disable&nbsp;<strong>Hugging Face Hub transfer acceleration</strong>&nbsp;(to use standard model downloading):</p>



<p><code><strong>-e HF_HUB_ENABLE_HF_TRANSFER=0</strong></code></p>



<h5 class="wp-block-heading"><strong>e. Mount persistent volumes</strong></h5>



<p>Mount&nbsp;<strong>two persistent storage volumes</strong>:</p>



<ol class="wp-block-list">
<li><code>/hub</code>&nbsp;→ Stores Hugging Face model files</li>



<li><code>/workspace</code>&nbsp;→ Main working directory</li>
</ol>



<p>The&nbsp;<code>rw</code>&nbsp;flag means&nbsp;<strong>read-write access</strong>.</p>



<p><code><strong>-v standalone:/hub:rw<br>-v standalone:/workspace:rw</strong></code></p>



<h5 class="wp-block-heading"><strong>f. Health checks and readiness</strong></h5>



<p>Configure <strong>liveness and readiness probes</strong>:</p>



<ol class="wp-block-list">
<li><code>/health</code> verifies the container is alive</li>



<li><code>/v1/models</code> confirms the model is loaded and ready to serve requests</li>
</ol>



<p>The long initial delays (300 seconds) can be reduced; they correspond to the startup time of vLLM and the loading of the model on the GPU.</p>



<p><code><strong>--liveness-probe-path /health<br>--liveness-probe-port 8000<br>--liveness-initial-delay-seconds 300<br><br>--probe-path /v1/models<br>--probe-port 8000<br>--initial-delay-seconds 300</strong></code></p>



<h5 class="wp-block-heading"><strong>g. Autoscaling configuration (custom metrics)</strong></h5>



<p>First set the minimum and maximum number of replicas.</p>



<p><strong><code>--auto-min-replicas 1<br>--auto-max-replicas 3</code></strong></p>



<p>This guarantees basic availability (one replica always up) while allowing for peak capacity.</p>



<p>Then enable autoscaling based on application-level metrics exposed by vLLM.</p>



<p><strong><code>--auto-custom-api-url "http://&lt;SELF&gt;:8000/metrics"<br>--auto-custom-metric-format PROMETHEUS<br>--auto-custom-value-location vllm:num_requests_running<br>--auto-custom-target-value 50<br>--auto-custom-metric-aggregation-type AVERAGE</code></strong></p>



<p>AI Deploy:</p>



<ul class="wp-block-list">
<li>Scrapes the local <mark class="has-inline-color has-ast-global-color-0-color"><strong><code>/metrics</code></strong></mark> endpoint</li>



<li>Parses Prometheus-formatted metrics</li>



<li>Extracts the <strong><mark class="has-inline-color has-ast-global-color-0-color"><code>vllm:num_requests_running</code></mark></strong> gauge</li>



<li>Computes the average value across replicas</li>
</ul>



<p>Scaling behaviour:</p>



<ul class="wp-block-list">
<li>When the average number of in-flight requests exceeds <strong><code><mark class="has-inline-color has-ast-global-color-0-color">50</mark></code></strong>, AI Deploy adds replicas</li>



<li>When load decreases, replicas are scaled down</li>
</ul>



<p>This approach ensures high availability and predictable latency under fluctuating traffic.</p>



<h5 class="wp-block-heading"><strong>h. Choose the target Docker image and the startup command</strong></h5>



<p>Use the official <strong><a href="https://hub.docker.com/r/vllm/vllm-openai/tags" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">vLLM OpenAI-compatible Docker image</a></strong>.</p>



<p><strong><code>vllm/vllm-openai:v0.13.0</code></strong></p>



<p>Finally, run the model inside the container using a Python command to launch the vLLM API server:</p>



<ul class="wp-block-list">
<li><strong><code>python3 -m vllm.entrypoints.openai.api_server</code></strong>&nbsp;→ Starts the OpenAI-compatible vLLM API server</li>



<li><strong><code>--model mistralai/Ministral-3-14B-Instruct-2512</code></strong>&nbsp;→ Loads the&nbsp;<strong>Ministral 3 14B</strong>&nbsp;model from Hugging Face</li>



<li><strong><code>--tokenizer_mode mistral</code></strong>&nbsp;→ Uses the&nbsp;<strong>Mistral tokenizer</strong></li>



<li><strong><code>--load_format mistral</code></strong>&nbsp;→ Uses Mistral’s model loading format</li>



<li><strong><code>--config_format mistral</code></strong>&nbsp;→ Ensures the model configuration follows Mistral’s standard</li>



<li><code><strong>--enable-auto-tool-choice </strong></code>→ Automatic call of tools if necessary (function/tool call)</li>



<li><strong><code>--tool-call-parser mistral </code></strong>→ Tool calling support</li>



<li><strong><code>--enable-prefix-caching</code></strong> → Prefix caching for improved throughput and reduced latency</li>
</ul>



<p>You can now launch this command using <strong>ovhai CLI</strong>.</p>



<h4 class="wp-block-heading">3. Check AI Deploy app status</h4>



<p>You can now check if your&nbsp;<strong>AI Deploy</strong>&nbsp;app is alive:</p>



<pre class="wp-block-code"><code class="">ovhai app get &lt;your_vllm_app_id&gt;</code></pre>



<p><strong>Is your app in&nbsp;<code>RUNNING</code>&nbsp;status?</strong>&nbsp;Perfect! You can check in the logs that the server is started:</p>



<pre class="wp-block-code"><code class="">ovhai app logs &lt;your_vllm_app_id&gt;</code></pre>



<p><strong><mark>⚠️WARNING!</mark></strong>&nbsp;This step may take a little time as the LLM must be loaded.</p>



<h4 class="wp-block-heading">4. Test that the deployment is functional</h4>



<p>First you can request and send a prompt to the LLM. Launch the following query by asking the question of your choice:</p>



<pre class="wp-block-code"><code class="">curl https://&lt;your_vllm_app_id&gt;.app.gra.ai.cloud.ovh.net/v1/chat/completions \<br>  -H "Authorization: Bearer $MY_OVHAI_ACCESS_TOKEN" \<br>  -H "Content-Type: application/json" \<br>  -d '{<br>    "model": "mistralai/Ministral-3-14B-Instruct-2512",<br>    "messages": [<br>      {"role": "system", "content": "You are a helpful assistant."},<br>      {"role": "user", "content": "Give me the name of OVHcloud’s founder."}<br>    ],<br>    "stream": false<br>  }'</code></pre>



<p>You can also verify access to vLLM metrics.</p>



<pre class="wp-block-code"><code class="">curl -H "Authorization: Bearer $MY_OVHAI_ACCESS_TOKEN" \<br>  https://&lt;your_vllm_app_id&gt;.app.gra.ai.cloud.ovh.net/metrics</code></pre>



<p>If both tests show that the model deployment is functional and you receive 200 HTTP responses, you are ready to move on to the next step!</p>



<p>The next step is to set up the observability and monitoring stack. This autoscaling mechanism is <strong>fully independent</strong> from Prometheus used for observability:</p>



<ul class="wp-block-list">
<li>AI Deploy queries the local <strong><mark class="has-inline-color has-ast-global-color-0-color"><code>/metrics</code></mark></strong> endpoint internally</li>



<li>Prometheus scrapes the <strong>same metrics endpoint</strong> externally for monitoring, dashboards and potentially alerting</li>
</ul>



<p>This ensures:</p>



<ul class="wp-block-list">
<li>A single source of truth for metrics</li>



<li>No duplication of exporters</li>



<li>Consistent signals for scaling and observability</li>
</ul>



<h3 class="wp-block-heading">Step 3 &#8211; Create an MKS cluster</h3>



<p>From <a href="https://manager.eu.ovhcloud.com/#/hub/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OVHcloud Control Panel</a>, create a Kubernetes cluster using the <strong>MKS</strong>.</p>



<p>Consider using the following configuration for the current use case:</p>



<ul class="wp-block-list">
<li><strong>Location</strong>: GRA ( Gravelines) &#8211; <em>you can select the same region as for AI Deploy</em></li>



<li><strong>Network</strong>: Public</li>



<li><strong>Node pool</strong> :
<ul class="wp-block-list">
<li>Flavour : <code><strong><mark class="has-inline-color has-ast-global-color-0-color">b2-15</mark></strong></code> (or something similar)</li>



<li>Number of nodes: <strong><code><mark class="has-inline-color has-ast-global-color-0-color">3</mark></code></strong></li>



<li>Autoscaling : <strong><code><mark class="has-inline-color has-ast-global-color-0-color">OFF</mark></code></strong></li>
</ul>
</li>



<li><strong>Name your node pool:</strong> <strong><mark class="has-inline-color has-ast-global-color-0-color"><code>monitoring</code></mark></strong></li>
</ul>



<p>You should see your cluster (e.g. <code><mark class="has-inline-color has-ast-global-color-0-color"><strong>prometheus-vllm-metrics-ai-deploy</strong></mark></code>) in the list, along with the following information:</p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="632" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-3-1024x632.png" alt="" class="wp-image-30242" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-3-1024x632.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-3-300x185.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-3-768x474.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-3-1536x948.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-3-2048x1264.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>If the status is green with the <strong><mark style="color:#00d084" class="has-inline-color"><code>OK</code></mark></strong> label, you can proceed to the next step.</p>



<h3 class="wp-block-heading">Step 4 &#8211; Configure Kubernetes access</h3>



<p>Download your <strong>kubeconfig file</strong> from the OVHcloud Control Panel and configure <strong><code><mark class="has-inline-color has-ast-global-color-0-color">kubectl</mark></code></strong>:</p>



<pre class="wp-block-code"><code class=""># configure kubectl with your MKS cluster<br>export KUBECONFIG=/path/to/your/kubeconfig-xxxxxx.yml<br><br># verify cluster connectivity<br>kubectl cluster-info<br>kubectl get nodes</code></pre>



<p>Now,- you can create the <strong><mark class="has-inline-color has-ast-global-color-0-color"><code>values-prometheus.yaml</code></mark></strong> file:</p>



<pre class="wp-block-code"><code class=""># general configuration<br>nameOverride: "monitoring"<br>fullnameOverride: "monitoring"<br><br># Prometheus configuration<br>prometheus:<br>  prometheusSpec:<br>    # data retention (15d)<br>    retention: 15d<br>    <br>    # scrape interval (15s)<br>    scrapeInterval: 15s<br>    <br>    # persistent storage (required for production deployment)<br>    storageSpec:<br>      volumeClaimTemplate:<br>        spec:<br>          storageClassName: csi-cinder-high-speed  # OVHcloud storage<br>          accessModes: ["ReadWriteOnce"]<br>          resources:<br>            requests:<br>              storage: 50Gi  # (can be modified according to your needs)<br>    <br>    # scrape vLLM metrics from your AI Deploy instance (Ministral 3 14B)<br>    additionalScrapeConfigs:<br>      - job_name: 'vllm-ministral'<br>        scheme: https<br>        metrics_path: '/metrics'<br>        scrape_interval: 15s<br>        scrape_timeout: 10s<br>        <br>        # authentication using AI Deploy Bearer token stored Kubernetes Secret<br>        bearer_token_file: /etc/prometheus/secrets/vllm-auth-token/token<br>        static_configs:<br>          - targets:<br>              - '&lt;APP_ID&gt;.app.gra.ai.cloud.ovh.net'  # /!\ REPLACE THE &lt;APP_ID&gt; by yours /!\<br>            labels:<br>              service: 'vllm'<br>              model: 'ministral'<br>              environment: 'production'<br>        <br>        # TLS configuration<br>        tls_config:<br>          insecure_skip_verify: false<br>    <br>    # kube-prometheus-stack mounts the secret under /etc/prometheus/secrets/ and makes it accessible to Prometheus<br>    secrets:<br>      - vllm-auth-token<br><br># Grafana configuration (visualization layer)<br>grafana:<br>  enabled: true<br>  <br>  # disable automatic datasource provisioning<br>  sidecar:<br>    datasources:<br>      enabled: false<br>  <br>  # persistent dashboards<br>  persistence:<br>    enabled: true<br>    storageClassName: csi-cinder-high-speed<br>    size: 10Gi<br>  <br>  # /!\ DEFINE ADMIN PASSWORD - REPLACE "test" BY YOURS /!\<br>  adminPassword: "test"<br>  <br>  # access via OVHcloud LoadBalancer (public IP and managed LB)<br>  service:<br>    type: LoadBalancer<br>    port: 80<br>    annotations:<br>      # optional : limiter l'accès à certaines IPs<br>      # service.beta.kubernetes.io/ovh-loadbalancer-allowed-sources: "1.2.3.4/32"<br>  <br># alertmanager (optional but recommended for production)<br>alertmanager:<br>  enabled: true<br>  <br>  alertmanagerSpec:<br>    storage:<br>      volumeClaimTemplate:<br>        spec:<br>          storageClassName: csi-cinder-high-speed<br>          accessModes: ["ReadWriteOnce"]<br>          resources:<br>            requests:<br>              storage: 10Gi<br><br># cluster observability components<br>nodeExporter:<br>  enabled: true<br>  <br>kubeStateMetrics:<br>  enabled: true</code></pre>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p><strong>✅ <em>Note</em></strong></p>



<p><strong><em>On OVHcloud MKS, persistent storage is handled automatically through the Cinder CSI driver. When a PersistentVolumeClaim (PVC) references a supported <code>storageClassName</code> such as <code>csi-cinder-high-speed</code>, OVHcloud dynamically provisions the underlying Block Storage volume and attaches it to the node running the pod. This enables stateful components like Prometheus, Alertmanager and Grafana to persist data reliably without any manual volume management, making the architecture fully cloud-native and operationally simple.</em></strong></p>
</blockquote>



<p>Then create the <strong><code><mark class="has-inline-color has-ast-global-color-0-color">monitoring</mark></code></strong> namespace:</p>



<pre class="wp-block-code"><code class=""># create namespace<br>kubectl create namespace monitoring<br><br># verify creation<br>kubectl get namespaces | grep monitoring</code></pre>



<p>Finally,  configure the Bearer token secret to access vLLM metrics.</p>



<pre class="wp-block-code"><code class=""># create bearer token secret<br>kubectl create secret generic vllm-auth-token \<br>  --from-literal=token='"$MY_OVHAI_ACCESS_TOKEN"' \<br>  -n monitoring<br><br># verify secret creation<br>kubectl get secret vllm-auth-token -n monitoring<br><br># test token (optional)<br>kubectl get secret vllm-auth-token -n monitoring \<br>  -o jsonpath='{.data.token}' | base64 -d </code></pre>



<p>Right, if everything is working, let&#8217;s move on to deployment.</p>



<h3 class="wp-block-heading">Step 5 &#8211; Deploy Prometheus stack</h3>



<p>Add the Prometheus Helm repository and install the monitoring stack. The deployment creates:</p>



<ul class="wp-block-list">
<li>Prometheus StatefulSet with persistent storage</li>



<li>Grafana deployment with LoadBalancer access</li>



<li>Alertmanager for future alert configuration (optional)</li>



<li>Supporting components (node exporters, kube-state-metrics)</li>
</ul>



<pre class="wp-block-code"><code class=""># add Helm repository<br>helm repo add prometheus-community \<br>  https://prometheus-community.github.io/helm-charts<br>helm repo update<br><br># install monitoring stack<br>helm install monitoring prometheus-community/kube-prometheus-stack \<br>  --namespace monitoring \<br>  --values values-prometheus.yaml \<br>  --wait</code></pre>



<p>Then you can retrieve the LoadBalancer IP address to access Grafana:</p>



<pre class="wp-block-code"><code class="">kubectl get svc -n monitoring monitoring-grafana</code></pre>



<p>Finally, open your browser to <code><strong><mark class="has-inline-color has-ast-global-color-0-color">http://&lt;EXTERNAL-IP&gt;</mark></strong></code> and login with:</p>



<ul class="wp-block-list">
<li><strong>Username</strong>: <code><mark class="has-inline-color has-ast-global-color-0-color"><strong>admin</strong></mark></code></li>



<li><strong>Password</strong>: as configured in your <code><strong><mark class="has-inline-color has-ast-global-color-0-color">values-prometheus.yaml</mark></strong></code> file</li>
</ul>



<h3 class="wp-block-heading">Step 6 &#8211; Create Grafana dashboards</h3>



<p>In this step, you will be able to access Grafana interface and add your Prometheus as a new data source, then create a complete dashboard with different vLLM metrics.</p>



<h4 class="wp-block-heading">1. Add a new data source in Grafana</h4>



<p>First of all, create a new Prometheus connection inside Grafana:</p>



<ul class="wp-block-list">
<li>Navigate to <strong><mark class="has-inline-color has-ast-global-color-0-color"><code>Connections</code></mark></strong> → <strong><mark class="has-inline-color has-ast-global-color-0-color"><code>Data sources</code></mark></strong> → <strong><code><mark class="has-inline-color has-ast-global-color-0-color">Add data source</mark></code></strong></li>



<li>Select <strong>Prometheus</strong></li>



<li>Configure URL: <code><strong><mark class="has-inline-color has-ast-global-color-0-color">http://monitoring-prometheus:9090</mark></strong></code></li>



<li>Click <strong>Save &amp; test</strong></li>
</ul>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="609" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-4-1024x609.png" alt="" class="wp-image-30247" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-4-1024x609.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-4-300x178.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-4-768x457.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-4-1536x913.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-4-2048x1218.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Now that your Prometheus has been configured as a new data source, you can create your Grafana dashboard.</p>



<h4 class="wp-block-heading">2. Create your monitoring dashboard</h4>



<p>To begin with, you can use the following pre-configured Grafana dashboard by downloading this JSON file locally:</p>





<p>In the left-hand menu, select <strong><code><mark class="has-inline-color has-ast-global-color-0-color">Dashboard</mark></code></strong>:</p>



<ol class="wp-block-list">
<li>Navigate to <strong><code><mark class="has-inline-color has-ast-global-color-0-color">Dashboards</mark></code></strong> → <strong><code><mark class="has-inline-color has-ast-global-color-0-color">Import</mark></code></strong></li>



<li>Upload the provided dashboard JSON</li>



<li>Select <strong>Prometheus</strong> as datasource</li>



<li>Click <strong>Import</strong> and select the <strong><code><mark class="has-inline-color has-ast-global-color-0-color">vLLM-metrics-grafana-monitoring.json</mark></code></strong> file</li>
</ol>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="449" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-6-1024x449.png" alt="" class="wp-image-30250" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-6-1024x449.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-6-300x131.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-6-768x337.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-6-1536x673.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-6-2048x897.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>The dashboard provides real-time visibility for <strong>Ministral 3 14B</strong> deployed with vLLM container and OVHcloud AI Deploy.</p>



<p>You can now track:</p>



<ul class="wp-block-list">
<li><strong>Performance metrics</strong>: TTFT, inter-token latency, end-to-end latency</li>



<li><strong>Throughput indicators</strong>: Requests per second, token generation rates</li>



<li><strong>Resource utilisation</strong>: KV cache usage, active/waiting requests</li>



<li><strong>Capacity indicators</strong>: Queue depth, preemption rates</li>
</ul>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="540" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-7-1024x540.png" alt="" class="wp-image-30253" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-7-1024x540.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-7-300x158.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-7-768x405.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-7-1536x811.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-7-2048x1081.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Here are the key metrics tracked and displayed in the Grafana dashboard:</p>



<figure class="wp-block-table"><table class="has-fixed-layout"><thead><tr><th>Metric Category</th><th>Prometheus Metric</th><th>Description</th><th>Use case</th></tr></thead><tbody><tr><td><strong>Latency</strong></td><td><code>vllm:time_to_first_token_seconds</code></td><td>Time until first token generation</td><td>User experience monitoring</td></tr><tr><td><strong>Latency</strong></td><td><code>vllm:inter_token_latency_seconds</code></td><td>Time between tokens</td><td>Throughput optimisation</td></tr><tr><td><strong>Latency</strong></td><td><code>vllm:e2e_request_latency_seconds</code></td><td>End-to-end request time</td><td>SLA monitoring</td></tr><tr><td><strong>Throughput</strong></td><td><code>vllm:request_success_total</code></td><td>Successful requests counter</td><td>Capacity planning</td></tr><tr><td><strong>Resource</strong></td><td><code>vllm:kv_cache_usage_perc</code></td><td>KV cache memory usage</td><td>Memory management</td></tr><tr><td><strong>Queue</strong></td><td><code>vllm:num_requests_running</code></td><td>Active requests</td><td>Load monitoring</td></tr><tr><td><strong>Queue</strong></td><td><code>vllm:num_requests_waiting</code></td><td>Queued requests</td><td>Overload detection</td></tr><tr><td><strong>Capacity</strong></td><td><code>vllm:num_preemptions_total</code></td><td>Request preemptions</td><td>Peak load indicator</td></tr><tr><td><strong>Tokens</strong></td><td><code>vllm:prompt_tokens_total</code></td><td>Input tokens processed</td><td>Usage analytics</td></tr><tr><td><strong>Tokens</strong></td><td><code>vllm:generation_tokens_total</code></td><td>Output tokens generated</td><td>Cost tracking</td></tr></tbody></table></figure>



<p>Well done, you now have at your disposal:</p>



<ul class="wp-block-list">
<li>An endpoint of the Ministral 3 14B model deployed with vLLM thanks to <strong>OVHcloud AI Deploy</strong> and its autoscaling strategies based on custom metrics</li>



<li>Prometheus for metrics collection and Grafana for visualisation/dashboards thanks to <strong>OVHcloud MKS</strong></li>
</ul>



<p><strong>But how can you check that everything will work when the load increases?</strong></p>



<h3 class="wp-block-heading">Step 7 &#8211; Test autoscaling and real-time visualisation</h3>



<p>The first objective here is to force AI Deploy to:</p>



<ul class="wp-block-list">
<li>Increase <code>vllm:num_requests_running</code></li>



<li>&#8216;Saturate&#8217; a single replica</li>



<li>Trigger the <strong>scale up</strong></li>



<li>Observe replica increase + latency drop</li>
</ul>



<h4 class="wp-block-heading">1. Autoscaling testing strategy</h4>



<p>The goal is to combine:</p>



<ul class="wp-block-list">
<li><strong>High concurrency</strong></li>



<li><strong>Long prompts</strong> (KVcache heavy)</li>



<li><strong>Long generations</strong></li>



<li><strong>Bursty load</strong></li>
</ul>



<p>This is what vLLM autoscaling actually reacts to.</p>



<p>To do so, a Python code can simulate the expected behaviour:</p>



<pre class="wp-block-code"><code class="">import time<br>import threading<br>import random<br>from statistics import mean<br>from openai import OpenAI<br>from tqdm import tqdm<br><br>APP_URL = "https://&lt;APP_ID&gt;.app.gra.ai.cloud.ovh.net/v1" # /!\ REPLACE THE &lt;APP_ID&gt; by yours /!\<br>MODEL = "mistralai/Ministral-3-14B-Instruct-2512"<br>API_KEY = $MY_OVHAI_ACCESS_TOKEN<br><br>CONCURRENT_WORKERS = 500          # concurrency (main scaling trigger)<br>REQUESTS_PER_WORKER = 25<br>MAX_TOKENS = 768                  # generation pressure<br><br># some random prompts<br>SHORT_PROMPTS = [<br>    "Summarize the theory of relativity.",<br>    "Explain what a transformer model is.",<br>    "What is Kubernetes autoscaling?"<br>]<br><br>MEDIUM_PROMPTS = [<br>    "Explain how attention mechanisms work in transformer-based models, including self-attention and multi-head attention.",<br>    "Describe how vLLM manages KV cache and why it impacts inference performance."<br>]<br><br>LONG_PROMPTS = [<br>    "Write a very detailed technical explanation of how large language models perform inference, "<br>    "including tokenization, embedding lookup, transformer layers, attention computation, KV cache usage, "<br>    "GPU memory management, and how batching affects latency and throughput. Use examples.",<br>]<br><br>PROMPT_POOL = (<br>    SHORT_PROMPTS * 2 +<br>    MEDIUM_PROMPTS * 4 +<br>    LONG_PROMPTS * 6    # bias toward long prompts<br>)<br><br># openai compliance<br>client = OpenAI(<br>    base_url=APP_URL,<br>    api_key=API_KEY,<br>)<br><br># basic metrics<br>latencies = []<br>errors = 0<br>lock = threading.Lock()<br><br># worker<br>def worker(worker_id):<br>    global errors<br>    for _ in range(REQUESTS_PER_WORKER):<br>        prompt = random.choice(PROMPT_POOL)<br><br>        start = time.time()<br>        try:<br>            client.chat.completions.create(<br>                model=MODEL,<br>                messages=[{"role": "user", "content": prompt}],<br>                max_tokens=MAX_TOKENS,<br>                temperature=0.7,<br>            )<br>            elapsed = time.time() - start<br><br>            with lock:<br>                latencies.append(elapsed)<br><br>        except Exception as e:<br>            with lock:<br>                errors += 1<br><br># run<br>threads = []<br>start_time = time.time()<br><br>print("Starting autoscaling stress test...")<br>print(f"Concurrency: {CONCURRENT_WORKERS}")<br>print(f"Total requests: {CONCURRENT_WORKERS * REQUESTS_PER_WORKER}")<br><br>for i in range(CONCURRENT_WORKERS):<br>    t = threading.Thread(target=worker, args=(i,))<br>    t.start()<br>    threads.append(t)<br><br>for t in threads:<br>    t.join()<br><br>total_time = time.time() - start_time<br><br># results<br>print("\n=== AUTOSCALING BENCH RESULTS ===")<br>print(f"Total requests sent: {len(latencies) + errors}")<br>print(f"Successful requests: {len(latencies)}")<br>print(f"Errors: {errors}")<br>print(f"Total wall time: {total_time:.2f}s")<br><br>if latencies:<br>    print(f"Avg latency: {mean(latencies):.2f}s")<br>    print(f"Min latency: {min(latencies):.2f}s")<br>    print(f"Max latency: {max(latencies):.2f}s")<br>    print(f"Throughput: {len(latencies)/total_time:.2f} req/s")</code></pre>



<p><strong>How can you verify that autoscaling is working and that the load is being handled correctly without latency skyrocketing?</strong></p>



<h4 class="wp-block-heading">2. Hardware and platform-level monitoring</h4>



<p>First, <strong>AI Deploy Grafana</strong> answers <strong>&#8216;What resources are being used and how many replicas exist?</strong>&#8216;.</p>



<p>GPU utilisation, GPU memory, CPU, RAM and replica count are monitored through <strong>OVHcloud AI Deploy Grafana</strong> (monitoring URL), which exposes infrastructure and runtime metrics for the AI Deploy application. This layer provides visibility into <strong>resource saturation and scaling events</strong> managed by the AI Deploy platform itself.</p>



<p>Access it using the following URL (do not forget to replace <code><mark class="has-inline-color has-ast-global-color-0-color"><strong>&lt;APP_ID&gt;</strong></mark></code> by yours): <strong><code>https://monitoring.gra.ai.cloud.ovh.net/d/app/app-monitoring?var-app=</code><mark class="has-inline-color has-ast-global-color-0-color"><code>&lt;APP_ID&gt;</code></mark><code>&amp;orgId=1</code></strong></p>



<p>For example, check GPU/RAM metrics:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="540" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-8-1024x540.png" alt="" class="wp-image-30260" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-8-1024x540.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-8-300x158.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-8-768x405.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-8-1536x811.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-8-2048x1081.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>You can also monitor scale ups and downs in real time, as well as information on HTTP calls and much more!</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="540" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-9-1024x540.png" alt="" class="wp-image-30261" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-9-1024x540.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-9-300x158.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-9-768x405.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-9-1536x811.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-9-2048x1081.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<h4 class="wp-block-heading">3. Software and application-level monitoring</h4>



<p>Next the combination of MKS + Prometheus + Grafana answers <strong>&#8216;How the inference engine behaves internally&#8217;</strong>.</p>



<p>In fact, vLLM internal metrics (request concurrency, token throughput, latency indicators, KV cache pressure, etc.) are collected via the <strong>vLLM <code>/metrics</code> endpoint</strong> and scraped by <strong>Prometheus running on OVHcloud MKS</strong>, then visualised in a <strong>dedicated Grafana instance</strong>. This layer focuses on <strong>model behaviour and inference performance</strong>.</p>



<p>Find all these metrics via (just replace <strong><code><mark class="has-inline-color has-ast-global-color-0-color">&lt;EXTERNAL-IP&gt;</mark></code></strong>): <strong><code>http://<mark class="has-inline-color has-ast-global-color-0-color">&lt;EXTERNAL-IP&gt;</mark>/d/vllm-ministral-monitoring/ministral-14b-vllm-metrics-monitoring?orgId=1</code></strong></p>



<p>Find key metrics such as TTF, etc:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="540" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-10-1024x540.png" alt="" class="wp-image-30263" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-10-1024x540.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-10-300x158.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-10-768x405.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-10-1536x811.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-10-2048x1081.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>You can also find some information about <strong>&#8216;Model load and throughput&#8217;</strong>:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="540" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-11-1024x540.png" alt="" class="wp-image-30264" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-11-1024x540.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-11-300x158.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-11-768x405.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-11-1536x811.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-11-2048x1081.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>To go further and add even more metrics, you can refer to the vLLM documentation on &#8216;<a href="https://docs.vllm.ai/en/v0.7.2/getting_started/examples/prometheus_grafana.html" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Prometheus and Grafana</a>&#8216;.</p>



<h2 class="wp-block-heading">Conclusion</h2>



<p>This reference architecture provides a scalable, and production-ready approach for deploying LLM inference on OVHcloud using <strong>AI Deploy</strong> and the <a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-deploy-apps-deployments?id=kb_article_view&amp;sysparm_article=KB0047997#advanced-custom-metrics-for-autoscaling" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">autoscaling on custom metric feature</a>.</p>



<p>OVHcloud <strong>MKS</strong> is dedicated to running Prometheus and Grafana, enabling secure scraping and visualisation of <strong>vLLM internal metrics</strong> exposed via the <strong><mark class="has-inline-color has-ast-global-color-0-color"><code>/metrics</code> </mark></strong>endpoint.</p>



<p>By scraping vLLM metrics securely from AI Deploy into Prometheus and exposing them through Grafana, the architecture provides full visibility into model behaviour, performance and load, enabling informed scaling analysis, troubleshooting and capacity planning in production environments.</p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Freference-architecture-custom-metric-autoscaling-for-llm-inference-with-vllm-on-ovhcloud-ai-deploy-and-observability-using-mks%2F&amp;action_name=Reference%20Architecture%3A%20Custom%20metric%20autoscaling%20for%20LLM%20inference%20with%20vLLM%20on%20OVHcloud%20AI%20Deploy%20and%20observability%20using%20MKS&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Reference Architecture: build a sovereign n8n RAG workflow for AI agent using OVHcloud Public Cloud solutions</title>
		<link>https://blog.ovhcloud.com/reference-architecture-build-a-sovereign-n8n-rag-workflow-for-ai-agent-using-ovhcloud-public-cloud-solutions/</link>
		
		<dc:creator><![CDATA[Eléa Petton]]></dc:creator>
		<pubDate>Tue, 27 Jan 2026 13:12:03 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[AI Deploy]]></category>
		<category><![CDATA[AI Endpoints]]></category>
		<category><![CDATA[LLM]]></category>
		<category><![CDATA[Managed Database]]></category>
		<category><![CDATA[n8n]]></category>
		<category><![CDATA[Object Storage]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[OVHcloud]]></category>
		<category><![CDATA[Public Cloud]]></category>
		<category><![CDATA[RAG]]></category>
		<category><![CDATA[S3]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=29694</guid>

					<description><![CDATA[What if an n8n workflow, deployed in a&#160;sovereign environment, saved you time while giving you peace of mind? From document ingestion to targeted response generation, n8n acts as the conductor of your RAG pipeline without compromising data protection. In the current landscape of AI agents and knowledge assistants, connecting your internal documentation with&#160;Large Language Models&#160;(LLMs) [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Freference-architecture-build-a-sovereign-n8n-rag-workflow-for-ai-agent-using-ovhcloud-public-cloud-solutions%2F&amp;action_name=Reference%20Architecture%3A%20build%20a%20sovereign%20n8n%20RAG%20workflow%20for%20AI%20agent%20using%20OVHcloud%20Public%20Cloud%20solutions&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p><em><em>What if an n8n workflow, deployed in a&nbsp;</em><strong><em>sovereign environment</em></strong><em>, saved you time while giving you peace of mind? From document ingestion to targeted response generation, n8n acts as the conductor of your RAG pipeline without compromising data protection.</em></em></p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="576" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/ref-archi-n8n-rag-1024x576.jpg" alt="" class="wp-image-30002" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/ref-archi-n8n-rag-1024x576.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/ref-archi-n8n-rag-300x169.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/ref-archi-n8n-rag-768x432.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/ref-archi-n8n-rag-1536x864.jpg 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/ref-archi-n8n-rag.jpg 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption"><em>n8n workflow overview</em></figcaption></figure>



<p>In the current landscape of AI agents and knowledge assistants, connecting your internal documentation with&nbsp;<strong>Large Language Models</strong>&nbsp;(LLMs) is becoming a strategic differentiator.</p>



<p><strong>How?</strong>&nbsp;By building&nbsp;<strong>Agentic RAG systems</strong>&nbsp;capable of retrieving, reasoning, and acting autonomously based on external knowledge.</p>



<p>To make this possible, engineers need a way to connect&nbsp;<strong>retrieval pipelines (RAG)</strong>&nbsp;with&nbsp;<strong>tool-based orchestration</strong>.</p>



<p>This article outlines a&nbsp;<strong>reference architecture</strong>&nbsp;for building a&nbsp;<strong>fully automated RAG pipeline orchestrated by n8n</strong>, leveraging&nbsp;<strong>OVHcloud AI Endpoints</strong>&nbsp;and&nbsp;<strong>PostgreSQL with pgvector</strong>&nbsp;as core components.</p>



<p>The final result will be a system that automatically ingests Markdown documentation from&nbsp;<strong>Object Storage</strong>, creates embeddings with OVHcloud’s&nbsp;<strong>BGE-M3</strong>&nbsp;model available on AI Endpoints, and stores them in a&nbsp;<strong>Managed Database PostgreSQL</strong>&nbsp;with pgvector extension.</p>



<p>Lastly, you’ll be able to build an AI Agent that lets you chat with an LLM (<strong>GPT-OSS-120B</strong>&nbsp;on AI Endpoints). This agent, utilising the RAG implementation carried out upstream, will be an expert on OVHcloud products.</p>



<p>You can further improve the process by using an&nbsp;<strong>LLM guard</strong>&nbsp;to protect the questions sent to the LLM, and set up a chat memory to use conversation history for higher response quality.</p>



<p><strong>But what about n8n?</strong></p>



<p><a href="https://n8n.io/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong>n8n</strong></a>, the open-source workflow automation tool,&nbsp;offers many benefits and connects seamlessly with over&nbsp;<strong>300</strong>&nbsp;APIs, apps, and services:</p>



<ul class="wp-block-list">
<li><strong>Open-source</strong>: n8n is a 100% self-hostable solution, which means you retain full data control;</li>



<li><strong>Flexible</strong>: combines low-code nodes and custom JavaScript/Python logic;</li>



<li><strong>AI-ready</strong>: includes useful integrations for LangChain, OpenAI, and embedding support capabilities;</li>



<li><strong>Composable</strong>: enables simple connections between data, APIs, and models in minutes;</li>



<li><strong>Sovereign by design</strong>: compliant with privacy-sensitive or regulated sectors.</li>
</ul>



<p>This reference architecture serves as a blueprint for building a sovereign, scalable Retrieval Augmented Generation (<strong>RAG</strong>) platform using&nbsp;<strong>n8n</strong>&nbsp;and&nbsp;<strong>OVHcloud Public Cloud</strong>&nbsp;solutions.</p>



<p>This setup shows how to orchestrate data ingestion, generate embedding, and enable conversational AI by combining&nbsp;<strong>OVHcloud Object Storage</strong>,&nbsp;<strong>Managed Databases with PostgreSQL</strong>,&nbsp;<strong>AI Endpoints</strong>&nbsp;and&nbsp;<strong>AI Deploy</strong>.<strong>The result?</strong>&nbsp;An AI environment that is fully integrated, protects privacy, and is exclusively hosted on <strong>OVHcloud’s European infrastructure</strong>.</p>



<h2 class="wp-block-heading">Overview of the n8n workflow architecture for RAG </h2>



<p>The workflow involves the following steps:</p>



<ul class="wp-block-list">
<li><strong>Ingestion:</strong>&nbsp;documentation in markdown format is fetched from <strong>OVHcloud Object Storage (S3);</strong></li>



<li><strong>Preprocessing:</strong> n8n cleans and normalises the text, removing YAML front-matter and encoding noise;</li>



<li><strong>Vectorisation:</strong>&nbsp;Each document is embedded using the <strong>BGE-M3</strong> model, which is available via <strong>OVHcloud AI Endpoints;</strong></li>



<li><strong>Persistence:</strong> vectors and metadata are stored in <strong>OVHcloud PostgreSQL Managed Database</strong> using pgvector;</li>



<li><strong>Retrieval:</strong> when a user sends a query, n8n triggers a <strong>LangChain Agent</strong> that retrieves relevant chunks from the database;</li>



<li><strong>Reasoning and actions:</strong>&nbsp;The <strong>AI Agent node</strong> combines LLM reasoning, memory, and tool usage to generate a contextual response or trigger downstream actions (Slack reply, Notion update, API call, etc.).</li>
</ul>



<p>In this tutorial, all services are deployed within the <strong>OVHcloud Public Cloud</strong>.</p>



<h2 class="wp-block-heading">Prerequisites</h2>



<p>Before you start, double-check that you have:</p>



<ul class="wp-block-list">
<li>an <strong>OVHcloud Public Cloud</strong> account</li>



<li>an <strong>OpenStack user</strong> with the <a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-users?id=kb_article_view&amp;sysparm_article=KB0048170" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">&nbsp;following roles</a>:
<ul class="wp-block-list">
<li>Administrator</li>



<li>AI Operator</li>



<li>Object Storage Operator</li>
</ul>
</li>



<li>An <strong>API key</strong> for <a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-endpoints-getting-started?id=kb_article_view&amp;sysparm_article=KB0065401" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Endpoints</a></li>



<li><strong>ovhai CLI available</strong> – <em>install the </em><a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-cli-install-client?id=kb_article_view&amp;sysparm_article=KB0047844" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><em>ovhai CLI</em></a></li>



<li><strong>Hugging Face access</strong> – <em>create a </em><a href="https://huggingface.co/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><em>Hugging Face account</em></a><em> and generate an </em><a href="https://huggingface.co/settings/tokens" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><em>access token</em></a></li>
</ul>



<p><strong>🚀 Now that you have everything you need, you can start building your n8n workflow!</strong></p>



<h2 class="wp-block-heading">Architecture guide: n8n agentic RAG workflow</h2>



<p>You’re all set to configure and deploy your n8n workflow</p>



<p>⚙️<em> Keep in mind that the following steps can be completed using OVHcloud APIs!</em></p>



<h3 class="wp-block-heading">Step 1 &#8211; Build the RAG data ingestion pipeline</h3>



<p>This first step involves building the foundation of the entire RAG workflow by preparing the elements you need:</p>



<ul class="wp-block-list">
<li>n8n deployment</li>



<li>Object Storage bucket creation</li>



<li>PostgreSQL database creation</li>



<li>and more</li>
</ul>



<p>Remember to set up the proper credentials in n8n so the different elements can connect and function.</p>



<h4 class="wp-block-heading">1. Deploy n8n on OVHcloud VPS</h4>



<p>OVHcloud provides <a href="https://www.ovhcloud.com/en-gb/vps/vps-n8n/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong>VPS solutions compatible with n8n</strong></a><strong>.</strong> Get a ready-to-use virtual server with <strong>pre-installed n8n </strong>and start building automation workflows without manual setup. With plans ranging from <strong>6 vCores&nbsp;/&nbsp;12 GB RAM</strong> to <strong>24 vCores&nbsp;/&nbsp;96 GB RAM</strong>, you can choose the capacity that suits your workload.</p>



<p><strong>How to set up n8n on a VPS?</strong></p>



<p>Setting up n8n on an OVHcloud VPS generally involves:</p>



<ul class="wp-block-list">
<li>Choosing and provisioning your OVHcloud VPS plan;</li>



<li>Connecting to your server via SSH and carrying out the initial server configuration, which includes updating the OS;</li>



<li>Installing n8n, typically with Docker (recommended for ease of management and updates), or npm by following this <a href="https://help.ovhcloud.com/csm/en-gb-vps-install-n8n?id=kb_article_view&amp;sysparm_article=KB0072179" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">guide</a>;</li>



<li>Configuring n8n with a domain name, SSL certificate for HTTPS, and any necessary environment variables for databases or settings.</li>
</ul>



<p>While OVHcloud provides a robust VPS platform, you can find detailed n8n installation guides in the <a href="https://docs.n8n.io/hosting/installation/docker/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">official n8n documentation</a>.</p>



<p>Once the configuration is complete, you can configure the database and bucket in Object Storage.</p>



<h4 class="wp-block-heading">2. Create Object Storage bucket</h4>



<p>First, you have to set up your data source. Here you can store all your documentation in an S3-compatible <a href="https://www.ovhcloud.com/en-gb/public-cloud/object-storage/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Object Storage</a> bucket.</p>



<p>Here, assume that all the documentation files are in Markdown format.</p>



<p>From <strong>OVHcloud Control Panel</strong>, create a new Object Storage container with <strong>S3-compatible API </strong>solution; follow this <a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-storage-s3-getting-started-object-storage?id=kb_article_view&amp;sysparm_article=KB0034674" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">guide</a>.</p>



<p>When the bucket is ready, add your Markdown documentation to it.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1024x580.png" alt="" class="wp-image-29733" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p><strong>Note:</strong>&nbsp;For this tutorial, we’re using the various OVHcloud product documentation available in Open-Source on the GitHub repository maintained by OVHcloud members.</p>



<p><em>Click this </em><a href="https://github.com/ovh/docs.git" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><em>link</em></a><em> to access the repository.</em></p>
</blockquote>
</blockquote>



<p>How do you do that? Extract all the <a href="http://guide.en-gb.md" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong>guide.en-gb.md</strong></a> files from the GitHub repository and rename each one to match its parent folder.</p>



<p>Example: the documentation about ovhai cli installation <code><strong>docs/pages/public_cloud/ai_machine_learning/cli_10_howto_install_cli/</strong></code><a href="http://guide.en-gb.md" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong>guide.en-gb.md</strong></a> is stored in <strong>ovhcloud-products-documentation-md</strong> bucket as <a href="http://cli_10_howto_install_cli.md" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong>cli_10_howto_install_cli.md</strong></a></p>



<p>You should get an overview that looks like this:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-1024x580.png" alt="" class="wp-image-29735" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Keep the following elements and create a new credential in n8n named <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">OVHcloud S3 gra credentials</mark></strong></code>:</p>



<ul class="wp-block-list">
<li>S3 Endpoint: <a href="https://s3.gra.io.cloud.ovh.net/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">https://s3.gra.io.cloud.ovh.net/</mark></code></strong></a></li>



<li>Region: <strong><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">gra</mark></code></strong></li>



<li>Access Key ID: <strong><code>&lt;your_object_storage_user_access_key&gt;</code></strong></li>



<li>Secret Access Key: <strong><code>&lt;your_pbject_storage_user_secret_key&gt;</code></strong></li>
</ul>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-2-1024x580.png" alt="" class="wp-image-29736" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-2-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-2-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-2-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-2-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-2-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Then, create a new n8n node by selecting&nbsp;<strong>S3</strong>, then&nbsp;<strong>Get Multiple Files</strong>.<br>Configure this node as follows:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.20.47-1024x580.png" alt="" class="wp-image-29740" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.20.47-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.20.47-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.20.47-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.20.47-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.20.47-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Connect the node to the previous one before moving on to the next step.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.18.00-1024x580.png" alt="" class="wp-image-29741" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.18.00-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.18.00-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.18.00-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.18.00-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.18.00-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>With the first phase done, you can now configure the vector DB.</p>



<h4 class="wp-block-heading">3. Configure PostgreSQL Managed DB (pgvector)</h4>



<p>In this step, you can set up the vector database that lets you store the embeddings generated from your documents.</p>



<p>How? By using OVHcloud’s managed databases, a pgvector extension of&nbsp;<a href="https://www.ovhcloud.com/en-gb/public-cloud/postgresql/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">PostgreSQL</a>. Go to your OVHcloud Control Panel and follow the steps.</p>



<p>1. Navigate to&nbsp;<strong>Databases &amp; Analytics &gt; Databases</strong></p>



<p><strong>2. Create a new database and select&nbsp;<em>PostgreSQL</em>&nbsp;and a datacenter location</strong></p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/4-1024x580.png" alt="" class="wp-image-29758" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/4-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/4-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/4-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/4-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/4-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p><strong>3. Select&nbsp;<em>Production</em>&nbsp;plan and&nbsp;<em>Instance type</em></strong></p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/5-1024x580.png" alt="" class="wp-image-29759" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/5-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/5-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/5-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/5-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/5-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p><strong>4. Reset the user password and save it</strong></p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-1-1024x580.png" alt="" class="wp-image-29762" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-1-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-1-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-1-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-1-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-1-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p><strong>5. Whitelist the IP of your n8n instance as follows</strong></p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/7-1024x580.png" alt="" class="wp-image-29761" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/7-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/7-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/7-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/7-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/7-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p><strong>6. Take note of te following parameters</strong></p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/6-1024x580.png" alt="" class="wp-image-29760" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/6-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/6-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/6-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/6-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/6-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Make a note of this information and create a new credential in n8n named&nbsp;<strong>OVHcloud PGvector credentials</strong>:</p>



<ul class="wp-block-list">
<li>Host:<strong>&nbsp;<code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">&lt;db_hostname&gt;</mark></code></strong></li>



<li>Database:&nbsp;<strong>defaultdb</strong></li>



<li>User:&nbsp;<code>avnadmin</code></li>



<li>Password:&nbsp;<code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">&lt;db_password&gt;</mark></strong></code></li>



<li>Port:&nbsp;<strong>20184</strong></li>
</ul>



<p>Consider&nbsp;<code>enabling</code>&nbsp;the&nbsp;<strong>Ignore SSL Issues (Insecure)</strong>&nbsp;button as needed and setting the&nbsp;<strong>Maximum Number of Connections</strong>&nbsp;value to&nbsp;<strong><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">1000</mark></code></strong>.</p>



<figure class="wp-block-image"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/8-1024x580.png" alt="" class="wp-image-29763" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/8-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/8-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/8-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/8-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/8-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>✅ You’re now connected to the database! But what about the PGvector extension?</p>



<p>Add a PosgreSQL node in your n8n workflow&nbsp;<code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Execute a SQL query</mark></strong></code>,&nbsp;and create the extension through an SQL query, which should look like this:</p>



<pre class="wp-block-code"><code class="">-- drop table as needed<br>DROP TABLE IF EXISTS md_embeddings;<br><br>-- activate pgvector<br>CREATE EXTENSION IF NOT EXISTS vector;<br><br>-- create table<br>CREATE TABLE md_embeddings (<br>    id SERIAL PRIMARY KEY,<br>    text TEXT,<br>    embedding vector(1024),<br>    metadata JSONB<br>);</code></pre>



<p>You should get this n8n node:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.43.39-1024x580.png" alt="" class="wp-image-29752" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.43.39-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.43.39-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.43.39-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.43.39-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.43.39-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Finally, you can create a new table and name it&nbsp;<code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">md_embeddings</mark></strong></code>&nbsp;using this node. Create a&nbsp;<code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Stop and Error</mark></strong></code>&nbsp;node if you run into errors setting up the table.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.51.45-1024x580.png" alt="" class="wp-image-29753" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.51.45-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.51.45-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.51.45-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.51.45-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.51.45-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>All set! Your vector DB is prepped and ready for data! Keep in mind, you still need an&nbsp;<strong>embeddings model</strong> for the RAG data ingestion pipeline.</p>



<h4 class="wp-block-heading">4. Access to OVHcloud AI Endpoints</h4>



<p><strong>OVHcloud AI Endpoints</strong>&nbsp;is a managed service that provides&nbsp;<strong>ready-to-use APIs for AI models</strong>, including&nbsp;<strong>LLM, CodeLLM, embeddings, Speech-to-Text, and image models</strong>&nbsp;hosted within OVHcloud’s European infrastructure.</p>



<p>To vectorise the various documents in Markdown format, you have to select an embedding model:&nbsp;<a href="https://endpoints.ai.cloud.ovh.net/models/bge-m3" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong>BGE-M3</strong></a>.</p>



<p>Usually, your AI Endpoints API key should already be created. If not, head to the AI Endpoints menu in your OVHcloud Control Panel to generate a new API key.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-3-1-1024x580.png" alt="" class="wp-image-29775" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-3-1-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-3-1-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-3-1-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-3-1-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-3-1-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Once this is done, you can create new OpenAI credentials in your n8n.</p>



<p>Why do I need OpenAI credentials? Because <strong>AI Endpoints API&nbsp;</strong>is fully compatible with OpenAI’s, integrating it is simple and ensures the&nbsp;<strong>sovereignty of your data.</strong></p>



<p>How? Thanks to a single endpoint&nbsp;<a href="https://oai.endpoints.kepler.ai.cloud.ovh.net/v1" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>https://oai.endpoints.kepler.ai.cloud.ovh.net/v1</code></mark></strong></a>, you can request the different AI Endpoints models.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.45.33-1024x580.png" alt="" class="wp-image-29776" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.45.33-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.45.33-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.45.33-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.45.33-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.45.33-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>This means you can create a new n8n node by selecting&nbsp;<strong>Postgres PGVector Store</strong>&nbsp;and&nbsp;<strong>Add documents to Vector Store</strong>.<br>Set up this node as shown below:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.24-1024x580.png" alt="" class="wp-image-29781" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.24-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.24-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.24-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.24-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.24-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Then configure the <strong>Data Loader</strong> with a custom text splitting and a JSON type.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.38-1-1024x580.png" alt="" class="wp-image-29780" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.38-1-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.38-1-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.38-1-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.38-1-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.38-1-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>For the text splitter, here are some options:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-12.02.43-1024x580.png" alt="" class="wp-image-29786" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-12.02.43-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-12.02.43-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-12.02.43-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-12.02.43-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-12.02.43-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>To finish, select the&nbsp;<strong>BGE-M3</strong> embedding model from the model list and set the&nbsp;<strong>Dimensions</strong> to 1024.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.51-1024x580.png" alt="" class="wp-image-29784" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.51-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.51-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.51-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.51-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.51-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>You now have everything you need to build the ingestion pipeline.</p>



<h4 class="wp-block-heading">5. Set up the ingestion pipeline loop</h4>



<p>To make use of a fully automated document ingestion and vectorisation pipeline, you have to integrate some specific nodes, mainly:</p>



<ul class="wp-block-list">
<li>a <strong><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Loop Over Items</mark></code></strong> that downloads each markdown file one by one so that it can be vectorised;</li>



<li>a <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Code in JavaScript</mark></strong></code> that counts the number of files processed, which subsequently determines the number of requests sent to the embedding model;</li>



<li>an <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">If</mark></strong></code> condition that allows you to check when the 400 requests have been reached;</li>



<li>a <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Wait</mark></strong></code> node that pauses after every 400 requests to avoid getting rate-limited;</li>



<li>an S3 block <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Download a file</mark></strong></code> to download each markdown;</li>



<li>another <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Code in JavaScript</mark></strong></code> to extract and process text from Markdown files by cleaning and removing special characters before sending it to the embeddings model;</li>



<li>a PostgreSQL node to <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Execute a SQL</mark></strong></code> query to check that the table contains vectors after the process (loop) is complete.</li>
</ul>



<h5 class="wp-block-heading">5.1. Create a loop to process each documentation file</h5>



<p>Begin by creating a <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Loop Over Items</mark></strong></code> to process all the Markdown files one at a time. Set the <strong>batch size</strong> to <strong><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">1</mark></code></strong> in this loop.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-10.50.13-1024x580.png" alt="" class="wp-image-29788" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-10.50.13-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-10.50.13-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-10.50.13-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-10.50.13-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-10.50.13-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Add the <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>Loop</code></mark></strong> statement right after the S3 <strong><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Get Many Files</mark></code></strong> node as shown below:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.30.00-1024x580.png" alt="" class="wp-image-29797" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.30.00-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.30.00-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.30.00-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.30.00-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.30.00-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Time to put the loop’s content into action!</p>



<h5 class="wp-block-heading">5.2. Count the number of files using a code snippet</h5>



<p>Next, choose the <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Code in JavaScript</mark></strong></code> node from the list to see how many files have been processed. Set “Run Once for Each Item” <code><strong>Mode</strong></code> and “JavaScript” code <strong>Language</strong>, then add the following code snippet to the designated block.</p>



<pre class="wp-block-code"><code class="">// simple counter per item<br>const counter = $runIndex + 1;<br><br>return {<br>  counter<br>};</code></pre>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.05.47-1024x580.png" alt="" class="wp-image-29792" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.05.47-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.05.47-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.05.47-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.05.47-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.05.47-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Make sure this code snippet is included in the loop.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.33.57-1024x580.png" alt="" class="wp-image-29798" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.33.57-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.33.57-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.33.57-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.33.57-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.33.57-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>You can start adding the <mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><strong><code>if</code></strong></mark> part to the loop now.</p>



<h5 class="wp-block-heading">5.3. Add a condition that applies a rule every 400 requests</h5>



<p>Here, you need to create an <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">If</mark></strong></code> node and add the following condition, which you have set as an expression.</p>



<pre class="wp-block-code"><code class="">{{ (Number($json["counter"]) % 400) === 0 }}</code></pre>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.11.42-1024x580.png" alt="" class="wp-image-29794" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.11.42-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.11.42-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.11.42-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.11.42-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.11.42-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Add it immediately after counting the files:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.44.10-1024x580.png" alt="" class="wp-image-29800" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.44.10-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.44.10-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.44.10-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.44.10-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.44.10-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>If this condition <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">is true</mark></strong></code>, trigger the <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Wait</mark></strong></code> node.</p>



<h5 class="wp-block-heading">5.4. Insert a pause after each set of 400 requests</h5>



<p>Then insert a <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Wait</mark></strong></code> node to pause for a few seconds before resuming. You can insert <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Resume</mark></strong></code> “After Time Interval” and set the <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Wait Amount</mark></strong></code> to “60:00” seconds.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.23.39-1024x580.png" alt="" class="wp-image-29796" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.23.39-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.23.39-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.23.39-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.23.39-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.23.39-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Link it to the <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">If</mark></strong></code> condition when this is <strong>True</strong>.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.45.08-1024x580.png" alt="" class="wp-image-29801" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.45.08-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.45.08-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.45.08-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.45.08-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.45.08-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Next, you can go ahead and download the Markdown file, and then process it.</p>



<h5 class="wp-block-heading">5.5. Launch documentation download</h5>



<p>To do this, create a new <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Download a file</mark></strong></code> S3 node and configure it with this File Key expression:</p>



<pre class="wp-block-code"><code class="">{{ $('Process each documentation file').item.json.Key }}</code></pre>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.42.12-1024x580.png" alt="" class="wp-image-29804" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.42.12-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.42.12-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.42.12-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.42.12-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.42.12-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Want to connect it?  That’s easy, link it to the output of the <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Wait</mark></strong></code> and <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">If</mark></strong></code> statements when the ‘if’ statement returns <strong>False</strong>; this will allow the file to be processed only if the rate limit is not exceeded.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.49.05-1024x580.png" alt="" class="wp-image-29805" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.49.05-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.49.05-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.49.05-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.49.05-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.49.05-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>You’re almost done! Now you need to extract and process the text from the Markdown files – clean and remove any special characters before sending it to the embedding model.</p>



<h5 class="wp-block-heading">5.6 Clean Markdown text content</h5>



<p>Next, create another <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Code in JavaScript</mark></strong></code> to process text from Markdown files:</p>



<pre class="wp-block-code"><code class="">// extract binary content<br>const binary = $input.item.binary.data;<br><br>// decoding into clean UTF-8 text<br>let text = Buffer.from(binary.data, 'base64').toString('utf8');<br><br>// cleaning - remove non-printable characters<br>text = text<br>  .replace(/[^\x09\x0A\x0D\x20-\x7EÀ-ÿ€£¥•–—‘’“”«»©®™°±§¶÷×]/g, ' ')<br>  .replace(/\s{2,}/g, ' ')<br>  .trim();<br><br>// check lenght<br>if (text.length &gt; 14000) {<br>  text = text.slice(0, 14000);<br>}<br><br>return [{<br>  text,<br>  fileName: binary.fileName,<br>  mimeType: binary.mimeType<br>}];</code></pre>



<p>Select the <em>“Run Once for Each Item”</em> <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Mode</mark></strong></code> and place the previous code in the dedicated JavaScript block.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.02.04-1024x580.png" alt="" class="wp-image-29806" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.02.04-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.02.04-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.02.04-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.02.04-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.02.04-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>To finish, check that the output text has been sent to the document vectorisation system, which was set up in <strong>Step 3 – Configure PostgreSQL Managed DB (pgvector)</strong>.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.15.45-1024x580.png" alt="" class="wp-image-29808" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.15.45-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.15.45-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.15.45-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.15.45-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.15.45-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>How do I confirm that the table contains all elements after vectorisation?</p>



<h5 class="wp-block-heading">5.7 Double-check that the documents are in the table</h5>



<p>To confirm that your RAG system is working, make sure your vector database has different vectors; use a PostgreSQL node with <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Execute a SQL query</mark></strong></code> in your n8n workflow.</p>



<p>Then, run the following query:</p>



<pre class="wp-block-code"><code class="">-- count the number of elements<br>SELECT COUNT(*) FROM md_embeddings;</code></pre>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-20.28.49-1024x580.png" alt="" class="wp-image-29818" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-20.28.49-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-20.28.49-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-20.28.49-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-20.28.49-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-20.28.49-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Next, link this element to the <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Done</mark></strong></code> section of your <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Loop</mark></strong>, so the elements are counted when the process is complete.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.14.41-1024x580.png" alt="" class="wp-image-29773" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.14.41-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.14.41-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.14.41-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.14.41-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.14.41-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Congrats! You can now run the workflow to begin ingesting documents.</p>



<p>Click the <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Execute workflow</mark></strong></code> button and wait until the vectorization process is complete.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-11.41.52-1024x580.png" alt="" class="wp-image-29823" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-11.41.52-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-11.41.52-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-11.41.52-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-11.41.52-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-11.41.52-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Remember, everything should be green when it’s finished ✅.</p>



<h3 class="wp-block-heading">Step 2 – RAG chatbot</h3>



<p>With the data ingestion and vectorisation steps completed, you can now begin implementing your AI agent.</p>



<p>This involves building a <strong>RAG-based AI Agent</strong>&nbsp;by simply starting a chat with an LLM.</p>



<h4 class="wp-block-heading">1. Set up the chat box to start a conversation</h4>



<p>First, configure your AI Agent based on the RAG system, and add a new node in the same n8n workflow: <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Chat Trigger</mark></strong></code>.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.31.24-1024x580.png" alt="" class="wp-image-29834" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.31.24-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.31.24-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.31.24-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.31.24-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.31.24-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>This node will allow you to interact directly with your AI agent! But before that, you need to check that your message is safe.</p>



<p>This node will allow you to interact directly with your AI agent! But before that, you need to check that your message is secure.</p>



<h4 class="wp-block-heading">2. Set up your LLM Guard with AI Deploy</h4>



<p>To check whether a message is secure or not, use an LLM Guard.</p>



<p><strong>What’s an LLM Guard?</strong>&nbsp;This is a safety and control layer that sits between users and an LLM, or between the LLM and an external connection. Its main goal is to filter, monitor, and enforce rules on what goes into or comes out of the model 🔐.</p>



<p>You can use <a href="file:///Users/jdutse/Downloads/www.ovhcloud.com/en-gb/public-cloud/ai-deploy" data-wpel-link="internal">AI Deploy</a> from OVHcloud to deploy your desired LLM guard. With a single command line, this AI solution lets you deploy a Hugging Face model using vLLM Docker containers.</p>



<p>For more details, please refer to this <a href="https://blog.ovhcloud.com/mistral-small-24b-served-with-vllm-and-ai-deploy-one-command-to-deploy-llm/" data-wpel-link="internal">blog</a>.</p>



<p>For the use case covered in this article, you can use the open-source model <strong>meta-llama/Llama-Guard-3-8B</strong> available on <a href="https://huggingface.co/meta-llama/Llama-Guard-3-8B" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Hugging Face</a>.</p>



<h5 class="wp-block-heading">2.1 Create a Bearer token to request your custom AI Deploy endpoint</h5>



<p><a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-cli-app-token?id=kb_article_view&amp;sysparm_article=KB0035280" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Create a token</a> to access your AI Deploy app once it’s deployed.</p>



<pre class="wp-block-code"><code class="">ovhai token create --role operator ai_deploy_token=my_operator_token</code></pre>



<p>The following output is returned:</p>



<p><code><strong>Id: 47292486-fb98-4a5b-8451-600895597a2b<br>Created At: 20-10-25 8:53:05<br>Updated At: 20-10-25 8:53:05<br>Spec:<br>Name: ai_deploy_token=my_operator_token<br>Role: AiTrainingOperator<br>Label Selector:<br>Status:<br>Value: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX<br>Version: 1</strong></code></p>



<p>You can now store and export your access token to add it as a new credential in n8n.</p>



<pre class="wp-block-code"><code class="">export MY_OVHAI_ACCESS_TOKEN=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX</code></pre>



<h5 class="wp-block-heading">2.1 Start Llama Guard 3 model with AI Deploy</h5>



<p>Using <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">ovhai</mark></strong></code> CLI, launch the following command and vLLM start inference server.</p>



<pre class="wp-block-code"><code class="">ovhai app run \<br>	--name vllm-llama-guard3 \<br>        --default-http-port 8000 \<br>        --gpu 1 \<br>	--flavor l40s-1-gpu \<br>        --label ai_deploy_token=my_operator_token \<br>	--env OUTLINES_CACHE_DIR=/tmp/.outlines \<br>	--env HF_TOKEN=$MY_HF_TOKEN \<br>	--env HF_HOME=/hub \<br>	--env HF_DATASETS_TRUST_REMOTE_CODE=1 \<br>	--env HF_HUB_ENABLE_HF_TRANSFER=0 \<br>	--volume standalone:/workspace:RW \<br>	--volume standalone:/hub:RW \<br>	vllm/vllm-openai:v0.10.1.1 \<br>	-- bash -c python3 -m vllm.entrypoints.openai.api_server                       <br>                           --model meta-llama/Llama-Guard-3-8B \                     <br>                           --tensor-parallel-size 1 \                     <br>                           --dtype bfloat16</code></pre>



<p><em>Full command explained:</em></p>



<ul class="wp-block-list">
<li><code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">ovhai app run</mark></strong></code></li>
</ul>



<p>This is the core command to&nbsp;<strong>run an app</strong>&nbsp;using the&nbsp;<strong>OVHcloud AI Deploy</strong>&nbsp;platform.</p>



<ul class="wp-block-list">
<li><code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">--name vllm-llama-guard3</mark></strong></code></li>
</ul>



<p>Sets a&nbsp;<strong>custom name</strong>&nbsp;for the job. For example,&nbsp;<code>vllm-llama-guard3</code>.</p>



<ul class="wp-block-list">
<li><code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">--default-http-port 8000</mark></strong></code></li>
</ul>



<p>Exposes&nbsp;<strong>port 8000</strong>&nbsp;as the default HTTP endpoint. vLLM server typically runs on port 8000.</p>



<ul class="wp-block-list">
<li><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>--gpu&nbsp;</code>1</mark></strong></li>



<li><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>--flavor l40s-1-gpu</code></mark></strong></li>
</ul>



<p>Allocates&nbsp;<strong>1 GPU L40S</strong>&nbsp;for the app. You can adjust the GPU type and number depending on the model you have to deploy.</p>



<ul class="wp-block-list">
<li><code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">--volume standalone:/workspace:RW</mark></strong></code></li>



<li><code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">--volume standalone:/hub:RW</mark></strong></code></li>
</ul>



<p>Mounts&nbsp;<strong>two persistent storage volumes</strong>: <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>/workspace</code></mark></strong> which is the main working directory and <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">/hub</mark></strong></code>&nbsp;to store Hugging Face model files.</p>



<ul class="wp-block-list">
<li><code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">--env OUTLINES_CACHE_DIR=/tmp/.outlines</mark></strong></code></li>



<li><strong><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">--env HF_TOKEN=$MY_HF_TOKEN</mark></code></strong></li>



<li><code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">--env HF_HOME=/hub</mark></strong></code></li>



<li><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><strong>--env HF_DATASETS_TRUST_REMOTE_CODE=1</strong></mark></code></li>



<li><code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">--env HF_HUB_ENABLE_HF_TRANSFER=0</mark></strong></code></li>
</ul>



<p>These are Hugging Face&nbsp;<strong>environment variables</strong> you have to set. Please export your Hugging Face access token as environment variable before starting the app: <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">export MY_HF_TOKEN=***********</mark></strong></code></p>



<ul class="wp-block-list">
<li><code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">vllm/vllm-openai:v0.10.1.1</mark></strong></code></li>
</ul>



<p>Use the&nbsp;<strong><code>v<strong><code>llm/vllm-openai</code></strong></code></strong>&nbsp;Docker image (a pre-configured vLLM OpenAI API server).</p>



<ul class="wp-block-list">
<li><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><strong>-- bash -c python3 -m vllm.entrypoints.openai.api_server                       <br>                    --model meta-llama/Llama-Guard-3-8B \                     <br>                    --tensor-parallel-size 1 \                     <br>                    --dtype bfloat16</strong></mark></code></li>
</ul>



<p>Finally, run a<strong>&nbsp;bash shell</strong>&nbsp;inside the container and executes a Python command to launch the vLLM API server.</p>



<h5 class="wp-block-heading">2.2 Check to confirm your AI Deploy app is RUNNING</h5>



<p>Replace the <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">&lt;app_id></mark></strong></code> by yours.</p>



<pre class="wp-block-code"><code class="">ovhai app get &lt;app_id&gt;</code></pre>



<p>You should get:</p>



<p><code>History:<br>DATE STATE<br>20-1O-25 09:58:00 QUEUED<br>20-10-25 09:58:01 INITIALIZING<br>04-04-25 09:58:07 PENDING<br>04-04-25 10:03:10&nbsp;<strong>RUNNING</strong><br>Info:<br>Message: App is running</code></p>



<h5 class="wp-block-heading">2.3 Create a new n8n credential with AI Deploy app URL and Bearer access token</h5>



<p>First, using your <code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><strong>&lt;app_id></strong></mark></code>, retrieve your AI Deploy app URL.</p>



<pre class="wp-block-code"><code class="">ovhai app get <span style="background-color: initial; font-family: inherit; font-size: inherit; text-align: initial; font-weight: inherit;">&lt;app_id&gt;</span> -o json | jq '.status.url' -r</code></pre>



<p>Then, create a new OpenAI credential from your n8n workflow, using your AI Deploy URL and the Bearer token as an API key.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.49.14-1024x580.png" alt="" class="wp-image-29837" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.49.14-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.49.14-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.49.14-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.49.14-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.49.14-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Don&#8217;t forget to replace <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>6e10e6a5-2862-4c82-8c08-26c458ca12c7</code></mark></strong> with your <span style="background-color: initial; font-family: inherit; font-size: inherit; text-align: initial; font-weight: inherit;"><strong><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">&lt;app_id></mark></code></strong></span>.</p>



<h5 class="wp-block-heading">2.4 Create the LLM Guard node in n8n workflow</h5>



<p>Create a new <strong>OpenAI node</strong> to <strong>Message a model</strong> and select the new AI Deploy credential for LLM Guard usage.</p>



<p>Next, create the prompt as follows:</p>



<pre class="wp-block-code"><code class="">{{ $('Chat with the OVHcloud product expert').item.json.chatInput }}</code></pre>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.09.43-1024x580.png" alt="" class="wp-image-29840" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.09.43-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.09.43-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.09.43-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.09.43-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.09.43-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Then, use an <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">If</mark></strong></code> node to determine if the scenario is <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>safe</code></mark></strong> or <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>unsafe</code></mark></strong>:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.25.29-1024x580.png" alt="" class="wp-image-29842" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.25.29-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.25.29-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.25.29-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.25.29-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.25.29-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>If the message is <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">unsafe</mark></strong></code>, send an error message right away to stop the workflow.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.26.49-1024x580.png" alt="" class="wp-image-29843" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.26.49-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.26.49-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.26.49-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.26.49-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.26.49-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>But if the message is <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">safe</mark></strong></code>, you can send the request to the AI Agent without issues 🔐.</p>



<h4 class="wp-block-heading">3. Set up AI Agent</h4>



<p>The&nbsp;<strong>AI Agent</strong>&nbsp;node in&nbsp;<strong>n8n</strong>&nbsp;acts as an intelligent orchestration layer that combines&nbsp;<strong>LLMs, memory, and external tools</strong>&nbsp;within an automated workflow.</p>



<p>It allows you to:</p>



<ul class="wp-block-list">
<li>Connect a <strong>Large Language Model</strong> using APIs (e.g., LLMs from AI Endpoints);</li>



<li>Use <strong>tools</strong> such as HTTP requests, databases, or RAG retrievers so the agent can take actions or fetch real information;</li>



<li>Maintain <strong>conversational memory</strong> via PostgreSQL databases;</li>



<li>Integrate directly with chat platforms (e.g., Slack, Teams) for interactive assistants (optional).</li>
</ul>



<p>Simply put, n8n becomes an&nbsp;<strong>agentic automation framework</strong>, enabling LLMs to not only provide answers, but also think, choose, and perform actions.</p>



<p>Please note that you can change and customise this n8n <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">AI Agent</mark></strong></code> node to fit your use cases, using features like function calling or structured output. This is the most basic configuration for the given use case. You can go even further with different agents.</p>



<p>🧑‍💻&nbsp;<strong>How do I implement this RAG?</strong></p>



<p>First, create an <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">AI Agent</mark></strong></code> node in <strong>n8n</strong> as follows:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1024x580.png" alt="" class="wp-image-29933" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Then, a series of steps are required, the first of which is creating prompts.</p>



<h5 class="wp-block-heading">3.1 Create prompts</h5>



<p>In the AI Agent node on your n8n workflow, edit the user and system prompts.</p>



<p>Begin by creating the&nbsp;<strong>prompt</strong>,&nbsp;which is also the&nbsp;<strong>user message</strong>:</p>



<pre class="wp-block-code"><code class="">{{ $('Chat with the OVHcloud product expert').item.json.chatInput }}</code></pre>



<p>Then create the <strong>System Message</strong> as shown below:</p>



<pre class="wp-block-code"><code class="">You have access to a retriever tool connected to a knowledge base.  <br>Before answering, always search for relevant documents using the retriever tool.  <br>Use the retrieved context to answer accurately.  <br>If no relevant documents are found, say that you have no information about it.</code></pre>



<p>You should get a configuration like this:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1-1024x580.png" alt="" class="wp-image-29935" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>🤔 Well, an LLM is now needed for this to work!</p>



<h5 class="wp-block-heading">3.2 Select LLM using AI Endpoints API</h5>



<p>First, add an <strong>OpenAI Chat Model</strong> node, and then set it as the <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Chat Model</mark></strong></code> for your agent.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-3-1024x580.png" alt="" class="wp-image-29939" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-3-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-3-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-3-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-3-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-3-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Next, select one of the&nbsp;<a href="https://www.ovhcloud.com/en/public-cloud/ai-endpoints/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OVHcloud AI Endpoints</a>&nbsp;from the list provided, because they are compatible with Open AI APIs.</p>



<p>✅ <strong>How?</strong> By using the right API <a href="https://oai.endpoints.kepler.ai.cloud.ovh.net/v1" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>https://oai.endpoints.kepler.ai.cloud.ovh.net/v1</code></mark></strong></a></p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-2-1024x580.png" alt="" class="wp-image-29936" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-2-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-2-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-2-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-2-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-2-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>The <a href="https://www.ovhcloud.com/en/public-cloud/ai-endpoints/catalog/gpt-oss-120b/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong>GPT OSS 120B</strong></a> model has been selected for this use case. Other models, such as Llama, Mistral, and Qwen, are also available.</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p><mark style="background-color:#fcb900" class="has-inline-color">⚠️ <strong>WARNING</strong> ⚠️</mark></p>



<p>If you are using a recent version of n8n, you will likely encounter the <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>/responses</code></mark></strong> issue (linked to OpenAI compatibility). To resolve this, you will need to disable the button <strong><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Use Responses API</mark></code></strong> and everything will work correctly</p>
</blockquote>



<figure class="wp-block-image aligncenter size-full is-resized"><img loading="lazy" decoding="async" width="829" height="675" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/02_44_08-1.jpg" alt="" class="wp-image-30352" style="aspect-ratio:1.2281554640124863;width:409px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/02_44_08-1.jpg 829w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/02_44_08-1-300x244.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/02_44_08-1-768x625.jpg 768w" sizes="auto, (max-width: 829px) 100vw, 829px" /><figcaption class="wp-element-caption"><em>Tips to fix /responses issue</em></figcaption></figure>



<p>Your LLM is now set to answer your questions! Don’t forget, it needs access to the knowledge base.</p>



<h5 class="wp-block-heading">3.3 Connect the knowledge base to the RAG retriever</h5>



<p>As usual, the first step is to create an n8n node called <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">PGVector Vector Store nod</mark>e</strong></code> and enter your PGvector credentials.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-4-1024x580.png" alt="" class="wp-image-29943" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-4-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-4-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-4-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-4-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-4-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Next, link this element to the <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>Tools</code></mark></strong> section of the AI Agent node.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-5-1024x580.png" alt="" class="wp-image-29944" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-5-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-5-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-5-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-5-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-5-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Remember to connect your PG vector database so that the retriever can access the previously generated embeddings. Here’s an overview of what you’ll get.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-6-1024x580.png" alt="" class="wp-image-29945" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-6-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-6-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-6-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-6-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-6-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>⏳Nearly done! The final step is to add the database memory.</p>



<h5 class="wp-block-heading">3.4 Manage conversation history with database memory</h5>



<p>Creating&nbsp;<strong>Database Memory</strong>&nbsp;node in n8n (PostgreSQL) lets you link it to your AI Agent, so it can store and retrieve past conversation history. This enables the model to remember and use context from multiple interactions.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-7-1024x580.png" alt="" class="wp-image-29946" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-7-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-7-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-7-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-7-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-7-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>So link this PostgreSQL database to the <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Memory</mark></strong></code> section of your AI agent.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-8-1024x580.png" alt="" class="wp-image-29947" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-8-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-8-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-8-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-8-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-8-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Congrats! 🥳 Your&nbsp;<strong>n8n RAG workflow</strong>&nbsp;is now complete. Ready to test it?</p>



<h4 class="wp-block-heading">4. Make the most of your automated workflow</h4>



<p>Want to try it? It’s easy!</p>



<p>By clicking the orange <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>Open chat</code></mark></strong> button, you can ask the AI agent questions about OVHcloud products, particularly where you need technical assistance.</p>



<figure class="wp-block-video"><video height="1660" style="aspect-ratio: 2930 / 1660;" width="2930" controls src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/video-n8n1.mp4"></video></figure>



<p>For example, you can ask the LLM about rate limits in OVHcloud AI Endpoints and get the information in seconds.</p>



<figure class="wp-block-video"><video height="1660" style="aspect-ratio: 2930 / 1660;" width="2930" controls src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/video-n8n2.mp4"></video></figure>



<p>You can now build your own autonomous RAG system using OVHcloud Public Cloud, suited for a wide range of applications.</p>



<h2 class="wp-block-heading">What’s next?</h2>



<p>To sum up, this reference architecture provides a guide on using&nbsp;<strong>n8n</strong> with&nbsp;<strong>OVHcloud AI Endpoints</strong>,&nbsp;<strong>AI Deploy</strong>,&nbsp;<strong>Object Storage</strong>, and&nbsp;<strong>PostgreSQL + pgvector</strong> to build a fully controlled, autonomous&nbsp;<strong>RAG AI system</strong>.</p>



<p>Teams can build scalable AI assistants that work securely and independently in their cloud environment by orchestrating ingestion, embedding generation, vector storage, retrieval, and LLM safety check, and reasoning within a single workflow.</p>



<p>With the core architecture in place, you can add more features to improve the capabilities and robustness of your agentic RAG system:</p>



<ul class="wp-block-list">
<li>Web search</li>



<li>Images with OCR</li>



<li>Audio files transcribed using the Whisper model</li>
</ul>



<p>This delivers an extensive knowledge base and a wider variety of use cases!</p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Freference-architecture-build-a-sovereign-n8n-rag-workflow-for-ai-agent-using-ovhcloud-public-cloud-solutions%2F&amp;action_name=Reference%20Architecture%3A%20build%20a%20sovereign%20n8n%20RAG%20workflow%20for%20AI%20agent%20using%20OVHcloud%20Public%20Cloud%20solutions&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		<enclosure url="https://blog.ovhcloud.com/wp-content/uploads/2025/11/video-n8n1.mp4" length="11190376" type="video/mp4" />
<enclosure url="https://blog.ovhcloud.com/wp-content/uploads/2025/11/video-n8n2.mp4" length="9881210" type="video/mp4" />

			</item>
		<item>
		<title>Safety first: Detect harmful texts using an AI safeguard agent</title>
		<link>https://blog.ovhcloud.com/safety-first-detect-harmful-texts-using-an-ai-safeguard-agent/</link>
		
		<dc:creator><![CDATA[Alexandre Movsessian]]></dc:creator>
		<pubDate>Thu, 22 Jan 2026 10:46:11 +0000</pubDate>
				<category><![CDATA[Deploy & Scale]]></category>
		<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[Machine learning]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=30185</guid>

					<description><![CDATA[This article explains how to use the Qwen 3 Guard safeguard models provided by OVHCloud. Using this guide, you can analyse and moderate texts for LLM applications, chat platforms, customer support systems, or any other text-based services requiring safe and compliant interactions. Our focus will be on written content, such as conversations or plain text. [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fsafety-first-detect-harmful-texts-using-an-ai-safeguard-agent%2F&amp;action_name=Safety%20first%3A%20Detect%20harmful%20texts%20using%20an%20AI%20safeguard%20agent&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="981" height="463" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image.png" alt="" class="wp-image-30187" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image.png 981w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-300x142.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-768x362.png 768w" sizes="auto, (max-width: 981px) 100vw, 981px" /></figure>



<p class="has-text-align-left"><strong>This article explains how to use the Qwen 3 Guard safeguard models provided by OVHCloud.</strong></p>



<p>Using this guide, you can analyse and moderate texts for LLM applications, chat platforms, customer support systems, or any other text-based services requiring safe and compliant interactions.</p>



<p>Our focus will be on written content, such as conversations or plain text. Although image moderators exist, they won’t be covered here.</p>



<h2 class="wp-block-heading"><strong>Introduction</strong></h2>



<p><strong><br></strong>As <strong>Large Language Models</strong> (LLMs) continue to grow, access to information has become more seamless, but this ease of access makes it easier to generate, and be exposed to, harmful or toxic content.</p>



<p>LLMs can be prompted with malicious queries (e.g., “How do I make a bomb?”) and some models might comply by generating potentially dangerous responses. This risk is particularly concerning given the widespread availability of LLMs, to both minors and malicious actors alike.</p>



<p>To combat this, LLM providers train their models to reject toxic prompts, and integrate safety features to prevent the creation of harmful content. Even so, users often craft ‘<strong>jailbreaks</strong>’, which are specific prompts designed to get around these safety measures.</p>



<p>As a result, providers have created <strong>specialised safeguard models</strong> to find and remove toxic content in writing.</p>



<h1 class="wp-block-heading">What is toxicity?</h1>



<p>Toxicity is inherently difficult to define, as perceptions vary depending on factors such as individual sensitivity, cultural background, age, and personal experience.</p>



<p>Perceptions of content can vary widely. For example, some users may find certain jokes offensive, while others consider them perfectly acceptable. Similarly, roleplaying with an AI chat may be enjoyable for some, yet deemed inappropriate by others depending on the context.</p>



<p>Furthermore, each moderation system focuses on different categories of harmful content, based on the specific data and instructions it was trained on. For instance, models developed in the United States tend to be highly sensitive to hate speech, political content, and other related categories.</p>



<p>Because jailbreak attempts are a fairly new issue, existing moderation models often fail to address them.</p>



<p>Below are the toxicity categories for the Qwen 3 Guard models:</p>



<figure class="wp-block-table"><table class="has-fixed-layout"><tbody><tr><td><strong>Name</strong></td><td><strong>Description</strong></td></tr><tr><td><em>Violent</em></td><td>Content that provides detailed instructions, methods, or advice on how to commit acts of violence, including the manufacture, acquisition, or use of weapons. Also includes depictions of violence.</td></tr><tr><td><em>Nonviolent illegal acts</em></td><td>Content providing guidance or advice for nonviolent criminal activities like hacking, unauthorised drug manufacturing, or theft.</td></tr><tr><td><em>Sexual content or sexual acts</em></td><td>Content with sexual depictions, references, or descriptions of people. Also includes content with explicit sexual imagery, references, or descriptions of illegal or unethical sexual acts, such as rape, bestiality, incest, and sexual slavery.</td></tr><tr><td><em>Personally identifiable information</em></td><td>Content that shares or discloses sensitive personal identifying information, with authorisation, such as name, ID number, address, phone number, medical records, financial details, and account passwords, etc.</td></tr><tr><td><em>Suicide &amp; self-harm</em></td><td>Content advocating, directly encouraging, or detailing methods for self-harm, suicide, or dangerous activities that could lead to serious injury or death.</td></tr><tr><td><em>Unethical acts</em></td><td>Any immoral or unethical content or acts, including but not limited to bias, discrimination, stereotype, injustice, hate speech, offensive language, harassment, insults, threat, defamation, extremism, misinformation regarding ethics, and other behaviours that, while not illegal, are still considered unethical.</td></tr><tr><td><em>Politically sensitive topics</em></td><td>The deliberate creation or spread of false information about government actions, historical events, or public figures that is demonstrably untrue and poses risk of public deception or social harm.</td></tr><tr><td><em>Copyright violation</em></td><td>Content that includes unauthorised reproduction, distribution, public display, or derivative use of copyrighted materials, such as novels, scripts, lyrics, and other legally protected creative works, without the copyright holder’s clear consent.</td></tr><tr><td><em>Jailbreak</em></td><td>Content that explicitly attempts to override the model&#8217;s system prompt or model conditioning.</td></tr></tbody></table></figure>



<p>These categories are <strong>not mutually exclusive</strong>. A text may very well contain both Unethical Acts and Violence, for example. Most notably, jailbreaks often include another kind of toxic query as it is designed to bypass security guardrails. The Qwen 3 Guard moderator, however, will only return one category.</p>



<p>These categories were arbitrarily chosen by Qwen 3 Guard creators; they can’t be changed, but <strong>you may choose to ignore some</strong> depending on your use case.</p>



<h1 class="wp-block-heading">Metrics</h1>



<p><em>Attack</em>: An attack refers to any attempt to produce harmful or toxic content. This is either a prompt crafted to make an LLM generate harmful output, or just a user’s toxic message in a chat system.</p>



<p><em>Attack Success Rates (ASR)</em>: This is a metric used to assess the effectiveness of a moderation system. It represents the <strong>proportion of attacks that successfully bypass the moderator</strong> and go undetected. A lower ASR indicates a more robust moderation system.</p>



<p><em>False positive</em>: A false positive occurs when benign, nontoxic content is incorrectly flagged as harmful by the moderator.</p>



<p><em>False Positive Rate (FPR)</em>: The FPR measures how often a moderation system misclassifies safe content as toxic. It complements the ASR by reflecting the <strong>model’s ability to correctly allow harmless content through</strong>. A lower FPR indicates better reliability.</p>



<h2 class="wp-block-heading">Qwen 3 Guard</h2>



<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Qwen 3 Guard was launched in October 2025 by Qwen, Alibaba’s AI team. After extensive testing and evaluation, we found this model to be the most effective in safeguarding content.</p>



<p>Besides being efficient, Qwen 3 Guard can detect toxicity across nine categories, including jailbreak attempts, a feature that isn’t common in safeguard models.</p>



<p>It also provides explanations by specifying the exact category detected.</p>



<h3 class="wp-block-heading">Specs</h3>



<ul class="wp-block-list">
<li>Base model: Qwen 3</li>



<li>Flavours: 0.6B, 4B, 8B</li>



<li>Context size: 32,768 tokens</li>



<li>Languages: English, French and 117 other languages and dialects</li>



<li>Tasks:<ul><li>Detection of toxicity in raw text</li></ul><ul><li>Detection of toxicity in LLM dialogue</li></ul><ul><li>Detection of answer refusal (LLM dialogue only)</li></ul>
<ul class="wp-block-list">
<li>Classification of toxicity</li>
</ul>
</li>
</ul>



<h3 class="wp-block-heading">Availability</h3>



<p><a href="https://www.ovhcloud.com/en/public-cloud/ai-endpoints/catalog" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">https://www.ovhcloud.com/en/public-cloud/ai-endpoints/catalog</a></p>



<p>There are two flavours of Qwen 3 Guard available on OVHCloud:</p>



<p><strong><em>Qwen 3 Guard 0.6B</em></strong>: This lightweight model is very effective at detecting overt toxic content.</p>



<p><strong><em>Qwen 3 Guard 8B</em></strong>: This heavier model comes in handy when confronted with more nuanced examples.</p>



<h3 class="wp-block-heading">Scores</h3>



<figure class="wp-block-table"><table class="has-fixed-layout"><tbody><tr><td><strong>&nbsp;</strong></td><td><strong><em>ASR</em></strong></td><td><strong><em>FPR</em></strong></td></tr><tr><td><strong><em>Qwen 3 Guard 0.6B</em></strong></td><td>0.20</td><td>0.06</td></tr><tr><td><strong><em>Qwen 3 Guard 8B</em></strong></td><td>0.20</td><td>0.04</td></tr></tbody></table></figure>



<h3 class="wp-block-heading">&nbsp;</h3>



<h3 class="wp-block-heading">Notes</h3>



<ul class="wp-block-list">
<li>The Qwen 3 Guard models has three safety labels for more precise moderation: Safe, Controversial, Unsafe</li>



<li>Although the model can moderate chats, it is recommended to process each part of the dialogue individually rather than submitting the entire conversation at once. Guard Models, like any LLMs, perform better in detection when the context size is kept extremely brief.</li>



<li>Since Qwen Guard is developed by a Chinese company, its interpretation of toxic content may differ from yours. If necessary, you can overlook certain categories.</li>
</ul>



<h1 class="wp-block-heading">How do I set up my own moderator?</h1>



<p>First, you need to choose the flavour you want:</p>



<ul class="wp-block-list">
<li><strong><em>Qwen 3 Guard 0.6B</em></strong> is <strong>lightweight</strong>, <strong>fast</strong>, <strong>efficient</strong> and is great at detecting <strong>overt toxic content</strong>, like <em>Sexual Content</em> or <em>Violence</em> in texts.</li>
</ul>



<ul class="wp-block-list">
<li><strong><em>Qwen 3 Guard 8B</em></strong> is heavier, slightly slower but it is more effective against <strong>more nuanced toxic content </strong>like <em>Jailbreak</em> or <em>Unethical Acts</em>, and has a <strong>lower false positive rate</strong>.</li>
</ul>



<p>Your use case is the key to choosing the right model. Do you need to moderate a large volume of text? Is processing speed a priority? How crucial is it to minimise false positives? Are you dealing with nuanced toxic content, or is it more overt?</p>



<p>Carefully considering these questions will help you determine which of the two models is most suitable for your needs.</p>



<p>Both models can be tested on the playground:</p>



<p><a href="https://www.ovhcloud.com/en/public-cloud/ai-endpoints/catalog" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">https://www.ovhcloud.com/en/public-cloud/ai-endpoints/catalog</a></p>



<p>Once you’ve made you choice, you need to send the texts you want checked to the AI Endpoints API.</p>



<p>First install the <em>requests</em> library:</p>



<pre class="wp-block-code"><code class="">pip install requests</code></pre>



<p>Next, export your access token to the <em>OVH_AI_ENDPOINTS_ACCESS_TOKEN</em> environment variable:</p>



<pre class="wp-block-code"><code class="">export OVH_AI_ENDPOINTS_ACCESS_TOKEN=&lt;your-access-token&gt;</code></pre>



<p><em>If you don’t have an access token key yet, follow the steps in the </em><a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-endpoints-getting-started?id=kb_article_view&amp;sysparm_article=KB0065401" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><em>AI Endpoints – Getting Started</em></a> <em>guide</em></p>



<p>Finally, run the following Python code:</p>



<pre class="wp-block-code"><code class="">import os<br>import requests<br><br>url = "https://oai.endpoints.kepler.ai.cloud.ovh.net/v1/chat/completions"<br><br>payload = {<br>"messages": [{"role": "user", "content": "How do I cook meth ?"}],<br>"model": , #Qwen/Qwen3Guard-Gen-0.6B or Qwen/Qwen3Guard-Gen-8B<br>"seed": 21<br>}<br><br>headers = {<br>"Content-Type": "application/json",<br>"Authorization": f"Bearer {os.getenv('OVH_AI_ENDPOINTS_ACCESS_TOKEN')}",<br>}<br><br>response = requests.post(url, json=payload, headers=headers)<br>if response.status_code == 200:<br># Handle response<br>response_data = response.json()<br># Parse JSON response<br>choices = response_data["choices"]<br>for choice in choices:<br>text = choice["message"]["content"]<br># Process text<br>print(text)<br>else:<br>print("Error:", response.status_code, response.text)</code></pre>



<p>The model will respond with a label (Safe, Controversial, Unsafe) and if the text is Controversial or Unsafe, it will return the associated category.</p>



<pre class="wp-block-code"><code class="">Safety: Unsafe<br>Categories: Nonviolent Illegal Acts</code></pre>



<p>Our moderation models are available for free during the beta phase. You can test them with a different model or within the playground.</p>



<h2 class="wp-block-heading"><strong>Conclusion</strong></h2>



<p>Two models are currently available for OVHCloud moderation users:<br><strong>•</strong> Qwen 3 Guard 0.6B: <strong>Lightweight</strong>, <strong>fast</strong>, <strong>efficient,</strong> great at detecting <strong>overt toxic content</strong><br><strong>•</strong> Qwen 3 Guard 8B: <strong>Heavier, slightly slower but more effective against more nuanced toxic content</strong><br><br>Which approach and which tool should you choose? Well, it&#8217;s up to you, depending on your use cases, teams, or needs, etc.<br><br>As we&#8217;ve seen in this blog post, OVHcloud AIEndpoint users can start using these models right away, safely and free of charge.<br><br>They are still in beta phase for now, so we&#8217;d appreciate your feedback!</p>



<p></p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fsafety-first-detect-harmful-texts-using-an-ai-safeguard-agent%2F&amp;action_name=Safety%20first%3A%20Detect%20harmful%20texts%20using%20an%20AI%20safeguard%20agent&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Moving Beyond Ingress: Why should OVHcloud Managed Kubernetes Service (MKS) users start looking at the Gateway API?</title>
		<link>https://blog.ovhcloud.com/moving-beyond-ingress-why-should-ovhcloud-managed-kubernetes-service-mks-users-start-looking-at-the-gateway-api/</link>
		
		<dc:creator><![CDATA[Aurélie Vache&nbsp;and&nbsp;Antonin Anchisi]]></dc:creator>
		<pubDate>Mon, 15 Dec 2025 09:26:36 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[Tranches de Tech & co]]></category>
		<category><![CDATA[Kubernetes]]></category>
		<category><![CDATA[OVHcloud Managed Kubernetes]]></category>
		<category><![CDATA[Public Cloud]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=30016</guid>

					<description><![CDATA[For years, the Kubernetes Ingress API, and the popular Ingress NGINX controller (ingress-nginx), have been the default way to expose applications running inside a Kubernetes cluster. But the ecosystem is changing: the Kubernetes SIG network has announced the retirement of Ingress NGINX in March 2026. After March 2026 the Ingress NGINX will no longer get [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fmoving-beyond-ingress-why-should-ovhcloud-managed-kubernetes-service-mks-users-start-looking-at-the-gateway-api%2F&amp;action_name=Moving%20Beyond%20Ingress%3A%20Why%20should%20OVHcloud%20Managed%20Kubernetes%20Service%20%28MKS%29%20users%20start%20looking%20at%20the%20Gateway%20API%3F&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" width="1024" height="680" src="https://blog.ovhcloud.com/wp-content/uploads/2025/12/Gribouillis-2025-12-02-13.47.59.631-1024x680.png" alt="" class="wp-image-30084" style="width:669px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/12/Gribouillis-2025-12-02-13.47.59.631-1024x680.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/12/Gribouillis-2025-12-02-13.47.59.631-300x199.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/12/Gribouillis-2025-12-02-13.47.59.631.png 1505w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>For years, the Kubernetes <strong>Ingress</strong> API, and the popular Ingress NGINX controller (ingress-nginx), have been the default way to expose applications running inside a Kubernetes cluster.</p>



<p>But the ecosystem is changing: the Kubernetes SIG network has announced the <a href="https://kubernetes.io/blog/2025/11/11/ingress-nginx-retirement/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">retirement of Ingress NGINX</a> in March 2026.</p>



<p>After <strong>March 2026 </strong>the Ingress NGINX will no longer get new features, new releases, security patches and bug fixes.</p>



<p>Furthermore, the <a href="https://kubernetes.io/docs/concepts/services-networking/ingress/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Kubernetes project <strong>recommends using Gateway instead of Ingress</strong></a>.</p>



<p>The Ingress API has already been frozen, which means it is no longer being developed, and will have no further changes or updates made to it. The Kubernetes project has no plans to remove Ingress from Kubernetes.</p>



<p>While OVHcloud Managed Kubernetes Service (MKS) does not yet provide a native <strong>GatewayClass</strong>, you can already benefit from Gateway API capabilities today by deploying your own controller 💪 .</p>



<p>Also, until Gateway API becomes fully integrated with OpenStack providers, there is an <strong>intermediate option</strong>: using a <strong>modern, actively maintained Ingress controller</strong> other than ingress-nginx.</p>



<h3 class="wp-block-heading">The limitations of the current Ingress controller model</h3>



<p>The traditional Kubernetes Ingress model was intentionally simple: define an <code>Ingress</code>, install an <code>Ingress Controller</code>, and let it configure a single proxy (usually Nginx) to route traffic.</p>



<p>This design works, but it comes with limitations:</p>



<p>&#8211; Single Monolithic “Entry Point”: All HTTP routing for the entire cluster goes through <strong>one shared proxy</strong>. It adds complexity, configuration conflicts and scaling challenges.<br>&#8211; Protocol limitations: only <strong>HTTP and HTTPS</strong>.Support for gRPC, HTTP/2, TCP, UDP or TLS passthrough is inconsistent and controller-specific.<br>&#8211; Heavy Reliance on Annotations: Advanced features (timeouts, rewrites, header handling…) rely on custom annotations.<br>&#8211; Strong 3rd parties and cloud Load Balancers support: Every <a href="https://kubernetes.io/docs/concepts/services-networking/ingress-controllers/#additional-controllers" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Ingress controllers</a> (3rd parties providers) come with their specialized annotations.</p>



<p>Finally, as mentioned, the most used Ingress controller, Ingress NGINX, will be retired in March 2026.</p>



<h3 class="wp-block-heading">A Transitional Solution: Using a Modern Ingress Controller (Traefik, Contour, HAProxy…)</h3>



<p>Before moving to the Gateway API, as a transitional solution, OVHcloud MKS users can simply replace Ingress Nginx with a <strong>modern, actively maintained Ingress controller</strong>.</p>



<p>This allows you to:</p>



<p>&#8211; keep using your existing <code>Ingress</code> manifests<br>&#8211; keep the same architecture: Service type LoadBalancer → OVHcloud Public Cloud Load Balancer → Ingress Controller<br>&#8211; avoid relying on unsupported or deprecated components<br>&#8211; gain features (better gRPC support, built‑in dashboards, improved L7 behaviour&#8230;)</p>



<h4 class="wp-block-heading">Popular alternatives:</h4>



<p><a href="https://doc.traefik.io/traefik/providers/kubernetes-ingress/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong>Traefik</strong></a>:<br>&#8211; Very easy to deploy<br>&#8211; Excellent support for HTTP/2, gRPC, WebSockets<br>&#8211; Built‑in dashboard<br>&#8211; Supports both Ingress and Gateway API<br>&#8211; Actively maintained<br>&#8211; Seamless migration from NGINX Ingress Controller to Traefik with <a href="https://doc.traefik.io/traefik/reference/routing-configuration/kubernetes/ingress-nginx/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">NGINX annotation compatibility</a></p>



<p><strong><a href="https://projectcontour.io/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Contour</a> (Envoy)</strong>:<br>&#8211; Envoy-based Ingress Controller<br>&#8211; Excellent performance<br>&#8211; Good stepping‑stone toward Gateway API</p>



<p><a href="https://www.haproxy.com/documentation/kubernetes-ingress/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong>HAProxy Ingress</strong></a>:<br>&#8211; Extremely performant<br>&#8211; Enterprise-grade L7 routing<br>&#8211; Optional Gateway API support</p>



<p><strong><a href="https://docs.nginx.com/nginx-gateway-fabric/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">NGINX Gateway Fabric</a> (NGF)</strong>:<br>&#8211; The successor to Ingress NGINX<br>&#8211; Built directly around Gateway API<br>&#8211; Still maturing but a strong long‑term candidate</p>



<p>If you are interested, you can read the more<a href="https://kubernetes.io/docs/concepts/services-networking/ingress-controllers/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"> exhaustive list of Ingress controllers</a>.</p>



<h3 class="wp-block-heading">Installing an Alternative Ingress Controller on OVHcloud MKS</h3>



<p>We will show you how to install <strong>Traefik</strong>, as an alternative Ingress controller and use it to spawn a single OVHcloud Public Cloud Load Balancer (based on OpenStack Octavia).</p>



<p>Install Traefik:</p>



<pre class="wp-block-code"><code class="">helm repo add traefik https://traefik.github.io/charts<br>helm repo update<br><br>helm install traefik traefik/traefik --namespace traefik --create-namespace --set service.type=LoadBalancer</code></pre>



<p>This automatically triggers:<br>&#8211; the OpenStack CCM (used by OVHcloud)<br>&#8211; the creation of an OVHcloud Public Cloud Load Balancer<br>&#8211; exposure of Traefik through a public IP</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="179" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-11-1024x179.png" alt="" class="wp-image-30035" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-11-1024x179.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-11-300x52.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-11-768x134.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-11-1536x268.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-11-2048x358.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>After several seconds, the Load Balancer will be active.</p>



<p>Check that Traefik is running:</p>



<pre class="wp-block-code"><code class="">$ kubectl get all -n traefik<br>NAME                           READY   STATUS    RESTARTS   AGE<br>pod/traefik-6777c5db85-pddd6   1/1     Running   0          31s<br><br>NAME              TYPE           CLUSTER-IP     EXTERNAL-IP   PORT(S)                      AGE<br>service/traefik   LoadBalancer   10.3.129.188   &lt;pending&gt;     80:30267/TCP,443:30417/TCP   31s<br><br>NAME                      READY   UP-TO-DATE   AVAILABLE   AGE<br>deployment.apps/traefik   1/1     1            1           31s<br><br>NAME                                 DESIRED   CURRENT   READY   AGE<br>replicaset.apps/traefik-6777c5db85   1         1         1       31s</code></pre>



<p>Then in order to use it, create an <code>ingress.yaml</code> file with the following content:</p>



<pre class="wp-block-code"><code class="">apiVersion: networking.k8s.io/v1<br>kind: Ingress<br>metadata:<br>  name: my-app-ingress<br>  namespace: default<br>  annotations:<br>    kubernetes.io/ingress.class: "traefik"  # Specifies Traefik as the ingress controller<br>spec:<br>  rules:<br>    - host: my-app.local<br>      http:<br>        paths:<br>          - path: /<br>            pathType: Prefix<br>            backend:<br>              service:<br>                name: my-app-service<br>                port:<br>                  number: 80</code></pre>



<p>And apply it in your cluster:</p>



<pre class="wp-block-code"><code class="">kubectl apply -f ingress.yaml</code></pre>



<p>Using this type of alternative provides a <strong>fully supported, modern Ingress Controller</strong> while you prepare a long‑term transition to the Gateway API.</p>



<h3 class="wp-block-heading">Gateway API: A modern, flexible networking model</h3>



<p>The <strong>Gateway API</strong> is the next-generation Kubernetes networking specification. It introduces clearer roles and more flexible architectures.</p>



<p>Gateway API splits responsibilities across:<br>&#8211; <strong>GatewayClass</strong>: defines the type of gateway and which controller manages it<br>&#8211; <strong>Gateway</strong>: the actual entry point (e.g., a Load Balancer)<br>&#8211; <strong>Routes</strong>: routing rules, protocol-specific (HTTPRoute, TLSRoute, GRPCRoute, TCPRoute…)</p>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" width="800" height="700" src="https://blog.ovhcloud.com/wp-content/uploads/2025/12/image-1.png" alt="" class="wp-image-30065" style="width:558px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/12/image-1.png 800w, https://blog.ovhcloud.com/wp-content/uploads/2025/12/image-1-300x263.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/12/image-1-768x672.png 768w" sizes="auto, (max-width: 800px) 100vw, 800px" /></figure>



<p>Gateway API supports:<br>&#8211; HTTP(S)<br>&#8211; HTTP/2<br>&#8211; gRPC<br>&#8211; TCP<br>&#8211; TLS passthrough<br>…in a consistent and portable way.</p>



<p>Unlike Ingress, Gateway API is explicitly designed to allow providers like OVHcloud, AWS, GCP, Azure to:<br>&#8211; provision Load Balancers (LB)<br>&#8211; manage listeners<br>&#8211; expose multiple ports<br>&#8211; integrate with their LB features<br>This paves the way for native OVHcloud <strong>GatewayClass</strong> support.</p>



<h3 class="wp-block-heading">How does it work today on OVHcloud MKS?</h3>



<p>OVHcloud MKS relies on the OpenStack Cloud Controller Manager (CCM) to provision OVHcloud <strong>Public Cloud</strong> Load Balancers in response to a Service of type <code>LoadBalancer</code>.</p>



<p>Since MKS does not yet include a native <code>GatewayClass</code>, you can use Gateway API today as follows:</p>



<p>1. You deploy an existing Gateway Controller (Envoy Gateway, Traefik, Contour/Envoy…) and its GatewayClass.<br>2. The controller deploys a Data Plane proxy inside the cluster.<br>3. To expose that proxy, you still have to create a <code>Service</code> of type <strong>LoadBalancer</strong> (and your app of course).<br>4. The CCM provisions an OVHcloud Public Cloud Load Balancer and forwards traffic to your proxy.</p>



<p>Thanks to that, you will have a fully functional Gateway API. The workflow is very similar to that which is required for using NGINX Ingress controller.</p>



<h3 class="wp-block-heading">Using the Gateway API on OVHcloud MKS today</h3>



<p>You can already use the Gateway API by deploying your preferred controller.</p>



<p>Here’s an example using<a href="https://gateway.envoyproxy.io/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"> Envoy Gateway</a>, one of the most future-proof options.</p>



<p>Install Gateway API CRDs:</p>



<pre class="wp-block-code"><code class="">kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/latest/download/standard-install.yaml</code></pre>



<p>Deploy Envoy Gateway:</p>



<pre class="wp-block-code"><code class="">helm install eg oci://docker.io/envoyproxy/gateway-helm -n envoy-gateway-system --create-namespace</code></pre>



<p>You should have a result like this:</p>



<pre class="wp-block-code"><code class="">$ helm install eg oci://docker.io/envoyproxy/gateway-helm -n envoy-gateway-system --create-namespace<br><br>Pulled: docker.io/envoyproxy/gateway-helm:1.6.0<br>Digest: sha256:5c55e7844ae8cff3152ca00330234ef61b1f9fa3d466f50db2c63a279f1cd1df<br>NAME: eg<br>LAST DEPLOYED: Mon Dec  1 16:27:07 2025<br>NAMESPACE: envoy-gateway-system<br>STATUS: deployed<br>REVISION: 1<br>TEST SUITE: None<br>NOTES:<br>**************************************************************************<br>*** PLEASE BE PATIENT: Envoy Gateway may take a few minutes to install ***<br>**************************************************************************<br><br>Envoy Gateway is an open source project for managing Envoy Proxy as a standalone or Kubernetes-based application gateway.<br><br>Thank you for installing Envoy Gateway! 🎉<br><br>Your release is named: eg. 🎉<br><br>Your release is in namespace: envoy-gateway-system. 🎉<br><br>To learn more about the release, try:<br><br>  $ helm status eg -n envoy-gateway-system<br>  $ helm get all eg -n envoy-gateway-system<br><br>To have a quickstart of Envoy Gateway, please refer to https://gateway.envoyproxy.io/latest/tasks/quickstart.<br><br>To get more details, please visit https://gateway.envoyproxy.io and https://github.com/envoyproxy/gateway.</code></pre>



<p>Check the Envoy gateway is running:</p>



<pre class="wp-block-code"><code class="">$ kubectl get po -n envoy-gateway-system<br>NAME                            READY   STATUS    RESTARTS   AGE<br>envoy-gateway-9cbbc577c-5h5qw   1/1     Running   0          16m</code></pre>



<p>As a quickstart, you can install directly the <a href="https://gateway-api.sigs.k8s.io/api-types/gatewayclass/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">GatewayClass</a>, <a href="https://gateway-api.sigs.k8s.io/api-types/gateway/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Gateway</a>, <a href="https://gateway-api.sigs.k8s.io/api-types/httproute/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">HTTPRoute</a> and an example app:</p>



<pre class="wp-block-code"><code class="">kubectl apply -f https://github.com/envoyproxy/gateway/releases/download/latest/quickstart.yaml -n default</code></pre>



<p>This command deploys a <code>GatewayClass</code>, a <code>Gateway</code>, a <code>HTTPRoute</code> and an app deployed in a deployment and exposed through a service:</p>



<pre class="wp-block-code"><code class="">gatewayclass.gateway.networking.k8s.io/eg created<br>gateway.gateway.networking.k8s.io/eg created<br>serviceaccount/backend created<br>service/backend created<br>deployment.apps/backend created<br>httproute.gateway.networking.k8s.io/backend created</code></pre>



<p>As you can see, a GatewayClass have been deployed:</p>



<pre class="wp-block-code"><code class="">$ kubectl get gatewayclass -o yaml | kubectl neat<br>apiVersion: v1<br>items:<br>- apiVersion: gateway.networking.k8s.io/v1<br>  kind: GatewayClass<br>  metadata:<br>    name: eg<br>  spec:<br>    controllerName: gateway.envoyproxy.io/gatewayclass-controller<br>kind: List<br>metadata:<br>  resourceVersion: ""</code></pre>



<p>Note that a GatewayClass is a cluster-wide resource so you don&#8217;t have to specify any namespace.</p>



<p>A Gateway have been deployed also:</p>



<pre class="wp-block-code"><code class="">$ kubectl get gateway -o yaml -n default | kubectl neat<br>apiVersion: v1<br>items:<br>- apiVersion: gateway.networking.k8s.io/v1<br>  kind: Gateway<br>  metadata:<br>    name: eg<br>    namespace: default<br>  spec:<br>    gatewayClassName: eg<br>    listeners:<br>    - allowedRoutes:<br>        namespaces:<br>          from: Same<br>      name: http<br>      port: 80<br>      protocol: HTTP<br>kind: List<br>metadata:<br>  resourceVersion: ""</code></pre>



<p>A HTTPRoute also:</p>



<pre class="wp-block-code"><code class="">$ kubectl get httproute -o yaml -n default | kubectl neat<br>apiVersion: v1<br>items:<br>- apiVersion: gateway.networking.k8s.io/v1<br>  kind: HTTPRoute<br>  metadata:<br>    name: backend<br>    namespace: default<br>  spec:<br>    hostnames:<br>    - www.example.com<br>    parentRefs:<br>    - group: gateway.networking.k8s.io<br>      kind: Gateway<br>      name: eg<br>    rules:<br>    - backendRefs:<br>      - group: ""<br>        kind: Service<br>        name: backend<br>        port: 3000<br>        weight: 1<br>      matches:<br>      - path:<br>          type: PathPrefix<br>          value: /<br>kind: List<br>metadata:<br>  resourceVersion: ""</code></pre>



<p>In order to retrieve the external IP (of the external Load Balancer), you just have to get information about the Gateway and export it in an environment variable:</p>



<pre class="wp-block-code"><code class="">$ kubectl get gateway eg<br>NAME   CLASS   ADDRESS        PROGRAMMED   AGE<br>eg     eg      xx.xxx.xx.xxx   True        18m<br><br>$ export GATEWAY_HOST=$(kubectl get gateway/eg -o jsonpath='{.status.addresses[0].value}')<br><br>$ echo $GATEWAY_HOST<br>xx.xxx.xx.xxx</code></pre>



<p>And finally, a <code>backend</code> service have been deployed with its deployment:</p>



<pre class="wp-block-code"><code class="">$ kubectl get pod,svc -l app=backend -n default<br>NAME                           READY   STATUS    RESTARTS   AGE<br>pod/backend-765694d47f-zr6hh   1/1     Running   0          21m<br><br>NAME              TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE<br>service/backend   ClusterIP   10.3.114.179   &lt;none&gt;        3000/TCP   21m</code></pre>



<p>In order to create your own <code>Gateway</code> and <code>*Route</code> resources, don&#8217;t hesitate to take a look at the <a href="https://gateway-api.sigs.k8s.io/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Gateway API website</a>.</p>



<h3 class="wp-block-heading">Conclusion</h3>



<p>Two migration paths are currently available for OVHcloud MKS users:</p>



<ul class="wp-block-list">
<li>Short-term: switch to a modern Ingress Controller (Traefik, Contour, HAProxy, NGF&#8230;). It provides full support for current Ingress usage, without requiring API changes.</li>



<li>Long-term: adopt the Gateway API. Gateway API brings multi‑protocol support, clearer separation of roles, and is the strategic direction of Kubernetes networking.</li>
</ul>



<p>Which approach and which tool should you choose? Well, it’s up to you, depending on your use cases, your teams, your needs… 🙂</p>



<p>As we have seen in this blog post, OVHcloud MKS users can begin adopting these technologies today, safely and incrementally.</p>



<p>This ecosystem is evolving quickly, so stay tuned to find out about the coming release of a pre-installed official GatewayClass (based on OpenStack Octavia) 💪.</p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fmoving-beyond-ingress-why-should-ovhcloud-managed-kubernetes-service-mks-users-start-looking-at-the-gateway-api%2F&amp;action_name=Moving%20Beyond%20Ingress%3A%20Why%20should%20OVHcloud%20Managed%20Kubernetes%20Service%20%28MKS%29%20users%20start%20looking%20at%20the%20Gateway%20API%3F&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
