<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>AI Deploy Archives - OVHcloud Blog</title>
	<atom:link href="https://blog.ovhcloud.com/tag/ai-deploy/feed/" rel="self" type="application/rss+xml" />
	<link>https://blog.ovhcloud.com/tag/ai-deploy/</link>
	<description>Innovation for Freedom</description>
	<lastBuildDate>Tue, 10 Feb 2026 08:51:12 +0000</lastBuildDate>
	<language>en-GB</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>https://blog.ovhcloud.com/wp-content/uploads/2019/07/cropped-cropped-nouveau-logo-ovh-rebranding-32x32.gif</url>
	<title>AI Deploy Archives - OVHcloud Blog</title>
	<link>https://blog.ovhcloud.com/tag/ai-deploy/</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Reference Architecture: Custom metric autoscaling for LLM inference with vLLM on OVHcloud AI Deploy and observability using MKS</title>
		<link>https://blog.ovhcloud.com/reference-architecture-custom-metric-autoscaling-for-llm-inference-with-vllm-on-ovhcloud-ai-deploy-and-observability-using-mks/</link>
		
		<dc:creator><![CDATA[Eléa Petton]]></dc:creator>
		<pubDate>Tue, 10 Feb 2026 08:51:11 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[AI Deploy]]></category>
		<category><![CDATA[Kubernetes]]></category>
		<category><![CDATA[LLM]]></category>
		<category><![CDATA[MKS]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[OVHcloud]]></category>
		<category><![CDATA[prometheus]]></category>
		<category><![CDATA[Public Cloud]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=30203</guid>

					<description><![CDATA[Take your LLM (Large Language Model) deployment to production level with comprehensive custom autoscaling configuration and advanced vLLM metrics observability. This reference architecture describes a comprehensive solution for deploying, autoscaling and monitoring vLLM-based LLM workloads on OVHcloud infrastructure. It combinesAI Deploy, used for model serving with custom metric autoscaling, and Managed Kubernetes Service (MKS), which [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Freference-architecture-custom-metric-autoscaling-for-llm-inference-with-vllm-on-ovhcloud-ai-deploy-and-observability-using-mks%2F&amp;action_name=Reference%20Architecture%3A%20Custom%20metric%20autoscaling%20for%20LLM%20inference%20with%20vLLM%20on%20OVHcloud%20AI%20Deploy%20and%20observability%20using%20MKS&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p><em><strong>Take your LLM (Large Language Model) deployment to production level with comprehensive custom autoscaling configuration and advanced vLLM metrics observability.</strong></em></p>



<figure class="wp-block-image aligncenter size-large"><img fetchpriority="high" decoding="async" width="1024" height="538" src="https://blog.ovhcloud.com/wp-content/uploads/2026/02/3-1024x538.jpg" alt="" class="wp-image-30579" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/02/3-1024x538.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/02/3-300x158.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/02/3-768x403.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/02/3.jpg 1200w" sizes="(max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption"><em>vLLM metrics monitoring and observability based on OVHcloud infrastructure</em></figcaption></figure>



<p>This reference architecture describes a comprehensive solution for <strong>deploying, autoscaling and monitoring vLLM-based LLM workloads</strong> on OVHcloud infrastructure. It combines<strong>AI Deploy</strong>, used for <strong>model serving with custom metric autoscaling</strong>, and <strong>Managed Kubernetes Service (MKS)</strong>, which hosts the monitoring and observability stack.</p>



<p>By leveraging <strong>application-level Prometheus metrics exposed by vLLM</strong>, AI Deploy can automatically scale inference replicas based on real workload demand, ensuring <strong>high availability, consistent performance under load and efficient GPU utilisation</strong>. This autoscaling mechanism allows the platform to react dynamically to traffic spikes while maintaining predictable latency for end users.</p>



<p>On top of this scalable inference layer, the monitoring architecture provides <strong>observability</strong> through <strong>Prometheus</strong>, <strong>Grafana</strong> and Alertmanager. It enables real-time performance monitoring, capacity planning, and operational insights, while ensuring <strong>full data sovereignty</strong> for organisations running Large Language Models (LLMs) in production environments.</p>



<p><strong>What are the key benefits</strong>?</p>



<ul class="wp-block-list">
<li><strong>Cost-effective</strong>: Leverage managed services to minimise operational overhead</li>



<li><strong>Real-time observability</strong>: Track Time-to-First-Token (TTFT), throughput, and resource utilisation</li>



<li><strong>Sovereign infrastructure</strong>: All metrics and data remain within European datacentres</li>



<li><strong>Production-ready</strong>: Persistent storage, high availability, and automated monitoring</li>
</ul>



<h2 class="wp-block-heading">Context</h2>



<h3 class="wp-block-heading">AI Deploy</h3>



<p>OVHcloud AI Deploy is a<strong>&nbsp;Container as a Service</strong>&nbsp;(CaaS) platform designed to help you deploy, manage and scale AI models. It provides a solution that allows you to optimally deploy your applications/APIs based on Machine Learning (ML), Deep Learning (DL) or Large Language Models (LLMs).</p>



<p><strong>Key points to keep in mind</strong>:</p>



<ul class="wp-block-list">
<li><strong>Easy to use:</strong>&nbsp;Bring your own custom Docker image and deploy it in a command line or a few clicks surely</li>



<li><strong>High-performance computing:</strong>&nbsp;A complete range of GPUs available (H100, A100, V100S, L40S and L4)</li>



<li><strong>Scalability and flexibility:</strong>&nbsp;Supports automatic scaling, allowing your model to effectively handle fluctuating workloads</li>



<li><strong>Cost-efficient:</strong>&nbsp;Billing per minute, no surcharges</li>
</ul>



<h3 class="wp-block-heading">Managed Kubernetes Service</h3>



<p><strong>OVHcloud MKS</strong> is a fully managed Kubernetes platform designed to help you deploy, operate, and scale containerised applications in production. It provides a secure and reliable Kubernetes environment without the operational overhead of managing the control plane.</p>



<p><strong>What should you keep in mind?</strong></p>



<ul class="wp-block-list">
<li><strong>Cost-efficient</strong>: Only pay for worker nodes and consumed resources, with no additional charge for the Kubernetes control plane</li>



<li><strong>Fully managed Kubernetes</strong>: Certified upstream Kubernetes with automated control plane management, upgrades and high availability</li>



<li><strong>Production-ready by design</strong>: Built-in integrations with OVHcloud Load Balancers, networking and persistent storage</li>



<li><strong>Scalability and flexibility</strong>: Easily scale workloads and node pools to match application demand</li>



<li><strong>Open and portable</strong>: Based on standard Kubernetes APIs, enabling seamless integration with open-source ecosystems and avoiding vendor lock-in</li>
</ul>



<p>In the following guide, all services are deployed within the&nbsp;<strong>OVHcloud Public Cloud</strong>.</p>



<h2 class="wp-block-heading">Overview of the architecture</h2>



<p>This reference architecture describes a <strong>complete, secure and scalable solution</strong> to:</p>



<ul class="wp-block-list">
<li>Deploy an LLM with vLLM and <strong>AI Deploy</strong>, benefiting from automatic scaling based on custom metrics to ensure high service availability &#8211; vLLM exposes <code><mark class="has-inline-color has-ast-global-color-0-color"><strong>/metrics</strong></mark></code> via its public HTTPS endpoint on AI Deploy</li>



<li>Collect, store and visualise these vLLM metrics using Prometheus and Grafana on <strong>MKS</strong></li>
</ul>



<figure class="wp-block-image aligncenter size-full"><img decoding="async" width="1200" height="630" src="https://blog.ovhcloud.com/wp-content/uploads/2026/02/1.jpg" alt="" class="wp-image-30578" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/02/1.jpg 1200w, https://blog.ovhcloud.com/wp-content/uploads/2026/02/1-300x158.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/02/1-1024x538.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/02/1-768x403.jpg 768w" sizes="(max-width: 1200px) 100vw, 1200px" /><figcaption class="wp-element-caption"><em>vLLM metrics monitoring and observability architecture overview</em></figcaption></figure>



<p>Here you will find the main components of the architecture. The solution comprises three main layers:</p>



<ol class="wp-block-list">
<li><strong>Model serving layer</strong> with AI Deploy
<ul class="wp-block-list">
<li>vLLM containers running on top of GPUs for LLM inference</li>



<li>vLLM inference server exposing Prometheus metrics</li>



<li>Automatic scaling based on custom metrics to ensure high availability</li>



<li>HTTPS endpoints with Bearer token authentication</li>
</ul>
</li>



<li><strong>Monitoring and observability infrastructure</strong> using Kubernetes
<ul class="wp-block-list">
<li>Prometheus for metrics collection and storage</li>



<li>Grafana for visualisation and dashboards</li>



<li>Persistent volume storage for long-term retention</li>
</ul>
</li>



<li><strong>Network layer</strong>
<ul class="wp-block-list">
<li>Secure HTTPS communication between components</li>



<li>OVHcloud LoadBalancer for external access</li>
</ul>
</li>
</ol>



<p>To go further, some prerequisites must be checked!</p>



<h2 class="wp-block-heading">Prerequisites</h2>



<p>Before you begin, ensure you have:</p>



<ul class="wp-block-list">
<li>An&nbsp;<strong>OVHcloud Public Cloud</strong>&nbsp;account</li>



<li>An&nbsp;<strong>OpenStack user</strong>&nbsp;with the<a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-users?id=kb_article_view&amp;sysparm_article=KB0048170" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"> </a><strong><code><mark class="has-inline-color has-ast-global-color-0-color">Administrator</mark></code></strong> role</li>



<li><strong>ovhai CLI available</strong> &#8211;&nbsp;<em>install the&nbsp;<a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-cli-install-client?id=kb_article_view&amp;sysparm_article=KB0047844" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">ovhai CLI</a></em></li>



<li>A <strong>Hugging Face access</strong> &#8211; <em>create a&nbsp;<a href="https://huggingface.co/" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">Hugging Face account</a>&nbsp;and generate an&nbsp;<a href="https://huggingface.co/settings/tokens" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">access token</a></em></li>



<li><code><strong><mark class="has-inline-color has-ast-global-color-0-color">kubectl</mark></strong></code> installed and <code><strong><mark class="has-inline-color has-ast-global-color-0-color">helm</mark></strong></code> installed (at least version 3.x)</li>
</ul>



<p><strong>🚀 Now you have all the ingredients for our recipe, it’s time to deploy the Ministral 14B using AI Deploy and vLLM Docker container!</strong></p>



<h2 class="wp-block-heading">Architecture guide: From autoscaling to observability for LLMs served by vLLM</h2>



<p>Let’s set up and deploy this architecture!</p>



<figure class="wp-block-image aligncenter size-large"><img decoding="async" width="1024" height="538" src="https://blog.ovhcloud.com/wp-content/uploads/2026/02/2-1024x538.jpg" alt="" class="wp-image-30580" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/02/2-1024x538.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/02/2-300x158.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/02/2-768x403.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/02/2.jpg 1200w" sizes="(max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption"><em>Overview of the deployment workflow</em></figcaption></figure>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p><strong>✅ <em>Note</em></strong></p>



<p><strong><em>In this example, <a href="https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">mistralai/Ministral-3-14B-Instruct-2512</a> is used. Choose the open-source model of your choice and follow the same steps, adapting the model slug (from Hugging Face), the versions and the GPU(s) flavour.</em></strong></p>
</blockquote>



<p><em>Remember that all of the following steps can be automated using OVHcloud APIs!</em></p>



<h3 class="wp-block-heading">Step 1 &#8211; Manage access tokens</h3>



<p>Before introducing the monitoring stack, this architecture starts with the <strong>deployment of the <strong>Ministral 3 14B</strong> on OVHcloud AI Deploy</strong>, configured to <strong>autoscale based on custom Prometheus metrics exposed by vLLM itself</strong>.</p>



<p>Export your&nbsp;<a href="https://huggingface.co/settings/tokens" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">Hugging Face token</a>.</p>



<pre class="wp-block-code"><code class="">export MY_HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxx</code></pre>



<p><a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-cli-app-token?id=kb_article_view&amp;sysparm_article=KB0035280" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">Create a Bearer token</a>&nbsp;to access your AI Deploy app once it&#8217;s been deployed.</p>



<pre class="wp-block-code"><code class="">ovhai token create --role operator ai_deploy_token=my_operator_token</code></pre>



<p>Returning the following output:</p>



<p><code><strong>Id: 47292486-fb98-4a5b-8451-600895597a2b<br>Created At: 20-01-26 11:53:05<br>Updated At: 20-01-26 11:53:05<br>Spec:<br>Name: ai_deploy_token=my_operator_token<br>Role: AiTrainingOperator<br>Label Selector:<br>Status:<br>Value: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX<br>Version: 1</strong></code></p>



<p>You can now store and export your access token:</p>



<pre class="wp-block-code"><code class="">export MY_OVHAI_ACCESS_TOKEN=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX</code></pre>



<h3 class="wp-block-heading">Step 2 &#8211; LLM deployment using AI Deploy</h3>



<p>Before introducing the monitoring stack, this architecture starts with the <strong>deployment of the <strong>Ministral 3 14B</strong> on OVHcloud AI Deploy</strong>, configured to <strong>autoscale based on custom Prometheus metrics exposed by vLLM itself</strong>.</p>



<h4 class="wp-block-heading">1. Define the targeted vLLM metric for autoscaling</h4>



<p>Before proceeding with the deployment of the <strong>Ministral 3 14B</strong> endpoint, you have to choose the metric you want to use as the trigger for scaling.</p>



<p>Instead of relying solely on CPU/RAM utilisation, AI Deploy allows autoscaling decisions to be driven by <strong>application-level signals</strong>.</p>



<p>To do this, you can consult the <a href="https://docs.vllm.ai/en/latest/design/metrics/#v1-metrics" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">metrics exposed by vLLM</a>.</p>



<p>In this example, you can use a basic metric such as <code><mark class="has-inline-color has-ast-global-color-0-color"><strong>vllm:num_requests_running</strong></mark></code> to scale the number of replicas based on <strong>real inference load</strong>.</p>



<p>This enables:</p>



<ul class="wp-block-list">
<li>Faster reaction to traffic spikes</li>



<li>Better GPU utilisation</li>



<li>Reduced inference latency under load</li>



<li>Cost-efficient scaling</li>
</ul>



<p>Finally, the configuration chosen for scaling this application is as follows:</p>



<figure class="wp-block-table"><table class="has-fixed-layout"><thead><tr><th>Parameter</th><th>Value</th><th>Description</th></tr></thead><tbody><tr><td>Metric source</td><td><code>/metrics</code></td><td>vLLM Prometheus endpoint</td></tr><tr><td>Metric name</td><td><code>vllm:num_requests_running</code></td><td>Number of in-flight requests</td></tr><tr><td>Aggregation</td><td><code>AVERAGE</code></td><td>Mean across replicas</td></tr><tr><td>Target value</td><td><code>50</code></td><td>Desired load per replica</td></tr><tr><td>Min replicas</td><td><code>1</code></td><td>Baseline capacity</td></tr><tr><td>Max replicas</td><td><code>3</code></td><td>Burst capacity</td></tr></tbody></table></figure>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p><strong>✅ <em>Note</em></strong></p>



<p><em><strong>You can choose the metric that best suits your use case. You can also apply a patch to your AI Deploy deployment at any time to change the target metric for scaling</strong></em>.</p>
</blockquote>



<p>When the <strong>average number of running requests exceeds 50</strong>, AI Deploy automatically provisions <strong>additional GPU-backed replicas</strong>.</p>



<h4 class="wp-block-heading">2. Deploy Ministral 3 14B using AI Deploy</h4>



<p>Now you can deploy the LLM using the <strong><code>ovhai</code> CLI</strong>.</p>



<p>Key elements necessary for proper functioning:</p>



<ul class="wp-block-list">
<li>GPU-based inference: <strong><code><mark class="has-inline-color has-ast-global-color-0-color">1 x H100</mark></code></strong></li>



<li>vLLM OpenAI-compatible Docker image: <a href="https://hub.docker.com/r/vllm/vllm-openai/tags" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong><code><mark class="has-inline-color has-ast-global-color-0-color">vllm/vllm-openai:v0.13.0</mark></code></strong></a></li>



<li>Custom autoscaling rules based on Prometheus metrics: <code><strong><mark class="has-inline-color has-ast-global-color-0-color">vllm:num_requests_running</mark></strong></code></li>
</ul>



<p>Below is the reference command used to deploy the <strong><a href="https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">mistralai/Ministral-3-14B-Instruct-2512</a></strong>:</p>



<pre class="wp-block-code"><code class="">ovhai app run \<br>  --name vllm-ministral-14B-autoscaling-custom-metric \<br>  --default-http-port 8000 \<br>  --label ai_deploy_token=my_operator_token \<br>  --gpu 1 \<br>  --flavor h100-1-gpu \<br>  -e OUTLINES_CACHE_DIR=/tmp/.outlines \<br>  -e HF_TOKEN=$MY_HF_TOKEN \<br>  -e HF_HOME=/hub \<br>  -e HF_DATASETS_TRUST_REMOTE_CODE=1 \<br>  -e HF_HUB_ENABLE_HF_TRANSFER=0 \<br>  -v standalone:/hub:rw \<br>  -v standalone:/workspace:rw \<br>  --liveness-probe-path /health \<br>  --liveness-probe-port 8000 \<br>  --liveness-initial-delay-seconds 300 \<br>  --probe-path /v1/models \<br>  --probe-port 8000 \<br>  --initial-delay-seconds 300 \<br>  --auto-min-replicas 1 \<br>  --auto-max-replicas 3 \<br>  --auto-custom-api-url "http://&lt;SELF&gt;:8000/metrics" \<br>  --auto-custom-metric-format PROMETHEUS \<br>  --auto-custom-value-location vllm:num_requests_running \<br>  --auto-custom-target-value 50 \<br>  --auto-custom-metric-aggregation-type AVERAGE \<br>  vllm/vllm-openai:v0.13.0 \<br>  -- bash -c "python3 -m vllm.entrypoints.openai.api_server \<br>    --model mistralai/Ministral-3-14B-Instruct-2512 \<br>    --tokenizer_mode mistral \<br>    --load_format mistral \<br>    --config_format mistral \<br>    --enable-auto-tool-choice \<br>    --tool-call-parser mistral \<br>    --enable-prefix-caching"</code></pre>



<p>How to understand the different parameters of this command?</p>



<h5 class="wp-block-heading"><strong>a. Start your AI Deploy app</strong></h5>



<p>Launch a new app using&nbsp;<a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-cli-install-client?id=kb_article_view&amp;sysparm_article=KB0047844" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">ovhai CLI</a>&nbsp;and name it.</p>



<p><code><strong>ovhai app run --name vllm-ministral-14B-autoscaling-custom-metric</strong></code></p>



<h5 class="wp-block-heading"><strong>b. Define access</strong></h5>



<p>Define the HTTP API port and restrict access to your token.</p>



<p><strong><code>--default-http-port 8000</code><br><code>--label ai_deploy_token=my_operator_token</code></strong></p>



<h5 class="wp-block-heading"><strong>c. Configure GPU resources</strong></h5>



<p>Specify the hardware type (<code><strong>h100-1-gpu</strong></code>), which refers to an&nbsp;<strong>NVIDIA H100 GPU</strong>&nbsp;and the number (<strong>1</strong>).</p>



<p><code><strong>--gpu 1<br>--flavor h100-1-gpu</strong></code></p>



<p><strong><mark>⚠️WARNING!</mark></strong>&nbsp;For this model, one H100 is sufficient, but if you want to deploy another model, you will need to check which GPU you need. Note that you can also access L40S and A100 GPUs for your LLM deployment.</p>



<h5 class="wp-block-heading"><strong>d. Set up environment variables</strong></h5>



<p>Configure caching for the&nbsp;<strong>Outlines library</strong>&nbsp;(used for efficient text generation):</p>



<p><code><strong>-e OUTLINES_CACHE_DIR=/tmp/.outlines</strong></code></p>



<p>Pass the&nbsp;<strong>Hugging Face token</strong>&nbsp;(<code>$MY_HF_TOKEN</code>) for model authentication and download:</p>



<p><code><strong>-e HF_TOKEN=$MY_HF_TOKEN</strong></code></p>



<p>Set the&nbsp;<strong>Hugging Face cache directory</strong>&nbsp;to&nbsp;<code>/hub</code>&nbsp;(where models will be stored):</p>



<p><code><strong>-e HF_HOME=/hub</strong></code></p>



<p>Allow execution of&nbsp;<strong>custom remote code</strong>&nbsp;from Hugging Face datasets (required for some model behaviours):</p>



<p><code><strong>-e HF_DATASETS_TRUST_REMOTE_CODE=1</strong></code></p>



<p>Disable&nbsp;<strong>Hugging Face Hub transfer acceleration</strong>&nbsp;(to use standard model downloading):</p>



<p><code><strong>-e HF_HUB_ENABLE_HF_TRANSFER=0</strong></code></p>



<h5 class="wp-block-heading"><strong>e. Mount persistent volumes</strong></h5>



<p>Mount&nbsp;<strong>two persistent storage volumes</strong>:</p>



<ol class="wp-block-list">
<li><code>/hub</code>&nbsp;→ Stores Hugging Face model files</li>



<li><code>/workspace</code>&nbsp;→ Main working directory</li>
</ol>



<p>The&nbsp;<code>rw</code>&nbsp;flag means&nbsp;<strong>read-write access</strong>.</p>



<p><code><strong>-v standalone:/hub:rw<br>-v standalone:/workspace:rw</strong></code></p>



<h5 class="wp-block-heading"><strong>f. Health checks and readiness</strong></h5>



<p>Configure <strong>liveness and readiness probes</strong>:</p>



<ol class="wp-block-list">
<li><code>/health</code> verifies the container is alive</li>



<li><code>/v1/models</code> confirms the model is loaded and ready to serve requests</li>
</ol>



<p>The long initial delays (300 seconds) can be reduced; they correspond to the startup time of vLLM and the loading of the model on the GPU.</p>



<p><code><strong>--liveness-probe-path /health<br>--liveness-probe-port 8000<br>--liveness-initial-delay-seconds 300<br><br>--probe-path /v1/models<br>--probe-port 8000<br>--initial-delay-seconds 300</strong></code></p>



<h5 class="wp-block-heading"><strong>g. Autoscaling configuration (custom metrics)</strong></h5>



<p>First set the minimum and maximum number of replicas.</p>



<p><strong><code>--auto-min-replicas 1<br>--auto-max-replicas 3</code></strong></p>



<p>This guarantees basic availability (one replica always up) while allowing for peak capacity.</p>



<p>Then enable autoscaling based on application-level metrics exposed by vLLM.</p>



<p><strong><code>--auto-custom-api-url "http://&lt;SELF&gt;:8000/metrics"<br>--auto-custom-metric-format PROMETHEUS<br>--auto-custom-value-location vllm:num_requests_running<br>--auto-custom-target-value 50<br>--auto-custom-metric-aggregation-type AVERAGE</code></strong></p>



<p>AI Deploy:</p>



<ul class="wp-block-list">
<li>Scrapes the local <mark class="has-inline-color has-ast-global-color-0-color"><strong><code>/metrics</code></strong></mark> endpoint</li>



<li>Parses Prometheus-formatted metrics</li>



<li>Extracts the <strong><mark class="has-inline-color has-ast-global-color-0-color"><code>vllm:num_requests_running</code></mark></strong> gauge</li>



<li>Computes the average value across replicas</li>
</ul>



<p>Scaling behaviour:</p>



<ul class="wp-block-list">
<li>When the average number of in-flight requests exceeds <strong><code><mark class="has-inline-color has-ast-global-color-0-color">50</mark></code></strong>, AI Deploy adds replicas</li>



<li>When load decreases, replicas are scaled down</li>
</ul>



<p>This approach ensures high availability and predictable latency under fluctuating traffic.</p>



<h5 class="wp-block-heading"><strong>h. Choose the target Docker image and the startup command</strong></h5>



<p>Use the official <strong><a href="https://hub.docker.com/r/vllm/vllm-openai/tags" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">vLLM OpenAI-compatible Docker image</a></strong>.</p>



<p><strong><code>vllm/vllm-openai:v0.13.0</code></strong></p>



<p>Finally, run the model inside the container using a Python command to launch the vLLM API server:</p>



<ul class="wp-block-list">
<li><strong><code>python3 -m vllm.entrypoints.openai.api_server</code></strong>&nbsp;→ Starts the OpenAI-compatible vLLM API server</li>



<li><strong><code>--model mistralai/Ministral-3-14B-Instruct-2512</code></strong>&nbsp;→ Loads the&nbsp;<strong>Ministral 3 14B</strong>&nbsp;model from Hugging Face</li>



<li><strong><code>--tokenizer_mode mistral</code></strong>&nbsp;→ Uses the&nbsp;<strong>Mistral tokenizer</strong></li>



<li><strong><code>--load_format mistral</code></strong>&nbsp;→ Uses Mistral’s model loading format</li>



<li><strong><code>--config_format mistral</code></strong>&nbsp;→ Ensures the model configuration follows Mistral’s standard</li>



<li><code><strong>--enable-auto-tool-choice </strong></code>→ Automatic call of tools if necessary (function/tool call)</li>



<li><strong><code>--tool-call-parser mistral </code></strong>→ Tool calling support</li>



<li><strong><code>--enable-prefix-caching</code></strong> → Prefix caching for improved throughput and reduced latency</li>
</ul>



<p>You can now launch this command using <strong>ovhai CLI</strong>.</p>



<h4 class="wp-block-heading">3. Check AI Deploy app status</h4>



<p>You can now check if your&nbsp;<strong>AI Deploy</strong>&nbsp;app is alive:</p>



<pre class="wp-block-code"><code class="">ovhai app get &lt;your_vllm_app_id&gt;</code></pre>



<p><strong>Is your app in&nbsp;<code>RUNNING</code>&nbsp;status?</strong>&nbsp;Perfect! You can check in the logs that the server is started:</p>



<pre class="wp-block-code"><code class="">ovhai app logs &lt;your_vllm_app_id&gt;</code></pre>



<p><strong><mark>⚠️WARNING!</mark></strong>&nbsp;This step may take a little time as the LLM must be loaded.</p>



<h4 class="wp-block-heading">4. Test that the deployment is functional</h4>



<p>First you can request and send a prompt to the LLM. Launch the following query by asking the question of your choice:</p>



<pre class="wp-block-code"><code class="">curl https://&lt;your_vllm_app_id&gt;.app.gra.ai.cloud.ovh.net/v1/chat/completions \<br>  -H "Authorization: Bearer $MY_OVHAI_ACCESS_TOKEN" \<br>  -H "Content-Type: application/json" \<br>  -d '{<br>    "model": "mistralai/Ministral-3-14B-Instruct-2512",<br>    "messages": [<br>      {"role": "system", "content": "You are a helpful assistant."},<br>      {"role": "user", "content": "Give me the name of OVHcloud’s founder."}<br>    ],<br>    "stream": false<br>  }'</code></pre>



<p>You can also verify access to vLLM metrics.</p>



<pre class="wp-block-code"><code class="">curl -H "Authorization: Bearer $MY_OVHAI_ACCESS_TOKEN" \<br>  https://&lt;your_vllm_app_id&gt;.app.gra.ai.cloud.ovh.net/metrics</code></pre>



<p>If both tests show that the model deployment is functional and you receive 200 HTTP responses, you are ready to move on to the next step!</p>



<p>The next step is to set up the observability and monitoring stack. This autoscaling mechanism is <strong>fully independent</strong> from Prometheus used for observability:</p>



<ul class="wp-block-list">
<li>AI Deploy queries the local <strong><mark class="has-inline-color has-ast-global-color-0-color"><code>/metrics</code></mark></strong> endpoint internally</li>



<li>Prometheus scrapes the <strong>same metrics endpoint</strong> externally for monitoring, dashboards and potentially alerting</li>
</ul>



<p>This ensures:</p>



<ul class="wp-block-list">
<li>A single source of truth for metrics</li>



<li>No duplication of exporters</li>



<li>Consistent signals for scaling and observability</li>
</ul>



<h3 class="wp-block-heading">Step 3 &#8211; Create an MKS cluster</h3>



<p>From <a href="https://manager.eu.ovhcloud.com/#/hub/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OVHcloud Control Panel</a>, create a Kubernetes cluster using the <strong>MKS</strong>.</p>



<p>Consider using the following configuration for the current use case:</p>



<ul class="wp-block-list">
<li><strong>Location</strong>: GRA ( Gravelines) &#8211; <em>you can select the same region as for AI Deploy</em></li>



<li><strong>Network</strong>: Public</li>



<li><strong>Node pool</strong> :
<ul class="wp-block-list">
<li>Flavour : <code><strong><mark class="has-inline-color has-ast-global-color-0-color">b2-15</mark></strong></code> (or something similar)</li>



<li>Number of nodes: <strong><code><mark class="has-inline-color has-ast-global-color-0-color">3</mark></code></strong></li>



<li>Autoscaling : <strong><code><mark class="has-inline-color has-ast-global-color-0-color">OFF</mark></code></strong></li>
</ul>
</li>



<li><strong>Name your node pool:</strong> <strong><mark class="has-inline-color has-ast-global-color-0-color"><code>monitoring</code></mark></strong></li>
</ul>



<p>You should see your cluster (e.g. <code><mark class="has-inline-color has-ast-global-color-0-color"><strong>prometheus-vllm-metrics-ai-deploy</strong></mark></code>) in the list, along with the following information:</p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="632" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-3-1024x632.png" alt="" class="wp-image-30242" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-3-1024x632.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-3-300x185.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-3-768x474.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-3-1536x948.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-3-2048x1264.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>If the status is green with the <strong><mark style="color:#00d084" class="has-inline-color"><code>OK</code></mark></strong> label, you can proceed to the next step.</p>



<h3 class="wp-block-heading">Step 4 &#8211; Configure Kubernetes access</h3>



<p>Download your <strong>kubeconfig file</strong> from the OVHcloud Control Panel and configure <strong><code><mark class="has-inline-color has-ast-global-color-0-color">kubectl</mark></code></strong>:</p>



<pre class="wp-block-code"><code class=""># configure kubectl with your MKS cluster<br>export KUBECONFIG=/path/to/your/kubeconfig-xxxxxx.yml<br><br># verify cluster connectivity<br>kubectl cluster-info<br>kubectl get nodes</code></pre>



<p>Now,- you can create the <strong><mark class="has-inline-color has-ast-global-color-0-color"><code>values-prometheus.yaml</code></mark></strong> file:</p>



<pre class="wp-block-code"><code class=""># general configuration<br>nameOverride: "monitoring"<br>fullnameOverride: "monitoring"<br><br># Prometheus configuration<br>prometheus:<br>  prometheusSpec:<br>    # data retention (15d)<br>    retention: 15d<br>    <br>    # scrape interval (15s)<br>    scrapeInterval: 15s<br>    <br>    # persistent storage (required for production deployment)<br>    storageSpec:<br>      volumeClaimTemplate:<br>        spec:<br>          storageClassName: csi-cinder-high-speed  # OVHcloud storage<br>          accessModes: ["ReadWriteOnce"]<br>          resources:<br>            requests:<br>              storage: 50Gi  # (can be modified according to your needs)<br>    <br>    # scrape vLLM metrics from your AI Deploy instance (Ministral 3 14B)<br>    additionalScrapeConfigs:<br>      - job_name: 'vllm-ministral'<br>        scheme: https<br>        metrics_path: '/metrics'<br>        scrape_interval: 15s<br>        scrape_timeout: 10s<br>        <br>        # authentication using AI Deploy Bearer token stored Kubernetes Secret<br>        bearer_token_file: /etc/prometheus/secrets/vllm-auth-token/token<br>        static_configs:<br>          - targets:<br>              - '&lt;APP_ID&gt;.app.gra.ai.cloud.ovh.net'  # /!\ REPLACE THE &lt;APP_ID&gt; by yours /!\<br>            labels:<br>              service: 'vllm'<br>              model: 'ministral'<br>              environment: 'production'<br>        <br>        # TLS configuration<br>        tls_config:<br>          insecure_skip_verify: false<br>    <br>    # kube-prometheus-stack mounts the secret under /etc/prometheus/secrets/ and makes it accessible to Prometheus<br>    secrets:<br>      - vllm-auth-token<br><br># Grafana configuration (visualization layer)<br>grafana:<br>  enabled: true<br>  <br>  # disable automatic datasource provisioning<br>  sidecar:<br>    datasources:<br>      enabled: false<br>  <br>  # persistent dashboards<br>  persistence:<br>    enabled: true<br>    storageClassName: csi-cinder-high-speed<br>    size: 10Gi<br>  <br>  # /!\ DEFINE ADMIN PASSWORD - REPLACE "test" BY YOURS /!\<br>  adminPassword: "test"<br>  <br>  # access via OVHcloud LoadBalancer (public IP and managed LB)<br>  service:<br>    type: LoadBalancer<br>    port: 80<br>    annotations:<br>      # optional : limiter l'accès à certaines IPs<br>      # service.beta.kubernetes.io/ovh-loadbalancer-allowed-sources: "1.2.3.4/32"<br>  <br># alertmanager (optional but recommended for production)<br>alertmanager:<br>  enabled: true<br>  <br>  alertmanagerSpec:<br>    storage:<br>      volumeClaimTemplate:<br>        spec:<br>          storageClassName: csi-cinder-high-speed<br>          accessModes: ["ReadWriteOnce"]<br>          resources:<br>            requests:<br>              storage: 10Gi<br><br># cluster observability components<br>nodeExporter:<br>  enabled: true<br>  <br>kubeStateMetrics:<br>  enabled: true</code></pre>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p><strong>✅ <em>Note</em></strong></p>



<p><strong><em>On OVHcloud MKS, persistent storage is handled automatically through the Cinder CSI driver. When a PersistentVolumeClaim (PVC) references a supported <code>storageClassName</code> such as <code>csi-cinder-high-speed</code>, OVHcloud dynamically provisions the underlying Block Storage volume and attaches it to the node running the pod. This enables stateful components like Prometheus, Alertmanager and Grafana to persist data reliably without any manual volume management, making the architecture fully cloud-native and operationally simple.</em></strong></p>
</blockquote>



<p>Then create the <strong><code><mark class="has-inline-color has-ast-global-color-0-color">monitoring</mark></code></strong> namespace:</p>



<pre class="wp-block-code"><code class=""># create namespace<br>kubectl create namespace monitoring<br><br># verify creation<br>kubectl get namespaces | grep monitoring</code></pre>



<p>Finally,  configure the Bearer token secret to access vLLM metrics.</p>



<pre class="wp-block-code"><code class=""># create bearer token secret<br>kubectl create secret generic vllm-auth-token \<br>  --from-literal=token='"$MY_OVHAI_ACCESS_TOKEN"' \<br>  -n monitoring<br><br># verify secret creation<br>kubectl get secret vllm-auth-token -n monitoring<br><br># test token (optional)<br>kubectl get secret vllm-auth-token -n monitoring \<br>  -o jsonpath='{.data.token}' | base64 -d </code></pre>



<p>Right, if everything is working, let&#8217;s move on to deployment.</p>



<h3 class="wp-block-heading">Step 5 &#8211; Deploy Prometheus stack</h3>



<p>Add the Prometheus Helm repository and install the monitoring stack. The deployment creates:</p>



<ul class="wp-block-list">
<li>Prometheus StatefulSet with persistent storage</li>



<li>Grafana deployment with LoadBalancer access</li>



<li>Alertmanager for future alert configuration (optional)</li>



<li>Supporting components (node exporters, kube-state-metrics)</li>
</ul>



<pre class="wp-block-code"><code class=""># add Helm repository<br>helm repo add prometheus-community \<br>  https://prometheus-community.github.io/helm-charts<br>helm repo update<br><br># install monitoring stack<br>helm install monitoring prometheus-community/kube-prometheus-stack \<br>  --namespace monitoring \<br>  --values values-prometheus.yaml \<br>  --wait</code></pre>



<p>Then you can retrieve the LoadBalancer IP address to access Grafana:</p>



<pre class="wp-block-code"><code class="">kubectl get svc -n monitoring monitoring-grafana</code></pre>



<p>Finally, open your browser to <code><strong><mark class="has-inline-color has-ast-global-color-0-color">http://&lt;EXTERNAL-IP&gt;</mark></strong></code> and login with:</p>



<ul class="wp-block-list">
<li><strong>Username</strong>: <code><mark class="has-inline-color has-ast-global-color-0-color"><strong>admin</strong></mark></code></li>



<li><strong>Password</strong>: as configured in your <code><strong><mark class="has-inline-color has-ast-global-color-0-color">values-prometheus.yaml</mark></strong></code> file</li>
</ul>



<h3 class="wp-block-heading">Step 6 &#8211; Create Grafana dashboards</h3>



<p>In this step, you will be able to access Grafana interface and add your Prometheus as a new data source, then create a complete dashboard with different vLLM metrics.</p>



<h4 class="wp-block-heading">1. Add a new data source in Grafana</h4>



<p>First of all, create a new Prometheus connection inside Grafana:</p>



<ul class="wp-block-list">
<li>Navigate to <strong><mark class="has-inline-color has-ast-global-color-0-color"><code>Connections</code></mark></strong> → <strong><mark class="has-inline-color has-ast-global-color-0-color"><code>Data sources</code></mark></strong> → <strong><code><mark class="has-inline-color has-ast-global-color-0-color">Add data source</mark></code></strong></li>



<li>Select <strong>Prometheus</strong></li>



<li>Configure URL: <code><strong><mark class="has-inline-color has-ast-global-color-0-color">http://monitoring-prometheus:9090</mark></strong></code></li>



<li>Click <strong>Save &amp; test</strong></li>
</ul>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="609" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-4-1024x609.png" alt="" class="wp-image-30247" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-4-1024x609.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-4-300x178.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-4-768x457.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-4-1536x913.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-4-2048x1218.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Now that your Prometheus has been configured as a new data source, you can create your Grafana dashboard.</p>



<h4 class="wp-block-heading">2. Create your monitoring dashboard</h4>



<p>To begin with, you can use the following pre-configured Grafana dashboard by downloading this JSON file locally:</p>





<p>In the left-hand menu, select <strong><code><mark class="has-inline-color has-ast-global-color-0-color">Dashboard</mark></code></strong>:</p>



<ol class="wp-block-list">
<li>Navigate to <strong><code><mark class="has-inline-color has-ast-global-color-0-color">Dashboards</mark></code></strong> → <strong><code><mark class="has-inline-color has-ast-global-color-0-color">Import</mark></code></strong></li>



<li>Upload the provided dashboard JSON</li>



<li>Select <strong>Prometheus</strong> as datasource</li>



<li>Click <strong>Import</strong> and select the <strong><code><mark class="has-inline-color has-ast-global-color-0-color">vLLM-metrics-grafana-monitoring.json</mark></code></strong> file</li>
</ol>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="449" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-6-1024x449.png" alt="" class="wp-image-30250" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-6-1024x449.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-6-300x131.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-6-768x337.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-6-1536x673.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-6-2048x897.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>The dashboard provides real-time visibility for <strong>Ministral 3 14B</strong> deployed with vLLM container and OVHcloud AI Deploy.</p>



<p>You can now track:</p>



<ul class="wp-block-list">
<li><strong>Performance metrics</strong>: TTFT, inter-token latency, end-to-end latency</li>



<li><strong>Throughput indicators</strong>: Requests per second, token generation rates</li>



<li><strong>Resource utilisation</strong>: KV cache usage, active/waiting requests</li>



<li><strong>Capacity indicators</strong>: Queue depth, preemption rates</li>
</ul>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="540" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-7-1024x540.png" alt="" class="wp-image-30253" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-7-1024x540.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-7-300x158.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-7-768x405.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-7-1536x811.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-7-2048x1081.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Here are the key metrics tracked and displayed in the Grafana dashboard:</p>



<figure class="wp-block-table"><table class="has-fixed-layout"><thead><tr><th>Metric Category</th><th>Prometheus Metric</th><th>Description</th><th>Use case</th></tr></thead><tbody><tr><td><strong>Latency</strong></td><td><code>vllm:time_to_first_token_seconds</code></td><td>Time until first token generation</td><td>User experience monitoring</td></tr><tr><td><strong>Latency</strong></td><td><code>vllm:inter_token_latency_seconds</code></td><td>Time between tokens</td><td>Throughput optimisation</td></tr><tr><td><strong>Latency</strong></td><td><code>vllm:e2e_request_latency_seconds</code></td><td>End-to-end request time</td><td>SLA monitoring</td></tr><tr><td><strong>Throughput</strong></td><td><code>vllm:request_success_total</code></td><td>Successful requests counter</td><td>Capacity planning</td></tr><tr><td><strong>Resource</strong></td><td><code>vllm:kv_cache_usage_perc</code></td><td>KV cache memory usage</td><td>Memory management</td></tr><tr><td><strong>Queue</strong></td><td><code>vllm:num_requests_running</code></td><td>Active requests</td><td>Load monitoring</td></tr><tr><td><strong>Queue</strong></td><td><code>vllm:num_requests_waiting</code></td><td>Queued requests</td><td>Overload detection</td></tr><tr><td><strong>Capacity</strong></td><td><code>vllm:num_preemptions_total</code></td><td>Request preemptions</td><td>Peak load indicator</td></tr><tr><td><strong>Tokens</strong></td><td><code>vllm:prompt_tokens_total</code></td><td>Input tokens processed</td><td>Usage analytics</td></tr><tr><td><strong>Tokens</strong></td><td><code>vllm:generation_tokens_total</code></td><td>Output tokens generated</td><td>Cost tracking</td></tr></tbody></table></figure>



<p>Well done, you now have at your disposal:</p>



<ul class="wp-block-list">
<li>An endpoint of the Ministral 3 14B model deployed with vLLM thanks to <strong>OVHcloud AI Deploy</strong> and its autoscaling strategies based on custom metrics</li>



<li>Prometheus for metrics collection and Grafana for visualisation/dashboards thanks to <strong>OVHcloud MKS</strong></li>
</ul>



<p><strong>But how can you check that everything will work when the load increases?</strong></p>



<h3 class="wp-block-heading">Step 7 &#8211; Test autoscaling and real-time visualisation</h3>



<p>The first objective here is to force AI Deploy to:</p>



<ul class="wp-block-list">
<li>Increase <code>vllm:num_requests_running</code></li>



<li>&#8216;Saturate&#8217; a single replica</li>



<li>Trigger the <strong>scale up</strong></li>



<li>Observe replica increase + latency drop</li>
</ul>



<h4 class="wp-block-heading">1. Autoscaling testing strategy</h4>



<p>The goal is to combine:</p>



<ul class="wp-block-list">
<li><strong>High concurrency</strong></li>



<li><strong>Long prompts</strong> (KVcache heavy)</li>



<li><strong>Long generations</strong></li>



<li><strong>Bursty load</strong></li>
</ul>



<p>This is what vLLM autoscaling actually reacts to.</p>



<p>To do so, a Python code can simulate the expected behaviour:</p>



<pre class="wp-block-code"><code class="">import time<br>import threading<br>import random<br>from statistics import mean<br>from openai import OpenAI<br>from tqdm import tqdm<br><br>APP_URL = "https://&lt;APP_ID&gt;.app.gra.ai.cloud.ovh.net/v1" # /!\ REPLACE THE &lt;APP_ID&gt; by yours /!\<br>MODEL = "mistralai/Ministral-3-14B-Instruct-2512"<br>API_KEY = $MY_OVHAI_ACCESS_TOKEN<br><br>CONCURRENT_WORKERS = 500          # concurrency (main scaling trigger)<br>REQUESTS_PER_WORKER = 25<br>MAX_TOKENS = 768                  # generation pressure<br><br># some random prompts<br>SHORT_PROMPTS = [<br>    "Summarize the theory of relativity.",<br>    "Explain what a transformer model is.",<br>    "What is Kubernetes autoscaling?"<br>]<br><br>MEDIUM_PROMPTS = [<br>    "Explain how attention mechanisms work in transformer-based models, including self-attention and multi-head attention.",<br>    "Describe how vLLM manages KV cache and why it impacts inference performance."<br>]<br><br>LONG_PROMPTS = [<br>    "Write a very detailed technical explanation of how large language models perform inference, "<br>    "including tokenization, embedding lookup, transformer layers, attention computation, KV cache usage, "<br>    "GPU memory management, and how batching affects latency and throughput. Use examples.",<br>]<br><br>PROMPT_POOL = (<br>    SHORT_PROMPTS * 2 +<br>    MEDIUM_PROMPTS * 4 +<br>    LONG_PROMPTS * 6    # bias toward long prompts<br>)<br><br># openai compliance<br>client = OpenAI(<br>    base_url=APP_URL,<br>    api_key=API_KEY,<br>)<br><br># basic metrics<br>latencies = []<br>errors = 0<br>lock = threading.Lock()<br><br># worker<br>def worker(worker_id):<br>    global errors<br>    for _ in range(REQUESTS_PER_WORKER):<br>        prompt = random.choice(PROMPT_POOL)<br><br>        start = time.time()<br>        try:<br>            client.chat.completions.create(<br>                model=MODEL,<br>                messages=[{"role": "user", "content": prompt}],<br>                max_tokens=MAX_TOKENS,<br>                temperature=0.7,<br>            )<br>            elapsed = time.time() - start<br><br>            with lock:<br>                latencies.append(elapsed)<br><br>        except Exception as e:<br>            with lock:<br>                errors += 1<br><br># run<br>threads = []<br>start_time = time.time()<br><br>print("Starting autoscaling stress test...")<br>print(f"Concurrency: {CONCURRENT_WORKERS}")<br>print(f"Total requests: {CONCURRENT_WORKERS * REQUESTS_PER_WORKER}")<br><br>for i in range(CONCURRENT_WORKERS):<br>    t = threading.Thread(target=worker, args=(i,))<br>    t.start()<br>    threads.append(t)<br><br>for t in threads:<br>    t.join()<br><br>total_time = time.time() - start_time<br><br># results<br>print("\n=== AUTOSCALING BENCH RESULTS ===")<br>print(f"Total requests sent: {len(latencies) + errors}")<br>print(f"Successful requests: {len(latencies)}")<br>print(f"Errors: {errors}")<br>print(f"Total wall time: {total_time:.2f}s")<br><br>if latencies:<br>    print(f"Avg latency: {mean(latencies):.2f}s")<br>    print(f"Min latency: {min(latencies):.2f}s")<br>    print(f"Max latency: {max(latencies):.2f}s")<br>    print(f"Throughput: {len(latencies)/total_time:.2f} req/s")</code></pre>



<p><strong>How can you verify that autoscaling is working and that the load is being handled correctly without latency skyrocketing?</strong></p>



<h4 class="wp-block-heading">2. Hardware and platform-level monitoring</h4>



<p>First, <strong>AI Deploy Grafana</strong> answers <strong>&#8216;What resources are being used and how many replicas exist?</strong>&#8216;.</p>



<p>GPU utilisation, GPU memory, CPU, RAM and replica count are monitored through <strong>OVHcloud AI Deploy Grafana</strong> (monitoring URL), which exposes infrastructure and runtime metrics for the AI Deploy application. This layer provides visibility into <strong>resource saturation and scaling events</strong> managed by the AI Deploy platform itself.</p>



<p>Access it using the following URL (do not forget to replace <code><mark class="has-inline-color has-ast-global-color-0-color"><strong>&lt;APP_ID&gt;</strong></mark></code> by yours): <strong><code>https://monitoring.gra.ai.cloud.ovh.net/d/app/app-monitoring?var-app=</code><mark class="has-inline-color has-ast-global-color-0-color"><code>&lt;APP_ID&gt;</code></mark><code>&amp;orgId=1</code></strong></p>



<p>For example, check GPU/RAM metrics:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="540" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-8-1024x540.png" alt="" class="wp-image-30260" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-8-1024x540.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-8-300x158.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-8-768x405.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-8-1536x811.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-8-2048x1081.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>You can also monitor scale ups and downs in real time, as well as information on HTTP calls and much more!</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="540" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-9-1024x540.png" alt="" class="wp-image-30261" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-9-1024x540.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-9-300x158.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-9-768x405.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-9-1536x811.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-9-2048x1081.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<h4 class="wp-block-heading">3. Software and application-level monitoring</h4>



<p>Next the combination of MKS + Prometheus + Grafana answers <strong>&#8216;How the inference engine behaves internally&#8217;</strong>.</p>



<p>In fact, vLLM internal metrics (request concurrency, token throughput, latency indicators, KV cache pressure, etc.) are collected via the <strong>vLLM <code>/metrics</code> endpoint</strong> and scraped by <strong>Prometheus running on OVHcloud MKS</strong>, then visualised in a <strong>dedicated Grafana instance</strong>. This layer focuses on <strong>model behaviour and inference performance</strong>.</p>



<p>Find all these metrics via (just replace <strong><code><mark class="has-inline-color has-ast-global-color-0-color">&lt;EXTERNAL-IP&gt;</mark></code></strong>): <strong><code>http://<mark class="has-inline-color has-ast-global-color-0-color">&lt;EXTERNAL-IP&gt;</mark>/d/vllm-ministral-monitoring/ministral-14b-vllm-metrics-monitoring?orgId=1</code></strong></p>



<p>Find key metrics such as TTF, etc:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="540" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-10-1024x540.png" alt="" class="wp-image-30263" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-10-1024x540.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-10-300x158.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-10-768x405.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-10-1536x811.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-10-2048x1081.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>You can also find some information about <strong>&#8216;Model load and throughput&#8217;</strong>:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="540" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-11-1024x540.png" alt="" class="wp-image-30264" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-11-1024x540.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-11-300x158.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-11-768x405.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-11-1536x811.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/image-11-2048x1081.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>To go further and add even more metrics, you can refer to the vLLM documentation on &#8216;<a href="https://docs.vllm.ai/en/v0.7.2/getting_started/examples/prometheus_grafana.html" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Prometheus and Grafana</a>&#8216;.</p>



<h2 class="wp-block-heading">Conclusion</h2>



<p>This reference architecture provides a scalable, and production-ready approach for deploying LLM inference on OVHcloud using <strong>AI Deploy</strong> and the <a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-deploy-apps-deployments?id=kb_article_view&amp;sysparm_article=KB0047997#advanced-custom-metrics-for-autoscaling" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">autoscaling on custom metric feature</a>.</p>



<p>OVHcloud <strong>MKS</strong> is dedicated to running Prometheus and Grafana, enabling secure scraping and visualisation of <strong>vLLM internal metrics</strong> exposed via the <strong><mark class="has-inline-color has-ast-global-color-0-color"><code>/metrics</code> </mark></strong>endpoint.</p>



<p>By scraping vLLM metrics securely from AI Deploy into Prometheus and exposing them through Grafana, the architecture provides full visibility into model behaviour, performance and load, enabling informed scaling analysis, troubleshooting and capacity planning in production environments.</p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Freference-architecture-custom-metric-autoscaling-for-llm-inference-with-vllm-on-ovhcloud-ai-deploy-and-observability-using-mks%2F&amp;action_name=Reference%20Architecture%3A%20Custom%20metric%20autoscaling%20for%20LLM%20inference%20with%20vLLM%20on%20OVHcloud%20AI%20Deploy%20and%20observability%20using%20MKS&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Reference Architecture: build a sovereign n8n RAG workflow for AI agent using OVHcloud Public Cloud solutions</title>
		<link>https://blog.ovhcloud.com/reference-architecture-build-a-sovereign-n8n-rag-workflow-for-ai-agent-using-ovhcloud-public-cloud-solutions/</link>
		
		<dc:creator><![CDATA[Eléa Petton]]></dc:creator>
		<pubDate>Tue, 27 Jan 2026 13:12:03 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[AI Deploy]]></category>
		<category><![CDATA[AI Endpoints]]></category>
		<category><![CDATA[LLM]]></category>
		<category><![CDATA[Managed Database]]></category>
		<category><![CDATA[n8n]]></category>
		<category><![CDATA[Object Storage]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[OVHcloud]]></category>
		<category><![CDATA[Public Cloud]]></category>
		<category><![CDATA[RAG]]></category>
		<category><![CDATA[S3]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=29694</guid>

					<description><![CDATA[What if an n8n workflow, deployed in a&#160;sovereign environment, saved you time while giving you peace of mind? From document ingestion to targeted response generation, n8n acts as the conductor of your RAG pipeline without compromising data protection. In the current landscape of AI agents and knowledge assistants, connecting your internal documentation with&#160;Large Language Models&#160;(LLMs) [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Freference-architecture-build-a-sovereign-n8n-rag-workflow-for-ai-agent-using-ovhcloud-public-cloud-solutions%2F&amp;action_name=Reference%20Architecture%3A%20build%20a%20sovereign%20n8n%20RAG%20workflow%20for%20AI%20agent%20using%20OVHcloud%20Public%20Cloud%20solutions&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p><em><em>What if an n8n workflow, deployed in a&nbsp;</em><strong><em>sovereign environment</em></strong><em>, saved you time while giving you peace of mind? From document ingestion to targeted response generation, n8n acts as the conductor of your RAG pipeline without compromising data protection.</em></em></p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="576" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/ref-archi-n8n-rag-1024x576.jpg" alt="" class="wp-image-30002" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/ref-archi-n8n-rag-1024x576.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/ref-archi-n8n-rag-300x169.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/ref-archi-n8n-rag-768x432.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/ref-archi-n8n-rag-1536x864.jpg 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/ref-archi-n8n-rag.jpg 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption"><em>n8n workflow overview</em></figcaption></figure>



<p>In the current landscape of AI agents and knowledge assistants, connecting your internal documentation with&nbsp;<strong>Large Language Models</strong>&nbsp;(LLMs) is becoming a strategic differentiator.</p>



<p><strong>How?</strong>&nbsp;By building&nbsp;<strong>Agentic RAG systems</strong>&nbsp;capable of retrieving, reasoning, and acting autonomously based on external knowledge.</p>



<p>To make this possible, engineers need a way to connect&nbsp;<strong>retrieval pipelines (RAG)</strong>&nbsp;with&nbsp;<strong>tool-based orchestration</strong>.</p>



<p>This article outlines a&nbsp;<strong>reference architecture</strong>&nbsp;for building a&nbsp;<strong>fully automated RAG pipeline orchestrated by n8n</strong>, leveraging&nbsp;<strong>OVHcloud AI Endpoints</strong>&nbsp;and&nbsp;<strong>PostgreSQL with pgvector</strong>&nbsp;as core components.</p>



<p>The final result will be a system that automatically ingests Markdown documentation from&nbsp;<strong>Object Storage</strong>, creates embeddings with OVHcloud’s&nbsp;<strong>BGE-M3</strong>&nbsp;model available on AI Endpoints, and stores them in a&nbsp;<strong>Managed Database PostgreSQL</strong>&nbsp;with pgvector extension.</p>



<p>Lastly, you’ll be able to build an AI Agent that lets you chat with an LLM (<strong>GPT-OSS-120B</strong>&nbsp;on AI Endpoints). This agent, utilising the RAG implementation carried out upstream, will be an expert on OVHcloud products.</p>



<p>You can further improve the process by using an&nbsp;<strong>LLM guard</strong>&nbsp;to protect the questions sent to the LLM, and set up a chat memory to use conversation history for higher response quality.</p>



<p><strong>But what about n8n?</strong></p>



<p><a href="https://n8n.io/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong>n8n</strong></a>, the open-source workflow automation tool,&nbsp;offers many benefits and connects seamlessly with over&nbsp;<strong>300</strong>&nbsp;APIs, apps, and services:</p>



<ul class="wp-block-list">
<li><strong>Open-source</strong>: n8n is a 100% self-hostable solution, which means you retain full data control;</li>



<li><strong>Flexible</strong>: combines low-code nodes and custom JavaScript/Python logic;</li>



<li><strong>AI-ready</strong>: includes useful integrations for LangChain, OpenAI, and embedding support capabilities;</li>



<li><strong>Composable</strong>: enables simple connections between data, APIs, and models in minutes;</li>



<li><strong>Sovereign by design</strong>: compliant with privacy-sensitive or regulated sectors.</li>
</ul>



<p>This reference architecture serves as a blueprint for building a sovereign, scalable Retrieval Augmented Generation (<strong>RAG</strong>) platform using&nbsp;<strong>n8n</strong>&nbsp;and&nbsp;<strong>OVHcloud Public Cloud</strong>&nbsp;solutions.</p>



<p>This setup shows how to orchestrate data ingestion, generate embedding, and enable conversational AI by combining&nbsp;<strong>OVHcloud Object Storage</strong>,&nbsp;<strong>Managed Databases with PostgreSQL</strong>,&nbsp;<strong>AI Endpoints</strong>&nbsp;and&nbsp;<strong>AI Deploy</strong>.<strong>The result?</strong>&nbsp;An AI environment that is fully integrated, protects privacy, and is exclusively hosted on <strong>OVHcloud’s European infrastructure</strong>.</p>



<h2 class="wp-block-heading">Overview of the n8n workflow architecture for RAG </h2>



<p>The workflow involves the following steps:</p>



<ul class="wp-block-list">
<li><strong>Ingestion:</strong>&nbsp;documentation in markdown format is fetched from <strong>OVHcloud Object Storage (S3);</strong></li>



<li><strong>Preprocessing:</strong> n8n cleans and normalises the text, removing YAML front-matter and encoding noise;</li>



<li><strong>Vectorisation:</strong>&nbsp;Each document is embedded using the <strong>BGE-M3</strong> model, which is available via <strong>OVHcloud AI Endpoints;</strong></li>



<li><strong>Persistence:</strong> vectors and metadata are stored in <strong>OVHcloud PostgreSQL Managed Database</strong> using pgvector;</li>



<li><strong>Retrieval:</strong> when a user sends a query, n8n triggers a <strong>LangChain Agent</strong> that retrieves relevant chunks from the database;</li>



<li><strong>Reasoning and actions:</strong>&nbsp;The <strong>AI Agent node</strong> combines LLM reasoning, memory, and tool usage to generate a contextual response or trigger downstream actions (Slack reply, Notion update, API call, etc.).</li>
</ul>



<p>In this tutorial, all services are deployed within the <strong>OVHcloud Public Cloud</strong>.</p>



<h2 class="wp-block-heading">Prerequisites</h2>



<p>Before you start, double-check that you have:</p>



<ul class="wp-block-list">
<li>an <strong>OVHcloud Public Cloud</strong> account</li>



<li>an <strong>OpenStack user</strong> with the <a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-users?id=kb_article_view&amp;sysparm_article=KB0048170" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">&nbsp;following roles</a>:
<ul class="wp-block-list">
<li>Administrator</li>



<li>AI Operator</li>



<li>Object Storage Operator</li>
</ul>
</li>



<li>An <strong>API key</strong> for <a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-endpoints-getting-started?id=kb_article_view&amp;sysparm_article=KB0065401" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Endpoints</a></li>



<li><strong>ovhai CLI available</strong> – <em>install the </em><a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-cli-install-client?id=kb_article_view&amp;sysparm_article=KB0047844" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><em>ovhai CLI</em></a></li>



<li><strong>Hugging Face access</strong> – <em>create a </em><a href="https://huggingface.co/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><em>Hugging Face account</em></a><em> and generate an </em><a href="https://huggingface.co/settings/tokens" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><em>access token</em></a></li>
</ul>



<p><strong>🚀 Now that you have everything you need, you can start building your n8n workflow!</strong></p>



<h2 class="wp-block-heading">Architecture guide: n8n agentic RAG workflow</h2>



<p>You’re all set to configure and deploy your n8n workflow</p>



<p>⚙️<em> Keep in mind that the following steps can be completed using OVHcloud APIs!</em></p>



<h3 class="wp-block-heading">Step 1 &#8211; Build the RAG data ingestion pipeline</h3>



<p>This first step involves building the foundation of the entire RAG workflow by preparing the elements you need:</p>



<ul class="wp-block-list">
<li>n8n deployment</li>



<li>Object Storage bucket creation</li>



<li>PostgreSQL database creation</li>



<li>and more</li>
</ul>



<p>Remember to set up the proper credentials in n8n so the different elements can connect and function.</p>



<h4 class="wp-block-heading">1. Deploy n8n on OVHcloud VPS</h4>



<p>OVHcloud provides <a href="https://www.ovhcloud.com/en-gb/vps/vps-n8n/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong>VPS solutions compatible with n8n</strong></a><strong>.</strong> Get a ready-to-use virtual server with <strong>pre-installed n8n </strong>and start building automation workflows without manual setup. With plans ranging from <strong>6 vCores&nbsp;/&nbsp;12 GB RAM</strong> to <strong>24 vCores&nbsp;/&nbsp;96 GB RAM</strong>, you can choose the capacity that suits your workload.</p>



<p><strong>How to set up n8n on a VPS?</strong></p>



<p>Setting up n8n on an OVHcloud VPS generally involves:</p>



<ul class="wp-block-list">
<li>Choosing and provisioning your OVHcloud VPS plan;</li>



<li>Connecting to your server via SSH and carrying out the initial server configuration, which includes updating the OS;</li>



<li>Installing n8n, typically with Docker (recommended for ease of management and updates), or npm by following this <a href="https://help.ovhcloud.com/csm/en-gb-vps-install-n8n?id=kb_article_view&amp;sysparm_article=KB0072179" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">guide</a>;</li>



<li>Configuring n8n with a domain name, SSL certificate for HTTPS, and any necessary environment variables for databases or settings.</li>
</ul>



<p>While OVHcloud provides a robust VPS platform, you can find detailed n8n installation guides in the <a href="https://docs.n8n.io/hosting/installation/docker/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">official n8n documentation</a>.</p>



<p>Once the configuration is complete, you can configure the database and bucket in Object Storage.</p>



<h4 class="wp-block-heading">2. Create Object Storage bucket</h4>



<p>First, you have to set up your data source. Here you can store all your documentation in an S3-compatible <a href="https://www.ovhcloud.com/en-gb/public-cloud/object-storage/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Object Storage</a> bucket.</p>



<p>Here, assume that all the documentation files are in Markdown format.</p>



<p>From <strong>OVHcloud Control Panel</strong>, create a new Object Storage container with <strong>S3-compatible API </strong>solution; follow this <a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-storage-s3-getting-started-object-storage?id=kb_article_view&amp;sysparm_article=KB0034674" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">guide</a>.</p>



<p>When the bucket is ready, add your Markdown documentation to it.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1024x580.png" alt="" class="wp-image-29733" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p><strong>Note:</strong>&nbsp;For this tutorial, we’re using the various OVHcloud product documentation available in Open-Source on the GitHub repository maintained by OVHcloud members.</p>



<p><em>Click this </em><a href="https://github.com/ovh/docs.git" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><em>link</em></a><em> to access the repository.</em></p>
</blockquote>
</blockquote>



<p>How do you do that? Extract all the <a href="http://guide.en-gb.md" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong>guide.en-gb.md</strong></a> files from the GitHub repository and rename each one to match its parent folder.</p>



<p>Example: the documentation about ovhai cli installation <code><strong>docs/pages/public_cloud/ai_machine_learning/cli_10_howto_install_cli/</strong></code><a href="http://guide.en-gb.md" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong>guide.en-gb.md</strong></a> is stored in <strong>ovhcloud-products-documentation-md</strong> bucket as <a href="http://cli_10_howto_install_cli.md" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong>cli_10_howto_install_cli.md</strong></a></p>



<p>You should get an overview that looks like this:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-1024x580.png" alt="" class="wp-image-29735" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Keep the following elements and create a new credential in n8n named <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">OVHcloud S3 gra credentials</mark></strong></code>:</p>



<ul class="wp-block-list">
<li>S3 Endpoint: <a href="https://s3.gra.io.cloud.ovh.net/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">https://s3.gra.io.cloud.ovh.net/</mark></code></strong></a></li>



<li>Region: <strong><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">gra</mark></code></strong></li>



<li>Access Key ID: <strong><code>&lt;your_object_storage_user_access_key&gt;</code></strong></li>



<li>Secret Access Key: <strong><code>&lt;your_pbject_storage_user_secret_key&gt;</code></strong></li>
</ul>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-2-1024x580.png" alt="" class="wp-image-29736" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-2-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-2-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-2-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-2-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-2-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Then, create a new n8n node by selecting&nbsp;<strong>S3</strong>, then&nbsp;<strong>Get Multiple Files</strong>.<br>Configure this node as follows:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.20.47-1024x580.png" alt="" class="wp-image-29740" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.20.47-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.20.47-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.20.47-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.20.47-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.20.47-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Connect the node to the previous one before moving on to the next step.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.18.00-1024x580.png" alt="" class="wp-image-29741" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.18.00-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.18.00-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.18.00-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.18.00-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-15-a-16.18.00-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>With the first phase done, you can now configure the vector DB.</p>



<h4 class="wp-block-heading">3. Configure PostgreSQL Managed DB (pgvector)</h4>



<p>In this step, you can set up the vector database that lets you store the embeddings generated from your documents.</p>



<p>How? By using OVHcloud’s managed databases, a pgvector extension of&nbsp;<a href="https://www.ovhcloud.com/en-gb/public-cloud/postgresql/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">PostgreSQL</a>. Go to your OVHcloud Control Panel and follow the steps.</p>



<p>1. Navigate to&nbsp;<strong>Databases &amp; Analytics &gt; Databases</strong></p>



<p><strong>2. Create a new database and select&nbsp;<em>PostgreSQL</em>&nbsp;and a datacenter location</strong></p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/4-1024x580.png" alt="" class="wp-image-29758" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/4-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/4-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/4-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/4-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/4-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p><strong>3. Select&nbsp;<em>Production</em>&nbsp;plan and&nbsp;<em>Instance type</em></strong></p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/5-1024x580.png" alt="" class="wp-image-29759" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/5-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/5-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/5-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/5-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/5-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p><strong>4. Reset the user password and save it</strong></p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-1-1024x580.png" alt="" class="wp-image-29762" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-1-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-1-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-1-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-1-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-1-1-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p><strong>5. Whitelist the IP of your n8n instance as follows</strong></p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/7-1024x580.png" alt="" class="wp-image-29761" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/7-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/7-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/7-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/7-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/7-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p><strong>6. Take note of te following parameters</strong></p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/6-1024x580.png" alt="" class="wp-image-29760" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/6-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/6-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/6-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/6-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/6-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Make a note of this information and create a new credential in n8n named&nbsp;<strong>OVHcloud PGvector credentials</strong>:</p>



<ul class="wp-block-list">
<li>Host:<strong>&nbsp;<code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">&lt;db_hostname&gt;</mark></code></strong></li>



<li>Database:&nbsp;<strong>defaultdb</strong></li>



<li>User:&nbsp;<code>avnadmin</code></li>



<li>Password:&nbsp;<code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">&lt;db_password&gt;</mark></strong></code></li>



<li>Port:&nbsp;<strong>20184</strong></li>
</ul>



<p>Consider&nbsp;<code>enabling</code>&nbsp;the&nbsp;<strong>Ignore SSL Issues (Insecure)</strong>&nbsp;button as needed and setting the&nbsp;<strong>Maximum Number of Connections</strong>&nbsp;value to&nbsp;<strong><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">1000</mark></code></strong>.</p>



<figure class="wp-block-image"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/8-1024x580.png" alt="" class="wp-image-29763" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/8-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/8-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/8-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/8-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/8-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>✅ You’re now connected to the database! But what about the PGvector extension?</p>



<p>Add a PosgreSQL node in your n8n workflow&nbsp;<code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Execute a SQL query</mark></strong></code>,&nbsp;and create the extension through an SQL query, which should look like this:</p>



<pre class="wp-block-code"><code class="">-- drop table as needed<br>DROP TABLE IF EXISTS md_embeddings;<br><br>-- activate pgvector<br>CREATE EXTENSION IF NOT EXISTS vector;<br><br>-- create table<br>CREATE TABLE md_embeddings (<br>    id SERIAL PRIMARY KEY,<br>    text TEXT,<br>    embedding vector(1024),<br>    metadata JSONB<br>);</code></pre>



<p>You should get this n8n node:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.43.39-1024x580.png" alt="" class="wp-image-29752" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.43.39-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.43.39-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.43.39-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.43.39-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.43.39-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Finally, you can create a new table and name it&nbsp;<code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">md_embeddings</mark></strong></code>&nbsp;using this node. Create a&nbsp;<code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Stop and Error</mark></strong></code>&nbsp;node if you run into errors setting up the table.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.51.45-1024x580.png" alt="" class="wp-image-29753" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.51.45-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.51.45-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.51.45-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.51.45-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-16-a-14.51.45-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>All set! Your vector DB is prepped and ready for data! Keep in mind, you still need an&nbsp;<strong>embeddings model</strong> for the RAG data ingestion pipeline.</p>



<h4 class="wp-block-heading">4. Access to OVHcloud AI Endpoints</h4>



<p><strong>OVHcloud AI Endpoints</strong>&nbsp;is a managed service that provides&nbsp;<strong>ready-to-use APIs for AI models</strong>, including&nbsp;<strong>LLM, CodeLLM, embeddings, Speech-to-Text, and image models</strong>&nbsp;hosted within OVHcloud’s European infrastructure.</p>



<p>To vectorise the various documents in Markdown format, you have to select an embedding model:&nbsp;<a href="https://endpoints.ai.cloud.ovh.net/models/bge-m3" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong>BGE-M3</strong></a>.</p>



<p>Usually, your AI Endpoints API key should already be created. If not, head to the AI Endpoints menu in your OVHcloud Control Panel to generate a new API key.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-3-1-1024x580.png" alt="" class="wp-image-29775" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-3-1-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-3-1-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-3-1-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-3-1-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/ref-archi-n8n-3-1-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Once this is done, you can create new OpenAI credentials in your n8n.</p>



<p>Why do I need OpenAI credentials? Because <strong>AI Endpoints API&nbsp;</strong>is fully compatible with OpenAI’s, integrating it is simple and ensures the&nbsp;<strong>sovereignty of your data.</strong></p>



<p>How? Thanks to a single endpoint&nbsp;<a href="https://oai.endpoints.kepler.ai.cloud.ovh.net/v1" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>https://oai.endpoints.kepler.ai.cloud.ovh.net/v1</code></mark></strong></a>, you can request the different AI Endpoints models.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.45.33-1024x580.png" alt="" class="wp-image-29776" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.45.33-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.45.33-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.45.33-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.45.33-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.45.33-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>This means you can create a new n8n node by selecting&nbsp;<strong>Postgres PGVector Store</strong>&nbsp;and&nbsp;<strong>Add documents to Vector Store</strong>.<br>Set up this node as shown below:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.24-1024x580.png" alt="" class="wp-image-29781" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.24-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.24-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.24-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.24-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.24-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Then configure the <strong>Data Loader</strong> with a custom text splitting and a JSON type.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.38-1-1024x580.png" alt="" class="wp-image-29780" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.38-1-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.38-1-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.38-1-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.38-1-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.38-1-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>For the text splitter, here are some options:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-12.02.43-1024x580.png" alt="" class="wp-image-29786" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-12.02.43-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-12.02.43-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-12.02.43-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-12.02.43-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-12.02.43-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>To finish, select the&nbsp;<strong>BGE-M3</strong> embedding model from the model list and set the&nbsp;<strong>Dimensions</strong> to 1024.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.51-1024x580.png" alt="" class="wp-image-29784" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.51-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.51-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.51-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.51-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.50.51-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>You now have everything you need to build the ingestion pipeline.</p>



<h4 class="wp-block-heading">5. Set up the ingestion pipeline loop</h4>



<p>To make use of a fully automated document ingestion and vectorisation pipeline, you have to integrate some specific nodes, mainly:</p>



<ul class="wp-block-list">
<li>a <strong><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Loop Over Items</mark></code></strong> that downloads each markdown file one by one so that it can be vectorised;</li>



<li>a <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Code in JavaScript</mark></strong></code> that counts the number of files processed, which subsequently determines the number of requests sent to the embedding model;</li>



<li>an <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">If</mark></strong></code> condition that allows you to check when the 400 requests have been reached;</li>



<li>a <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Wait</mark></strong></code> node that pauses after every 400 requests to avoid getting rate-limited;</li>



<li>an S3 block <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Download a file</mark></strong></code> to download each markdown;</li>



<li>another <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Code in JavaScript</mark></strong></code> to extract and process text from Markdown files by cleaning and removing special characters before sending it to the embeddings model;</li>



<li>a PostgreSQL node to <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Execute a SQL</mark></strong></code> query to check that the table contains vectors after the process (loop) is complete.</li>
</ul>



<h5 class="wp-block-heading">5.1. Create a loop to process each documentation file</h5>



<p>Begin by creating a <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Loop Over Items</mark></strong></code> to process all the Markdown files one at a time. Set the <strong>batch size</strong> to <strong><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">1</mark></code></strong> in this loop.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-10.50.13-1024x580.png" alt="" class="wp-image-29788" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-10.50.13-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-10.50.13-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-10.50.13-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-10.50.13-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-10.50.13-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Add the <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>Loop</code></mark></strong> statement right after the S3 <strong><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Get Many Files</mark></code></strong> node as shown below:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.30.00-1024x580.png" alt="" class="wp-image-29797" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.30.00-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.30.00-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.30.00-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.30.00-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.30.00-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Time to put the loop’s content into action!</p>



<h5 class="wp-block-heading">5.2. Count the number of files using a code snippet</h5>



<p>Next, choose the <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Code in JavaScript</mark></strong></code> node from the list to see how many files have been processed. Set “Run Once for Each Item” <code><strong>Mode</strong></code> and “JavaScript” code <strong>Language</strong>, then add the following code snippet to the designated block.</p>



<pre class="wp-block-code"><code class="">// simple counter per item<br>const counter = $runIndex + 1;<br><br>return {<br>  counter<br>};</code></pre>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.05.47-1024x580.png" alt="" class="wp-image-29792" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.05.47-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.05.47-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.05.47-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.05.47-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.05.47-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Make sure this code snippet is included in the loop.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.33.57-1024x580.png" alt="" class="wp-image-29798" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.33.57-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.33.57-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.33.57-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.33.57-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.33.57-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>You can start adding the <mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><strong><code>if</code></strong></mark> part to the loop now.</p>



<h5 class="wp-block-heading">5.3. Add a condition that applies a rule every 400 requests</h5>



<p>Here, you need to create an <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">If</mark></strong></code> node and add the following condition, which you have set as an expression.</p>



<pre class="wp-block-code"><code class="">{{ (Number($json["counter"]) % 400) === 0 }}</code></pre>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.11.42-1024x580.png" alt="" class="wp-image-29794" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.11.42-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.11.42-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.11.42-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.11.42-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.11.42-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Add it immediately after counting the files:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.44.10-1024x580.png" alt="" class="wp-image-29800" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.44.10-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.44.10-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.44.10-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.44.10-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.44.10-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>If this condition <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">is true</mark></strong></code>, trigger the <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Wait</mark></strong></code> node.</p>



<h5 class="wp-block-heading">5.4. Insert a pause after each set of 400 requests</h5>



<p>Then insert a <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Wait</mark></strong></code> node to pause for a few seconds before resuming. You can insert <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Resume</mark></strong></code> “After Time Interval” and set the <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Wait Amount</mark></strong></code> to “60:00” seconds.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.23.39-1024x580.png" alt="" class="wp-image-29796" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.23.39-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.23.39-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.23.39-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.23.39-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.23.39-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Link it to the <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">If</mark></strong></code> condition when this is <strong>True</strong>.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.45.08-1024x580.png" alt="" class="wp-image-29801" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.45.08-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.45.08-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.45.08-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.45.08-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-11.45.08-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Next, you can go ahead and download the Markdown file, and then process it.</p>



<h5 class="wp-block-heading">5.5. Launch documentation download</h5>



<p>To do this, create a new <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Download a file</mark></strong></code> S3 node and configure it with this File Key expression:</p>



<pre class="wp-block-code"><code class="">{{ $('Process each documentation file').item.json.Key }}</code></pre>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.42.12-1024x580.png" alt="" class="wp-image-29804" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.42.12-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.42.12-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.42.12-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.42.12-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.42.12-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Want to connect it?  That’s easy, link it to the output of the <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Wait</mark></strong></code> and <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">If</mark></strong></code> statements when the ‘if’ statement returns <strong>False</strong>; this will allow the file to be processed only if the rate limit is not exceeded.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.49.05-1024x580.png" alt="" class="wp-image-29805" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.49.05-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.49.05-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.49.05-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.49.05-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-16.49.05-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>You’re almost done! Now you need to extract and process the text from the Markdown files – clean and remove any special characters before sending it to the embedding model.</p>



<h5 class="wp-block-heading">5.6 Clean Markdown text content</h5>



<p>Next, create another <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Code in JavaScript</mark></strong></code> to process text from Markdown files:</p>



<pre class="wp-block-code"><code class="">// extract binary content<br>const binary = $input.item.binary.data;<br><br>// decoding into clean UTF-8 text<br>let text = Buffer.from(binary.data, 'base64').toString('utf8');<br><br>// cleaning - remove non-printable characters<br>text = text<br>  .replace(/[^\x09\x0A\x0D\x20-\x7EÀ-ÿ€£¥•–—‘’“”«»©®™°±§¶÷×]/g, ' ')<br>  .replace(/\s{2,}/g, ' ')<br>  .trim();<br><br>// check lenght<br>if (text.length &gt; 14000) {<br>  text = text.slice(0, 14000);<br>}<br><br>return [{<br>  text,<br>  fileName: binary.fileName,<br>  mimeType: binary.mimeType<br>}];</code></pre>



<p>Select the <em>“Run Once for Each Item”</em> <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Mode</mark></strong></code> and place the previous code in the dedicated JavaScript block.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.02.04-1024x580.png" alt="" class="wp-image-29806" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.02.04-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.02.04-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.02.04-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.02.04-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.02.04-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>To finish, check that the output text has been sent to the document vectorisation system, which was set up in <strong>Step 3 – Configure PostgreSQL Managed DB (pgvector)</strong>.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.15.45-1024x580.png" alt="" class="wp-image-29808" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.15.45-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.15.45-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.15.45-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.15.45-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-17.15.45-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>How do I confirm that the table contains all elements after vectorisation?</p>



<h5 class="wp-block-heading">5.7 Double-check that the documents are in the table</h5>



<p>To confirm that your RAG system is working, make sure your vector database has different vectors; use a PostgreSQL node with <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Execute a SQL query</mark></strong></code> in your n8n workflow.</p>



<p>Then, run the following query:</p>



<pre class="wp-block-code"><code class="">-- count the number of elements<br>SELECT COUNT(*) FROM md_embeddings;</code></pre>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-20.28.49-1024x580.png" alt="" class="wp-image-29818" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-20.28.49-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-20.28.49-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-20.28.49-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-20.28.49-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-20-a-20.28.49-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Next, link this element to the <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Done</mark></strong></code> section of your <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Loop</mark></strong>, so the elements are counted when the process is complete.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.14.41-1024x580.png" alt="" class="wp-image-29773" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.14.41-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.14.41-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.14.41-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.14.41-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-17-a-11.14.41-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Congrats! You can now run the workflow to begin ingesting documents.</p>



<p>Click the <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Execute workflow</mark></strong></code> button and wait until the vectorization process is complete.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-11.41.52-1024x580.png" alt="" class="wp-image-29823" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-11.41.52-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-11.41.52-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-11.41.52-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-11.41.52-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-11.41.52-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Remember, everything should be green when it’s finished ✅.</p>



<h3 class="wp-block-heading">Step 2 – RAG chatbot</h3>



<p>With the data ingestion and vectorisation steps completed, you can now begin implementing your AI agent.</p>



<p>This involves building a <strong>RAG-based AI Agent</strong>&nbsp;by simply starting a chat with an LLM.</p>



<h4 class="wp-block-heading">1. Set up the chat box to start a conversation</h4>



<p>First, configure your AI Agent based on the RAG system, and add a new node in the same n8n workflow: <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Chat Trigger</mark></strong></code>.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.31.24-1024x580.png" alt="" class="wp-image-29834" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.31.24-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.31.24-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.31.24-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.31.24-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.31.24-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>This node will allow you to interact directly with your AI agent! But before that, you need to check that your message is safe.</p>



<p>This node will allow you to interact directly with your AI agent! But before that, you need to check that your message is secure.</p>



<h4 class="wp-block-heading">2. Set up your LLM Guard with AI Deploy</h4>



<p>To check whether a message is secure or not, use an LLM Guard.</p>



<p><strong>What’s an LLM Guard?</strong>&nbsp;This is a safety and control layer that sits between users and an LLM, or between the LLM and an external connection. Its main goal is to filter, monitor, and enforce rules on what goes into or comes out of the model 🔐.</p>



<p>You can use <a href="file:///Users/jdutse/Downloads/www.ovhcloud.com/en-gb/public-cloud/ai-deploy" data-wpel-link="internal">AI Deploy</a> from OVHcloud to deploy your desired LLM guard. With a single command line, this AI solution lets you deploy a Hugging Face model using vLLM Docker containers.</p>



<p>For more details, please refer to this <a href="https://blog.ovhcloud.com/mistral-small-24b-served-with-vllm-and-ai-deploy-one-command-to-deploy-llm/" data-wpel-link="internal">blog</a>.</p>



<p>For the use case covered in this article, you can use the open-source model <strong>meta-llama/Llama-Guard-3-8B</strong> available on <a href="https://huggingface.co/meta-llama/Llama-Guard-3-8B" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Hugging Face</a>.</p>



<h5 class="wp-block-heading">2.1 Create a Bearer token to request your custom AI Deploy endpoint</h5>



<p><a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-cli-app-token?id=kb_article_view&amp;sysparm_article=KB0035280" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Create a token</a> to access your AI Deploy app once it’s deployed.</p>



<pre class="wp-block-code"><code class="">ovhai token create --role operator ai_deploy_token=my_operator_token</code></pre>



<p>The following output is returned:</p>



<p><code><strong>Id: 47292486-fb98-4a5b-8451-600895597a2b<br>Created At: 20-10-25 8:53:05<br>Updated At: 20-10-25 8:53:05<br>Spec:<br>Name: ai_deploy_token=my_operator_token<br>Role: AiTrainingOperator<br>Label Selector:<br>Status:<br>Value: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX<br>Version: 1</strong></code></p>



<p>You can now store and export your access token to add it as a new credential in n8n.</p>



<pre class="wp-block-code"><code class="">export MY_OVHAI_ACCESS_TOKEN=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX</code></pre>



<h5 class="wp-block-heading">2.1 Start Llama Guard 3 model with AI Deploy</h5>



<p>Using <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">ovhai</mark></strong></code> CLI, launch the following command and vLLM start inference server.</p>



<pre class="wp-block-code"><code class="">ovhai app run \<br>	--name vllm-llama-guard3 \<br>        --default-http-port 8000 \<br>        --gpu 1 \<br>	--flavor l40s-1-gpu \<br>        --label ai_deploy_token=my_operator_token \<br>	--env OUTLINES_CACHE_DIR=/tmp/.outlines \<br>	--env HF_TOKEN=$MY_HF_TOKEN \<br>	--env HF_HOME=/hub \<br>	--env HF_DATASETS_TRUST_REMOTE_CODE=1 \<br>	--env HF_HUB_ENABLE_HF_TRANSFER=0 \<br>	--volume standalone:/workspace:RW \<br>	--volume standalone:/hub:RW \<br>	vllm/vllm-openai:v0.10.1.1 \<br>	-- bash -c python3 -m vllm.entrypoints.openai.api_server                       <br>                           --model meta-llama/Llama-Guard-3-8B \                     <br>                           --tensor-parallel-size 1 \                     <br>                           --dtype bfloat16</code></pre>



<p><em>Full command explained:</em></p>



<ul class="wp-block-list">
<li><code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">ovhai app run</mark></strong></code></li>
</ul>



<p>This is the core command to&nbsp;<strong>run an app</strong>&nbsp;using the&nbsp;<strong>OVHcloud AI Deploy</strong>&nbsp;platform.</p>



<ul class="wp-block-list">
<li><code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">--name vllm-llama-guard3</mark></strong></code></li>
</ul>



<p>Sets a&nbsp;<strong>custom name</strong>&nbsp;for the job. For example,&nbsp;<code>vllm-llama-guard3</code>.</p>



<ul class="wp-block-list">
<li><code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">--default-http-port 8000</mark></strong></code></li>
</ul>



<p>Exposes&nbsp;<strong>port 8000</strong>&nbsp;as the default HTTP endpoint. vLLM server typically runs on port 8000.</p>



<ul class="wp-block-list">
<li><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>--gpu&nbsp;</code>1</mark></strong></li>



<li><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>--flavor l40s-1-gpu</code></mark></strong></li>
</ul>



<p>Allocates&nbsp;<strong>1 GPU L40S</strong>&nbsp;for the app. You can adjust the GPU type and number depending on the model you have to deploy.</p>



<ul class="wp-block-list">
<li><code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">--volume standalone:/workspace:RW</mark></strong></code></li>



<li><code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">--volume standalone:/hub:RW</mark></strong></code></li>
</ul>



<p>Mounts&nbsp;<strong>two persistent storage volumes</strong>: <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>/workspace</code></mark></strong> which is the main working directory and <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">/hub</mark></strong></code>&nbsp;to store Hugging Face model files.</p>



<ul class="wp-block-list">
<li><code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">--env OUTLINES_CACHE_DIR=/tmp/.outlines</mark></strong></code></li>



<li><strong><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">--env HF_TOKEN=$MY_HF_TOKEN</mark></code></strong></li>



<li><code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">--env HF_HOME=/hub</mark></strong></code></li>



<li><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><strong>--env HF_DATASETS_TRUST_REMOTE_CODE=1</strong></mark></code></li>



<li><code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">--env HF_HUB_ENABLE_HF_TRANSFER=0</mark></strong></code></li>
</ul>



<p>These are Hugging Face&nbsp;<strong>environment variables</strong> you have to set. Please export your Hugging Face access token as environment variable before starting the app: <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">export MY_HF_TOKEN=***********</mark></strong></code></p>



<ul class="wp-block-list">
<li><code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">vllm/vllm-openai:v0.10.1.1</mark></strong></code></li>
</ul>



<p>Use the&nbsp;<strong><code>v<strong><code>llm/vllm-openai</code></strong></code></strong>&nbsp;Docker image (a pre-configured vLLM OpenAI API server).</p>



<ul class="wp-block-list">
<li><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><strong>-- bash -c python3 -m vllm.entrypoints.openai.api_server                       <br>                    --model meta-llama/Llama-Guard-3-8B \                     <br>                    --tensor-parallel-size 1 \                     <br>                    --dtype bfloat16</strong></mark></code></li>
</ul>



<p>Finally, run a<strong>&nbsp;bash shell</strong>&nbsp;inside the container and executes a Python command to launch the vLLM API server.</p>



<h5 class="wp-block-heading">2.2 Check to confirm your AI Deploy app is RUNNING</h5>



<p>Replace the <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">&lt;app_id></mark></strong></code> by yours.</p>



<pre class="wp-block-code"><code class="">ovhai app get &lt;app_id&gt;</code></pre>



<p>You should get:</p>



<p><code>History:<br>DATE STATE<br>20-1O-25 09:58:00 QUEUED<br>20-10-25 09:58:01 INITIALIZING<br>04-04-25 09:58:07 PENDING<br>04-04-25 10:03:10&nbsp;<strong>RUNNING</strong><br>Info:<br>Message: App is running</code></p>



<h5 class="wp-block-heading">2.3 Create a new n8n credential with AI Deploy app URL and Bearer access token</h5>



<p>First, using your <code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><strong>&lt;app_id></strong></mark></code>, retrieve your AI Deploy app URL.</p>



<pre class="wp-block-code"><code class="">ovhai app get <span style="background-color: initial; font-family: inherit; font-size: inherit; text-align: initial; font-weight: inherit;">&lt;app_id&gt;</span> -o json | jq '.status.url' -r</code></pre>



<p>Then, create a new OpenAI credential from your n8n workflow, using your AI Deploy URL and the Bearer token as an API key.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.49.14-1024x580.png" alt="" class="wp-image-29837" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.49.14-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.49.14-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.49.14-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.49.14-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-16.49.14-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Don&#8217;t forget to replace <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>6e10e6a5-2862-4c82-8c08-26c458ca12c7</code></mark></strong> with your <span style="background-color: initial; font-family: inherit; font-size: inherit; text-align: initial; font-weight: inherit;"><strong><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">&lt;app_id></mark></code></strong></span>.</p>



<h5 class="wp-block-heading">2.4 Create the LLM Guard node in n8n workflow</h5>



<p>Create a new <strong>OpenAI node</strong> to <strong>Message a model</strong> and select the new AI Deploy credential for LLM Guard usage.</p>



<p>Next, create the prompt as follows:</p>



<pre class="wp-block-code"><code class="">{{ $('Chat with the OVHcloud product expert').item.json.chatInput }}</code></pre>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.09.43-1024x580.png" alt="" class="wp-image-29840" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.09.43-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.09.43-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.09.43-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.09.43-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.09.43-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Then, use an <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">If</mark></strong></code> node to determine if the scenario is <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>safe</code></mark></strong> or <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>unsafe</code></mark></strong>:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.25.29-1024x580.png" alt="" class="wp-image-29842" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.25.29-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.25.29-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.25.29-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.25.29-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.25.29-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>If the message is <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">unsafe</mark></strong></code>, send an error message right away to stop the workflow.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.26.49-1024x580.png" alt="" class="wp-image-29843" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.26.49-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.26.49-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.26.49-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.26.49-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/10/Capture-decran-2025-10-21-a-18.26.49-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>But if the message is <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">safe</mark></strong></code>, you can send the request to the AI Agent without issues 🔐.</p>



<h4 class="wp-block-heading">3. Set up AI Agent</h4>



<p>The&nbsp;<strong>AI Agent</strong>&nbsp;node in&nbsp;<strong>n8n</strong>&nbsp;acts as an intelligent orchestration layer that combines&nbsp;<strong>LLMs, memory, and external tools</strong>&nbsp;within an automated workflow.</p>



<p>It allows you to:</p>



<ul class="wp-block-list">
<li>Connect a <strong>Large Language Model</strong> using APIs (e.g., LLMs from AI Endpoints);</li>



<li>Use <strong>tools</strong> such as HTTP requests, databases, or RAG retrievers so the agent can take actions or fetch real information;</li>



<li>Maintain <strong>conversational memory</strong> via PostgreSQL databases;</li>



<li>Integrate directly with chat platforms (e.g., Slack, Teams) for interactive assistants (optional).</li>
</ul>



<p>Simply put, n8n becomes an&nbsp;<strong>agentic automation framework</strong>, enabling LLMs to not only provide answers, but also think, choose, and perform actions.</p>



<p>Please note that you can change and customise this n8n <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">AI Agent</mark></strong></code> node to fit your use cases, using features like function calling or structured output. This is the most basic configuration for the given use case. You can go even further with different agents.</p>



<p>🧑‍💻&nbsp;<strong>How do I implement this RAG?</strong></p>



<p>First, create an <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">AI Agent</mark></strong></code> node in <strong>n8n</strong> as follows:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1024x580.png" alt="" class="wp-image-29933" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Then, a series of steps are required, the first of which is creating prompts.</p>



<h5 class="wp-block-heading">3.1 Create prompts</h5>



<p>In the AI Agent node on your n8n workflow, edit the user and system prompts.</p>



<p>Begin by creating the&nbsp;<strong>prompt</strong>,&nbsp;which is also the&nbsp;<strong>user message</strong>:</p>



<pre class="wp-block-code"><code class="">{{ $('Chat with the OVHcloud product expert').item.json.chatInput }}</code></pre>



<p>Then create the <strong>System Message</strong> as shown below:</p>



<pre class="wp-block-code"><code class="">You have access to a retriever tool connected to a knowledge base.  <br>Before answering, always search for relevant documents using the retriever tool.  <br>Use the retrieved context to answer accurately.  <br>If no relevant documents are found, say that you have no information about it.</code></pre>



<p>You should get a configuration like this:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1-1024x580.png" alt="" class="wp-image-29935" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-1-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>🤔 Well, an LLM is now needed for this to work!</p>



<h5 class="wp-block-heading">3.2 Select LLM using AI Endpoints API</h5>



<p>First, add an <strong>OpenAI Chat Model</strong> node, and then set it as the <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Chat Model</mark></strong></code> for your agent.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-3-1024x580.png" alt="" class="wp-image-29939" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-3-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-3-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-3-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-3-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-3-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Next, select one of the&nbsp;<a href="https://www.ovhcloud.com/en/public-cloud/ai-endpoints/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OVHcloud AI Endpoints</a>&nbsp;from the list provided, because they are compatible with Open AI APIs.</p>



<p>✅ <strong>How?</strong> By using the right API <a href="https://oai.endpoints.kepler.ai.cloud.ovh.net/v1" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>https://oai.endpoints.kepler.ai.cloud.ovh.net/v1</code></mark></strong></a></p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-2-1024x580.png" alt="" class="wp-image-29936" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-2-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-2-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-2-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-2-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-2-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>The <a href="https://www.ovhcloud.com/en/public-cloud/ai-endpoints/catalog/gpt-oss-120b/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong>GPT OSS 120B</strong></a> model has been selected for this use case. Other models, such as Llama, Mistral, and Qwen, are also available.</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p><mark style="background-color:#fcb900" class="has-inline-color">⚠️ <strong>WARNING</strong> ⚠️</mark></p>



<p>If you are using a recent version of n8n, you will likely encounter the <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>/responses</code></mark></strong> issue (linked to OpenAI compatibility). To resolve this, you will need to disable the button <strong><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Use Responses API</mark></code></strong> and everything will work correctly</p>
</blockquote>



<figure class="wp-block-image aligncenter size-full is-resized"><img loading="lazy" decoding="async" width="829" height="675" src="https://blog.ovhcloud.com/wp-content/uploads/2026/01/02_44_08-1.jpg" alt="" class="wp-image-30352" style="aspect-ratio:1.2281554640124863;width:409px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2026/01/02_44_08-1.jpg 829w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/02_44_08-1-300x244.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2026/01/02_44_08-1-768x625.jpg 768w" sizes="auto, (max-width: 829px) 100vw, 829px" /><figcaption class="wp-element-caption"><em>Tips to fix /responses issue</em></figcaption></figure>



<p>Your LLM is now set to answer your questions! Don’t forget, it needs access to the knowledge base.</p>



<h5 class="wp-block-heading">3.3 Connect the knowledge base to the RAG retriever</h5>



<p>As usual, the first step is to create an n8n node called <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">PGVector Vector Store nod</mark>e</strong></code> and enter your PGvector credentials.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-4-1024x580.png" alt="" class="wp-image-29943" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-4-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-4-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-4-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-4-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-4-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Next, link this element to the <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>Tools</code></mark></strong> section of the AI Agent node.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-5-1024x580.png" alt="" class="wp-image-29944" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-5-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-5-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-5-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-5-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-5-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Remember to connect your PG vector database so that the retriever can access the previously generated embeddings. Here’s an overview of what you’ll get.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-6-1024x580.png" alt="" class="wp-image-29945" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-6-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-6-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-6-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-6-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-6-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>⏳Nearly done! The final step is to add the database memory.</p>



<h5 class="wp-block-heading">3.4 Manage conversation history with database memory</h5>



<p>Creating&nbsp;<strong>Database Memory</strong>&nbsp;node in n8n (PostgreSQL) lets you link it to your AI Agent, so it can store and retrieve past conversation history. This enables the model to remember and use context from multiple interactions.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-7-1024x580.png" alt="" class="wp-image-29946" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-7-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-7-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-7-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-7-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-7-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>So link this PostgreSQL database to the <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">Memory</mark></strong></code> section of your AI agent.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="580" src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-8-1024x580.png" alt="" class="wp-image-29947" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-8-1024x580.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-8-300x170.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-8-768x435.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-8-1536x870.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/11/image-8-2048x1160.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Congrats! 🥳 Your&nbsp;<strong>n8n RAG workflow</strong>&nbsp;is now complete. Ready to test it?</p>



<h4 class="wp-block-heading">4. Make the most of your automated workflow</h4>



<p>Want to try it? It’s easy!</p>



<p>By clicking the orange <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>Open chat</code></mark></strong> button, you can ask the AI agent questions about OVHcloud products, particularly where you need technical assistance.</p>



<figure class="wp-block-video"><video height="1660" style="aspect-ratio: 2930 / 1660;" width="2930" controls src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/video-n8n1.mp4"></video></figure>



<p>For example, you can ask the LLM about rate limits in OVHcloud AI Endpoints and get the information in seconds.</p>



<figure class="wp-block-video"><video height="1660" style="aspect-ratio: 2930 / 1660;" width="2930" controls src="https://blog.ovhcloud.com/wp-content/uploads/2025/11/video-n8n2.mp4"></video></figure>



<p>You can now build your own autonomous RAG system using OVHcloud Public Cloud, suited for a wide range of applications.</p>



<h2 class="wp-block-heading">What’s next?</h2>



<p>To sum up, this reference architecture provides a guide on using&nbsp;<strong>n8n</strong> with&nbsp;<strong>OVHcloud AI Endpoints</strong>,&nbsp;<strong>AI Deploy</strong>,&nbsp;<strong>Object Storage</strong>, and&nbsp;<strong>PostgreSQL + pgvector</strong> to build a fully controlled, autonomous&nbsp;<strong>RAG AI system</strong>.</p>



<p>Teams can build scalable AI assistants that work securely and independently in their cloud environment by orchestrating ingestion, embedding generation, vector storage, retrieval, and LLM safety check, and reasoning within a single workflow.</p>



<p>With the core architecture in place, you can add more features to improve the capabilities and robustness of your agentic RAG system:</p>



<ul class="wp-block-list">
<li>Web search</li>



<li>Images with OCR</li>



<li>Audio files transcribed using the Whisper model</li>
</ul>



<p>This delivers an extensive knowledge base and a wider variety of use cases!</p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Freference-architecture-build-a-sovereign-n8n-rag-workflow-for-ai-agent-using-ovhcloud-public-cloud-solutions%2F&amp;action_name=Reference%20Architecture%3A%20build%20a%20sovereign%20n8n%20RAG%20workflow%20for%20AI%20agent%20using%20OVHcloud%20Public%20Cloud%20solutions&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		<enclosure url="https://blog.ovhcloud.com/wp-content/uploads/2025/11/video-n8n1.mp4" length="11190376" type="video/mp4" />
<enclosure url="https://blog.ovhcloud.com/wp-content/uploads/2025/11/video-n8n2.mp4" length="9881210" type="video/mp4" />

			</item>
		<item>
		<title>Fine tune an LLM with Axolotl and OVHcloud Machine Learning Services</title>
		<link>https://blog.ovhcloud.com/fine-tune-an-llm-with-axolotl-and-ovhcloud-machine-learning-services/</link>
		
		<dc:creator><![CDATA[Stéphane Philippart]]></dc:creator>
		<pubDate>Fri, 25 Jul 2025 13:07:40 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[Tranches de Tech & co]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[AI Deploy]]></category>
		<category><![CDATA[AI Notebook]]></category>
		<category><![CDATA[Fine Tuning]]></category>
		<category><![CDATA[OVHcloud]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=29408</guid>

					<description><![CDATA[There are many ways to train a model,📚 using detailed instructions, system prompts, Retrieval Augmented Generation, or function calling One way is fine-tuning, which is what this blog is about! ✨ Two years back we posted a blog on fine-tuning Llama models—it’s not nearly as complicated as it was before 😉.  This time we’re using the [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Ffine-tune-an-llm-with-axolotl-and-ovhcloud-machine-learning-services%2F&amp;action_name=Fine%20tune%20an%20LLM%20with%20Axolotl%20and%20OVHcloud%20Machine%20Learning%20Services&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<figure class="wp-block-image aligncenter size-full is-resized"><img loading="lazy" decoding="async" width="1024" height="1024" src="https://blog.ovhcloud.com/wp-content/uploads/2025/07/red-cat-02-1.png" alt="A robot with a car tuning style" class="wp-image-29462" style="width:600px" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/07/red-cat-02-1.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/07/red-cat-02-1-300x300.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/07/red-cat-02-1-150x150.png 150w, https://blog.ovhcloud.com/wp-content/uploads/2025/07/red-cat-02-1-768x768.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/07/red-cat-02-1-70x70.png 70w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p>There are many ways to train a model,📚 using detailed instructions, system prompts, Retrieval Augmented Generation, or function calling</p>



<p>One way is fine-tuning, which is what this blog is about! ✨</p>



<p>Two years back we posted a <a href="https://blog.ovhcloud.com/fine-tuning-llama-2-models-using-a-single-gpu-qlora-and-ai-notebooks/" data-wpel-link="internal">blog</a> on fine-tuning Llama models—it’s not nearly as complicated as it was before 😉.  This time we’re using the Framework <a href="https://docs.axolotl.ai/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Axolotl</a>, so hopefully there’s less to manage.</p>



<h3 class="wp-block-heading">So what’s the plan?</h3>



<p>For this blog, I’d like to fine-tune a small model, <a href="https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Llama-3.2-1B-Instruct</a>, and then test it out on a few questions about our <a href="https://endpoints.ai.cloud.ovh.net/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OVHcloud AI Endpoints</a> product 📝.</p>



<p>Before we fine-tune, let’s try it out! Deploying a <a href="https://huggingface.co/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Hugging Face</a> model is super easy with <a href="https://www.ovhcloud.com/fr/public-cloud/ai-deploy/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Deploy</a> from <a href="https://www.ovhcloud.com/fr/public-cloud/ai-machine-learning/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Machine Learning Services</a> 🥳.</p>



<p>And thanks to a <a href="https://blog.ovhcloud.com/mistral-small-24b-served-with-vllm-and-ai-deploy-one-command-to-deploy-llm/" data-wpel-link="internal">previous blog post</a>, we know how to use <a href="https://docs.vllm.ai/en/v0.7.3/index.html" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">vLLM</a> and <a href="https://www.ovhcloud.com/fr/public-cloud/ai-deploy/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Deploy</a>.</p>



<pre title="Deploy a model thanks to vLLM and AI Deploy" class="wp-block-code"><code lang="bash" class="language-bash line-numbers">ovhai app run --name $1 \
	--flavor l40s-1-gpu \
	--gpu 2 \
	--default-http-port 8000 \
	--env OUTLINES_CACHE_DIR=/tmp/.outlines \
	--env HF_TOKEN=$MY_HUGGING_FACE_TOKEN \
	--env HF_HOME=/hub \
	--env HF_DATASETS_TRUST_REMOTE_CODE=1 \
	--env HF_HUB_ENABLE_HF_TRANSFER=0 \
	--volume standalone:/hub:rw \
	--volume standalone:/workspace:rw \
	vllm/vllm-openai:v0.8.2 \
	-- bash	-c "vllm serve meta-llama/Llama-3.2-1B-Instruct"</code></pre>



<p class="has-text-align-center"><strong><strong>⚠️ Make sure you’ve agreed to the terms of use for the model’s license from Hugging Face ⚠️</strong></strong></p>



<p>Check out the <a href="https://blog.ovhcloud.com/mistral-small-24b-served-with-vllm-and-ai-deploy-one-command-to-deploy-llm/" data-wpel-link="internal">blog</a> I mentioned earlier for all the details you need on the command and its parameters.</p>



<p>To test our different chatbots we will use a simple <a href="https://github.com/ovh/public-cloud-examples/tree/main/ai/llm-fine-tune/chatbot/chatbot.py" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Gradio application</a>:</p>



<pre title="Chatbot" class="wp-block-code"><code lang="python" class="language-python line-numbers"># Application to compare answers generation from OVHcloud AI Endpoints exposed model and fine tuned model.
# ⚠️ Do not used in production!! ⚠️

import gradio as gr
import os

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

# 📜 Prompts templates 📜
prompt_template = ChatPromptTemplate.from_messages(
    [
        ("system", "{system_prompt}"),
        ("human", "{user_prompt}"),
    ]
)

def chat(prompt, system_prompt, temperature, top_p, model_name, model_url, api_key):
    """
    Function to generate a chat response using the provided prompt, system prompt, temperature, top_p, model name, model URL and API key.
    """

    # ⚙️ Initialize the OpenAI model ⚙️
    llm = ChatOpenAI(api_key=api_key, 
                 model=model_name, 
                 base_url=model_url,
                 temperature=temperature,
                 top_p=top_p
                 )

    # 📜 Apply the prompt to the model 📜
    chain = prompt_template | llm
    ai_msg = chain.invoke(
        {
            "system_prompt": system_prompt,
            "user_prompt": prompt
        }
    )

    # 🤖 Return answer in a compatible format for Gradio component.
    return [{"role": "user", "content": prompt}, {"role": "assistant", "content": ai_msg.content}]

# 🖥️ Main application 🖥️
with gr.Blocks() as demo:
    with gr.Row():
        with gr.Column():
            system_prompt = gr.Textbox(value="""You are a specialist on OVHcloud products.
If you can't find any sure and relevant information about the product asked, answer with "This product doesn't exist in OVHcloud""", 
                label="🧑‍🏫 System Prompt 🧑‍🏫")
            temperature = gr.Slider(minimum=0.0, maximum=2.0, step=0.01, label="Temperature", value=0.5)
            top_p = gr.Slider(minimum=0.0, maximum=1.0, step=0.01, label="Top P", value=0.0)
            model_name = gr.Textbox(label="🧠 Model Name 🧠", value='Llama-3.1-8B-Instruct')
            model_url = gr.Textbox(label="🔗 Model URL 🔗", value='https://oai.endpoints.kepler.ai.cloud.ovh.net/v1')
            api_key = gr.Textbox(label="🔑 OVH AI Endpoints Access Token 🔑", value=os.getenv("OVH_AI_ENDPOINTS_ACCESS_TOKEN"), type="password")

        with gr.Column():
            chatbot = gr.Chatbot(type="messages", label="🤖 Response 🤖")
            prompt = gr.Textbox(label="📝 Prompt 📝", value='How many requests by minutes can I do with AI Endpoints?')
            submit = gr.Button("Submit")

    submit.click(chat, inputs=[prompt, system_prompt, temperature, top_p, model_name, model_url, api_key], outputs=chatbot)

demo.launch()</code></pre>



<p>ℹ️ You can find all resources to build and run this application in the <a href="https://github.com/ovh/public-cloud-examples/tree/main/ai/llm-fine-tune/chatbot/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">dedicated folder</a> in the GitHub repository.</p>



<p>Let&#8217;s test with a simple question: &#8220;How many requests by minutes can I do with AI Endpoints?&#8221;.<br>The first test is with <a href="https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Llama-3.2-1B-Instruct</a> from <a href="https://huggingface.co/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"></a><a href="https://huggingface.co/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Hugging Face</a> deployed with <a href="https://docs.vllm.ai/en/v0.7.3/index.html" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">vLLM</a> and <a href="https://www.ovhcloud.com/fr/public-cloud/ai-deploy/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OVHcloud AI Deploy</a>.</p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="474" src="https://blog.ovhcloud.com/wp-content/uploads/2025/07/Screenshot-2025-07-23-at-13.19.16-1024x474.png" alt="Ask for AI Endpoints rate limit with a Llama-3.2-1B-Instruct model" class="wp-image-29448" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/07/Screenshot-2025-07-23-at-13.19.16-1024x474.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/07/Screenshot-2025-07-23-at-13.19.16-300x139.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/07/Screenshot-2025-07-23-at-13.19.16-768x356.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/07/Screenshot-2025-07-23-at-13.19.16-1536x712.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/07/Screenshot-2025-07-23-at-13.19.16-2048x949.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>The response isn’t exactly what we expected. 😅</p>



<p>FYI, according to the official <a href="https://help.ovhcloud.com/csm/fr-public-cloud-ai-endpoints-capabilities?id=kb_article_view&amp;sysparm_article=KB0065424#limitations" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OVHcloud guide</a>, the correct answer is:<br> &#8211; <strong>Anonymous</strong>: 2 requests per minute, per IP and per model.<br> &#8211; <strong>Authenticated with an API access key</strong>: 400 requests per minute, per Public Cloud project and per model.</p>



<h3 class="wp-block-heading"><strong>What’s the best way to feed the model fresh data?</strong></h3>



<p>I bet you already know this—you can use some data during the inference step, using Retrieval Augmented Generation (RAG). You can learn how to set up RAG by reading our <a href="https://blog.ovhcloud.com/rag-chatbot-using-ai-endpoints-and-langchain/" data-wpel-link="internal">past blog post</a>. 📗</p>



<p>Another way to feed a model fresh data by fine-tuning. ✨</p>



<p>In a nutshell,  fine-tuning is when you take a pre-trained machine learning model and train it further on additional data, so it can do a specific job. It’s quicker and easier than building a model yourself, or from scratch. 😉</p>



<p>For this, I’m picking <a href="https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Llama-3.2-1B-Instruct</a> from Hugging Face as the base model.</p>



<p><em>ℹ️ The more parameters your base model has, the more computing power you need. In this case, this model needs between 3GB and 4GB of memory, <em>which is why we’ll be using</em> a <a href="https://www.ovhcloud.com/fr/public-cloud/prices/#5260" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">single L4 GPU</a> (we need </em><a href="https://www.nvidia.com/en-us/data-center/ampere-architecture/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><em>Ampere</em> compatible architecture</a>).</p>



<h3 class="wp-block-heading">When data is your gold</h3>



<p>To train a model, you need enough good-quality data.</p>



<p>The first part is easy; I get the OVHcloud AI Endpoints official documentation in a markdown format from our <a href="https://github.com/ovh/docs/tree/develop/pages/public_cloud/ai_machine_learning" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">public cloud documentation repository</a> (by the way, would you like to contribute?). 📚</p>



<p>First, create a dataset with the right format, Axolotl offers varying <a href="https://docs.axolotl.ai/docs/dataset-formats/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">dataset formats</a>. I prefer the <a href="https://docs.axolotl.ai/docs/dataset-formats/conversation.html" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">conversation format</a> because it’s the easiest for my use case, so I’m going with that. 😉</p>



<pre title="Conersation format dataset" class="wp-block-code"><code lang="json" class="language-json line-numbers"><a href="https://docs.axolotl.ai/docs/dataset-formats/conversation.html#cb1-1" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"></a>{
   "messages": [
     {"role": "...", "content": "..."}, 
     {"role": "...", "content": "..."}, 
     ...]
}</code></pre>



<p>And to create it manually and add the relevant information, I use an LLM to convert the markdown data into a well-formed dataset. 🤖</p>



<p>Here we’re using <a href="https://github.com/ovh/public-cloud-examples/tree/main/ai/llm-fine-tune/ai/llm-fine-tune/dataset/DatasetCreation.py" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Python script </a>🐍:</p>



<pre title="Dataset creation with LLM" class="wp-block-code"><code lang="python" class="language-python line-numbers">import os
from pathlib import Path
from langchain_openai import ChatOpenAI
from langchain.schema import HumanMessage

# 🗺️ Define the JSON schema for the response 🗺️
message_schema = {
    "type": "object",
    "properties": {
        "role": {"type": "string"},
        "content": {"type": "string"}
    },
    "required": ["role", "content"]
}

response_format = {
    "type": "json_object",
    "json_schema": {
        "name": "Messages",
        "description": "A list of messages with role and content",
        "properties": {
            "messages": {
                "type": "array",
                "items": message_schema
            }
        }
    }
}

# ⚙️ Initialize the chat model with AI Endpoints configuration ⚙️
chat_model = ChatOpenAI(
    api_key=os.getenv("OVH_AI_ENDPOINTS_ACCESS_TOKEN"),
    base_url=os.getenv("OVH_AI_ENDPOINTS_MODEL_URL"),
    model_name=os.getenv("OVH_AI_ENDPOINTS_MODEL_NAME"),
    temperature=0.0
)

# 📂 Define the directory path 📂
directory_path = "docs/pages/public_cloud/ai_machine_learning"
directory = Path(directory_path)

# 🗃️ Walk through the directory and its subdirectories 🗃️
for path in directory.rglob("*"):
    # Check if the current path is a directory
    if path.is_dir():
        # Get the name of the subdirectory
        sub_directory = path.name

        # Construct the path to the "guide.en-gb.md" file in the subdirectory
        guide_file_path = path / "guide.en-gb.md"

        # Check if the "guide.en-gb.md" file exists in the subdirectory
        if "endpoints" in sub_directory and guide_file_path.exists():
            print(f"📗 Guide processed: {sub_directory}")
            with open(guide_file_path, 'r', encoding='utf-8') as file:
                raw_data = file.read()

            user_message = HumanMessage(content=f"""
With the markdown following, generate a JSON file composed as follows: a list named "messages" composed of tuples with a key "role" which can have the value "user" when it's the question and "assistant" when it's the response. To split the document, base it on the markdown chapter titles to create the question, seems like a good idea.
Keep the language English.
I don't need to know the code to do it but I want the JSON result file.
For the "user" field, don't just repeat the title but make a real question, for example "What are the requirements for OVHcloud AI Endpoints?"
Be sure to add OVHcloud with AI Endpoints so that it's clear that OVHcloud creates AI Endpoints.
Generate the entire JSON file.
An example of what it should look like: messages [{{"role":"user", "content":"What is AI Endpoints?"}}]
There must always be a question followed by an answer, never two questions or two answers in a row.
The source markdown file:
{raw_data}
""")
            chat_response = chat_model.invoke([user_message], response_format=response_format)
            
            with open(f"./generated/{sub_directory}.json", 'w', encoding='utf-8') as output_file:
                output_file.write(chat_response.content)
                print(f"✅ Dataset generated: ./generated/{sub_directory}.json")

</code></pre>



<p><em>ℹ️ You can find all resources to build and run this application in the <a href="https://github.com/ovh/public-cloud-examples/tree/main/ai/llm-fine-tune/dataset/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">dedicated folder</a> in the GitHub repository.</em></p>



<p>Here’s a sample of the file created as the dataset:</p>



<pre title="Dataset example" class="wp-block-code"><code lang="json" class="language-json line-numbers">[
  {
    "role": "user",
    "content": "What are the requirements for using OVHcloud AI Endpoints?"
  },
  {
    "role": "assistant",
    "content": "To use OVHcloud AI Endpoints, you need the following: \n1. A Public Cloud project in your OVHcloud account \n2. A payment method defined on your Public Cloud project. Access keys created from Public Cloud projects in Discovery mode (without a payment method) cannot use the service."
  },
  {
    "role": "user",
    "content": "What are the rate limits for using OVHcloud AI Endpoints?"
  },
  {
    "role": "assistant",
    "content": "The rate limits for OVHcloud AI Endpoints are as follows:\n- Anonymous: 2 requests per minute, per IP and per model.\n- Authenticated with an API access key: 400 requests per minute, per PCI project and per model."
  }, 
   ...]
}</code></pre>



<p>As for quantity, it’s a bit tricky. How can we generate the right data for training without lowering data quality?</p>



<p>To do this, I’ve created synthetic data using an LLM to create it from the original data. The trick is to generate more data on the same topic by rephrasing it differently but with the same idea.</p>



<p>Here is the <a href="https://github.com/ovh/public-cloud-examples/tree/main/ai/llm-fine-tune/dataset/DatasetAugmentation.py" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Python script</a> 🐍 to do the data augmentation:</p>



<pre title="Data augmentation" class="wp-block-code"><code lang="python" class="language-python line-numbers">import os
import json
import uuid
from pathlib import Path
from langchain_openai import ChatOpenAI
from langchain.schema import HumanMessage
from jsonschema import validate, ValidationError

# 🗺️ Define the JSON schema for the response 🗺️
message_schema = {
    "type": "object",
    "properties": {
        "role": {"type": "string"},
        "content": {"type": "string"}
    },
    "required": ["role", "content"]
}

response_format = {
    "type": "json_object",
    "json_schema": {
        "name": "Messages",
        "description": "A list of messages with role and content",
        "properties": {
            "messages": {
                "type": "array",
                "items": message_schema
            }
        }
    }
}

# ✅ JSON validity verification ❌
def is_valid(json_data):
    """
    Test the validity of the JSON data against the schema.
    Argument:
        json_data (dict): The JSON data to validate.  
    Raises:
        ValidationError: If the JSON data does not conform to the specified schema.  
    """
    try:
        validate(instance=json_data, schema=response_format["json_schema"])
        return True
    except ValidationError as e:
        print(f"❌ Validation error: {e}")
        return False

# ⚙️ Initialize the chat model with AI Endpoints configuration ⚙️
chat_model = ChatOpenAI(
    api_key=os.getenv("OVH_AI_ENDPOINTS_ACCESS_TOKEN"),
    base_url=os.getenv("OVH_AI_ENDPOINTS_MODEL_URL"),
    model_name=os.getenv("OVH_AI_ENDPOINTS_MODEL_NAME"),
    temperature=0.0
)

# 📂 Define the directory path 📂
directory_path = "generated"
print(f"📂 Directory path: {directory_path}")
directory = Path(directory_path)

# 🗃️ Walk through the directory and its subdirectories 🗃️
for path in directory.rglob("*"):
    print(f"📜 Processing file: {path}")
    # Check if the current path is a valid file
    if path.is_file() and path.name.__contains__ ("endpoints"):
        # Read the raw data from the file
        with open(path, 'r', encoding='utf-8') as file:
            raw_data = file.read()

        try:
            json_data = json.loads(raw_data)
        except json.JSONDecodeError:
            print(f"❌ Failed to decode JSON from file: {path.name}")
            continue

        if not is_valid(json_data):
            print(f"❌ Dataset non valide: {path.name}")
            continue
        print(f"✅ Input dataset valide: {path.name}")

        user_message = HumanMessage(content=f"""
        Given the following JSON, generate a similar JSON file where you paraphrase each question in the content attribute
        (when the role attribute is user) and also paraphrase the value of the response to the question stored in the content attribute
        when the role attribute is assistant.
        The objective is to create synthetic datasets based on existing datasets.
        I do not need to know the code to do this, but I want the resulting JSON file.
        It is important that the term OVHcloud is present as much as possible, especially when the terms AI Endpoints are mentioned
        either in the question or in the response.
        There must always be a question followed by an answer, never two questions or two answers in a row.
        It is IMPERATIVE to keep the language in English.
        The source JSON file:
        {raw_data}
        """)

        chat_response = chat_model.invoke([user_message], response_format=response_format)

        output = chat_response.content

        # Replace unauthorized characters
        output = output.replace("\\t", " ")

        generated_file_name = f"{uuid.uuid4()}_{path.name}"
        with open(f"./generated/synthetic/{generated_file_name}", 'w', encoding='utf-8') as output_file:
            output_file.write(output)

        if not is_valid(json.loads(output)):
            print(f"❌ ERROR: File {generated_file_name} is not valid")
        else:
            print(f"✅ Successfully generated file: {generated_file_name}")</code></pre>



<p><em>ℹ️ Again, you can find all resources to build and run this application in the <a href="https://github.com/ovh/public-cloud-examples/tree/main/ai/llm-fine-tune/dataset/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">dedicated folder</a> in the GitHub repository.</em></p>



<h3 class="wp-block-heading">Fine-tune the model</h3>



<p>We now have enough training data, let’s fine-tune!</p>



<p><em><em>ℹ️ It’s hard to say exactly how much data is needed to train a model properly. It all depends on the model, the data, the topic, and so on.</em><br><em>The only option is to test and adapt. 🔁</em>.</em></p>



<p>I use <a href="https://jupyter.org/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Jupyter notebook</a>, created with <a href="https://www.ovhcloud.com/fr/public-cloud/ai-notebooks/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OVHcloud AI Notebooks</a>, to fine-tune my models.</p>



<pre title="Jupyter notebook creation" class="wp-block-code"><code lang="bash" class="language-bash line-numbers">ovhai notebook run conda jupyterlab \
	--name axolto-llm-fine-tune \
	--framework-version 25.3.1-py312-cudadevel128-gpu \
	--flavor l4-1-gpu \
	--gpu 1 \
	--envvar HF_TOKEN=$MY_HF_TOKEN \
	--envvar WANDB_TOKEN=$MY_WANDB_TOKEN \
	--unsecure-http</code></pre>



<p><em><em>ℹ️ For more details on how to create Jupyter notebook with <a href="https://www.ovhcloud.com/fr/public-cloud/ai-notebooks/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Notebooks</a>, read the <a href="https://help.ovhcloud.com/csm/fr-documentation-public-cloud-ai-and-machine-learning-ai-notebooks?id=kb_browse_cat&amp;kb_id=574a8325551974502d4c6e78b7421938&amp;kb_category=c8441955f49801102d4ca4d466a7fd58&amp;spa=1" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">documentation</a>.</em></em></p>



<p class="has-text-align-left">⚙️ The <strong>HF_TOKEN</strong> environment variable is used to pull and push the trained model to <a href="https://huggingface.co/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Hugging Face</a> <br>⚙️ The <strong>WANDB_TOKEN</strong> environment variable helps you track training quality in <a href="https://wandb.ai" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Weight &amp; Biases</a></p>



<p>Once the notebook is set up, you can start coding the model’s training with Axolotl.</p>



<p>To start, install Axolotl CLI and its dependencies. 🧰</p>



<pre title="Axolot installation" class="wp-block-code"><code lang="bash" class="language-bash"># Axolotl need these dependencies
!pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126

# Axolotl CLI installation
!pip install --no-build-isolation axolotl[flash-attn,deepspeed]

# Verify Axolotl version and installation
!axolotl --version</code></pre>



<p></p>



<p>The next step is to configure the Hugging Face CLI. 🤗</p>



<pre title="Hugging Face configurartion" class="wp-block-code"><code lang="bash" class="language-bash line-numbers">!pip install -U "huggingface_hub[cli]"

!huggingface-cli --version</code></pre>



<pre title="Hugging Face hub authentication " class="wp-block-code"><code lang="python" class="language-python line-numbers">import os
from huggingface_hub import login

login(os.getenv("HF_TOKEN"))</code></pre>



<p></p>



<p>Then, configure your Weight &amp; Biases access.</p>



<pre title="Weight &amp; Biases configuration" class="wp-block-code"><code lang="bash" class="language-bash line-numbers">pip install wandb

!wandb login $WANDB_TOKEN</code></pre>



<p></p>



<p>Once all that’s done, it’s time to train the model.</p>



<pre title="Train the model" class="wp-block-code"><code lang="bash" class="language-bash line-numbers">!axolotl train /workspace/instruct-lora-1b-ai-endpoints.yml</code></pre>



<p>You only need to type this one line to train it, how cool is that? 😎</p>



<p><em><em>ℹ️ With one L4 card, 10 epochs, and roughly 2000 questions and answers in the datasets, it ran for about 90 minutes.</em></em></p>



<p>Basically, the command line needs just one parameter: the Axolotl config file. You can find everything you need to set up Axolotl in the <a href="https://docs.axolotl.ai/docs/config-reference.html" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">official documentation</a>.📜<br>Here’s what the model was trained on:</p>



<pre title="Axolotl configuration" class="wp-block-code"><code lang="yaml" class="language-yaml">base_model: meta-llama/Llama-3.2-1B-Instruct
# optionally might have model_type or tokenizer_type
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer
# Automatically upload checkpoint and final model to HF
# hub_model_id: username/custom_model_name

load_in_8bit: true
load_in_4bit: false

datasets:
  - path: /workspace/ai-endpoints-doc/
    type: chat_template
      
    field_messages: messages
    message_property_mappings:
      role: role
      content: content
    roles:
      user:
        - user
      assistant:
        - assistant

dataset_prepared_path:
val_set_size: 0.01
output_dir: /workspace/out/llama-3.2-1b-ai-endpoints

sequence_len: 4096
sample_packing: false
pad_to_sequence_len: true

adapter: lora
lora_model_dir:
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true

wandb_project: ai_endpoints_training
wandb_entity: &lt;user id&gt;
wandb_mode: 
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 10
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002

bf16: auto
tf32: false

gradient_checkpointing: true
resume_from_checkpoint:
logging_steps: 1
flash_attention: true

warmup_steps: 10
evals_per_epoch: 4
saves_per_epoch: 1
weight_decay: 0.0
special_tokens:
   pad_token: &lt;|end_of_text|&gt;
</code></pre>



<p>🔎 Some key points (only the fields modified from the <a href="https://github.com/axolotl-ai-cloud/axolotl/blob/main/examples/llama-3/instruct-lora-8b.yml" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">given templates</a>):<br>&#8211; <strong>base_model: meta-llama/Llama-3.2-1B-Instruct</strong>: before you download the base model from Hugging Face, be sure to accept the licence’s <a>terms of use </a><a href="#_msocom_1">[JD1]</a> <br>&#8211; <strong>path: /workspace/ai-endpoints-doc/</strong>: folder to upload the generated dataset<br>&#8211; <strong>wandb_project: ai_endpoints_training</strong> &amp; <strong>wandb_entity: &lt;user id></strong>: to configure weights and biases<br>&#8211; <strong>num_epochs: 10</strong>: number of epochs for the training<a id="_msocom_1"></a></p>



<p>After the training, you can test the new model 🤖:</p>



<pre title="New model testing" class="wp-block-code"><code lang="bash" class="language-bash line-numbers">!echo "What is OVHcloud AI Endpoints and how to use it?" | axolotl inference /workspace/instruct-lora-1b-ai-endpoints.yml --lora-model-dir="/workspace/out/llama-3.2-1b-ai-endpoints" </code></pre>



<p></p>



<p>When you’re satisfied with the result, merge the weights and upload the new model to Hugging Face:</p>



<pre title="Push the model" class="wp-block-code"><code lang="bash" class="language-bash line-numbers">!axolotl merge-lora /workspace/instruct-lora-1b-ai-endpoints.yml

%cd /workspace/out/llama-3.2-1b-ai-endpoints/merged

!huggingface-cli upload wildagsx/Llama-3.2-1B-Instruct-AI-Endpoints-v0.6 .</code></pre>



<p>ℹ️ <em>You can find all resources to create and run the notebook in the <a href="https://github.com/ovh/public-cloud-examples/tree/main/ai/llm-fine-tune/notebook/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">dedicated folder</a> in the GitHub repository.</em></p>



<h3 class="wp-block-heading">Test the new model</h3>



<p>Once you have pushed your model in Hugging Face you can, again, deploy it with vLLM and AI Deploy to test it ⚡️.</p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="474" src="https://blog.ovhcloud.com/wp-content/uploads/2025/07/Screenshot-2025-07-23-at-14.58.02-1024x474.png" alt="" class="wp-image-29459" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/07/Screenshot-2025-07-23-at-14.58.02-1024x474.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/07/Screenshot-2025-07-23-at-14.58.02-300x139.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/07/Screenshot-2025-07-23-at-14.58.02-768x356.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/07/Screenshot-2025-07-23-at-14.58.02-1536x712.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/07/Screenshot-2025-07-23-at-14.58.02-2048x949.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Ta-da! 🥳 Our little Llama model is now an OVHcloud AI Endpoints pro!</p>



<p></p>



<p>Feel free to try out OVHcloud Machine Learning products, and share your thoughts on our Discord server (<em><a href="https://discord.gg/ovhcloud" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">https://discord.gg/ovhcloud</a></em>), see you soon! 👋</p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Ffine-tune-an-llm-with-axolotl-and-ovhcloud-machine-learning-services%2F&amp;action_name=Fine%20tune%20an%20LLM%20with%20Axolotl%20and%20OVHcloud%20Machine%20Learning%20Services&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Reference Architecture: deploying the Mistral Large 123B model in a sovereign environment with OVHcloud</title>
		<link>https://blog.ovhcloud.com/reference-architecture-deploy-mistral-large-model-in-sovereign-environment-ovhcloud/</link>
		
		<dc:creator><![CDATA[Eléa Petton]]></dc:creator>
		<pubDate>Wed, 18 Jun 2025 12:45:51 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[AI Deploy]]></category>
		<category><![CDATA[AI Training]]></category>
		<category><![CDATA[Machine learning]]></category>
		<category><![CDATA[Mistral]]></category>
		<category><![CDATA[OVHcloud]]></category>
		<category><![CDATA[Public Cloud]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=29186</guid>

					<description><![CDATA[Are you ready to think bigger with the Mistral Large model 🚀 ? As Artificial Intelligence (AI) becomes a strategic pillar for both enterprises and public institutions, data sovereignty and infrastructure control have become essential. Deploying advanced large language models (LLMs) like Mistral Large, under a commercial license, requires a secure, high-performance environment that complies [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Freference-architecture-deploy-mistral-large-model-in-sovereign-environment-ovhcloud%2F&amp;action_name=Reference%20Architecture%3A%C2%A0deploying%20the%20Mistral%20Large%20123B%20model%20in%20a%20sovereign%20environment%20with%20OVHcloud&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p><em><strong>Are you ready to think bigger with the Mistral Large model 🚀 ?</strong></em></p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="461" src="https://blog.ovhcloud.com/wp-content/uploads/2025/06/mistral_large_archi_ref-1024x461.png" alt="" class="wp-image-29249" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/06/mistral_large_archi_ref-1024x461.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/06/mistral_large_archi_ref-300x135.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/06/mistral_large_archi_ref-768x346.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/06/mistral_large_archi_ref-1536x691.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/06/mistral_large_archi_ref.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption"><em>Mistral Large model deployed on OVHcloud infrastructure<br></em></figcaption></figure>



<p>As Artificial Intelligence (<strong>AI</strong>) becomes a strategic pillar for both enterprises and public institutions, <strong>data sovereignty</strong> and <strong>infrastructure control</strong> have become essential. Deploying advanced large language models (LLMs) like <strong>Mistral Large</strong>, under a commercial license, requires a secure, high-performance environment that complies with <strong>European data regulations</strong>.</p>



<p><strong>OVHcloud Machine Learning Services</strong> offer a trusted solution for deploying AI models in a <strong>fully sovereign cloud environment</strong> — hosted in Europe, under <strong>EU jurisdiction</strong>, and fully <strong>GDPR-compliant</strong>.</p>



<p>This <strong>Reference Architecture</strong> will show you how to:</p>



<ul class="wp-block-list">
<li>Access Mistral AI registry using your own license</li>



<li>Download the Mistral Large 123B model automatically using <strong>AI Training</strong></li>



<li>Store the model into a dedicated bucket with <strong>OVHcloud Object Storage</strong></li>



<li>Deploy a production-ready inference API for <strong>Mistral Large</strong> using <strong>AI Deploy</strong> </li>
</ul>



<h2 class="wp-block-heading">Context</h2>



<h3 class="wp-block-heading">Mistral Large model</h3>



<p>The <strong>Mistral Large</strong> model is a <strong>state-of-the-art (LLM)</strong> developed by <strong><a href="https://mistral.ai/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Mistral AI</a>,</strong> a French AI company. It&#8217;s designed to compete with top-tier models like GPT-4, Claude, while emphasizing performance and efficiency.</p>



<p>This is a model with <strong>123 billion</strong> parameters. <strong>Mistral AI</strong> recommends deploying this model in FP8 with 4 H100 GPUs. For more information, refer to <a href="https://help.mistral.ai/en/articles/235545-mistral-models" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Mistral documentation</a>.</p>



<p>This model requires the use of a <strong>commercial licence</strong>. To do this, you need to create an account on <a href="https://console.mistral.ai/" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">La Plateforme</a> via the Mistral AI console (<strong>console.mistral.ai</strong>).</p>



<h3 class="wp-block-heading">AI Training </h3>



<p><strong>OVHcloud AI Training</strong> is a fully managed platform designed to help you <strong>train, tune</strong> Machine Learning (ML), Deep Learning (DL), and Large Language Models (LLMs) efficiently. Whether you&#8217;re working on computer vision, NLP, or tabular data, this solution lets you launch training jobs on high-performance GPUs in seconds.</p>



<p><strong>What are the key benefits?</strong></p>



<ul class="wp-block-list">
<li><strong>Easy to use</strong>: launch processing or training jobs in one CLI command or a few clicks using your own Docker image</li>



<li><strong>High-performance computing</strong>: access GPUs like H100, A100, V100S, L40S, and L4 as of June 2025 &#8211; new references are added regularly</li>



<li><strong>Cost-efficient</strong>:<strong> </strong>pay-per-minute billing with no upfront commitment. You only pay for compute time used, with precise control over resources thanks to automatic job stop and synchronisation</li>
</ul>



<p><strong>💡 Why do we need AI Training? </strong>To download the Mistral Large model automatically and efficiently, using a single command to launch the job.</p>



<h3 class="wp-block-heading">AI Deploy</h3>



<p>OVHcloud AI Deploy is a<strong>&nbsp;Container as a Service</strong>&nbsp;(CaaS) platform designed to help you deploy, manage and scale AI models. It provides a solution that allows you to optimally deploy your applications / APIs based on Machine Learning (ML), Deep Learning (DL) or LLMs.</p>



<p><strong>The key benefits are:</strong></p>



<ul class="wp-block-list">
<li><strong>Easy to use:</strong>&nbsp;bring your own custom Docker image and deploy it in a command line or a few clicks surely</li>



<li><strong>High-performance computing:</strong>&nbsp;a complete range of GPUs available (H100, A100, V100S, L40S and L4)</li>



<li><strong>Scalability and flexibility:</strong>&nbsp;supports automatic scaling, allowing your model to effectively handle fluctuating workloads</li>



<li><strong>Cost-efficient:</strong>&nbsp;billing per minute, no surcharges</li>
</ul>



<p>✅ To go further, some prerequisites must be checked!</p>



<h2 class="wp-block-heading">Overview of the Mistral Large deployment architecture</h2>



<p>Here is how will be deployed <strong>Mistral Large 123B</strong>:</p>



<ol class="wp-block-list">
<li>Install the <strong>ovhai CLI</strong></li>



<li>Create a bucket for <strong>model storage</strong></li>



<li>Retrieve the <strong>license information</strong> from <a href="https://console.mistral.ai/on-premise/licenses" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Mistral Console</a></li>



<li>Configure and set up the<strong> environment</strong></li>



<li>Download the <strong>Mistral Large model weights</strong></li>



<li>Deploy the <strong>Mistral Large service</strong></li>



<li>Test it with simple request and <strong>advanced usage</strong> thanks to LangChain</li>
</ol>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="173" src="https://blog.ovhcloud.com/wp-content/uploads/2025/06/mistral_large_archi_process-1024x173.png" alt="" class="wp-image-29251" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/06/mistral_large_archi_process-1024x173.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/06/mistral_large_archi_process-300x51.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/06/mistral_large_archi_process-768x130.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/06/mistral_large_archi_process-1536x259.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/06/mistral_large_archi_process.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Let’s go for the set up and deployment of your own Mistral Large service!</p>



<h2 class="wp-block-heading">Prerequisites</h2>



<p>Before you begin, ensure you have:</p>



<ul class="wp-block-list">
<li>A <strong><a href="https://console.mistral.ai/on-premise/licenses" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Mistral AI license</a></strong> to access to the <strong>Mistral Large model</strong></li>



<li>An&nbsp;<strong>OVHcloud Public Cloud</strong>&nbsp;account</li>



<li>An&nbsp;<strong>OpenStack user</strong>&nbsp;with the following roles:
<ul class="wp-block-list">
<li>Administrator</li>



<li>AI Training Operator</li>



<li>Object Storage Operator</li>
</ul>
</li>
</ul>



<p><strong>🚀 Having all the ingredients for our recipe, it’s time to </strong>deploy the Mistral Large model on 4 H100<strong>!</strong></p>



<h2 class="wp-block-heading">Architecture guide:&nbsp;Mistral Large on OVHcloud infrastructure</h2>



<p>Let’s go for the set up and deployment of the <strong>Mistral Large</strong> model!</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p><strong>✅ Note</strong></p>
<cite><strong>In this example, the <mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>Mistral Large 25.02</code></mark> is used. Choose the mistral model under the licence of your choice and repeat the same steps, adapting the model name and versions.</strong></cite></blockquote>



<p>⚙️<em>&nbsp;Also consider that all of the following steps can be automated using OVHcloud APIs!</em></p>



<h3 class="wp-block-heading">Step 1 &#8211; Install&nbsp;<code>ovhai</code>&nbsp;CLI</h3>



<p>If the <code><strong>ovhai</strong></code> CLI is not install, start by setting up your CLI environment.</p>



<pre class="wp-block-code"><code class="">curl https://cli.gra.ai.cloud.ovh.net/install.sh | bash</code></pre>



<p>Secondly, login using your&nbsp;<strong>OpenStack credentials</strong>.</p>



<pre class="wp-block-code"><code class="">ovhai login -u &lt;openstack-username&gt; -p &lt;openstack-password&gt;</code></pre>



<p>Now, it’s time to create your bucket inside OVHcloud Object Storage!</p>



<h3 class="wp-block-heading">Step 2 – Provision Object Storage</h3>



<ol class="wp-block-list">
<li>Go to&nbsp;<strong>Public Cloud &gt; Storage &gt; Object Storage</strong>&nbsp;in the OVHcloud Control Panel.</li>



<li>Create a&nbsp;<strong>datastore</strong>&nbsp;and a new&nbsp;<strong>S3 bucket</strong>&nbsp;(e.g.,&nbsp;<strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>s3-mistral-large-model</code>)</mark></strong>.</li>



<li>Register the datastore with the&nbsp;<code>ovhai</code>&nbsp;CLI:</li>
</ol>



<pre class="wp-block-code"><code class="">ovhai datastore add s3 &lt;ALIAS&gt; https://s3.gra.perf.cloud.ovh.net/ gra &lt;my-access-key&gt; &lt;my-secret-key&gt; --store-credentials-locally</code></pre>



<p>💡 <em>Note that, for this use case, we recommend the <strong>High Performance Object Storage</strong> range using <code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><strong>https://s3.gra.perf.cloud.ovh.net/</strong></mark></code> instead of <code>https://s3.gra.io.cloud.ovh.net/</code></em></p>



<h3 class="wp-block-heading">Step 3 &#8211; Access the Mistral AI registry</h3>



<p><em>⚠️ Please note that you must have a <strong>licence for the Mistral Large model </strong>to be able to carry out the following steps.</em></p>



<ul class="wp-block-list">
<li>Go to the Mistral AI platform: <strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">https://console.mistral.ai/home</mark></strong></li>



<li>Retrieve <strong>credentials</strong> and the <strong>license key</strong> from the Mistral console:<strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"> https://console.mistral.ai/on-premise/licenses</mark></strong></li>



<li>Authenticate to the Mistral AI Docker registry:</li>
</ul>



<pre class="wp-block-code"><code class="">docker login &lt;mistral-ai-registry&gt; --username $DOCKER_USERNAME --password $DOCKER_PASSWORD</code></pre>



<ul class="wp-block-list">
<li>Add the private registry to the config using the <code><strong>ovhai</strong></code> CLI:</li>
</ul>



<pre class="wp-block-code"><code class="">ovhai registry add &lt;mistral-ai-registry&gt;</code></pre>



<ul class="wp-block-list">
<li>Check that it is present in the list:</li>
</ul>



<pre class="wp-block-code"><code class="">ovhai registry list</code></pre>



<h3 class="wp-block-heading">Step 4 &#8211; Define environment variables</h3>



<p>The next step is to define an<mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"> <strong><code>.env</code></strong></mark> file that will list all the environment variables required to download and deploy the Mistral Large model.</p>



<ul class="wp-block-list">
<li>Create the <mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><strong><code>.env</code></strong></mark> file, enter the following information:</li>
</ul>



<pre class="wp-block-code"><code class=""><code>SERVED_MODEL=mistral-large-2502
RECIPES_VERSION=v0.0.76TP_SIZE=4
LICENSE_KEY=&lt;your-mistral-license-key&gt;
DOCKER_IMAGE_INFERENCE_ENGINE=&lt;<span style="background-color: initial; font-family: inherit; font-size: inherit; font-weight: inherit;">mistral-inference-server</span>-docker-image&gt;
DOCKER_IMAGE_MISTRAL_UTILS=<span style="background-color: rgba(248, 248, 242, 0.2); font-family: inherit; font-size: inherit; font-weight: inherit;">&lt;</span><span style="font-family: inherit; font-size: inherit; font-weight: inherit; background-color: initial;">mistral-utils</span><span style="background-color: rgba(248, 248, 242, 0.2); font-family: inherit; font-size: inherit; font-weight: inherit;">-docker-image&gt;</span></code></code></pre>



<ul class="wp-block-list">
<li>Then, create a script to load theses environment variables easily. Name it <code><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">load_env.sh</mark></strong></code>:</li>
</ul>



<pre class="wp-block-code"><code class="">#!/bin/bash

# Vérifie si le fichier .env existe
if [ ! -f .env ]; then
  echo "Error: .env not found"
  exit 1
fi

# Exporter toutes les variables du .env
export $(grep -v '^#' .env | xargs)

echo "Environment variables are loaded from .env"</code></pre>



<ul class="wp-block-list">
<li>Now, launch this script :</li>
</ul>



<pre class="wp-block-code"><code class="">source load_env.sh</code></pre>



<p>✅ You have everything you need to start the implementation!</p>



<h3 class="wp-block-heading">Step 5 &#8211; Download Mistral Large model weights</h3>



<p>The aim here is to download the model and its artefacts into the S3 bucket created earlier.</p>



<p>To achieve this, you can launch a download job that will run automatically with AI Training.</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p><strong> 💡 Here&#8217;s a tip! </strong></p>
<cite><strong>Note that here you are not using AI Training to train models, but as an easy-to-use Container as a Service solution. With a single command line, you can launch a one-shot download of the Mistral Large model with automatic synchronisation to Object Storage.</strong></cite></blockquote>



<ul class="wp-block-list">
<li>Launch the <strong>AI Training</strong> download job by attaching the object container:</li>
</ul>



<pre class="wp-block-code"><code class="">ovhai job run --name DOWNLOAD_MISTRAL_LARGE_123B \
              --cpu 12 \
              --volume s3-mistral-large-model@&lt;ALIAS&gt;/:/opt/ml/model:RW \
              -e RECIPES_VERSION=$RECIPES_VERSION \
              $<span style="background-color: initial; font-family: inherit; font-size: inherit; font-weight: inherit;">DOCKER_IMAGE_MISTRAL_UTILS</span> \
                -- bash -c "cd /app/mistral-rclone &amp;&amp; \ 
                  poetry run python mistral-rclone.py \
                  --license-key $LICENSE_KEY \
                  --download-model $SERVED_MODEL"</code></pre>



<p><em>Full command explained:</em></p>



<ul class="wp-block-list">
<li><code>ovhai job run</code></li>
</ul>



<p>This is the core command to&nbsp;<strong>run a job</strong>&nbsp;using the&nbsp;<strong>OVHcloud AI Training</strong>&nbsp;platform.</p>



<ul class="wp-block-list">
<li><code>--name DOWNLOAD_MISTRAL_LARGE_123B</code></li>
</ul>



<p>Sets a&nbsp;<strong>custom name</strong>&nbsp;for the job. For example,&nbsp;<code><code>DOWNLOAD_MISTRAL_LARGE_123B</code></code>.</p>



<ul class="wp-block-list">
<li><code>--cpu&nbsp;12</code></li>
</ul>



<p>Allocates&nbsp;<strong>12 CPU</strong>&nbsp;for the job.</p>



<ul class="wp-block-list">
<li><code>--volume s3-mistral-large-model@&lt;ALIAS&gt;/:/opt/ml/model:RW</code></li>
</ul>



<p>This mounts your&nbsp;<strong>OVHcloud Object Storage volume</strong>&nbsp;into the job’s file system:<br>–&nbsp;<code>s3-mistral-large-model@&lt;ALIAS&gt;/</code>: refers to your&nbsp;<strong>S3 bucket volume</strong>&nbsp;from the OVHcloud Object Storage<br>–&nbsp;<code>:<code>/opt/ml/model</code></code>: mounts the volume into the container under&nbsp;<code><code>/opt/ml/model</code></code><br>–&nbsp;<code>RW</code>: enables&nbsp;<strong>Read/Write</strong>&nbsp;permissions</p>



<ul class="wp-block-list">
<li><code>-e RECIPES_VERSION=$RECIPES_VERSION</code></li>
</ul>



<p>This is from your <strong>environment variables</strong>&nbsp;defined previously.</p>



<ul class="wp-block-list">
<li><code>$<span style="background-color: initial; font-family: inherit; font-size: inherit; font-weight: inherit;">DOCKER_IMAGE_MISTRAL_UTILS</span></code></li>
</ul>



<p>This is the<strong>&nbsp;Mistral Large utils Docker image</strong>&nbsp;you are running inside the job.</p>



<ul class="wp-block-list">
<li><code>-- bash -c "cd /app/mistral-rclone &amp;&amp; \</code><br><code>               poetry run python mistral-rclone.py \</code><br><code>                   --license-key $LICENSE_KEY \</code><br><code>                   --download-model $SERVED_MODEL"</code></li>
</ul>



<p>Refers to the specific command to <strong>launch the model download</strong>.</p>



<p><em>Note that synchronisation with Object Storage will be <strong>automatic at the end of the AI Training job</strong>.</em></p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>⚠️ <strong>WARNING!</strong></p>
<cite><strong>Wait for the job to go to <code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">DONE</mark></code> before proceeding to the next step</strong>.</cite></blockquote>



<ul class="wp-block-list">
<li>Check that the various elements are present in the bucket:</li>
</ul>



<pre class="wp-block-code"><code class="">ovhai bucket object list s3-mistral-large-model@&lt;ALIAS&gt;</code></pre>



<p>The bucket must be organized and split into 4 different folders:</p>



<ul class="wp-block-list">
<li>grammars</li>



<li>recipes</li>



<li>tokenizers</li>



<li>weights</li>
</ul>



<p>Note that a total of 6 elements must be present.</p>



<p>🚀 It&#8217;s all there? So let&#8217;s move on to the <strong>deployment of the Mistral Large model</strong>!</p>



<h3 class="wp-block-heading">Step 6 &#8211; Deploy Mistral Large service</h3>



<p>To deploy the Mistral Large 123B model using the previously downloaded weights, you will use OVHcloud&#8217;s <strong>AI Deploy </strong>product.</p>



<p>But first you need to create an API key that will allow you to consume the model and query it, in particular using Open AI compatibility.</p>



<ul class="wp-block-list">
<li>Creation of an access token:</li>
</ul>



<pre class="wp-block-code"><code class="">ovhai token create --role read mistral_large=api_key_reader</code></pre>



<ul class="wp-block-list">
<li>Export this token as an environment variable:</li>
</ul>



<pre class="wp-block-code"><code class="">export MY_OVHAI_MISTRAL_LARGE_TOKEN=&lt;your_ovh_access_token_value&gt;</code></pre>



<ul class="wp-block-list">
<li>Launch the <strong>Mistral Large service</strong> with <strong>AI Deploy </strong>by running the following command:</li>
</ul>



<pre class="wp-block-code"><code class="">ovhai app run --name DEPLOY_MISTRAL_LARGE_123B \
              --gpu 4 \
              --flavor h100-1-gpu \
              --default-http-port 5000 \
              --label mistral_large=api_key_reader \
              -e SERVED_MODEL=$SERVED_MODEL \
              -e RECIPES_VERSION=$RECIPES_VERSION \
              -e TP_SIZE=$TP_SIZE \
              --volume s3-mistral-large-model@&lt;ALIAS&gt;/:/opt/ml/model:RW \
              --volume standalone:/tmp:RW \
              --volume standalone:/workspace:RW \
              $<span style="background-color: initial; font-family: inherit; font-size: inherit; font-weight: inherit;">DOCKER_IMAGE_INFERENCE_ENGINE</span></code></pre>



<p><em>Full command explained:</em></p>



<ul class="wp-block-list">
<li><code>ovhai app run</code></li>
</ul>



<p>This is the core command to&nbsp;<strong>run an app / API</strong>&nbsp;using the&nbsp;<strong>OVHcloud AI Deploy</strong>&nbsp;platform.</p>



<ul class="wp-block-list">
<li><code>--name DEPLOY_MISTRAL_LARGE_123B</code></li>
</ul>



<p>Sets a&nbsp;<strong>custom name</strong>&nbsp;for the app. For example,&nbsp;<code>DEPLOY_MISTRAL_LARGE_123B</code>.</p>



<ul class="wp-block-list">
<li><code>--default-http-port 5000</code></li>
</ul>



<p>Exposes&nbsp;<strong>port 5000</strong>&nbsp;as the default HTTP endpoint.</p>



<ul class="wp-block-list">
<li><code>--gpu&nbsp;</code>4</li>
</ul>



<p>Allocates&nbsp;<strong>4 GPUs</strong>&nbsp;for the app.</p>



<ul class="wp-block-list">
<li><code>--flavor h100-1-gpu</code></li>
</ul>



<p>Chooses&nbsp;<strong>H100 GPUs</strong>&nbsp;for the app.</p>



<ul class="wp-block-list">
<li><code>--volume s3-mistral-large-model@&lt;ALIAS&gt;/:/opt/ml/model:RW</code></li>
</ul>



<p>This mounts your&nbsp;<strong>OVHcloud Object Storage volume</strong>&nbsp;into the job’s file system:<br>–&nbsp;<code>s3-mistral-large-model@&lt;ALIAS&gt;/</code>: refers to your&nbsp;<strong>S3 bucket volume</strong>&nbsp;from the OVHcloud Object Storage<br>–&nbsp;<code>:<code>/opt/ml/model</code></code>: mounts the volume into the container under&nbsp;<code><code>/opt/ml/model</code></code><br>–&nbsp;<code>RW</code>: enables&nbsp;<strong>Read/Write</strong>&nbsp;permissions</p>



<ul class="wp-block-list">
<li><code>--label mistral_large=api_key_reader</code></li>
</ul>



<p>Means that the access is restricted to your token</p>



<ul class="wp-block-list">
<li><code>-e SERVED_MODEL=$SERVED_MODEL</code></li>



<li><code>-e RECIPES_VERSION=$RECIPES_VERSION</code></li>



<li><code>-e TP_SIZE=$TP_SIZE</code></li>
</ul>



<p>These are&nbsp;<strong>environment variables</strong>&nbsp;defined previously.</p>



<ul class="wp-block-list">
<li><code>-v standalone:/tmp:rw</code></li>



<li><code>-v standalone:/workspace:rw</code></li>
</ul>



<p>Mounts&nbsp;<strong>two persistent storage volumes</strong>:<br>&#8211; <code>/tmp</code><br>&#8211; <code>/workspace</code>&nbsp;→ Main working directory</p>



<ul class="wp-block-list">
<li><code>$<span style="background-color: initial; font-family: inherit; font-size: inherit; font-weight: inherit;">DOCKER_IMAGE_INFERENCE_ENGINE</span></code></li>
</ul>



<p>This is the<strong>&nbsp;Mistral Large inference Docker image</strong>&nbsp;you are running inside the app.</p>



<p><em>It&nbsp;may&nbsp;take&nbsp;a&nbsp;few&nbsp;minutes&nbsp;for&nbsp;the&nbsp;resources&nbsp;to&nbsp;be&nbsp;allocated&nbsp;and&nbsp;for&nbsp;the&nbsp;<strong>Docker image</strong>&nbsp;to&nbsp;be&nbsp;pulled.&nbsp;</em></p>



<p>To&nbsp;check&nbsp;the&nbsp;progress&nbsp;and&nbsp;get&nbsp;additional&nbsp;information&nbsp;about&nbsp;the&nbsp;<strong>AI&nbsp;deploy&nbsp;app</strong>,&nbsp;run&nbsp;the&nbsp;following&nbsp;command:</p>



<pre class="wp-block-code"><code class="">ovhai app get &lt;ai_deploy_mistral_app_id&gt;</code></pre>



<p>Once in <strong><code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">RUNNING</mark></code></strong> status, the model will be loaded. To check that the load was successful, you can check the container logs:</p>



<pre class="wp-block-code"><code class="">ovhai app logs &lt;ai_deploy_mistral_app_id&gt;</code></pre>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>⚠️ <strong>WARNING!</strong></p>
<cite><strong>To&nbsp;consume&nbsp;the&nbsp;service,&nbsp;you&nbsp;must&nbsp;wait&nbsp;for&nbsp;the&nbsp;app&nbsp;to&nbsp;go&nbsp;into&nbsp;<code><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">RUNNING</mark></code>&nbsp;status,&nbsp;AND&nbsp;for&nbsp;the&nbsp;model&nbsp;to&nbsp;finish&nbsp;loading.</strong></cite></blockquote>



<p>🎉 Is&nbsp;that&nbsp;it?&nbsp;Everything&nbsp;ready?&nbsp;It&nbsp;is&nbsp;therefore&nbsp;possible&nbsp;to&nbsp;start&nbsp;playing&nbsp;with&nbsp;the&nbsp;model!</p>



<h3 class="wp-block-heading">Step 7 &#8211; Test the Mistral Large model by sending your first requests</h3>



<ul class="wp-block-list">
<li>Access the API doc via your app URL:</li>
</ul>



<p><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code><strong>https://&lt;ai_deploy_mistral_app_id>.app.gra.ai.cloud.ovh.net/docs</strong></code></mark></p>



<p>To find the information, please refer to <a href="https://console.mistral.ai/on-premise/licenses" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">https://console.mistral.ai/on-premise/licenses</mark></strong></a></p>



<ul class="wp-block-list">
<li>Test with a basic cURL:</li>
</ul>



<pre class="wp-block-code"><code class="">curl -X 'POST' \
'https://&lt;ai_deploy_mistral_app_id&gt;.app.gra.ai.cloud.ovh.net/v1/chat/completions' \
  -H 'accept: application/json' \
  -H "Authorization: Bearer $MY_OVHAI_MISTRAL_LARGE_TOKEN" \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "mistral-large-&lt;version&gt;",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant!"
    },
    {
      "role": "user",
      "content": "What is the capital of France?"     
    }
  ]
}'</code></pre>



<p><strong>⚠️ Note that you have also to replace <mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code>&lt;version&gt;</code></mark> in the model name by the one you are using: </strong><br><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color"><code><strong>"model": "mistral-large-&lt;version&gt;"</strong></code></mark></p>



<p>To take implementation a step further and take advantage of all the features of this endpoint, you can also integrate it with <strong>Langchain</strong> thanks to its fuOpenAI compatibility.</p>



<ul class="wp-block-list">
<li>LangChain integration:</li>
</ul>



<pre class="wp-block-code"><code class="">import time
import os 
from langchain.chat_models import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

def chat_completion_basic(new_message: str):

  model = ChatOpenAI(model_name="mistral-large-&lt;version&gt;",
                        openai_api_key=$MY_OVHAI_MISTRAL_LARGE_TOKEN,
                        openai_api_base='https://&lt;ai_deploy_mistral_app_id&gt;.app.gra.ai.cloud.ovh.net/v1',
                       )

  prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant!"),
    ("human", "{question}"),
  ])

  chain = prompt | model

  print("🤖: ")
  for r in chain.stream({"question", new_message}):
    print(r.content, end="", flush=True)
    time.sleep(0.150)

chat_completion_basic("What is the capital of France?)</code></pre>



<p>🥹 Congratulations! You have successfully completed the deployment!</p>



<h2 class="wp-block-heading">Conclusion</h2>



<p>You can now consume your <strong>Mistral Large 123B</strong> in a secure environment!</p>



<p>The result of your implementation? The deployment of a sovereign, scalable, production-quality 123B LLM, powered by <strong>OVHcloud AI Deploy</strong>.</p>



<p>➡️ <strong>To go further? </strong></p>



<ul class="wp-block-list">
<li>Update your model in a single command line and without interruption following this <a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-deploy-update-custom-docker-image?id=kb_article_view&amp;sysparm_article=KB0057968" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">documentation</a></li>



<li>Go to the next replica in the event of a heavy load to ensure high availability using this <a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-deploy-apps-deployments?id=kb_article_view&amp;sysparm_article=KB0047997" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">method</a></li>
</ul>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Freference-architecture-deploy-mistral-large-model-in-sovereign-environment-ovhcloud%2F&amp;action_name=Reference%20Architecture%3A%C2%A0deploying%20the%20Mistral%20Large%20123B%20model%20in%20a%20sovereign%20environment%20with%20OVHcloud&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Deep Dive into DeepSeek-R1 &#8211; Part 1</title>
		<link>https://blog.ovhcloud.com/deep-dive-into-deepseek-r1-part-1/</link>
		
		<dc:creator><![CDATA[Fabien Ric]]></dc:creator>
		<pubDate>Thu, 06 Mar 2025 09:56:20 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[AI Deploy]]></category>
		<category><![CDATA[AI Endpoints]]></category>
		<category><![CDATA[DeepSeek]]></category>
		<category><![CDATA[LLM]]></category>
		<category><![CDATA[Machine learning]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[OVHcloud]]></category>
		<category><![CDATA[Public Cloud]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=28199</guid>

					<description><![CDATA[Introduction A few weeks ago, the release of the open-source large language model DeepSeek-R1 has taken the AI world by storm. The Chinese research team claimed their new reasoning model was on par with OpenAI&#8217;s flagship model o1, open-sourced the model and gave details about the work behind it. In this blog post series, we [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fdeep-dive-into-deepseek-r1-part-1%2F&amp;action_name=Deep%20Dive%20into%20DeepSeek-R1%20%26%238211%3B%20Part%201&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="512" src="https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-16-1024x512.png" alt="A cute whale with a baseball cap, using a computer, representing DeepSeek." class="wp-image-28353" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-16-1024x512.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-16-300x150.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-16-768x384.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-16-1536x768.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-16.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<h2 class="wp-block-heading">Introduction</h2>



<p>A few weeks ago, the release of the open-source large language model DeepSeek-R1 has taken the AI world by storm. The Chinese research team claimed their new reasoning model was on par with OpenAI&#8217;s flagship model o1, open-sourced the model and gave details about the work behind it.</p>



<p>In this blog post series, we will dive into the DeepSeek-R1 model family and see how you can run it on OVHcloud to build a simple chatbot that handles reasoning.</p>



<p>The &#8220;R&#8221; in DeepSeek-R1 stands for &#8220;Reasoning&#8221;, so let&#8217;s start by defining what a reasoning model is.</p>



<h2 class="wp-block-heading">What are reasoning models?</h2>



<p>Reasoning models are large language models (LLM) capable of reflecting on a problem before generating an answer. Traditionally, LLMs have been improved by spending more compute (more data, increase the number of parameters and the number of training iterations) at training time: it is <strong>training-time compute</strong>. Reasoning models, however, differ with standard LLMs in the way they use <strong>test-time compute</strong>, which means that during inference, they spend more time and resources to generate and refine a better answer.</p>



<p>Reasoning models excel at tasks that require understanding and working through a problem step-by-step, such as mathematics, riddles, puzzles, coding, planning tasks and agentic workflows. They may be counterproductive for use cases that don&#8217;t require reasoning capabilities, such as knowledge facts (for example, <em>who discovered penicillin)</em>.</p>



<p>In a classroom, a reasoning model would be a student that takes time to understand the question, split the problem into manageable steps and detail the resolution process before rushing to write the answer.</p>



<p>Here is a comparison between the outputs of a standard LLM and a reasoning LLM, on an example prompt:</p>



<figure data-wp-context="{&quot;imageId&quot;:&quot;69e9aa4ff2041&quot;}" data-wp-interactive="core/image" data-wp-key="69e9aa4ff2041" class="wp-block-image aligncenter size-full wp-lightbox-container"><img loading="lazy" decoding="async" width="1029" height="492" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-14.png" alt="A diagram showing the differences between standard LLM and reasoning LLM outputs for a given prompt." class="wp-image-28318" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-14.png 1029w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-14-300x143.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-14-1024x490.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-14-768x367.png 768w" sizes="auto, (max-width: 1029px) 100vw, 1029px" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>



<p>The reasoning model has generated more tokens, showing how it plans to solve the problem, before the actual answer. You can see it generates reasoning content into <code>&lt;think&gt;...&lt;/think&gt; </code>tags, in the case of DeepSeek-R1.</p>



<p>A standard LLM can also show reasoning abilities, that are often more visible when using a technique called <a href="https://arxiv.org/abs/2201.11903" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">Chain-of-Thought prompting (CoT)</a>, by adding phrases such as &#8220;let&#8217;s think step-by-step&#8221; in the prompt.</p>



<p>However, a reasoning LLM has been trained to behave this way. Its reasoning skill is internalized, so it doesn&#8217;t require specific prompting techniques to trigger the chain of thoughts process.</p>



<p>It&#8217;s important to note that DeepSeek-R1 is not the first reasoning model; OpenAI led the way by releasing their o1 model in September 2024.</p>



<p>The two main reasons why DeepSeek-R1 made the headline are its open-source nature, and the paper released by the research team which give many details on how they trained the model, with valuable insight for the open-source community to create reasoning models. Especially, the key highlight of their paper is that they observe the reasoning behavior can emerge only through Reinforcement Learning (RL), without fine-tuning.</p>



<h2 class="wp-block-heading">The DeepSeek-R1 model family</h2>



<p>You may have heard about DeepSeek-R1 but it&#8217;s not the only model of the DeepSeek family: DeepSeek-V3, DeepSeek-R1-Zero, and distilled models, are also available. So what are the differences between those models?</p>



<p>First, let&#8217;s go through some definitions and an overview of how language models are trained.</p>



<h3 class="wp-block-heading">Language model training overview</h3>



<p>The large language models available in apps and playgrounds are usually trained in 3 steps:</p>



<ol class="wp-block-list">
<li>A <strong>base model</strong> is trained on an unsupervised language modeling task (for instance, next token prediction) with a dataset of trillions of tokens (also called <em>pre-training</em>),</li>



<li>An <strong>instruct model </strong>is trained from the base model, by fine-tuning it on a massive dataset of instructions, conversations, questions and answers, to improve the performances of the model with the prompts frequently encountered in a chat,</li>



<li>The <strong>final model</strong> is the instruct model trained to better handle human preferences, avoid the generation of harmful content, etc. with techniques such as RLHF (reinforcement learning from human feedback) and DPO (direct policy optimization).</li>
</ol>



<figure data-wp-context="{&quot;imageId&quot;:&quot;69e9aa4ff26f2&quot;}" data-wp-interactive="core/image" data-wp-key="69e9aa4ff26f2" class="wp-block-image aligncenter size-full wp-lightbox-container"><img loading="lazy" decoding="async" width="1459" height="239" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://blog.ovhcloud.com/wp-content/uploads/2025/03/image.png" alt="A diagram showing the 3 training steps of a LLM." class="wp-image-28268" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/03/image.png 1459w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-300x49.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-1024x168.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-768x126.png 768w" sizes="auto, (max-width: 1459px) 100vw, 1459px" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>



<p></p>



<h3 class="wp-block-heading">DeepSeek-V3 training</h3>



<p>According to the <a href="https://arxiv.org/pdf/2412.19437" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">technical report provided by DeepSeek</a>, DeepSeek-V3 is a mixture-of-experts (MoE) language model trained with the same kind of process, which is described in the image below:</p>



<ul class="wp-block-list">
<li><strong>DeepSeek-V3-Base</strong> is trained with 14.8 trillion tokens,</li>



<li>A dataset of 1.5 million instructions examples is used to fine-tune the base model,</li>



<li>This instruct model goes through reinforcement learning with several reward models. The final model is <strong>DeepSeek-V3</strong>.</li>
</ul>



<figure data-wp-context="{&quot;imageId&quot;:&quot;69e9aa4ff2d9c&quot;}" data-wp-interactive="core/image" data-wp-key="69e9aa4ff2d9c" class="wp-block-image aligncenter size-full wp-lightbox-container"><img loading="lazy" decoding="async" width="1453" height="242" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-8.png" alt="A diagram showing the 3 training steps of DeepSeek-V3." class="wp-image-28288" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-8.png 1453w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-8-300x50.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-8-1024x171.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-8-768x128.png 768w" sizes="auto, (max-width: 1453px) 100vw, 1453px" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>



<p>For the reinforcement learning step, DeepSeek uses their algorithm called <strong>GRPO</strong> (<a href="https://arxiv.org/pdf/2402.03300" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">group relative policy optimization</a>), which uses several reward models to assess the quality of the content generated by the model. The score given by each reward model is combined into a final score, used to update the model so that it maximizes its global score the next time.</p>



<h3 class="wp-block-heading">DeepSeek-R1 model series training</h3>



<p><strong>DeepSeek-R1</strong> models are built with a different training pipeline, using the base model of DeepSeek-V3. The diagram below shows the main steps of the process designed by DeepSeek to create several reasoning models mentioned in their <a href="https://arxiv.org/pdf/2501.12948" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">technical report</a>:</p>



<figure data-wp-context="{&quot;imageId&quot;:&quot;69e9aa4ff34c9&quot;}" data-wp-interactive="core/image" data-wp-key="69e9aa4ff34c9" class="wp-block-image aligncenter size-full wp-lightbox-container"><img loading="lazy" decoding="async" width="1262" height="1323" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-12.png" alt="A diagram showing the training process of DeepSeek-R1, DeepSeek-R1-Zero and DeepSeek-Distill models." class="wp-image-28301" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-12.png 1262w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-12-286x300.png 286w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-12-977x1024.png 977w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-12-768x805.png 768w" sizes="auto, (max-width: 1262px) 100vw, 1262px" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>



<p>Let&#8217;s walk through it step-by-step (no pun intended):</p>



<p>1. The main breakthrough described in DeepSeek&#8217;s paper: they managed to train the DeepSeek-V3-Base 671B model to learn the reasoning capability with reinforcement learning only, which doesn&#8217;t require labeled data, as opposed to supervised fine-tuning. They use the same GRPO algorithm as before, with two rewards: one on the accuracy of the generated content, with &#8220;rule-based&#8221; experts instead of full reward models, that are also trained and require significant resources. For example, to assess if the model generated a correct Python code, you could have one expert that compiles the generated code and gives a note based on the number of errors. Another expert would generate test cases and see if the generated code can handle them. The other reward they use is about the format of the model&#8217;s responses, which must follow the  <code>&lt;think&gt;...&lt;think&gt;</code> tags to enclose the reasoning content. The resulting model is <strong>DeepSeek-R1-Zero.</strong> However, it has limitations that make it unsuitable for direct use, such as language mixing and poor readability.</p>



<p>2. To overcome these limitations, DeepSeek uses DeepSeek-R1-Zero to create a cold-start reasoning dataset, augmented with other data from sources not explicitly mentioned. DeepSeek-V3-Base is trained with this cold-start data, before applying a new round of reinforcement learning.</p>



<p>3. They use the same RL approach to get a new reasoning model, that generates a better quality of output. Using this model, they build a 100x bigger reasoning data, growing from 5k to 600k samples, using DeepSeek-V3 as a quality judge. This dataset is then completed with 200k samples generated with DeepSeek-V3 on non-reasoning tasks.</p>



<p>4. A second stage of supervised fine-tuning is done with the dataset built earlier.</p>



<p>5. The model is then aligned with human preferences with a final round of reinforcement learning with a specific human preferences reward. The resulting model is <strong>DeepSeek-R1</strong>.</p>



<p>6. Finally, DeepSeek experimented with fine-tuning much smaller models than DeepSeek-V3 (LLaMa 3.3 70B, Qwen 2.5 32B&#8230;) with the dataset built at step 3. In the paper, they call this process <strong>distillation</strong>. However, it must not be mistaken with the <em>knowledge distillation</em> technique frequently used in deep learning, where a student model learns from the probabilities distribution of a teacher model. Here, the term &#8220;distillation&#8221; refers to the fact that the reasoning skill is &#8220;distilled&#8221; into the base model, but it&#8217;s plain old supervised fine-tuning. This is how the <strong>DeepSeek-R1-Distill </strong>model series is trained. The quality of the dataset enables the resulting distilled models to beat much larger models on reasoning tasks, as show in the benchmark below:</p>



<figure data-wp-context="{&quot;imageId&quot;:&quot;69e9aa4ff3b60&quot;}" data-wp-interactive="core/image" data-wp-key="69e9aa4ff3b60" class="wp-block-image aligncenter size-full is-resized wp-lightbox-container"><img loading="lazy" decoding="async" width="770" height="312" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-13.png" alt="A screen capture of benchmark data table." class="wp-image-28310" style="width:750px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-13.png 770w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-13-300x122.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/image-13-768x311.png 768w" sizes="auto, (max-width: 770px) 100vw, 770px" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button><figcaption class="wp-element-caption"><em>Benchmark of distilled models on several reasoning tasks (source: DeepSeek R1 technical paper)</em></figcaption></figure>



<h3 class="wp-block-heading">Recap</h3>



<p>The table below summarize the differences between the model of the DeepSeek-R1 series:</p>



<figure class="wp-block-table"><table><tbody><tr><td>Model</td><td>Description</td></tr><tr><td>DeepSeek-R1-Zero</td><td>Intermediate 671B reasoning model trained from DeepSeek-V3 exclusively with reinforcement learning, and used to bootstrap DeepSeek-R1 training.</td></tr><tr><td>DeepSeek-R1</td><td>671B reasoning model trained from DeepSeek-V3.</td></tr><tr><td>DeepSeek-R1-Distill</td><td>Smaller models fine-tuned for reasoning with a dataset generated by an intermediate version of DeepSeek-R1.</td></tr></tbody></table></figure>



<h2 class="wp-block-heading">Run DeepSeek-R1 on OVHcloud</h2>



<p>Now that we&#8217;ve seen the differences between all DeepSeek models, let&#8217;s try to use them!</p>



<h3 class="wp-block-heading">AI Endpoints</h3>



<p>The fastest way to test DeepSeek-R1 is to use OVHcloud<strong> AI Endpoints</strong>.</p>



<p><strong>DeepSeek-R1-Distill-Llama-70B</strong> is already available, ready to use and optimized for inference speed. Check it out here: <a href="https://endpoints.ai.cloud.ovh.net/models/a011515c-0042-41b2-9a00-ec8b5d34462d" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">https://endpoints.ai.cloud.ovh.net/models/a011515c-0042-41b2-9a00-ec8b5d34462d</a></p>



<p>AI Endpoints makes it easy to integrate AI into your applications with a simple API call, without the need for deep AI expertise or infrastructure management. And while it’s in beta, it’s <strong>free</strong>!</p>



<p>Here is an example cURL command to use DeepSeek-R1 Distill Llama 70B on the OpenAI compatible endpoint provided by OVHcloud AI Endpoints:</p>



<pre class="wp-block-code"><code class="">curl -X 'POST' \
  'https://deepseek-r1-distill-llama-70b.endpoints.kepler.ai.cloud.ovh.net/api/openai_compat/v1/chat/completions' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "max_tokens": 4096,
  "messages": [
    {
      "content": "How can I calculate an approximation of Pi in Python?",
      "role": "user"
    }
  ],
  "model": null,
  "seed": null,
  "stream": false,
  "temperature": 0.7,
  "top_p": 1
}'</code></pre>



<p>We can see in the output the thinking process followed by the answer, which have been truncated for clarity.</p>



<pre class="wp-block-code"><code class="">{
    "id": "chatcmpl-8c21b2e3fac44d43b63c06fa25e58091",
    "object": "chat.completion",
    "created": 1741199564,
    "model": "DeepSeek-R1-Distill-Llama-70B",
    "choices":
    [
        {
            "index": 0,
            "message":
            {
                "role": "assistant",
                "content": "&lt;think&gt;\nOkay, the user is asking how to approximate Pi using Python. I need to think about different methods they can use. Let's see, there are a few common approaches. \n\nFirst, there's the Monte Carlo method. ... Let me structure the response with each method as a separate section, explaining what it is, how it works, and providing the code. Then, the user can pick which one they prefer based on their situation.\n&lt;/think&gt;\n\nThere are several ways to approximate the value of Pi (π) using Python. Below are a few methods:\n\n### 1. Using the Monte Carlo Method..."
            },
            "finish_reason": "stop",
            "logprobs": null
        }
    ],
    "usage":
    {
        "prompt_tokens": 14,
        "completion_tokens": 1377,
        "total_tokens": 1391
    }
}</code></pre>



<p>Stéphane Philippart, Developer Relation Advocate at OVHcloud, has written a blog post covering everything you need to know to get up to speed with AI Endpoints and run this model: <a href="https://blog.ovhcloud.com/release-of-deepseek-r1-on-ovhcloud-ai-endpoints/" target="_blank" rel="noreferrer noopener" data-wpel-link="internal">Release of DeepSeek-R1 on OVHcloud AI Endpoints</a></p>



<h3 class="wp-block-heading">AI Deploy</h3>



<p>What if you want to run another version of DeepSeek-R1, such as the Qwen 7B distilled version?</p>



<p>You can use another OVHcloud AI product, <strong>AI Deploy</strong>, to create your own serving endpoint, with <a href="https://docs.vllm.ai/en/stable/" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">vLLM</a> as the inference engine. It is open-source, fast and well maintained, ensuring maximal compatibility with even the most recent AI models.</p>



<p>Eléa Petton, Solution Architect at OVHcloud, has written a blog post explaining in details how to serve an open-source model with vLLM on AI Deploy. Just replace the Mistral Small model with the DeepSeek distilled version you want to use (e.g. <strong>deepseek-ai/DeepSeek-R1-Distill-Qwen-7B</strong>) and adapt the number of L40S cards needed (1 is enough for the 7B version) : <a href="https://blog.ovhcloud.com/mistral-small-24b-served-with-vllm-and-ai-deploy-one-command-to-deploy-llm/" target="_blank" rel="noreferrer noopener" data-wpel-link="internal">Mistral Small 24B served with vLLM and AI Deploy – a single command to deploy an LLM (Part 1)</a></p>



<h3 class="wp-block-heading">Next up, creating a reasoning chatbot with DeepSeek-R1</h3>



<p>In part 2 of this blog post series, we will use a DeepSeek-R1-Distill model to create a chatbot that will handle reasoning gracefully, by showing the thinking process of the model.</p>



<p>We will develop our chatbot with OVHcloud AI Endpoints and the Python library <a href="https://www.gradio.app/" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">Gradio</a>, that enables to quickly create simple chat interfaces.</p>



<p>Here a screenshot of the finalized chatbot we will build:</p>



<figure data-wp-context="{&quot;imageId&quot;:&quot;69e9aa5000171&quot;}" data-wp-interactive="core/image" data-wp-key="69e9aa5000171" class="wp-block-image aligncenter size-full wp-lightbox-container"><img loading="lazy" decoding="async" width="723" height="1173" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://blog.ovhcloud.com/wp-content/uploads/2025/03/chatbot.png" alt="A screenshot of a chatbot application developed with DeepSeek-R1 and Gradio in Python." class="wp-image-28328" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/03/chatbot.png 723w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/chatbot-185x300.png 185w, https://blog.ovhcloud.com/wp-content/uploads/2025/03/chatbot-631x1024.png 631w" sizes="auto, (max-width: 723px) 100vw, 723px" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			aria-label="Enlarge"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.imageButtonRight"
			data-wp-style--top="state.imageButtonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>



<p>Stay tuned for the next article in this DeepSeek-R1 series. In the meantime, try out DeepSeek-R1 on AI Endpoints and AI Deploy and let us know what you &lt;think&gt;!</p>



<h3 class="wp-block-heading">Resources</h3>



<p>If you want to learn more about DeepSeek-R1 and the topics we covered in this blog post, such as test-time compute, GRPO, reinforcement learning and reasoning models, we suggest having a look at these resources:</p>



<ul class="wp-block-list">
<li><a href="https://arxiv.org/pdf/2501.12948" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">DeepSeek-R1 technical report</a>, by the DeepSeek team</li>



<li><a href="https://newsletter.languagemodels.co/p/the-illustrated-deepseek-r1" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">The Illustrated DeepSeek-R1</a>, by Jay Alamar</li>



<li><a href="https://magazine.sebastianraschka.com/p/understanding-reasoning-llms" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">Understanding Reasoning LLMs</a>, by Sebastian Raschka</li>



<li><a href="https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-reasoning-llms" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">A Visual Guide to Reasoning LLMs</a>, by Maarten Grootendorst</li>
</ul>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fdeep-dive-into-deepseek-r1-part-1%2F&amp;action_name=Deep%20Dive%20into%20DeepSeek-R1%20%26%238211%3B%20Part%201&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Mistral Small 24B served with vLLM and AI Deploy &#8211; a single command to deploy an LLM (Part 1)</title>
		<link>https://blog.ovhcloud.com/mistral-small-24b-served-with-vllm-and-ai-deploy-one-command-to-deploy-llm/</link>
		
		<dc:creator><![CDATA[Eléa Petton]]></dc:creator>
		<pubDate>Mon, 24 Feb 2025 10:08:37 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[AI Deploy]]></category>
		<category><![CDATA[LLM]]></category>
		<category><![CDATA[Machine learning]]></category>
		<category><![CDATA[Mistral]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[OVHcloud]]></category>
		<category><![CDATA[Public Cloud]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=28212</guid>

					<description><![CDATA[You are not dreaming! You can deploy open-source LLM in a single command line. Deploying advanced language models can be a challenge! But this sometimes this arduous task is becoming increasingly accessible, enabling developers to integrate sophisticated AI capabilities into their applications. In this guide, we will walk through deploying the Mistral-Small-24B-Instruct-2501 model using vLLM [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fmistral-small-24b-served-with-vllm-and-ai-deploy-one-command-to-deploy-llm%2F&amp;action_name=Mistral%20Small%2024B%20served%20with%20vLLM%20and%20AI%20Deploy%20%26%238211%3B%20a%20single%20command%20to%20deploy%20an%20LLM%20%28Part%201%29&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p><strong><em>You are not dreaming! You can deploy open-source LLM in a single command line</em>.</strong></p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="724" src="https://blog.ovhcloud.com/wp-content/uploads/2025/02/image_blog_post_mistral_small_ai_deploy-1024x724.png" alt="Rocket in MistralAI colors in a data center with a French rooster showing rapid LLM deployment" class="wp-image-28219" srcset="https://blog.ovhcloud.com/wp-content/uploads/2025/02/image_blog_post_mistral_small_ai_deploy-1024x724.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2025/02/image_blog_post_mistral_small_ai_deploy-300x212.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2025/02/image_blog_post_mistral_small_ai_deploy-768x543.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2025/02/image_blog_post_mistral_small_ai_deploy-1536x1086.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2025/02/image_blog_post_mistral_small_ai_deploy.png 2000w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Deploying advanced language models can be a challenge! But this sometimes this arduous task is becoming increasingly accessible, enabling developers to integrate sophisticated AI capabilities into their applications.</p>



<p>In this guide, we will walk through deploying the <strong><a href="https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Mistral-Small-24B-Instruct-2501</a></strong> model using <strong>vLLM</strong> on OVHcloud&#8217;s <a href="https://www.ovhcloud.com/fr/public-cloud/ai-deploy/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Deploy platform</a>. This combination offers a powerful solution for efficient and scalable AI model serving.</p>



<p>Deploying a model is great, but doing it quickly is even better!</p>



<p>🤯 <strong>What if a single command line was enough?</strong> That&#8217;s the challenge we&#8217;re tackling today!</p>



<h2 class="wp-block-heading">Context</h2>



<p>Before deployment, let’s take a closer look at our key technologies!</p>



<h3 class="wp-block-heading">Mistral Small</h3>



<p>The <code><strong>mistralai/Mistral-Small-24B-Instruct-2501</strong></code> is a 24-billion-parameter instruction-fine-tuned model, renowned for its compact size and performance comparable to larger models.</p>



<p>This model, from <a href="https://mistral.ai/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">MistralAI</a>, is an instruction-fine-tuned version of the base model:&nbsp;<a href="https://huggingface.co/mistralai/Mistral-Small-24B-Base-2501" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Mistral-Small-24B-Base-2501</a>.</p>



<p>To serve this model efficiently, we will utilize vLLM, an open-source library for <strong>LLM inference</strong>.</p>



<h3 class="wp-block-heading">vLLM</h3>



<p><a href="https://docs.vllm.ai/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">vLLM</a> (<strong>Virtual LLM</strong>) is a highly optimized service engine designed to efficiently run large language models. It takes advantage of several key optimizations, such as:</p>



<ul class="wp-block-list">
<li><strong>PagedAttention:</strong> an attention mechanism that reduces memory fragmentation and enables more efficient use of GPU memory</li>



<li><strong>Continuous Batching:</strong> vLLM dynamically adjusts batch sizes in real time, ensuring that the GPU is always used efficiently, even with multiple simultaneous requests</li>



<li><strong>Tensor parallelism:</strong> enables model inference across multiple GPUs to boost performance</li>



<li><strong>Optimized kernel implementations:</strong> vLLM uses custom CUDA kernels for faster execution, reducing latency compared to traditional inference frameworks</li>
</ul>



<p>These features make vLLM one of the best choices for large models such as Mistral Small 24B, enabling low-latency, high-throughput inference on the latest GPUs.</p>



<p>By deploying on OVHcloud&#8217;s AI Deploy platform, you can deploy this model in a single command line.</p>



<h3 class="wp-block-heading">AI Deploy </h3>



<p>OVHcloud AI Deploy is a<strong> Container as a Service</strong> (CaaS) platform designed to help you deploy, manage and scale AI models. It provides a solution that allows you to optimally deploy your applications / APIs based on Machine Learning (ML), Deep Learning (DL) or LLMs.</p>



<p>The key benefits are:</p>



<ul class="wp-block-list">
<li><strong>Easy to use:</strong> bring your own custom Docker image and deploy it in a command line or a few clicks surely</li>



<li><strong>High-performance computing:</strong> a complete range of GPUs available (H100, A100, V100S, L40S and L4)</li>



<li><strong>Scalability and flexibility:</strong> supports automatic scaling, allowing your model to effectively handle fluctuating workloads</li>



<li><strong>Cost-efficient:</strong> billing per minute, no surcharges</li>
</ul>



<p>✅ To go further, some prerequisites must be checked!</p>



<h2 class="wp-block-heading">Prerequisites</h2>



<p>Before you begin, ensure that you have:</p>



<ul class="wp-block-list">
<li><strong>OVHcloud account</strong>: access to the&nbsp;<a href="https://www.ovh.com/auth/?action=gotomanager&amp;from=https://www.ovh.co.uk/&amp;ovhSubsidiary=GB" data-wpel-link="exclude">OVHcloud Control Panel</a></li>



<li><strong>ovhai CLI available:</strong> install the <a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-cli-install-client?id=kb_article_view&amp;sysparm_article=KB0047844" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">ovhai CLI</a></li>



<li><strong>AI Deploy access</strong>: ensure you have a <a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-users?id=kb_article_view&amp;sysparm_article=KB0048170" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">user for AI Deploy</a></li>



<li><strong>Hugging Face access</strong>: create an <a href="https://huggingface.co/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Hugging Face account</a> and generate an <a href="https://huggingface.co/settings/tokens" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">access token</a></li>



<li><strong>Gated model authorization</strong>: be sure you have been granted access to <a href="https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Mistral-Small-24B-Instruct-2501</a> model</li>
</ul>



<p><strong>🚀 Having all the ingredients for our recipe, it&#8217;s time to deploy!</strong></p>



<h2 class="wp-block-heading">Deployment of the Mistral Small 24B LLM</h2>



<p>Let&#8217;s go for the deployment of the model <code><strong>mistralai/Mistral-Small-24B-Instruct-2501</strong></code></p>



<h3 class="wp-block-heading">Manage access tokens</h3>



<p>Export your <a href="https://huggingface.co/settings/tokens" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Hugging Face token</a>.</p>



<pre class="wp-block-code"><code class="">export MY_HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxx</code></pre>



<p><a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-cli-app-token?id=kb_article_view&amp;sysparm_article=KB0035280" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Create a token</a> to access your AI Deploy app once it will be deployed.</p>



<pre class="wp-block-code"><code class="">ovhai token create --role operator ai_deploy_token=my_operator_token</code></pre>



<p>Returning the following output:</p>



<p><code><strong>Id:         47292486-fb98-4a5b-8451-600895597a2b<br>Created At: 20-02-25 11:53:05<br>Updated At: 20-02-25 11:53:05<br>Spec:<br>  Name:           ai_deploy_token=my_operator_token<br>  Role:           AiTrainingOperator<br>  Label Selector: <br>Status:<br>  Value:   XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX<br>  Version: 1</strong></code></p>



<p>You can now store and export your access token:</p>



<pre class="wp-block-code"><code class="">export MY_OVHAI_ACCESS_TOKEN=<span style="background-color: initial; font-family: inherit; font-size: inherit; font-weight: inherit;">XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX</span></code></pre>



<h3 class="wp-block-heading">Launch Mistral Small LLM with AI Deploy</h3>



<p>You are ready to start<strong> Mistral-Small-24B</strong> using vLLM and AI Deploy:</p>



<pre class="wp-block-code"><code class="">ovhai app run --name vllm-mistral-small \
              --default-http-port 8000 \
              --label ai_deploy_token=my_operator_token \
              --gpu 2 \
              --flavor l40s-1-gpu \
              -e OUTLINES_CACHE_DIR=/tmp/.outlines \
              -e HF_TOKEN=$MY_HF_TOKEN \
              -e HF_HOME=/hub \
              -e HF_DATASETS_TRUST_REMOTE_CODE=1 \
              -e HF_HUB_ENABLE_HF_TRANSFER=0 \
              -v standalone:/hub:rw \
              -v standalone:/workspace:rw \
              vllm/vllm-openai:v0.8.2 \
              -- bash -c "python3 -m vllm.entrypoints.openai.api_server \
                        --model mistralai/Mistral-Small-24B-Instruct-2501 \
                        --tensor-parallel-size 2 \
                        --tokenizer_mode mistral \
                        --load_format mistral \
                        --config_format mistral \
                        --dtype half"</code></pre>



<p><strong>How to understand the different parameters of this command?</strong></p>



<h5 class="wp-block-heading">1. Start your AI Deploy app</h5>



<p>Launch a new app using <a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-cli-install-client?id=kb_article_view&amp;sysparm_article=KB0047844" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">ovhai CLI</a> and name it.</p>



<p><code><strong>ovhai app run --name vllm-mistral-small</strong></code></p>



<h5 class="wp-block-heading">2. Define access</h5>



<p>Define the HTTP API port and restrict access to your token.</p>



<p><strong><code>--default-http-port 8000</code><br><code>--label ai_deploy_token=my_operator_token</code></strong></p>



<h5 class="wp-block-heading">3. Configure GPU resources</h5>



<p>Specifies the hardware type (<code><strong>l40s-1-gpu</strong></code>), which refers to an <strong>NVIDIA L40S GPU</strong> and the number (<code><strong>2</strong></code>).</p>



<p><code><strong>--gpu 2<br>--flavor l40s-1-gpu</strong></code></p>



<p><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">⚠️WARNING!</mark></strong> For this model, two L40S are sufficient, but if you want to deploy another model, you will need to check which GPU you need. Note that you can also access to A100 and H100 GPUs for your larger models.</p>



<h5 class="wp-block-heading">4. Set up environment variables</h5>



<p>Configure caching for the <strong>Outlines library</strong> (used for efficient text generation):</p>



<p><code><strong>-e OUTLINES_CACHE_DIR=/tmp/.outlines</strong></code></p>



<p>Pass the <strong>Hugging Face token</strong> (<code>$MY_HF_TOKEN</code>) for model authentication and download:</p>



<p><code><strong>-e HF_TOKEN=$MY_HF_TOKEN</strong></code></p>



<p>Set the <strong>Hugging Face cache directory</strong> to <code>/hub</code> (where models will be stored):</p>



<p><code><strong>-e HF_HOME=/hub</strong></code></p>



<p>Allow execution of <strong>custom remote code</strong> from Hugging Face datasets (required for some model behaviors):</p>



<p><code><strong>-e HF_DATASETS_TRUST_REMOTE_CODE=1</strong></code></p>



<p>Disable <strong>Hugging Face Hub transfer acceleration</strong> (to use standard model downloading):</p>



<p><code><strong>-e HF_HUB_ENABLE_HF_TRANSFER=0</strong></code></p>



<h5 class="wp-block-heading">5. Mount persistent volumes</h5>



<p>Mounts <strong>two persistent storage volumes</strong>:</p>



<ul class="wp-block-list">
<li><code>/hub</code> → Stores Hugging Face model files</li>



<li><code>/workspace</code> → Main working directory</li>
</ul>



<p>The <code>rw</code> flag means <strong>read-write access</strong>.</p>



<p><code><strong>-v standalone:/hub:rw<br>-v standalone:/workspace:rw</strong></code></p>



<h5 class="wp-block-heading">6. Choose the target Docker image</h5>



<p>Uses the <strong><code>v<strong><code>llm/vllm-openai:v0.8.2</code></strong></code></strong> Docker image (a pre-configured vLLM OpenAI API server).</p>



<p><strong><code>vllm/vllm-openai:v0.8.2</code></strong></p>



<h5 class="wp-block-heading">7. Running the model inside the container</h5>



<p>Runs a<strong> bash shell</strong> inside the container and executes a Python command to launch the vLLM API server:</p>



<ul class="wp-block-list">
<li><strong><code>python3 -m vllm.entrypoints.openai.api_server</code></strong> → Starts the OpenAI-compatible vLLM API server</li>



<li><strong><code>--model mistralai/Mistral-Small-24B-Instruct-2501</code></strong> → Loads the <strong>Mistral Small 24B</strong> model from Hugging Face</li>



<li><strong><code>--tensor-parallel-size 2</code></strong> → Distributes the model across <strong>2 GPUs</strong></li>



<li><strong><code>--tokenizer_mode mistral</code></strong> → Uses the <strong>Mistral tokenizer</strong></li>



<li><strong><code>--load_format mistral</code></strong> → Uses Mistral’s model loading format</li>



<li><strong><code>--config_format mistral</code></strong> → Ensures the model configuration follows Mistral&#8217;s standard</li>



<li><strong><code>--dtype half</code></strong> → Uses <strong>FP16 (half-precision floating point)</strong> for optimized GPU performance</li>
</ul>



<p>You can now check if your <strong>AI Deploy</strong> app is alive:</p>



<pre class="wp-block-code"><code class="">ovhai app get &lt;your_vllm_app_id&gt;</code></pre>



<p>💡<strong>Is your app in <code>RUNNING</code> status?</strong> Perfect! You can check in the logs that the server is started&#8230;</p>



<pre class="wp-block-code"><code class="">ovhai app logs &lt;your_vllm_app_id&gt;</code></pre>



<p><strong><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-ast-global-color-0-color">⚠️WARNING!</mark></strong> This step may take a little time as the template must be loaded&#8230;<br>After a few minutes, you should get the following information in the logs:</p>



<p><code><strong>2025-02-20T13:48:07Z [app] [tcmzt] INFO:     Started server process [13] 2025-02-20T13:48:07Z [app] [tcmzt] INFO:     Waiting for application startup. 2025-02-20T13:48:07Z [app] [tcmzt] INFO:     Application startup complete. 2025-02-20T13:48:07Z [app] [tcmzt] INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)</strong></code></p>



<p>🚦 <strong>Are all the indicators green? </strong>Then it&#8217;s off to inference!</p>



<h3 class="wp-block-heading">Request and send prompt to the LLM</h3>



<p>Launch the following query by asking the question of your choice:</p>



<pre class="wp-block-code"><code class="">curl https://&lt;your_vllm_app_id&gt;.app.gra.ai.cloud.ovh.net/v1/chat/completions \
  -H "Authorization: Bearer $MY_OVHAI_ACCESS_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mistralai/Mistral-Small-24B-Instruct-2501",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Give me the name of OVHcloud’s founder."}
    ],
    "stream": false
  }'</code></pre>



<p>Returning the following result:</p>



<pre class="wp-block-code"><code class="">{
  "id":"chatcmpl-d6ea734b524bd851668e71d4111ba496",
  "object":"chat.completion",
  "created":1740059807,
  "model":"mistralai/Mistral-Small-24B-Instruct-2501",
  "choices":[
    {
      "index":0,
      "message":{
        "role":"assistant",
        "reasoning_content":null, 
        "content":"The founder of OVHcloud is Octave Klaba.",
        "tool_calls":[]
      },
      "logprobs":null,
      "finish_reason":"stop",
      "stop_reason":null
    }
  ],
  "usage":{
    "prompt_tokens":22,
    "total_tokens":35,
    "completion_tokens":13,
    "prompt_tokens_details":null
  },
  "prompt_logprobs":null
}</code></pre>



<h2 class="wp-block-heading">Conclusion</h2>



<p>By following these steps, you have successfully deployed the <code><strong>mistralai/Mistral-Small-24B-Instruct-2501</strong></code> model using <strong>vLLM</strong> on OVHcloud&#8217;s AI Deploy platform. This setup provides a scalable and efficient solution for serving advanced language models in production environments.</p>



<p>For further customization and optimization, refer to the <a href="https://help.ovhcloud.com/csm/en-ie-documentation-public-cloud-ai-and-machine-learning-ai-deploy?id=kb_browse_cat&amp;kb_id=574a8325551974502d4c6e78b7421938&amp;kb_category=3241efc6a052d910f078d4b4ef43651f&amp;spa=1" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">vLLM documentation</a> and <a>OVHcloud AI Deploy resources</a>.</p>



<p>💪 <strong>Challenges taken!</strong> You can now enjoy the power of your LLM deployed in a single command line!</p>



<p>Want even more simplicity? You can also use ready-to-use APIs with <a href="https://endpoints.ai.cloud.ovh.net/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Endpoints</a>!</p>



<p><strong><em>But… what’s next?</em></strong></p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fmistral-small-24b-served-with-vllm-and-ai-deploy-one-command-to-deploy-llm%2F&amp;action_name=Mistral%20Small%2024B%20served%20with%20vLLM%20and%20AI%20Deploy%20%26%238211%3B%20a%20single%20command%20to%20deploy%20an%20LLM%20%28Part%201%29&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>How to serve LLMs with vLLM and OVHcloud AI Deploy</title>
		<link>https://blog.ovhcloud.com/how-to-serve-llms-with-vllm-and-ovhcloud-ai-deploy/</link>
		
		<dc:creator><![CDATA[Mathieu Busquet]]></dc:creator>
		<pubDate>Wed, 29 May 2024 12:22:26 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[AI Deploy]]></category>
		<category><![CDATA[AI Endpoints]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Deep learning]]></category>
		<category><![CDATA[GPU]]></category>
		<category><![CDATA[LLaMA]]></category>
		<category><![CDATA[LLaMA 3]]></category>
		<category><![CDATA[LLM Serving]]></category>
		<category><![CDATA[Mistral]]></category>
		<category><![CDATA[Mixtral]]></category>
		<category><![CDATA[vLLM]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=26762</guid>

					<description><![CDATA[In this tutorial, we will learn how to serve Large Language Models (LLMs) using vLLM and the OVHcloud AI Products.<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fhow-to-serve-llms-with-vllm-and-ovhcloud-ai-deploy%2F&amp;action_name=How%20to%20serve%20LLMs%20with%20vLLM%20and%20OVHcloud%20AI%20Deploy&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p><em>In this tutorial, we will walk you through the process of serving large language models (LLMs), providing step-by-step instruction</em>.</p>



<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" width="1024" height="345" src="https://blog.ovhcloud.com/wp-content/uploads/2023/07/LLaMA2_finetuning_OVHcloud_resized-1024x345.png" alt="" class="wp-image-25615" style="width:750px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2023/07/LLaMA2_finetuning_OVHcloud_resized-1024x345.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2023/07/LLaMA2_finetuning_OVHcloud_resized-300x101.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2023/07/LLaMA2_finetuning_OVHcloud_resized-768x259.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2023/07/LLaMA2_finetuning_OVHcloud_resized-1536x518.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2023/07/LLaMA2_finetuning_OVHcloud_resized-2048x690.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p></p>



<h3 class="wp-block-heading">Introduction</h3>



<p>In recent years, <strong>large language models</strong> (LLMs) have become increasingly <strong>popular</strong>, with <strong>open-source</strong> models like <em>Mistral</em> and <em>LLaMA</em> gaining widespread attention. In particular, the <em>LLaMA 3</em> model was released on <em>April 18, 2024</em>, is one of today&#8217;s most powerful open-source LLMs.</p>



<p>However, <strong>serving these LLMs can be challenging</strong>, particularly on hardware with limited resources. Indeed, even on expensive hardware, LLMs can be surprisingly slow, with high VRAM utilization and throughput limitations.</p>



<p>This is where<strong><em> </em></strong><em><a href="https://github.com/vllm-project/vllm" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong>vLLM</strong></a></em> comes in. <em><strong>vLLM</strong></em> is an <strong>open-source project</strong> that enables <strong>fast and easy-to-use LLM inference and serving</strong>. Designed for optimal performance and resource utilization, <em>vLLM</em> supports a range of <a href="https://docs.vllm.ai/en/latest/models/supported_models.html" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">LLM architectures</a> and offers <a href="https://docs.vllm.ai/en/latest/models/engine_args.html" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">flexible customization options</a>. That&#8217;s why we are going to use it to efficiently deploy and scale our LLMs.</p>



<h3 class="wp-block-heading">Objective</h3>



<p>In this guide, you will discover how to deploy a LLM thanks to <a href="https://github.com/vllm-project/vllm" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><em>vLLM</em></a> and the <strong><em>AI Deploy</em></strong> <em>OVHcloud</em> solution. This will enable you to benefit from <em>vLLM</em>&#8216;s optimisations and <em>OVHcloud</em>&#8216;s GPU computing resources. Your LLM will then be exposed by a secured API.</p>



<p>🎁 And for those who do not want to bother with the deployment process, <strong>a surprise awaits you at the <a href="#AI-ENDPOINTS">end of the article</a></strong>. We are going to introduce you to our new solution for using LLMs, called <a href="https://endpoints.ai.cloud.ovh.net/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong>AI Endpoints</strong></a>. This product makes it easy to integrate AI capabilities into your applications with a simple API call, without the need for deep AI expertise or infrastructure management. And while it&#8217;s in alpha, it&#8217;s <strong>free</strong>!</p>



<h3 class="wp-block-heading">Requirements</h3>



<p>To deploy your <em>vLLM</em> server, you need:</p>



<ul class="wp-block-list">
<li>An <em>OVHcloud</em> account to access the <a href="https://www.ovh.com/auth/?action=gotomanager&amp;from=https://www.ovh.co.uk/&amp;ovhSubsidiary=GB" data-wpel-link="exclude"><em>OVHcloud Control Panel</em></a></li>



<li>A <em>Public Cloud</em> project</li>



<li>A <a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-users?id=kb_article_view&amp;sysparm_article=KB0048170" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">user for the AI Products</a>, related to this <em>Public Cloud</em> project</li>



<li><a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-cli-install-client?id=kb_article_view&amp;sysparm_article=KB0047844" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">The <em>OVHcloud AI CLI</em></a> installed on your local computer (to interact with the AI products by running commands). </li>



<li><a href="https://www.docker.com/get-started" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Docker</a> installed on your local computer, <strong>or</strong> access to a Debian Docker Instance, which is available on the <a href="https://www.ovh.com/manager/public-cloud/" data-wpel-link="exclude"><em>Public Cloud</em></a></li>
</ul>



<p>Once these conditions have been met, you are ready to serve your LLMs.</p>



<h3 class="wp-block-heading">Building a Docker image</h3>



<p>Since the <a href="https://www.ovhcloud.com/en/public-cloud/ai-deploy/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><em>OVHcloud AI Deploy</em></a> solution is based on <a href="https://www.docker.com/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><em>Docker</em></a> images, we will be using a <em>Docker</em> image to deploy our <em>vLLM</em> inference server. </p>



<p>As a reminder, <em>Docker</em> is a platform that allows you to create, deploy, and run applications in containers. <em>Docker</em> containers are standalone and executable packages that include everything needed to run an application (code, libraries, system tools).</p>



<p>To create this <em>Docker</em> image, we will need to write the following <em><strong>Dockerfile</strong></em> into a new folder:</p>



<pre class="wp-block-code"><code lang="bash" class="language-bash">mkdir my_vllm_image
nano Dockerfile</code></pre>



<pre class="wp-block-code"><code lang="bash" class="language-bash"># 🐳 Base image
FROM pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime

# 👱 Set the working directory inside the container
WORKDIR /workspace

# 📚 Install missing system packages (git) so we can clone the vLLM project repository
RUN apt-get update &amp;&amp; apt-get install -y git
RUN git clone https://github.com/vllm-project/vllm/

# 📚 Install the Python dependencies
RUN pip3 install --upgrade pip
RUN pip3 install vllm 

# 🔑 Give correct access rights to the OVHcloud user
ENV HOME=/workspace
RUN chown -R 42420:42420 /workspace</code></pre>



<p>Let&#8217;s take a closer look at this <em>Dockerfile</em> to understand it:</p>



<ul class="wp-block-list">
<li><strong>FROM</strong>: Specify the base image for our <em>Docker</em> Image. We choose the <em>PyTorch</em> image since it comes with <em>CUDA</em>, <em>CuDNN</em> and <em>torch</em>, which is needed by <em>vLLM</em>. </li>



<li><strong>WORKDIR /workspace</strong>: We set the working directory for the <em>Docker</em> container to <em>/workspace</em>, which is the default folder when we use <em>AI Deploy</em>.</li>



<li><strong>RUN</strong>: It allows us to upgrade <em>pip</em> to the latest version to make sure we have access to the latest libraries and dependencies. We will install <em>vLLM</em> library, and <em>git</em>, which will enable to clone the <a href="https://github.com/vllm-project/vllm/tree/main" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><em>vLLM</em> repository</a> into th<em>e /workspace</em> directory.</li>



<li><strong>ENV</strong> HOME=/workspace: This sets the <em>HOME</em> environment variable to <em>/workspace</em>. This is a requirement to use the <em>OVHcloud</em> AI Products.</li>



<li><strong>RUN chown -R 42420:42420 /workspace</strong>: This changes the owner of the <em>/workspace</em> directory to the user and group with IDs of <em>42420</em> (<em>OVHcloud</em> user). This is also a requirement to use the <em>OVHcloud</em> AI Products.</li>
</ul>



<p>This <em>Dockerfile</em> does not contain a <strong>CMD</strong> instruction and therefore does not launch our <em>VLLM</em> server. Do not worry about that, we will do it directly from <a href="https://www.ovhcloud.com/en/public-cloud/ai-deploy/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Deploy</a>&nbsp;to have more flexibility.</p>



<p>Once your Dockerfile is written, launch the following command to build your image:</p>



<pre class="wp-block-code"><code lang="bash" class="language-bash">docker build . -t vllm_image:latest</code></pre>



<h3 class="wp-block-heading">Push the image into the shared registry</h3>



<p>Once you have built the Docker image, you will need to push it to a <strong>registry</strong> to make it accessible from <em>AI Deploy</em>. A <strong>registry</strong> is a service that allows you to store and distribute <em>Docker</em> images, making it easy to deploy them in different environments.</p>



<p>Several registries can be used (<em><a href="https://www.ovhcloud.com/en-gb/public-cloud/managed-private-registry/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OVHcloud Managed Private Registry</a>, <a href="https://hub.docker.com/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Docker Hub</a>, <a href="https://github.com/features/packages" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">GitHub packages</a>, &#8230;</em>). In this tutorial, we will use the <strong><em>OVHcloud</em> <em>shared registry</em></strong>. More information are available in the <a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-manage-registries?id=kb_article_view&amp;sysparm_article=KB0057949" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Registries documentation</a>.</p>



<p>To find the address of your shared registry, use the following command (<a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-cli-install-client?id=kb_article_view&amp;sysparm_article=KB0047844" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><em>ovhai CLI</em></a> needs to be installed on your computer):</p>



<pre class="wp-block-code"><code lang="bash" class="language-bash">ovhai registry list</code></pre>



<p>Then, log in on your <em>shared registry</em> with your usual <a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-users?id=kb_article_view&amp;sysparm_article=KB0048170" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><em>AI Platform user</em></a> credentials:</p>



<pre class="wp-block-code"><code lang="bash" class="language-bash">docker login -u &lt;user&gt; -p &lt;password&gt; &lt;shared-registry-address&gt;</code></pre>



<p>Once you are logged in to the registry, tag the compiled image and push it into your shared registry:</p>



<pre class="wp-block-code"><code lang="bash" class="language-bash">docker tag vllm_image:latest &lt;shared-registry-address&gt;/vllm_image:latest
docker push &lt;shared-registry-address&gt;/vllm_image:latest</code></pre>



<h3 class="wp-block-heading">vLLM inference server deployment</h3>



<p>Once your image has been pushed, it can be used with <em>AI Deploy</em>, using either the <em>ovhai CLI</em> or the <em>OVHcloud Control Panel (UI)</em>.</p>



<h5 class="wp-block-heading">Creating an access token </h5>



<p>Tokens are used as unique authenticators to securely access the <em>AI Deploy</em> apps. By creating a token, you can ensure that only authorized requests are allowed to interact with the <em>vLLM</em> endpoint. You can create this token by using the <em>OVHcloud Control Panel (UI)</em> or by running the following command:</p>



<pre class="wp-block-code"><code lang="" class="">ovhai token create vllm --role operator --label-selector name=vllm</code></pre>



<p>This will give you a token that you will need to keep.</p>



<h5 class="wp-block-heading">Creating a Hugging Face token (optionnal)</h5>



<p>Note that some models, such as <a href="https://huggingface.co/meta-llama/Meta-Llama-3-8B" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">LLaMA 3</a> require you to accept their license, hence, you need to create a <a href="https://huggingface.co/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">HuggingFace account</a>, accept the model’s license, and generate a <a href="https://huggingface.co/settings/tokens" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">token</a> by accessing your account settings, that will allow you to access the model.</p>



<p>For example, when visiting the HugginFace <a href="https://huggingface.co/google/gemma-2b" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Gemma model page</a>, you’ll see this (if you are logged in):</p>



<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="716" height="312" src="https://blog.ovhcloud.com/wp-content/uploads/2024/05/Screenshot-2024-05-22-at-14.15.21.png" alt="accept_model_conditions_hugging_face" class="wp-image-26768" srcset="https://blog.ovhcloud.com/wp-content/uploads/2024/05/Screenshot-2024-05-22-at-14.15.21.png 716w, https://blog.ovhcloud.com/wp-content/uploads/2024/05/Screenshot-2024-05-22-at-14.15.21-300x131.png 300w" sizes="auto, (max-width: 716px) 100vw, 716px" /></figure>



<p>If you want to use this model, you will have to Acknowledge the license, and then make sure to create a token in the <a href="https://huggingface.co/settings/tokens" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">tokens section</a>.</p>



<p>In the next step, we will set this token as an environment variable (named  <code>HF_TOKEN</code>). Doing this will enable us to use any LLM whose conditions of use we have accepted.</p>



<h5 class="wp-block-heading">Run the AI Deploy application</h5>



<p>Run the following command to deploy your <em>vLLM</em> server by running your customized <em>Docker</em> image:</p>



<pre class="wp-block-code"><code lang="" class="">ovhai app run &lt;shared-registry-address&gt;/vllm_image:latest \
  --name vllm_app \
  --flavor h100-1-gpu \
  --gpu 1 \
  --env HF_TOKEN="&lt;YOUR_HUGGING_FACE_TOKEN&gt;" \
  --label name=vllm \
  --default-http-port 8080 \
  -- python -m vllm.entrypoints.api_server --host 0.0.0.0 --port 8080 --model &lt;model&gt; --dtype half</code></pre>



<p><em>You just need to change the address of your registry to the one you used, and the name of the LLM you want to use. Also pay attention to the name of the image, its tag, and the label selector of your label if you haven&#8217;t used the same ones as those given in this tutorial.</em></p>



<p><strong>Parameters explanation</strong></p>



<ul class="wp-block-list">
<li><code>&lt;shared-registry-address&gt;/vllm_image:latest</code> is the image on which the app is based.</li>



<li><code>--name vllm_app</code> is an optional argument that allows you to give your app a custom name, making it easier to manage all your apps.</li>



<li><code>--flavor h100-1-gpu</code> indicates that we want to run our app on H100 GPU(s). You can access the full list of GPUs available by <code>running ovhai capabilities flavor list</code></li>



<li><code>--gpu 1</code> indicates that we request 1 GPU for that app.</li>



<li><code>--env HF_TOKEN</code> is an optional argument that allows us to set our Hugging Face token as an environment variable. This gives us access to models for which we have accepted the conditions.</li>



<li><code>--label name=vllm</code> allows to privatize our LLM by adding the token corresponding to the label selector <code>name=vllm</code>.</li>



<li><code>--default-http-port 8080</code> indicates that the port to reach on the app URL is the <code>8080</code>.</li>



<li><code>--python -m vllm.entrypoints.api_server --host 0.0.0.0 --port 8080 --model &lt;model&gt;</code> allows to start the vLLM API server. The specified &lt;model&gt; will be downloaded from Hugging Face. Here is a list of those that are <a href="https://docs.vllm.ai/en/latest/models/supported_models.html" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">supported by vLLM</a>. <a href="https://docs.vllm.ai/en/latest/models/engine_args.html" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Many arguments</a> can be used to optimize your inference.</li>
</ul>



<p>When this <code>ovhai app run</code> command is executed, several pieces of information will appear in your terminal. Get the ID of your application, and open the Info URL in a new tab. Wait a few minutes for your application to launch. When it is <strong>RUNNING</strong>, you can stream its logs by executing:</p>



<pre class="wp-block-code"><code class="">ovhai app logs -f &lt;APP_ID&gt;</code></pre>



<p>This will allow you to track the server launch, the model download and any errors you may encounter if you have used a model for which you have not accepted the user contract. </p>



<p>If all goes well, you should see the following output, which means that your server is up and running:</p>



<pre class="wp-block-code"><code class="">Started server process [11]
Waiting for application startup.
Application startup complete.
Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)</code></pre>



<h3 class="wp-block-heading">Interacting with your LLM</h3>



<p>Once the server is up and running, we can interact with our LLM by hitting the <code>/generate</code> endpoint.</p>



<p><strong>Using cURL</strong></p>



<p><em>Make sure you change the ID to that of your application so that you target the right endpoint. In order for the request to be accepted, also specify the token that you generated previously by executing</em> <code>ovhai token create</code>. Feel free to adapt the parameters of the request (<em>prompt</em>, <em>max_tokens</em>, <em>temperature</em>, &#8230;)</p>



<pre class="wp-block-code"><code lang="bash" class="language-bash">curl --request POST \                                             
  --url https://&lt;APP_ID&gt;.app.gra.ai.cloud.ovh.net/generate \
  --header 'Authorization: Bearer &lt;AI_TOKEN_generated_with_CLI&gt;' \
  --header 'Content-Type: application/json' \
  --data '{
        "prompt": "&lt;YOUR_PROMPT&gt;",
        "max_tokens": 50,
        "n": 1,
        "stream": false
}'</code></pre>



<p><strong>Using Python</strong></p>



<p><em>Here too, you need to add your personal token and the correct link for your application.</em></p>



<pre class="wp-block-code"><code lang="python" class="language-python">import requests
import json

# change for your host
APP_URL = "https://&lt;APP_ID&gt;.app.gra.ai.cloud.ovh.net"
TOKEN = "AI_TOKEN_generated_with_CLI"

url = f"{APP_URL}/generate"

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {TOKEN}"
}
data = {
    "prompt": "What a LLM is in AI?",
    "max_tokens": 100,
    "temperature": 0
}

response = requests.post(url, headers=headers, data=json.dumps(data))

print(response.json()["text"][0])</code></pre>



<h3 class="wp-block-heading" id="AI-ENDPOINTS">OVHcloud AI Endpoints</h3>



<p>If you are not interested in building your own image and deploying your own LLM inference server, you can use OVHcloud&#8217;s new <em><strong><a href="https://endpoints.ai.cloud.ovh.net/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Endpoints</a></strong> </em>product which will make your life definitely easier!</p>



<p><a href="https://endpoints.ai.cloud.ovh.net/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><em>AI Endpoints</em></a> is a serverless solution that provides AI APIs, enabling you to easily use pre-trained and optimized AI models in your applications. </p>



<figure class="wp-block-video"><video height="1400" style="aspect-ratio: 2560 / 1400;" width="2560" controls src="https://blog.ovhcloud.com/wp-content/uploads/2024/05/demo-ai-endpoints.mp4"></video></figure>



<p class="has-text-align-center"><em>Overview of AI Endpoints</em></p>



<p>You can use LLM as a Service, choosing the desired model (such as <em>LLaMA</em>, <em>Mistral</em>, or <em>Mixtral</em>) and making an API call to use it in your application. This will allow you to interact with these models without even having to deploy them!</p>



<p>In addition to LLM capabilities, <a href="https://endpoints.ai.cloud.ovh.net/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><em>AI Endpoints</em></a> also offers a range of other AI models, including speech-to-text, translation, summarization, embeddings and computer vision. </p>



<p>Best of all, <a href="https://endpoints.ai.cloud.ovh.net/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><em>AI Endpoints</em></a> is currently in alpha phase and is <strong>free to use</strong>, making it an accessible and affordable solution for developers seeking to explore the possibilities of AI. Check <a href="https://blog.ovhcloud.com/enhance-your-applications-with-ai-endpoints/" data-wpel-link="internal">this article</a> and try it out today to discover the power of AI!</p>



<p>Join our <a href="https://discord.gg/ovhcloud" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Discord server</a> to interact with the community and send us your feedbacks (#<em>ai-endpoints</em> channel)!</p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fhow-to-serve-llms-with-vllm-and-ovhcloud-ai-deploy%2F&amp;action_name=How%20to%20serve%20LLMs%20with%20vLLM%20and%20OVHcloud%20AI%20Deploy&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		<enclosure url="https://blog.ovhcloud.com/wp-content/uploads/2024/05/demo-ai-endpoints.mp4" length="14424826" type="video/mp4" />

			</item>
		<item>
		<title>Understanding Image Generation: A Beginner&#8217;s Guide to Generative Adversarial Networks</title>
		<link>https://blog.ovhcloud.com/understanding-image-generation-beginner-guide-generative-adversarial-networks-gan/</link>
		
		<dc:creator><![CDATA[Mathieu Busquet]]></dc:creator>
		<pubDate>Tue, 05 Sep 2023 09:21:57 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[AI Deploy]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Docker]]></category>
		<category><![CDATA[Machine learning]]></category>
		<category><![CDATA[PyTorch]]></category>
		<category><![CDATA[Streamlit]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=25664</guid>

					<description><![CDATA[How to train a generative adversarial network (GAN) to generate images ?
How to train a DCGAN ? 
How GAN and DCGAN work ?<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Funderstanding-image-generation-beginner-guide-generative-adversarial-networks-gan%2F&amp;action_name=Understanding%20Image%20Generation%3A%20A%20Beginner%26%238217%3Bs%20Guide%20to%20Generative%20Adversarial%20Networks&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p><em>All the code related to this article is available in our dedicated <a href="https://github.com/ovh/ai-training-examples/tree/main/notebooks/computer-vision/image-generation/miniconda/dcgan-image-generation" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">GitHub repository</a><a href="https://github.com/ovh/ai-training-examples/tree/main/apps/streamlit/speech-to-text" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">.</a> You can reproduce all the experiments with</em> <a href="https://www.ovhcloud.com/en-gb/public-cloud/ai-notebooks/" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">OVHcloud AI Notebooks</a>.</p>



<figure class="wp-block-image aligncenter size-full is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2023/08/dcgan_evolution.gif" alt="" class="wp-image-25680" style="width:549px;height:549px" width="549" height="549"/><figcaption class="wp-element-caption"><em>Fake samples generated by the model during training</em></figcaption></figure>



<p>Have you ever been amazed by what generative artificial intelligence could do, and wondered how it can generate realistic images 🤯🎨?</p>



<p>In this tutorial, we will embark on an exciting journey into the world of <strong>Generating Adversarial Networks (GANs)</strong>, a revolutionary concept in generative AI. No prior experience is necessary to follow along. We will walk you through every step, starting with the basic concepts and gradually building up to the implementation of <strong>Deep Convolutional GANs (DCGANs)</strong>.</p>



<p><em><strong>By the end of this tutorial, you will be able to generate your own images!</strong></em></p>



<h3 class="wp-block-heading">Introduction</h3>



<p>GANs have been introduced by Ian Goodfellow et al. in 2014 in the paper <a href="https://arxiv.org/abs/1406.2661" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><em>Generative Adversarial Nets</em></a>. GANs have become very popular last years, allowing us, for example, to:</p>



<ul class="wp-block-list">
<li>Generate high-resolution images (avatars, objects and scenes)</li>



<li>Augment our data (generating synthetic (fake) data samples for limited datasets)</li>



<li>Enhance the resolution of low-resolution images (upscaling images)</li>



<li>Transfer image style of one image to another (Black and white to color)</li>



<li>Predict facial appearances at different ages (Face Aging)</li>
</ul>



<h4 class="wp-block-heading">What is a GAN and how it works?</h4>



<p>A GAN is composed of two main components: a <strong>generator <em>G</em></strong> and a <strong>discriminator <em>D</em></strong>.</p>



<p>Each component is a neural network, but their roles are different:</p>



<ul class="wp-block-list">
<li>The purpose of the generator <em>G</em> is to <strong>reproduce the data distribution of the training data <em>𝑥</em></strong>, to <strong>generate synthetic samples</strong> for the same data distribution. These data are often images, but can also be audio or text.</li>
</ul>



<ul class="wp-block-list">
<li>On the other hand, the discriminator <em>D</em> is a kind of judge who will <strong>estimate whether a sample <strong><em>𝑥</em></strong> is real or fake</strong> (has been generated). It is in fact a <strong>classifier</strong> that will say if a sample comes from the real data distribution or the generator.</li>
</ul>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2023/08/GAN_Illustration-1024x311.png" alt="" class="wp-image-25667" style="width:1201px;height:365px" width="1201" height="365" srcset="https://blog.ovhcloud.com/wp-content/uploads/2023/08/GAN_Illustration-1024x311.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/GAN_Illustration-300x91.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/GAN_Illustration-768x233.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/GAN_Illustration-1536x467.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/GAN_Illustration.png 1882w" sizes="auto, (max-width: 1201px) 100vw, 1201px" /></figure>



<p class="has-text-align-center"><em>Illustration of GAN training</em></p>



<p>During training, the generator starts with a <strong>vector of random noise</strong> (z) as input and produces synthetic samples G(z).</p>



<p>As training progresses, it refines its output, making the generated data G(z) more and more similar to the real data. The goal of the generator is to <strong>outsmart</strong> the discriminator into classifying its generated samples as real.</p>



<p>Meanwhile, the discriminator is presented with both real samples from the training data and fake samples from the generator. As it learns to discriminate between the two, it <strong>provides feedback</strong> to the generator about the quality of its generated samples. This is why the term <em>&#8220;<strong>adversarial</strong>&#8220;</em> is used here.</p>



<h4 class="wp-block-heading">Mathematical approach</h4>



<p>In fact, GANs come from game theory, where <em>D</em> and <em>G</em> are playing a two-player <em>minimax</em> game with the following value function:</p>



<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2023/08/eq-1024x59.png" alt="value-objective-function" class="wp-image-25668" style="width:802px;height:46px" width="802" height="46" srcset="https://blog.ovhcloud.com/wp-content/uploads/2023/08/eq-1024x59.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/eq-300x17.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/eq-768x45.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/eq.png 1447w" sizes="auto, (max-width: 802px) 100vw, 802px" /></figure>



<p></p>



<p>As we can observe, the <strong>discriminator aims to maximize the V function</strong>. To do this, it must maximize each of the two parts of the equation that will be added together. This means maximizing <em>log(D(x))</em>, so <em>D(x)</em> in the first equation part (probability of real data), and minimizing the <em>D(G(z))</em> in the second part (probability of fake data).</p>



<p>Simultaneously, the <strong>generator tries to minimize the function</strong>. It only comes into play in the second part of the function, where it tries to obtain the highest value of <em>D(G(z))</em> in order to fool the discriminator.</p>



<p>This constant confrontation between the generator and the discriminator creates an <strong>iterative learning process</strong>, where the generator gradually improves to produce increasingly realistic G(z) samples, and the discriminator becomes increasingly accurate in its distinction of the data presented to it.</p>



<p>In an <strong>ideal scenario</strong>, this iterative process would reach an <strong>equilibrium point</strong>, where the generator produces data that is indistinguishable from real data, and the discriminator&#8217;s performance is 50% (random guessing).</p>



<p>GANs may not always reach this equilibrium due to the training process being sensitive to factors (architecture, hyperparameters, dataset complexity). The generator and discriminator may reach a dead end, oscillating between solutions or facing <strong>mode collapse</strong>, resulting in limited sample diversity. Also, it is important that discriminator does not start off too strong, otherwise the generator will not get any information on how to improve itself, since it does not know what the real data looks like, as shown in the illustration above.</p>



<h4 class="wp-block-heading">DCGAN (Deep Convolutional GANs)</h4>



<p><strong>DCGAN</strong><em> </em>has been introduced in 2016 by Alec Radford et al. in the paper<em> <a href="https://arxiv.org/abs/1511.06434" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks</a>.</em></p>



<p>Its new convolutional architecture has considerably improved the quality and stability of image synthesis compared to classical GANs. Here are the major changes:</p>



<ul class="wp-block-list">
<li>Replace any pooling layers with strided convolutions (discriminator) and fractional-strided convolutions (generator), making them exceptionally well-suited for image generation tasks.</li>



<li>Use batchnorm in both the generator and the discriminator.</li>



<li>Removing fully connected hidden layers for deeper architectures.</li>



<li>Use ReLU activation in generator for all layers except for the output, which uses tanh.</li>



<li>Use LeakyReLU activation in the discriminator for all layer.</li>
</ul>



<p><em>The operation principles</em> <em>of these layers will not be explained in this tutorial.</em></p>



<h3 class="wp-block-heading">Use case &amp; Objective</h3>



<p>Now that we know the concept of image generation, let’s try to put it into practice!</p>



<p>In this tutorial, we will <strong>implement a</strong> <strong>DCGAN</strong> architecture and <strong>train it on a medical dataset</strong> to generate new images. This dataset is the <a href="https://www.kaggle.com/datasets/paultimothymooney/chest-xray-pneumonia" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><em>Chest X-Ray Pneumonia</em></a>. All the code explained here will run on <strong>a single GPU</strong>, linked to <a href="https://www.ovhcloud.com/en-gb/public-cloud/ai-notebooks/" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">OVHcloud AI Notebooks</a>, and is given in our <em><a href="https://github.com/ovh/ai-training-examples/tree/main/notebooks/computer-vision/image-generation/miniconda/dcgan-image-generation" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">GitHub repository</a><a href="https://github.com/ovh/ai-training-examples/tree/main/apps/streamlit/speech-to-text" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">.</a></em></p>



<h3 class="wp-block-heading">1 &#8211; Explore dataset and prepare it for training</h3>



<p><em>The </em><a href="https://www.kaggle.com/datasets/paultimothymooney/chest-xray-pneumonia" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><em>Chest X-Ray Pneumonia</em> dataset</a> contains <strong>5,863 X-Ray images</strong>. This may not be sufficient for training a robust DCGAN, but we are going to try! Indeed, the DCGAN research paper is conducting its study on a dataset of over 60,000 images. </p>



<p>Additionally, it is important to consider that the dataset contains two classes (Pneumonia/Normal). While we will not separate the classes for data quantity purposes, improving our network&#8217;s performance could be beneficial. Furthermore, it is advisable to verify if the classes are well-balanced. </p>



<p>Only the training subset will be used here (5,221 images). Let&#8217;s take a look at our images:</p>



<figure class="wp-block-image aligncenter size-full is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2023/08/training_images.png" alt="chest-x-ray-pneumonia-dataset-images" class="wp-image-25669" style="width:366px;height:366px" width="366" height="366" srcset="https://blog.ovhcloud.com/wp-content/uploads/2023/08/training_images.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/training_images-150x150.png 150w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/training_images-70x70.png 70w" sizes="auto, (max-width: 366px) 100vw, 366px" /><figcaption class="wp-element-caption"><em>Chest X-Ray Pneumonia dataset real samples</em></figcaption></figure>



<p>We notice that we have quite similar images. The <strong>backgrounds are identical</strong>, and the chests are <strong>often centered in the same way</strong>, which should help the network learn.</p>



<h4 class="wp-block-heading">Preprocessing</h4>



<p><strong>Data pre-processing</strong> is a crucial step when you want to facilitate and accelerate model convergence and obtain high-quality results. This pre-processing can be broken down into various generic operations that are commonly applied.</p>



<p>Each image in the dataset will be <strong>transformed</strong>. They are then assembled in packets of 128 images, which we call <strong>batches</strong>. This avoids loading the dataset all at once, which could use up a lot of memory. This also makes the most of <strong>GPUs parallelism</strong>.</p>



<p>The applied <strong>transformation</strong> will:</p>



<ul class="wp-block-list">
<li><strong>Resize</strong> <strong>images</strong> to (64x64xchannels), dimensions expected by our DCGAN. This avoids keeping the original dimensions of the images, which are all different. This also reduces the number of pixels which accelerates the model training (computation cost).</li>



<li><strong>Convert images to tensors</strong> (format expected by models).</li>



<li><strong>Standardize &amp; Normalize the image&#8217;s pixel values</strong>, which improves training performance in AI.</li>
</ul>



<p><em>If original images are smaller than the desired size, transformation will pad the images to reach the specified size.</em></p>



<p><em>We won&#8217;t show you the code for these transformations here, but as mentioned earlier, you can find it in its entirety on our <a href="https://github.com/ovh/ai-training-examples/tree/main/notebooks/computer-vision/image-generation/miniconda/dcgan-image-generation" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">GitHub repository</a><a href="https://github.com/ovh/ai-training-examples/tree/main/apps/streamlit/speech-to-text" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">.</a> You can reproduce all the experiments with <a href="https://www.ovhcloud.com/en-gb/public-cloud/ai-notebooks/" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">OVHcloud AI Notebooks</a>.</em></p>



<h3 class="wp-block-heading">Step 2 &#8211; Define the models</h3>



<p>Now that the images are ready, we can define our DCGAN:</p>



<h4 class="wp-block-heading">Generator implementation</h4>



<figure class="wp-block-image aligncenter size-full is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2023/09/Generator-Frame1-1.svg" alt="" class="wp-image-25742" style="width:1200px;height:319px" width="1200" height="319"/></figure>



<p>As shown in the image above, the generator architecture is designed to take a random noise vector z as input and transform it into a (3x64x64) image, which is the same size as the images in the training dataset.</p>



<p>To do this, it uses <strong>transposed convolutions</strong> (also falsely known as deconvolutions) to progressively upsample the noise vector <em>z</em> until it reaches the desired output image size. In fact, the transposed convolutions help the generator capture complex patterns and generate realistic images during the training process.</p>



<p>The final <em>Tanh()</em> activation function ensures that the pixel values of the generated images are in the range <em>[-1, 1]</em>, which also corresponds to our transformed training images (we had normalized them).</p>



<p><em>The code for implementing this generator is given in its entirety on our <a href="https://github.com/ovh/ai-training-examples/tree/main/notebooks/computer-vision/image-generation/miniconda/dcgan-image-generation" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">GitHub repository</a><a href="https://github.com/ovh/ai-training-examples/tree/main/apps/streamlit/speech-to-text" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">.</a></em></p>



<h3 class="wp-block-heading">Discriminator implementation</h3>



<p>As a reminder, the discriminator acts as a sample classifier. Its aim is to distinguish the data generated by the generator from the real data in the training dataset.</p>



<figure class="wp-block-image size-full"><img decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2023/09/Discriminator-Frame.svg" alt="DCGAN architecture discriminator" class="wp-image-25743"/></figure>



<p></p>



<p>As shown in the image above, the discriminator takes an input image of size (3x64x64) and <strong>outputs a probability</strong>, indicating if the input image is real (1) or fake (0).</p>



<p>To do this, it uses convolutional layers, batch normalization layers, and LeakyReLU functions, which are presented in the paper as architecture guidelines to follow. Each convolutional block is designed to capture features of the input images, moving from low-level features such as edges and textures for the first blocks, to more abstract and complex features such as shapes and objects for the last.</p>



<p>Probability is obtained thanks to the use of the sigmoid activation, which squashes the output to the range <em>[0, 1]</em>.</p>



<p><em>The code for implementing this discriminator is given in its entirety on our <a href="https://github.com/ovh/ai-training-examples/tree/main/notebooks/computer-vision/image-generation/miniconda/dcgan-image-generation" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">GitHub repository</a><a href="https://github.com/ovh/ai-training-examples/tree/main/apps/streamlit/speech-to-text" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">.</a></em></p>



<h4 class="wp-block-heading">Define loss function and labels</h4>



<p>Now that we have our adversarial networks, we need to define the <strong>loss function</strong>. </p>



<p>The adversarial loss <em>V(D, G)</em> can be approximated using the<strong> <em>Binary Cross Entropy (BCE)</em></strong> loss function, which is commonly used for GANs because it measures the binary cross-entropy between the discriminator&#8217;s output (probability) and the ground truth labels during training (here we fix real=1 or fake=0). It will calculate the loss for both the generator and the discriminator during <strong>backpropagation</strong>.</p>



<p><em>BCE Loss</em> is computed with the following equation, where <em>target</em> is the ground truth label (1 or 0), and <em>ŷ</em> is the discriminator&#8217;s probability output:</p>



<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2023/09/bce_eq-1024x102.png" alt="Binary Cross Entropy loss function" class="wp-image-25744" style="width:616px;height:61px" width="616" height="61" srcset="https://blog.ovhcloud.com/wp-content/uploads/2023/09/bce_eq-1024x102.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2023/09/bce_eq-300x30.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2023/09/bce_eq-768x76.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2023/09/bce_eq.png 1246w" sizes="auto, (max-width: 616px) 100vw, 616px" /></figure>



<p>If we compare this equation to our previous <em>V(D, G)</em> objective, we can see that BCE loss term for real data samples corresponds to the first term in <em>V(D, G)</em>, <em>log(D(x))</em>, and the BCE loss term for fake data samples corresponds to the second term in V(D, G), log(1 &#8211; D(G(z))).</p>



<p>In this binary case, the BCE can be represented by two distinct curves, which describe how the loss varies as a function of the predictions ŷ of the model. The first shows the loss as a function of the calculated probability, for a synthetic sample (label y = 0). The second describes the loss for a real sample (label y = 1).</p>



<figure class="wp-block-image aligncenter size-full"><img decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2023/09/BCE-LOSS1.svg" alt="" class="wp-image-25747"/><figcaption class="wp-element-caption"><em>Variations of BCE loss over the interval ]0;1[ for different targeted labels (y =0 and y = 1)</em></figcaption></figure>



<p>We can see that<strong> the further the prediction ŷ is from the actual label assigned (target), the greater the loss</strong>. On the other hand, a prediction that is close to the truth will generate a loss very close to zero, which will not impact the model since it appears to classify the samples successfully.</p>



<p>During training, <strong>the goal is to minimize the BCE loss</strong>. This way, the discriminator will learn to correctly classify real and generated samples, while the generator will learn to generate samples that can &#8220;fool&#8221; the discriminator into classifying them as real.</p>



<h4 class="wp-block-heading">Hyperparameters</h4>



<p>Hyperparameters were chosen according to the indications given by in the <a href="https://arxiv.org/abs/1511.06434" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">DCGAN paper</a>.</p>



<h3 class="wp-block-heading">Step 3 &#8211; Train the model</h3>



<p>We are now ready to train our DCGAN !</p>



<p>To monitor the generator&#8217;s learning progress, we will create a <strong>constant noise vector</strong>, denoted as <code><strong>fixed_noise</strong></code>. </p>



<p>During the training loop, we will regularly feed this <code><strong>fixed_noise</strong></code> into the generator. Using a same constant vector makes it possible to generate similar images each time, and to observe the evolution of the samples produced by the generator over the training cycles.</p>



<pre class="wp-block-code"><code lang="python" class="language-python">fixed_noise = torch.randn(64, nz, 1, 1, device=device)</code></pre>



<p>Also, we will compute the <strong>BCE Loss</strong> of the Discriminator and the Generator separately. This will enable them to improve over the training cycles. For each batch, these losses will be calculated and saved into lists, enabling us to plot the losses after training for each training iteration.</p>



<h4 class="wp-block-heading">Training Process</h4>



<p>Thanks to our fixed noise vector, we were able to capture the evolution of the generated images, providing an overview of how the model learned to reproduce the distribution of training data over time.</p>



<p>Here are the samples generated by our model during training, when fed with a fixed noise, over 100 epochs.  For visualization, a display of 9 generated images was chosen : </p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="381" src="https://blog.ovhcloud.com/wp-content/uploads/2023/08/Results-Chests-1024x381.png" alt="generated-samples-epoch" class="wp-image-25678" srcset="https://blog.ovhcloud.com/wp-content/uploads/2023/08/Results-Chests-1024x381.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/Results-Chests-300x112.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/Results-Chests-768x286.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/Results-Chests-1536x572.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/Results-Chests-2048x762.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption"><em>Evolution of the synthetic samples produced by the generator over time, from a constant random vector of noise z</em></figcaption></figure>



<p>At the start of the training process (epoch 1), the images generated show the characteristics of the random noise vector. </p>



<p>As the training progresses, the <strong>weights</strong> of the discriminator and generator <strong>are updated</strong>. Noticeable changes occur in the generated images. Epochs 5, 10 and 20 show quick and subtle evolution of the model, which begins to capture more distinct shapes and structures.</p>



<p>Next epochs show an improvement in edges and details. Generated samples become sharper and more identifiable, and by epoch 100 the images are quite realistic despite the limited data available (5,221 images).</p>



<p><em>Do not hesitate to play with the hyperparameters to try and vary your results! You can also check out the <a href="https://github.com/soumith/ganhacks" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">GAN hacks repo</a>, which shares many tips dedicated to training GANs.</em> <em>Training time will vary according to your resources and the number of imag</em>es.</p>



<h3 class="wp-block-heading">Step 4 &#8211; Results &amp; Inference</h3>



<p>Once the generator has been trained over 100 epochs, we are free to generated unlimited new images, based on a new random noise vector each time.</p>



<p>In order to retain only relevant samples, a data <strong>post-processing</strong> step was set up to assess the quality of the images generated. All generated images were sent to the trained discriminator. Its job is to evaluate the probability of the generated samples, and keep only those which have obtained a probability greater than a fixed threshold (0.8 for example).</p>



<p>This way, we have obtained the following images, compared to the original ones. We can see that despite the small number of images in our dataset, the model was able to identify and learn the distribution of the real images data and reproduce them in a realistic way:</p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="498" src="https://blog.ovhcloud.com/wp-content/uploads/2023/08/results-1024x498.png" alt="real-images-vs-generated" class="wp-image-25679" srcset="https://blog.ovhcloud.com/wp-content/uploads/2023/08/results-1024x498.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/results-300x146.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/results-768x373.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/results-1536x747.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/results.png 1888w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p class="has-text-align-center"><em>O<em>riginal dataset images (left)</em>, compared with images selected from generated samples (right) </em></p>



<h3 class="wp-block-heading">Step 5 &#8211; Evaluate the model</h3>



<p>A DCGAN model (and GANs in general) can be evaluated in several ways. A <a href="https://arxiv.org/abs/1802.03446" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">research paper</a> has been published on this subject.</p>



<h3 class="wp-block-heading">Quantitative measures</h3>



<p>On the <strong>quantitative</strong> side, the <strong>evolution of the BCE loss</strong> of the generator and the discriminator provide indications of the quality of the model during training.</p>



<p>The evolution of these losses is illustrated in the figure below, where the discriminator losses are shown in orange and the generator losses in blue, over a total of 4100 iterations. Each iteration corresponds to a complete pass of the dataset, which is split into 41 batches of 128 images. Since the model has been trained over 100 epochs, loss tracking is available over 4100 iterations (41*100).</p>



<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2023/08/loss-1024x568.png" alt="generator-discriminator-loss" class="wp-image-25677" style="width:706px;height:392px" width="706" height="392" srcset="https://blog.ovhcloud.com/wp-content/uploads/2023/08/loss-1024x568.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/loss-300x167.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/loss-768x426.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/loss-1536x853.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/loss.png 1668w" sizes="auto, (max-width: 706px) 100vw, 706px" /></figure>



<p>At the start of training, both curves show high loss values, indicating an <strong>unstable start</strong> of the DCGAN. This results in very <strong>unrealistic images being generated</strong>, where the nature of the <strong>random noise is still too present </strong>(see epoch 1 on the previous image). The discriminator is therefore too powerful for the moment.</p>



<p>A few iterations later, the losses converge towards lower values, demonstrating the improvement in the model&#8217;s performance.</p>



<p>However, from epoch 10, a trend emerges. The discriminator loss begins to decrease very slightly, indicating an improvement in its ability to determine which samples are genuine and which are synthetic. On the other hand, the generator&#8217;s loss shows a slight increase, suggesting that it needs to improve in order to generate images capable of deceiving its adversary.</p>



<p>More generally, fluctuations are observed throughout training due to the competitive nature of the network, where the generator and discriminator are constantly adjusting relative to each other. These moments of fluctuation may reflect attempts to adjust the two networks. Unfortunately, they do not ultimately appear to lead to an overall reduction in network loss.</p>



<h4 class="wp-block-heading">Qualitative measures</h4>



<p>Losses are not the only performance indicator. They are often insufficient to assess the visual quality of the images generated.</p>



<p>This is confirmed by an analysis of the previous graphs, where we inevitably notice that the images generated at epoch 10 are not the most realistic, while the loss is approximately the same as that obtained at epoch 100.</p>



<p>One commonly used method is <strong>human visual</strong> assessment. However, this manual assessment has a number of limitations. It is subjective, does not fully reflect the capabilities of the models, cannot be reproduced and is <strong>expensive</strong>.</p>



<p>Research is therefore focusing on finding new, more reliable and less costly methods. This is particularly the case with <strong>CAPTCHAs</strong>, tests designed to check whether a user is a human or a robot before accessing content. These tests sometimes present pairs of generated and real images where the user has to indicate which of the two seems more authentic. This ultimately amounts to training a discriminator and a generator manually.</p>



<p class="has-text-align-center"><em>All the code related to this article is available in our dedicated <a href="https://github.com/ovh/ai-training-examples/tree/main/notebooks/computer-vision/image-generation/miniconda/dcgan-image-generation" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">GitHub repository</a><a href="https://github.com/ovh/ai-training-examples/tree/main/apps/streamlit/speech-to-text" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">.</a> You can reproduce all the experiments with</em> <a href="https://www.ovhcloud.com/en-gb/public-cloud/ai-notebooks/" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">OVHcloud AI Notebooks</a>.</p>



<h3 class="wp-block-heading">Conclusion</h3>



<p>I hope you have enjoyed this post!</p>



<p>You are now more comfortable with image generation and the concept of Generative Adversarial Networks! Now you know how to generate images from your own dataset, even if it&#8217;s not very large!</p>



<p>You can train your own network on your dataset and generate images of faces, objects and landscapes. Happy GANning! 🎨🚀</p>



<p>You can check our other computer vision articles to learn how to:</p>



<ul class="wp-block-list">
<li><a href="https://blog.ovhcloud.com/image-segmentation-train-a-u-net-model-to-segment-brain-tumors/" data-wpel-link="internal">Perform Brain tumor segmentation using U-Net</a></li>



<li><a href="https://blog.ovhcloud.com/object-detection-train-yolov5-on-a-custom-dataset/" target="_blank" rel="noreferrer noopener" data-wpel-link="internal">Train YOLOv5 on a custom dataset Object detection:</a></li>
</ul>



<p>Paper references</p>



<ul class="wp-block-list">
<li><a href="https://arxiv.org/abs/1406.2661" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">Generative Adversarial Nets, Ian Goodfellow, 2014</a></li>



<li><a href="https://arxiv.org/abs/1511.06434" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, Alec Radford et al., 2016</a></li>



<li><a href="https://arxiv.org/abs/1802.03446" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Pros and Cons of GAN Evaluation Measures, Ali Borji, 2018</a></li>
</ul>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Funderstanding-image-generation-beginner-guide-generative-adversarial-networks-gan%2F&amp;action_name=Understanding%20Image%20Generation%3A%20A%20Beginner%26%238217%3Bs%20Guide%20to%20Generative%20Adversarial%20Networks&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Create your solution for Sign Language recognition with OVHcloud AI tools</title>
		<link>https://blog.ovhcloud.com/create-your-solution-for-sign-language-recognition-with-ovhcloud-ai-tools/</link>
		
		<dc:creator><![CDATA[Eléa Petton]]></dc:creator>
		<pubDate>Fri, 01 Sep 2023 09:27:49 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[AI Deploy]]></category>
		<category><![CDATA[AI Notebooks]]></category>
		<category><![CDATA[AI Training]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Machine learning]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=25709</guid>

					<description><![CDATA[A guide to build a solution for sign language interpretation based on a Computer Vision algorithm: YOLOv7. Introduction In the field of Artificial Intelligence, we often talk about Computer Vision and Object Detection, but what role do these AI techniques play in the vast field of healthcare? We&#8217;ll see that data plays a key role [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fcreate-your-solution-for-sign-language-recognition-with-ovhcloud-ai-tools%2F&amp;action_name=Create%20your%20solution%20for%20Sign%20Language%20recognition%20with%20OVHcloud%20AI%20tools&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p><em>A guide to build a solution for sign language interpretation based on a <strong>Computer Vision</strong> algorithm: <a href="https://github.com/WongKinYiu/yolov7" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">YOLOv7</a>.</em></p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="617" src="https://blog.ovhcloud.com/wp-content/uploads/2023/08/Capture-decran-2023-08-28-a-11.23.52-1024x617.png" alt="" class="wp-image-25717" srcset="https://blog.ovhcloud.com/wp-content/uploads/2023/08/Capture-decran-2023-08-28-a-11.23.52-1024x617.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/Capture-decran-2023-08-28-a-11.23.52-300x181.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/Capture-decran-2023-08-28-a-11.23.52-768x463.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/Capture-decran-2023-08-28-a-11.23.52-1536x925.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/Capture-decran-2023-08-28-a-11.23.52.png 1738w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption"><em>Sign Language recognition with OVHcloud AI tools</em></figcaption></figure>



<h2 class="wp-block-heading">Introduction</h2>



<p>In the field of Artificial Intelligence, we often talk about <strong>Computer Vision</strong> and <strong>Object Detection</strong>, but what role do these AI techniques play in the vast field of healthcare? We&#8217;ll see that data plays a key role in AI applications for the medical-social sector. </p>



<p><strong>Have you ever wondered if AI could be the solution to understand sign language?</strong></p>



<p>Through this article, you will see that it is possible to use an AI model to detect signed letters. How? Thanks to the power of <strong>Computer Vision</strong> and <strong>Transfer Learning</strong>!</p>



<p><strong>The article is organized as follows:</strong></p>



<ul class="wp-block-list">
<li>Objectives</li>



<li>American Sign Language Dataset</li>



<li>Fine-Tune YOLOv7 model for Sign Language detection</li>



<li>Deploy custom YOLOv7 model for real time detection</li>
</ul>



<p><em>All the code for this blogpost is available in our dedicated <a href="https://github.com/ovh/ai-training-examples" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">G</a><a href="https://github.com/ovh/ai-training-examples/blob/main/notebooks/computer-vision/object-detection/miniconda/yolov7/notebook_object_detection_yolov7_asl.ipynb" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">itHub repository</a>. You can <a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-notebooks-yolov7-sign-language?id=kb_article_view&amp;sysparm_article=KB0057517" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Fine-Tune YOLOv7</a> to detect signs with <strong>AI Notebooks</strong> tool and <a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-deploy-streamlit-yolov7-sign-language?id=kb_article_view&amp;sysparm_article=KB0057491" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">deploy the custom model</a> for real-time detection with <strong>AI Deploy</strong>.</em></p>



<h2 class="wp-block-heading">Objectives</h2>



<p>The purpose of this article is to show how it is possible to deploy a solution for <strong>Sign Language recognition</strong> thanks to AI. </p>



<p>An <strong>Object Detection</strong> algorithm will be used to detect the various signs and categorize them. Although closely related to image classification, <strong>Object Detection </strong>performs <strong>Image Classification</strong> on a more precise scale.</p>



<p>In this article, you will learn how to <strong><a href="https://github.com/ovh/ai-training-examples/blob/main/notebooks/computer-vision/object-detection/miniconda/yolov7/notebook_object_detection_yolov7_asl.ipynb" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Fine-Tune YOLOv7</a></strong> model for <strong>Sign Language</strong> detection.</p>



<p>Once the model has been trained, what do you think of deploying a web app? <strong>Streamlit</strong> is the answer to your needs! At the end, AI will enable you to understand Sign Language, with <strong>real-time detection</strong> and written transcription.</p>



<h2 class="wp-block-heading">American Sign Language Dataset</h2>



<p>First of all, let&#8217;s talk data!</p>



<p><a href="https://public.roboflow.com/object-detection/american-sign-language-letters/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">American Sign Language Letters Dataset v1</a> is a public set of alphabet images and their labels created by <strong>David Lee</strong>.</p>



<p>This dataset is composed of <strong>1728 images</strong> and <strong>26 classes</strong> with the alphabet letters from A to Z.</p>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:100%">
<figure class="wp-block-image aligncenter size-full is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2023/08/Capture-decran-2023-08-28-a-11.23.00-1.png" alt="" class="wp-image-25725" style="width:377px;height:390px" width="377" height="390" srcset="https://blog.ovhcloud.com/wp-content/uploads/2023/08/Capture-decran-2023-08-28-a-11.23.00-1.png 935w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/Capture-decran-2023-08-28-a-11.23.00-1-290x300.png 290w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/Capture-decran-2023-08-28-a-11.23.00-1-768x794.png 768w" sizes="auto, (max-width: 377px) 100vw, 377px" /><figcaption class="wp-element-caption"><em>ASL dataset</em></figcaption></figure>
</div>
</div>



<p>This dataset is composed of <strong>images</strong> and their corresponding <strong>labels</strong>, which are in <strong>txt</strong> format and give information about the location of the object thanks to the <em>x</em>, <em>y</em> coordinates as well as the <em>height</em> and <em>width</em> of the bounding box.</p>



<figure class="wp-block-image aligncenter size-full is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2022/01/mug_annotation-1024x635.png" alt="" class="wp-image-21645" style="width:1024px;height:635px" width="1024" height="635" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/01/mug_annotation-1024x635.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/mug_annotation-300x186.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/mug_annotation-768x477.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/mug_annotation-1536x953.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/mug_annotation.png 1713w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption"><em>Label components of the ASL dataset for YOLOv7 usage</em></figcaption></figure>



<p>This data format is ideal for training a <strong>YOLO</strong> type Object Detection model.</p>



<h2 class="wp-block-heading">Fine-Tune YOLOv7 model for Sign Language recognition</h2>



<p>How can the model YOLOv7 be trained to recognize American Sign Language letters? </p>



<h6 class="wp-block-heading"><strong>Object Detection with YOLOv7 </strong></h6>



<p><strong>YOLOv7</strong> is part of the &#8220;YOLO family&#8221; algorithms, which actually means &#8220;<em>You Only Look Once</em>.&#8221; In fact, unlike many detection algorithms, YOLO is a neural network that evaluates the position and class of identified objects from a <strong>single end-to-end network</strong> that detects classes using a fully connected layer.</p>



<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2023/08/Capture-decran-2023-08-28-a-14.02.47-1024x991.png" alt="" class="wp-image-25722" style="width:533px;height:515px" width="533" height="515" srcset="https://blog.ovhcloud.com/wp-content/uploads/2023/08/Capture-decran-2023-08-28-a-14.02.47-1024x991.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/Capture-decran-2023-08-28-a-14.02.47-300x290.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/Capture-decran-2023-08-28-a-14.02.47-768x743.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/Capture-decran-2023-08-28-a-14.02.47.png 1059w" sizes="auto, (max-width: 533px) 100vw, 533px" /><figcaption class="wp-element-caption"><em>Object Detection</em></figcaption></figure>



<p>Therefore, YOLO models pass only once on each image to detect the objects. This Object Detection model is particularly known for its <strong>speed</strong> and <strong>accuracy</strong> and allows <strong>real-time recognition</strong>.</p>



<p>But how can the model YOLOv7 be trained to recognize American Sign Language letters? Follow the next steps and let the magic work!</p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="266" src="https://blog.ovhcloud.com/wp-content/uploads/2023/08/Capture-decran-2023-08-28-a-11.23.27-1024x266.png" alt="" class="wp-image-25719" srcset="https://blog.ovhcloud.com/wp-content/uploads/2023/08/Capture-decran-2023-08-28-a-11.23.27-1024x266.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/Capture-decran-2023-08-28-a-11.23.27-300x78.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/Capture-decran-2023-08-28-a-11.23.27-768x200.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/Capture-decran-2023-08-28-a-11.23.27-1536x400.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/Capture-decran-2023-08-28-a-11.23.27.png 1841w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption"><em>Fine-Tuning of YOLOv7</em></figcaption></figure>



<p><em>The full notebook is available on the following <a href="https://github.com/ovh/ai-training-examples/blob/main/notebooks/computer-vision/object-detection/miniconda/yolov7/notebook_object_detection_yolov7_asl.ipynb" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">GitHub repository</a>.</em></p>



<h4 class="wp-block-heading">Import dependencies</h4>



<p>Firstly, import the dependencies you need.</p>



<pre class="wp-block-code"><code class="">import torch
import os
import yaml
import torchvision
from IPython.display import Image, clear_output</code></pre>



<h4 class="wp-block-heading">Check GPU availability</h4>



<p>Then, check the GPU availability. Indeed, the training of a model like YOLOv7 requires the use of <strong>GPU</strong>, in this case a Tesla V100S is used.</p>



<pre class="wp-block-code"><code class="">print('Setup complete. Using torch %s %s' % (torch.__version__, torch.cuda.get_device_properties(0) if torch.cuda.is_available() else 'CPU'))</code></pre>



<p><code>Setup complete. Using torch 1.12.1+cu102 _CudaDeviceProperties(name='Tesla V100S-PCIE-32GB', major=7, minor=0, total_memory=32510MB, multi_processor_count=80)</code></p>



<h4 class="wp-block-heading">Extract the dataset information</h4>



<p>Next, you can access to the <code>data.yaml</code> file.</p>



<p>This file contains vital information about the dataset, especially the number of classes. Here we got 26 classes with the letters from A to Z.</p>



<pre class="wp-block-code"><code class=""># go to the directory where the data.yaml file is located to extract the number of classes
%cd /workspace/data
with open("data.yaml", 'r') as stream:
    num_classes = str(yaml.safe_load(stream)['nc'])</code></pre>



<p>Now, it&#8217;s time to train YOLOv7 model!</p>



<h4 class="wp-block-heading">Recover YOLOv7 weights</h4>



<p>In this tutorial, you can use the&nbsp;<strong>Transfer Learning</strong>&nbsp;method by using YOLOv7 weights pre-trained on the&nbsp;<a href="https://cocodataset.org/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">COCO dataset</a>.</p>



<p><strong>How to define Transfer Learning?</strong></p>



<p>For both humans and machines, learning something new takes time and practice. However, it is easier to perform out tasks similar to those already learned. As with humans, AI will be able to identify patterns from previous knowledge and apply them to new learning.</p>



<p>If a model is trained on a database, there is no need to re-train the model from scratch to fit a new set of similar data.</p>



<p><strong>Main advantages of Transfer Learning:</strong></p>



<ul class="wp-block-list">
<li>saving resources</li>



<li>improving efficiency</li>



<li>model training facilitation</li>



<li>saving time</li>
</ul>



<p>At this time, you can download the trained model:</p>



<pre class="wp-block-code"><code class=""># YOLOv7 path
%cd /workspace/yolov7
!wget https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7_training.pt</code></pre>



<p><code>Saving to: ‘yolov7_training.pt’<br>yolov7_training.pt 100%[===================&gt;] 72.12M 12.0MB/s in 5.5s</code></p>



<h4 class="wp-block-heading">Run YOLOv7 training on ASL Letters Dataset</h4>



<p>You can therefore set the following parameters.</p>



<ul class="wp-block-list">
<li><em><strong>workers:</strong></em> maximum number of dataloader workers.</li>



<li><em><strong>device:</strong></em> cuda device.</li>



<li><strong><em>batch-size:</em></strong> refers to the batch size (number of training examples used in one iteration).</li>



<li><strong><em>data:</em></strong> refers to the path to the yaml file.</li>



<li><strong><em>img:</em></strong> refers to the input images size.</li>



<li><strong><em>cfg:</em></strong> define the model configuration.</li>



<li><strong><em>weights:</em></strong> initial weights path.</li>



<li><strong><em>name:</em></strong> save to project/name.</li>



<li><strong><em>hyp:</em></strong> hyperparameters path.</li>



<li><strong><em>epochs:</em></strong> refers to the number of training epochs. An epoch corresponds to one cycle through the full training dataset.</li>
</ul>



<pre class="wp-block-code"><code class=""># time the performance
%time

# train yolov7 on custom data for 100 epochs
!python /workspace/yolov7/train.py \
          --workers 8 \
          --device 0 \
          --batch-size 8 \
          --data '/workspace/data/data.yaml' \
          --img 416 416 \
          --cfg '/workspace/yolov7/cfg/training/yolov7.yaml' \
          --weights '/workspace/yolov7/yolov7_training.pt' \
          --name yolov7-asl \
          --hyp '/workspace/yolov7/data/hyp.scratch.custom.yaml' \
          --epochs 100</code></pre>



<h4 class="wp-block-heading">Display results of YOLOv7 training on ASL Letters dataset</h4>



<p>Then you can display the results of the training and check the evolution of the metrics.</p>



<pre class="wp-block-code"><code class=""># display images
Image(filename='/workspace/yolov7/runs/train/yolov7-asl/results.png', width=1000)  # view results</code></pre>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="512" src="https://blog.ovhcloud.com/wp-content/uploads/2023/08/image-1024x512.png" alt="" class="wp-image-25713" srcset="https://blog.ovhcloud.com/wp-content/uploads/2023/08/image-1024x512.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/image-300x150.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/image-768x384.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/image-1536x768.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/image-2048x1024.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption"><em>YOLOv7 training overview</em></figcaption></figure>



<h4 class="wp-block-heading">Export new weights for future inference</h4>



<p>Finally, you can extract the <strong>new weights</strong> coming from YOLOv7 training on ASL Alphabet dataset. The goal is to save the model weights in a bucket in the cloud for reuse in a dedicated application.</p>



<p>Firstly, rename the PyTorch model it with the name you want.</p>



<pre class="wp-block-code"><code class="">%cd /workspace/yolov7/runs/train/yolov7-asl/weights/
os.rename("best.pt","yolov7.pt")</code></pre>



<p><code>/workspace/yolov7/runs/train/yolov7-asl/weights</code></p>



<p>Secondly, copy it in a new folder where you can put all the weights generated during your trainings.</p>



<pre class="wp-block-code"><code class="">%cp /workspace/yolov7/runs/train/yolov7-asl/weights/yolov7.pt /workspace/asl-volov7-model/yolov7.pt</code></pre>



<p><strong>Your model is ready?</strong> It&#8217;s now time to deploy a web app to use the model and benefit from real-time detection 🎉 !</p>



<h2 class="wp-block-heading">Deploy custom YOLOv7 model for real time detection</h2>



<p>Once this <strong>YOLOv7 model</strong> is trained, it can be used for inference. If you want to quickly build an app to serve your AI model, the <strong>Streamlit</strong> framework may be right for you.</p>



<h6 class="wp-block-heading"><strong>What is Streamlit?</strong></h6>



<p>Now, it&#8217;s time to discuss about the framework used to create a Web App: <strong>Streamlit</strong>!</p>



<p><a href="https://streamlit.io/" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">Streamlit</a>&nbsp;allows you to transform data scripts into quickly shareable web applications using only the&nbsp;<strong>Python</strong>&nbsp;language. Moreover, this framework does not require front-end skills.</p>



<p>This is a time-saver for the data scientist who wants to deploy an app around the world of data!</p>



<p>To make this app accessible, you need to containerize it using&nbsp;<strong>Docker</strong>.</p>



<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2023/08/Capture-decran-2023-08-28-a-14.00.50-1024x960.png" alt="" class="wp-image-25723" style="width:601px;height:564px" width="601" height="564" srcset="https://blog.ovhcloud.com/wp-content/uploads/2023/08/Capture-decran-2023-08-28-a-14.00.50-1024x960.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/Capture-decran-2023-08-28-a-14.00.50-300x281.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/Capture-decran-2023-08-28-a-14.00.50-768x720.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/Capture-decran-2023-08-28-a-14.00.50.png 1098w" sizes="auto, (max-width: 601px) 100vw, 601px" /><figcaption class="wp-element-caption"><em>Streamlit web app</em></figcaption></figure>



<p>By creating an app, you will enable anyone to <strong>understand Sign Language</strong>, with Real-Time detection and written transcription.</p>



<p>Let&#8217;s go for the implementation!</p>



<h4 class="wp-block-heading">Create the interface with Streamlit</h4>



<p>First of all, we must build the <strong>web interface</strong> to take a photo and the various functions to analyze the signs present on this image.</p>



<ul class="wp-block-list">
<li><code>load_model</code>: this function should be pushed in &#8220;cache&#8221; so that you only have to load the model once</li>
</ul>



<pre class="wp-block-code"><code class="">@st.cache
def load_model():

    custom_yolov7_model = torch.hub.load("WongKinYiu/yolov7", 'custom', '/workspace/asl-volov7-model/yolov7.pt')

    return custom_yolov7_model</code></pre>



<ul class="wp-block-list">
<li><code>get_prediction</code>: the model analyzes the image and returns the result of the prediction</li>
</ul>



<pre class="wp-block-code"><code class="">def get_prediction(img_bytes, model):

    img = Image.open(io.BytesIO(img_bytes))
    results = model(img, size=640)

    return results</code></pre>



<ul class="wp-block-list">
<li><code>analyse_image</code>: the image is processed before and after the model analysis</li>
</ul>



<pre class="wp-block-code"><code class="">def analyse_image(image, model):

    if image is not None:

        img = Image.open(image)

        bytes_data = image.getvalue()
        img_bytes = np.asarray(bytearray(bytes_data), dtype=np.uint8)
        result = get_prediction(img_bytes, model)
        result.render()

        for img in result.imgs:
            RGB_img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
            im_arr = cv2.imencode('.jpg', RGB_img)[1]
            st.image(im_arr.tobytes())

        result_list = list((result.pandas().xyxy[0])["name"])

    else:
        st.write("no asl letters were detected!")
        result_list = []

    return result_list</code></pre>



<ul class="wp-block-list">
<li><code>display_letters</code>: the letters are recovered and displayed to form the final word</li>
</ul>



<pre class="wp-block-code"><code class="">def display_letters(letters_list):

    word = ''.join(letters_list)
    path_file = "/workspace/word_file.txt"
    with open(path_file, "a") as f:
        f.write(word)

    return path_file</code></pre>



<p><em>To access the full code of the app, refer to this <a href="https://github.com/ovh/ai-training-examples/tree/main/apps/streamlit/sign-language-recognition-yolov7-app" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">GitHub</a><a href="https://github.com/ovh/ai-training-examples/blob/main/apps/streamlit/sign-language-recognition-yolov7-app/main.py" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"> repository</a>.</em></p>



<h4 class="wp-block-heading">Containerize your app with Docker</h4>



<p>Once the app code has been created, it&#8217;s time to containerize it!</p>



<p>The containerization is based on the construction of a Docker image, and before this image is usable, several steps must be completed.</p>



<p><strong>What are the containerization steps 🐳 ?</strong></p>



<p><em>The following steps refer to this <a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-deploy-streamlit-yolov7-sign-language?id=kb_article_view&amp;sysparm_article=KB0057491" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">documentation</a> where you can find detailed information.</em></p>



<ul class="wp-block-list">
<li>Write the <a href="https://github.com/ovh/ai-training-examples/blob/main/apps/streamlit/sign-language-recognition-yolov7-app/requirements.txt" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">requirements.txt</a> file</li>



<li>Create the <a href="https://github.com/ovh/ai-training-examples/blob/main/apps/streamlit/sign-language-recognition-yolov7-app/Dockerfile" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Dockerfile</a></li>



<li><a href="https://help.ovhcloud.com/csm/fr-public-cloud-ai-deploy-streamlit-yolov7-sign-language?id=kb_article_view&amp;sysparm_article=KB0057495#build-the-docker-image-from-the-dockerfile" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Build the Docker image</a></li>



<li><a href="https://help.ovhcloud.com/csm/fr-public-cloud-ai-deploy-streamlit-yolov7-sign-language?id=kb_article_view&amp;sysparm_article=KB0057495#push-the-image-into-the-shared-registry" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Tag and push the Docker image on a registry</a></li>
</ul>



<p>Your docker image is created successfully? You are ready to launch your app 🚀 !</p>



<h4 class="wp-block-heading">Deploy your app and make it accessible</h4>



<p>The following command starts a new AI Deploy app running your Streamlit web interface.</p>



<pre class="wp-block-code"><code class="">ovhai app run
       --gpu 1 \
       --default-http-port 8501 \
       --volume asl-volov7-model@GRA/:/workspace/asl-volov7-model:RO \
       &lt;shared-registry-address&gt;/yolov7-streamlit-asl-recognition:latest</code></pre>



<p>In this command line, you can set up several parameters:</p>



<ul class="wp-block-list">
<li><code>resources</code>: choose between CPUs or GPUs</li>



<li><code>default HTTP port</code>: precise the Streamlit default port &#8211; 8501</li>



<li><code>data</code>: link the bucket containing your model</li>



<li><code>docker image</code>: add your docker image addess</li>
</ul>



<p>When your app is up and running, you can access the following page:</p>



<figure class="wp-block-image aligncenter size-large"><img loading="lazy" decoding="async" width="648" height="1024" src="https://blog.ovhcloud.com/wp-content/uploads/2023/08/overview-streamlit-yolov7-asl-648x1024.png" alt="" class="wp-image-25720" srcset="https://blog.ovhcloud.com/wp-content/uploads/2023/08/overview-streamlit-yolov7-asl-648x1024.png 648w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/overview-streamlit-yolov7-asl-190x300.png 190w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/overview-streamlit-yolov7-asl-768x1214.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/overview-streamlit-yolov7-asl-972x1536.png 972w, https://blog.ovhcloud.com/wp-content/uploads/2023/08/overview-streamlit-yolov7-asl.png 988w" sizes="auto, (max-width: 648px) 100vw, 648px" /><figcaption class="wp-element-caption"><em>Resulting Streamlit app</em></figcaption></figure>



<h2 class="wp-block-heading">Conclusion</h2>



<p>Well done 🎉&nbsp;! You have learned how to create <strong>your own solution for Sign Language recognition</strong> with OVHcloud AI tools.</p>



<p>You have been able to <strong>Fine-Tune YOLOv7 model</strong> thanks to <em>AI Notebooks</em> and <strong>deploy a Real-Time recognition app</strong> with <em>AI Deploy</em>.</p>



<h4 class="wp-block-heading" id="want-to-find-out-more">Want to find out more?</h4>



<h6 class="wp-block-heading"><strong>Notebook</strong></h6>



<p>You want to access the notebook? Refer to the&nbsp;<a href="https://github.com/ovh/ai-training-examples/blob/main/notebooks/computer-vision/object-detection/miniconda/yolov7/notebook_object_detection_yolov7_asl.ipynb" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">GitHub repository</a>.</p>



<p>To launch this notebook with&nbsp;<strong>AI Notebook</strong>, please refer to&nbsp;our&nbsp;<a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-notebooks-yolov7-sign-language?id=kb_article_view&amp;sysparm_article=KB0057517" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">documentation</a>.</p>



<h6 class="wp-block-heading"><strong>App</strong></h6>



<p>You want to access to the full code to create the Streamlit app? Refer to the&nbsp;<a href="https://github.com/ovh/ai-training-examples/tree/main/apps/streamlit/sign-language-recognition-yolov7-app" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">GitHub repository</a>.<br><br>To deploy this app with&nbsp;<strong>AI Deploy</strong>, please refer to&nbsp;our&nbsp;<a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-deploy-streamlit-yolov7-sign-language?id=kb_article_view&amp;sysparm_article=KB0057491" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">doc</a><a href="https://docs.ovh.com/gb/en/publiccloud/ai/deploy/tuto-streamlit-eda-iris/" data-wpel-link="exclude">umentation</a>.</p>



<h2 class="wp-block-heading">References</h2>



<ul class="wp-block-list">
<li><a href="https://public.roboflow.com/object-detection/american-sign-language-letters" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">ASL Alphabet Dataset V1</a></li>



<li><a href="https://github.com/WongKinYiu/yolov7" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">YOLOv7 GitHub repository</a></li>



<li><a href="https://blog.ovhcloud.com/object-detection-train-yolov5-on-a-custom-dataset/" data-wpel-link="internal">Object detection: train YOLOv5 on a custom dataset</a></li>



<li><a href="https://medium.com/@prishanga1/yolov7-training-on-custom-data-c6d8ec030e13" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">YoloV7 Training on Custom Data</a></li>
</ul>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fcreate-your-solution-for-sign-language-recognition-with-ovhcloud-ai-tools%2F&amp;action_name=Create%20your%20solution%20for%20Sign%20Language%20recognition%20with%20OVHcloud%20AI%20tools&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Image segmentation: Train a U-Net model to segment brain tumors</title>
		<link>https://blog.ovhcloud.com/image-segmentation-train-a-u-net-model-to-segment-brain-tumors/</link>
		
		<dc:creator><![CDATA[Mathieu Busquet]]></dc:creator>
		<pubDate>Wed, 19 Apr 2023 12:03:29 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[AI Deploy]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Docker]]></category>
		<category><![CDATA[Machine learning]]></category>
		<category><![CDATA[PyTorch]]></category>
		<category><![CDATA[Streamlit]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=24637</guid>

					<description><![CDATA[brain tumor segmentation tutorial with BraTS2020 dataset and U-Net<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fimage-segmentation-train-a-u-net-model-to-segment-brain-tumors%2F&amp;action_name=Image%20segmentation%3A%20Train%20a%20U-Net%20model%20to%20segment%20brain%20tumors&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p><em>A guide to discover image segmentation and train a convolutional neural network on medical images to segment brain tumors</em></p>



<p>All the code related to this article is available in our dedicated <a href="https://github.com/ovh/ai-training-examples/blob/main/notebooks/computer-vision/image-segmentation/tensorflow/brain-tumor-segmentation-unet/notebook_image_segmentation_unet.ipynb" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">GitHub repository</a>. You can reproduce all the experiments with <strong><a href="https://www.ovhcloud.com/en-gb/public-cloud/ai-notebooks/" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">AI Notebooks</a></strong>.</p>



<figure class="wp-block-image aligncenter size-full is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2023/04/segmentations_compare.gif" alt="Graphical comparison of the original brain tumor segmentation, ground truth, and prediction for the BraTS2020 dataset" class="wp-image-25024" width="1188" height="397"/><figcaption class="wp-element-caption"><em>Comparison of the original and predicted segmentation, with non-enhancing tumors in blue, edema in green and enhancing tumors in yellow.</em></figcaption></figure>



<p>Over the past few years, the field of <strong>computer vision</strong> has experienced a significant growth. It encompasses a wide range of methods for acquiring, processing, analyzing and understanding digital images.</p>



<p>Among these methods, one is called <strong>image segmentation</strong>.</p>



<h3 class="wp-block-heading"><strong>What is Image Segmentation?</strong> 🤔</h3>



<p>Image segmentation is a technique used to <strong>separate an image into multiple segments or regions</strong>, each of which corresponds to a different object or part of the image.</p>



<p>The goal is to simplify the image and make it easier to analyze, so that a computer can better understand and interpret the content of the image, which can be really useful!</p>



<p><strong>Application fields</strong></p>



<p>Indeed, image segmentation has a lot of application fields such as <strong>object detection &amp; recognition, medical imaging, and self-driving systems</strong>. In all these cases, the understanding of the image content by the computer is essential.</p>



<p><strong>Example</strong></p>



<p>In an image of a street with cars, the segmentation algorithm would be able to divide the image into different regions, with one for the cars, one for the road, another for the sky, one for the trees and so on.</p>



<figure class="wp-block-image aligncenter size-full is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2023/02/Image_segmentation.png" alt="illustration of semantic image segmentation" class="wp-image-24755" width="470" height="354" srcset="https://blog.ovhcloud.com/wp-content/uploads/2023/02/Image_segmentation.png 512w, https://blog.ovhcloud.com/wp-content/uploads/2023/02/Image_segmentation-300x226.png 300w" sizes="auto, (max-width: 470px) 100vw, 470px" /></figure>



<p class="has-text-align-center"><em>Semantic image segmentation from <a href="https://commons.wikimedia.org/wiki/File:Image_segmentation.png" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">Wikipedia Creative Commons</a></em></p>



<h5 class="wp-block-heading"><strong>Different types of segmentation</strong></h5>



<p>There are two main types of image segmentation: <strong>semantic segmentation</strong> and <strong>instance segmentation</strong>.</p>



<ul class="wp-block-list">
<li><strong>Semantic segmentation</strong> is the task of assigning a class label to each pixel in an image. For example, in an image of a city, the task of semantic segmentation would be to label each pixel as belonging to a certain class, such as &#8220;building&#8221;, &#8220;road&#8221;, &#8220;sky&#8221;, &#8230;, as shown in the image above.</li>
</ul>



<ul class="wp-block-list">
<li><strong>Instance segmentation</strong> not only assigns a class label to each pixel, but also differentiates instances of the same class within an image. In the previous example, the task would be to not only label each pixel as belonging to a certain class, such as &#8220;building&#8221;, &#8220;road&#8221;, &#8230;, but also to distinguish different instances of the same class, such as different buildings in the image. Each building will then be represented by a different color.</li>
</ul>



<h3 class="wp-block-heading"><strong>Use case &amp; Objective</strong></h3>



<p>Now that we know the concept of image segmentation, let&#8217;s try to put it into practice!</p>



<p>In this article, we will focus on <strong>medical imaging</strong>. Our goal will be to <strong>segment brain tumors</strong>. To do this, we will use the <strong><a href="https://www.kaggle.com/datasets/awsaf49/brats20-dataset-training-validation" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">BraTS2020 Dataset</a></strong>.</p>



<h3 class="wp-block-heading">1 &#8211; <strong>BraTS2020 dataset exploration</strong></h3>



<p>This dataset <strong>contains magnetic resonance imaging (MRI) scans of brain tumors</strong>.</p>



<p>To be more specific, each patient of this dataset is represented through <strong>four different MRI scans / modalities, named <strong>T1, T1CE, T2 </strong></strong>and<strong><strong> FLAIR</strong>. </strong>These 4 images come with<strong> </strong>the ground truth segmentation of the tumoral and non-tumoral regions of their brains, which has been manually realized by experts.</p>



<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2023/02/braTS2020_dataset_overview-1024x212.png" alt="Display of 4 MRI images from the BraTS2020 dataset, and a tumor segmentation" class="wp-image-24644" width="1195" height="248" srcset="https://blog.ovhcloud.com/wp-content/uploads/2023/02/braTS2020_dataset_overview-1024x212.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2023/02/braTS2020_dataset_overview-300x62.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2023/02/braTS2020_dataset_overview-768x159.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2023/02/braTS2020_dataset_overview-1536x318.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2023/02/braTS2020_dataset_overview.png 1606w" sizes="auto, (max-width: 1195px) 100vw, 1195px" /><figcaption class="wp-element-caption"><em>Display of the 4 modalities of a patient and its segmentation</em></figcaption></figure>



<p><strong>Why 4 modalities ?</strong></p>



<p>As you can see, the four modalities bring out <strong>different aspects</strong> for the same patient. To be more specific, here is a description of their interest: </p>



<ul class="wp-block-list">
<li><strong>T1 :</strong> Show the structure and composition of different types of tissue.</li>



<li><strong>T1CE:</strong> Similar to T1 images but with the injection of a contrast agent, which will enhance the visibility of abnormalities.</li>



<li><strong>T2:</strong> Show the fluid content of different types of tissue.</li>



<li><strong>FLAIR:</strong> Used to suppress this fluid content, to better identify lesions and tumors that are not clearly visible on T1 or T2 images.</li>
</ul>



<p>For an expert, it can be useful to have these 4 modalities in order to analyze the tumor more precisely, and to confirm its presence or not.</p>



<p>But for our artificial approach, <strong>using only two modalities instead of four is interesting</strong> since it can reduce the amount of manipulated data and therefore the computational and memory requirements of the segmentation task, making it faster and more efficient. </p>



<p>That is why we will <strong>exclude T1</strong>, since we have its improved version T1CE. Also, <strong>we will exclude the T2 modality</strong>. Indeed, the fluids it presents could degrade our predictions. These fluids are removed in the flair version, which highlights the affected regions much better, and will therefore be much more interesting for our training.</p>



<p><strong>Images format</strong></p>



<p>It is important to understand that all these MRI scans are <strong><em>NIfTI</em> <em>files</em></strong> (<em>.nii format)</em>. A NIfTI image is a digital representation of a 3D object, such as a brain in our case. Indeed, our modalities and our annotations have a 3-dimensional (240, 240, 155) shape.</p>



<p>Each dimension is composed of a series of two-dimensional images, known as <strong>slices</strong>, which all contain the same number of pixels, and are stacked together to create a 3D representation. That is why we have been able to display 2D images just above. Indeed, we have displayed the <strong>100th slice</strong> of a dimension for the 4 modalities and the segmentation.</p>



<p>Here is a quick presentation of these 3 planes:</p>



<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2023/04/body_planes-1024x493.png" alt="illustration of planes of the body" class="wp-image-24957" width="982" height="473" srcset="https://blog.ovhcloud.com/wp-content/uploads/2023/04/body_planes-1024x493.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2023/04/body_planes-300x144.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2023/04/body_planes-768x370.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2023/04/body_planes.png 1188w" sizes="auto, (max-width: 982px) 100vw, 982px" /></figure>



<p class="has-text-align-center"><em>Planes of the body</em> <em>from <a href="https://commons.wikimedia.org/wiki/File:Planes_of_Body.jpg" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">Wikipedia Creative Commons</a></em></p>



<p>&#8211; <strong>Sagittal Plane</strong>: Divides the body into left and right sections and is often referred to as a &#8220;front-back&#8221; plane.</p>



<p>&#8211; <strong>Coronal Plane</strong>: Divides the body into front and back sections and is often referred to as a &#8220;side-side&#8221; plane.</p>



<p>&#8211; <strong>Axial or Transverse Plane</strong>: Divides the body into top and bottom sections and is often referred to as a &#8220;head-toe&#8221; plane.</p>



<p>Each modality can then be displayed through its different planes. For example, we will display the 3 axes of the T1 modality:</p>



<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2023/04/all_planes_slice_resized-1024x360.png" alt="MRI scan viewed in the 3 planes of the human body" class="wp-image-25017" width="1024" height="360" srcset="https://blog.ovhcloud.com/wp-content/uploads/2023/04/all_planes_slice_resized-1024x360.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2023/04/all_planes_slice_resized-300x106.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2023/04/all_planes_slice_resized-768x270.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2023/04/all_planes_slice_resized.png 1284w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption"><em>100th slice of the T1 modality of the first patient, in the 3 planes of the human body</em></figcaption></figure>



<p><strong>Why choose to display the 100th slice?</strong></p>



<p>Now that we know why we have three dimensions, let&#8217;s try to understand why we chose to display a specific slice.</p>



<p>To do this, we will display all the slices of a modality:</p>



<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2023/04/all_slices_of_a_plane-1024x667.png" alt="all the slices of a BraTS2020 MRI modality" class="wp-image-24959" width="1024" height="667" srcset="https://blog.ovhcloud.com/wp-content/uploads/2023/04/all_slices_of_a_plane-1024x667.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2023/04/all_slices_of_a_plane-300x195.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2023/04/all_slices_of_a_plane-768x500.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2023/04/all_slices_of_a_plane.png 1227w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption"><em>Display of all slices of T1 of the first patient in the sagittal plane</em></figcaption></figure>



<p>As you can see, <strong>two black parts are present</strong> on each side of our montage. However, <strong>these black parts correspond to slices</strong>. This means that a large part of the slices does not contain much information. This is not surprising since the MRI scanner goes through the brain gradually.</p>



<p>This analysis is the same on all other modalities, all planes and also on the images segmented by the experts. Indeed, they were not able to segment the slices that do not contain much information.</p>



<p>This is why we can exclude these slices in our analysis, in order to reduce the number of manipulated images, and speed up our training. Indeed, you can see that a <strong>(60:135) slices range will be much more interesting</strong>:</p>



<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2023/04/selected_slices_of_a_plane-1024x667.png" alt="some slices of a BraTS2020 MRI modality" class="wp-image-24962" width="1024" height="667" srcset="https://blog.ovhcloud.com/wp-content/uploads/2023/04/selected_slices_of_a_plane-1024x667.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2023/04/selected_slices_of_a_plane-300x195.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2023/04/selected_slices_of_a_plane-768x500.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2023/04/selected_slices_of_a_plane.png 1227w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption"><em>Display of slices 60 to 135 of T1 of the first patient in the sagittal plane</em></figcaption></figure>



<p><strong>What about segmentations?</strong></p>



<p>Now, let&#8217;s focus on the segmentations provided by the experts. What information do they give us?</p>



<figure class="wp-block-image aligncenter size-full is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2023/04/random_segmentation-edited.png" alt="segmentation classes from BraTS2020 dataset" class="wp-image-25027" width="555" height="416" srcset="https://blog.ovhcloud.com/wp-content/uploads/2023/04/random_segmentation-edited.png 555w, https://blog.ovhcloud.com/wp-content/uploads/2023/04/random_segmentation-edited-300x225.png 300w" sizes="auto, (max-width: 555px) 100vw, 555px" /><figcaption class="wp-element-caption"><em><em>100th slice of the segmentation modality of the first patient</em></em></figcaption></figure>



<p>Regardless of the plane you are viewing, you will notice that some slices have multiple colors, which means that the experts have assigned multiple values / classes to the segmentation (one color represents one value).</p>



<p>Actually, we only have 4 possible pixels values in this dataset. <strong>These 4 values will form our 4 classes</strong>. Here is what they correspond to:</p>



<figure class="wp-block-table aligncenter"><table><tbody><tr><td class="has-text-align-center" data-align="center"><strong>Class value</strong></td><td class="has-text-align-center" data-align="center"><strong>Class color</strong></td><td class="has-text-align-center" data-align="center"><strong>Class meaning</strong></td></tr><tr><td class="has-text-align-center" data-align="center">0</td><td class="has-text-align-center" data-align="center">Purple</td><td class="has-text-align-center" data-align="center">Not tumor (healthy zone or image background)</td></tr><tr><td class="has-text-align-center" data-align="center">1</td><td class="has-text-align-center" data-align="center">Blue</td><td class="has-text-align-center" data-align="center">Necrotic and non-enhancing tumor</td></tr><tr><td class="has-text-align-center" data-align="center">2</td><td class="has-text-align-center" data-align="center">Green</td><td class="has-text-align-center" data-align="center">Peritumoral Edema</td></tr><tr><td class="has-text-align-center" data-align="center">4</td><td class="has-text-align-center" data-align="center">Yellow</td><td class="has-text-align-center" data-align="center">Enhancing Tumor</td></tr></tbody></table></figure>



<p class="has-text-align-center"><em>Explanation of the BraTS2020 dataset classes</em></p>



<p>As you can see, class 3 does not exist. We go directly to 4. We will therefore modify this &#8220;error&#8221; before sending the data to our model.</p>



<p>Our goal is to predict and segment each of these 4 classes for new patients to find out whether or not they have a brain tumor and which areas are affected.</p>



<p><strong>To summarize data exploration:</strong></p>



<ul class="wp-block-list">
<li>We have for each patient 4 different modalities (T1, T1CE, T2 &amp; FLAIR), accompanied by a segmentation that indicates tumor areas.</li>



<li>Modalities <strong>T1CE</strong> and <strong>FLAIR</strong> are the more interesting to keep, since these 2 provide complementary information about the anatomy and tissue contrast of the patient&#8217;s brain.</li>



<li>Each image is 3D, and can therefore be analyzed through 3 different planes that are composed of 2D slices.</li>



<li>Many slices contain little or no information. We will <strong>only</strong> <strong>keep the (60:135)</strong> <strong>slices</strong> range.</li>



<li>A segmentation image contains 1 to 4 classes.</li>



<li>Class number 4 must be reassigned to 3 since value 3 is missing.</li>
</ul>



<p>Now that we know more about our data, it is time to prepare the training of our model.</p>



<h3 class="wp-block-heading">2 &#8211; Training preparation</h3>



<p><strong>Split data into 3 sets</strong></p>



<p>In the world of AI, the quality of a model is determined by its <strong>ability to make accurate predictions on new, unseen data</strong>. To achieve this, it is important to divide our data into three sets: <strong>Training, Validation and Test</strong>.</p>



<p>Reminder of their usefulness:</p>



<ul class="wp-block-list">
<li><strong>Training set</strong> is used to train the model. During training, the model is exposed to the training data and adjusts its parameters to minimize the error between its predictions and the Ground truth (original segmentations).</li>



<li><strong>Validation set</strong> is used to fine-tune the hyperparameters of our model, which are set before training and determine the behavior of our model. The aim is to compare different hyperparameters and select the best configuration for our model.</li>



<li><strong>Test set</strong> is used to evaluate the performance of our model after it has been trained, to see how well it performs on data that was not used during the training of the model.</li>
</ul>



<p>The dataset contains 369 different patients. Here is the distribution chosen for the 3 data sets:</p>



<figure class="wp-block-image aligncenter size-full is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2023/04/BraTS_data_distribution.png" alt="Data distribution for BraTS2020 dataset" class="wp-image-25006" width="430" height="340" srcset="https://blog.ovhcloud.com/wp-content/uploads/2023/04/BraTS_data_distribution.png 398w, https://blog.ovhcloud.com/wp-content/uploads/2023/04/BraTS_data_distribution-300x237.png 300w" sizes="auto, (max-width: 430px) 100vw, 430px" /></figure>



<p><strong>Data preprocessing</strong></p>



<p>In order to train a neural network to segment objects in images, it is necessary to feed it with both the raw image data (X) and the ground truth segmentations (y). By combining these two types of data, the neural network can learn to recognize tumor patterns and make accurate predictions about the contents of a patient&#8217;s scan.</p>



<p>Unfortunately, our modalities images (X) and our segmentations (y) <strong>cannot be sent directly to the AI model</strong>. Indeed, loading all these 3D images would overload the memory of our environment, and will lead to shape mismatch errors. We have to do some image <strong>preprocessing</strong> before, which will be done by using a<strong> Data Generator</strong>, where we will perform any operation that we think is necessary when loading the images.</p>



<p>As we have explained, we will, for each sample:</p>



<ul class="wp-block-list">
<li>Retrieve the paths of its 2 selected modalities (T1CE &amp; FLAIR) and of its ground truth (original segmentation)</li>



<li>Load modalities &amp; segmentation</li>



<li>Create a X array (image) that will contain all the selected slices (60-135) of these 2 modalities.</li>



<li>Generate an y array (image) that will contain all the selected slices (60-135) of the ground truth. </li>



<li>Assign to all the 4 in the y array the value 3 (in order to correct the class 3 missing case).</li>
</ul>



<p>In addition to these preprocessing steps, we will:</p>



<p><strong>Work in the axial plane</strong></p>



<p>Since the images are square in shape (240&#215;240) in this plane. But since we will manipulate a range of slices, we will be able to visualize the predictions in the 3 planes, so it doesn&#8217;t really have an impact.</p>



<p><strong>Apply a One-Hot Encoder to the y array</strong></p>



<p>Since our goal is to segment regions that are represented as different classes (0 to 3), we must use One-Hot Encoding to convert our categorical variables (classes) into a numerical representation that can be used by our neural network (since they are based on mathematical equations).</p>



<p>Indeed, from a mathematical point of view, sending the y array as it is would mean that some classes are superior to others, while there is no superiority link between them. For example, class 1 is inferior to class 4 since 1 &lt; 4. A One-Hot encoder will allow us to manipulate only 0 and 1.</p>



<p>Here is what it consists of, for one slice: </p>



<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2023/04/one-hot-encoding-1024x576.png" alt="One-Hot encoding applied to the BraTS2020 dataset" class="wp-image-25058" width="1204" height="677" srcset="https://blog.ovhcloud.com/wp-content/uploads/2023/04/one-hot-encoding-1024x576.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2023/04/one-hot-encoding-300x169.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2023/04/one-hot-encoding-768x432.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2023/04/one-hot-encoding.png 1280w" sizes="auto, (max-width: 1204px) 100vw, 1204px" /><figcaption class="wp-element-caption"><em>One-Hot encoding applied to the BraTS2020 dataset</em></figcaption></figure>



<p><strong>Resize each slice of our images</strong> from (240&#215;240) to a (128, 128) shape.</p>



<p>Resizing is needed since we need image shapes that are a power of two (2<sup>n</sup>, where n is an integer). This is due to the fact that we will use pooling layers (MaxPooling2D) in our convolutional neural network (CNN), which reduce the spatial resolution by 2.</p>



<p>You may wonder why we didn&#8217;t resize the images in a (256, 256) shape, which also is a power of 2 and is closer to 240 than 128 is.</p>



<p>Indeed, resizing images to (256, 256) may preserve more information than resizing to (128, 128), which could lead to better performance. However, this larger size also means that the model will have more parameters, which will increase the training time and memory requirements. This is why we will choose the (128, 128) shape.</p>



<p><strong>To summarize the preprocessing steps:</strong> </p>



<ul class="wp-block-list">
<li>We use a data generator to be able to process and send our data to our neural network (since all our images cannot be stored in memory at once).</li>



<li>For each epoch (single pass of the entire training dataset through a neural network), the model will receive 250 samples (those contained in our training dataset).</li>



<li>For each sample, the model will have to analyze 150 slices (since there are two modalities, and 75 selected slices for both of them), received in a (128, 128) shape, as an X array of a (128, 128, 75, 2) shape. This array will be provided with the ground truth segmentation of the patient, which will be One-Hot encoded and will then have a (75, 128, 128, 4) shape.</li>
</ul>



<h3 class="wp-block-heading">3 &#8211; Define the model</h3>



<p>Now that our data is ready, we can define our segmentation model.</p>



<p><strong>U-Net</strong></p>



<p>We will use the <a href="https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">U-Net architecture</a>. This <a href="https://en.wikipedia.org/wiki/Convolutional_neural_network" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">convolutional neural network (CNN)</a> is designed for biomedical image segmentation, and is particularly well-suited for segmentation tasks where the regions of interest are small and have complex shapes (such as tumors in MRI scans).</p>



<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2023/04/u-net-architecture-1024x682.png" alt="U-Net architecture" class="wp-image-25056" width="793" height="528" srcset="https://blog.ovhcloud.com/wp-content/uploads/2023/04/u-net-architecture-1024x682.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2023/04/u-net-architecture-300x200.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2023/04/u-net-architecture-768x512.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2023/04/u-net-architecture-1536x1023.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2023/04/u-net-architecture.png 1555w" sizes="auto, (max-width: 793px) 100vw, 793px" /><figcaption class="wp-element-caption"><em>U-Net architecture</em></figcaption></figure>



<p><em>This neural network was first introduced in 2015 by Olaf Ronneberger, Philipp Fischer, Thomas Brox and reported in the paper <a href="https://arxiv.org/abs/1505.04597" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">U-Net: Convolutional Networks for Biomedical Image Segmentation</a>.</em></p>



<p><strong>Loss function</strong></p>



<p>When training a CNN, it&#8217;s important to choose a loss function that accurately reflects the performance of the network. Indeed, this function will allow to compare the predicted pixels to those of the ground truth for each patient. At each epoch, the goal is to update the weights of our model in a way that minimizes this loss function, and therefore improves the accuracy of its predictions.</p>



<p>A commonly used loss function for multi-class classification problems is <strong>categorical cross-entropy</strong>, which measures the difference between the predicted probability distribution of each pixel and the real value of the one-hot encoded ground truth. Note that segmentations models sometimes use the <strong>dice loss function</strong> as well.</p>



<p><strong>Output activation function</strong></p>



<p>To get this probability distribution over the different classes for each pixel, we apply a <strong>softmax</strong> activation function to the output layer of our neural network. </p>



<p>This means that during training, our CNN will adjust its weights to minimize our loss function, which compares predicted probabilities given by the softmax function with those of the ground truth segmentation.</p>



<p><strong>Other metrics</strong></p>



<p>It is also important to monitor the model&#8217;s performance using evaluation metrics. </p>



<p>We will of course use <strong>accuracy</strong>, which is a very popular measure. However, this metric can be misleading when working with imbalanced datasets like BraTS2020, where Background class is over represented. To address this issue, we will use other metrics such as the <strong>intersection over union (IoU), the Dice coefficient, precision, sensitivity, and specificity.</strong></p>



<ul class="wp-block-list">
<li><strong>Accuracy</strong>: Measures the overall proportion of correctly classified pixels, including both positive and negative pixels.</li>



<li><strong>IoU: </strong>Measures the overlap between the predicted and ground truth segmentations.</li>



<li><strong>Precision</strong> (positive predictive value): Measures the proportion of predicted positive pixels that are actually positive.</li>



<li><strong>Sensitivity</strong> (true positive rate): Measures the proportion of positive ground truth pixels that were correctly predicted as positive.</li>



<li><strong>Specificity</strong> (true negative rate): Measures the proportion of negative ground truth pixels that were correctly predicted as negative.</li>
</ul>



<h3 class="wp-block-heading">4 &#8211; <strong>Analysis of training metrics</strong></h3>



<p><em>Model has been trained on 35 epochs.</em></p>



<figure class="wp-block-image aligncenter size-large is-resized"><img decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2023/04/metrics_result-edited.png" alt="Training metrics of a segmentation model for the BraTS2020 dataset" class="wp-image-25047" width="1197" srcset="https://blog.ovhcloud.com/wp-content/uploads/2023/04/metrics_result-edited.png 1407w, https://blog.ovhcloud.com/wp-content/uploads/2023/04/metrics_result-edited-300x150.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2023/04/metrics_result-edited-1024x512.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2023/04/metrics_result-edited-768x384.png 768w" sizes="(max-width: 1407px) 100vw, 1407px" /><figcaption class="wp-element-caption"><em>Graphical display of training metrics over epochs</em></figcaption></figure>



<p>On the accuracy graph, we can see that both training accuracy and validation accuracy are increasing over epochs and reaching a plateau. This indicates that the model is learning from the data (training set) and generalizing well to new one (validation set). It does not seem that we are facing overfitting since both metrics are improving.</p>



<p>Then, we can see that our models is clearly learning from the training data, since both losses decrease over time on the second graph. We also notice that the best version of our model is reached around epoch 26. This conclusion is reinforced by the third graph, where both dice coefficients are increasing over epochs.</p>



<h3 class="wp-block-heading">5 &#8211; <strong>Segmentation results</strong></h3>



<p>Once the training is completed, we can look at how the model behaves against the<strong> test set </strong>by calling the <em><strong>.evaluate() </strong></em>function:</p>



<figure class="wp-block-table aligncenter"><table><tbody><tr><td class="has-text-align-center" data-align="center">Metric</td><td class="has-text-align-center" data-align="center">Score</td></tr><tr><td class="has-text-align-center" data-align="center">Categorical cross-entropy loss</td><td class="has-text-align-center" data-align="center">0.0206</td></tr><tr><td class="has-text-align-center" data-align="center">Accuracy</td><td class="has-text-align-center" data-align="center">0.9935</td></tr><tr><td class="has-text-align-center" data-align="center">MeanIOU</td><td class="has-text-align-center" data-align="center">0.8176</td></tr><tr><td class="has-text-align-center" data-align="center">Dice coefficient</td><td class="has-text-align-center" data-align="center">0.6008</td></tr><tr><td class="has-text-align-center" data-align="center">Precision</td><td class="has-text-align-center" data-align="center">0.9938</td></tr><tr><td class="has-text-align-center" data-align="center">Sensitivity</td><td class="has-text-align-center" data-align="center">0.9922</td></tr><tr><td class="has-text-align-center" data-align="center">Specificity</td><td class="has-text-align-center" data-align="center">0.9979</td></tr></tbody></table></figure>



<p>We can conclude that the model <strong>performed very well on the test dataset</strong>, achieving a <strong>low test loss </strong>(0.0206), <strong>a correct dice coefficient</strong> (0.6008) for an image segmentation task, and <strong>good scores on other metrics</strong> which indicate that the model has good generalization performance on unseen data.</p>



<p>To understand a little better what is behind these scores, let&#8217;s try to plot some randomly selected patient predicted segmentations:</p>



<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2023/04/truth_vs_pred_wo_processing-1024x640.png" alt="Predicted segmentation vs ground truth segmentation for the BraTS2020 dataset" class="wp-image-25052" width="902" height="564" srcset="https://blog.ovhcloud.com/wp-content/uploads/2023/04/truth_vs_pred_wo_processing-1024x640.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2023/04/truth_vs_pred_wo_processing-300x188.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2023/04/truth_vs_pred_wo_processing-768x480.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2023/04/truth_vs_pred_wo_processing.png 1280w" sizes="auto, (max-width: 902px) 100vw, 902px" /><figcaption class="wp-element-caption"><em>Graphical comparison of original and predicted segmentations for randomly selected patients</em></figcaption></figure>



<p>Predicted segmentations <strong>seem quite accurate</strong> but we need to do some <strong>post-processing</strong> in order to convert the probabilities given by the softmax function in a single class, for each pixel, corresponding to the class that has obtained the highest probability.</p>



<p><em><strong>Argmax() </strong></em>function is chosen here. Applying this function will also allow us to <strong>remove some false positive cases</strong>, and to <strong>plot the same colors</strong> between the original segmentation and the prediction, which will be easier to compare than just above.</p>



<p>For the same patients as before, we obtain: </p>



<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2023/04/truth_vs_pred_processed1-1024x640.png" alt="Post-processed predicted segmentation vs ground truth segmentation for the BraTS2020 dataset" class="wp-image-25054" width="908" height="568" srcset="https://blog.ovhcloud.com/wp-content/uploads/2023/04/truth_vs_pred_processed1-1024x640.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2023/04/truth_vs_pred_processed1-300x188.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2023/04/truth_vs_pred_processed1-768x480.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2023/04/truth_vs_pred_processed1.png 1280w" sizes="auto, (max-width: 908px) 100vw, 908px" /><figcaption class="wp-element-caption"><em>Graphical comparison of original and post-processed predicted segmentations for randomly selected patients</em></figcaption></figure>



<h3 class="wp-block-heading">Conclusion</h3>



<p>I hope you have enjoyed this tutorial, you are now more comfortable with image segmentation! </p>



<p>Keep in mind that even if our results seem accurate, we have some false positive in our predictions. In a field like medical imaging, it is crucial to evaluate the balance between true positives and false positives and assess the risks and benefits of an artificial approach.</p>



<p>As we have seen, post-processing techniques can be used to solve this problem. However, we must be careful with the results of these methods, since they can lead to a loss of information.</p>



<h3 class="wp-block-heading">Want to find out more?</h3>



<ul class="wp-block-list">
<li><strong>Notebook</strong></li>
</ul>



<p>All the code is available on our <a href="https://github.com/ovh/ai-training-examples/blob/main/notebooks/computer-vision/image-segmentation/tensorflow/brain-tumor-segmentation-unet/notebook_image_segmentation_unet.ipynb" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">GitHub repository</a>.</p>



<ul class="wp-block-list">
<li><strong>App</strong></li>
</ul>



<p>A Streamlit application was created around this use case to predict and observe the predictions generated by the model. Find the <a href="https://github.com/ovh/ai-training-examples/tree/main/apps/streamlit/image-segmentation-brain-tumors" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">segmentation app&#8217;s code here</a>.</p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fimage-segmentation-train-a-u-net-model-to-segment-brain-tumors%2F&amp;action_name=Image%20segmentation%3A%20Train%20a%20U-Net%20model%20to%20segment%20brain%20tumors&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
