<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Deep learning Archives - OVHcloud Blog</title>
	<atom:link href="https://blog.ovhcloud.com/tag/deep-learning/feed/" rel="self" type="application/rss+xml" />
	<link>https://blog.ovhcloud.com/tag/deep-learning/</link>
	<description>Innovation for Freedom</description>
	<lastBuildDate>Wed, 29 May 2024 12:36:13 +0000</lastBuildDate>
	<language>en-GB</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>https://blog.ovhcloud.com/wp-content/uploads/2019/07/cropped-cropped-nouveau-logo-ovh-rebranding-32x32.gif</url>
	<title>Deep learning Archives - OVHcloud Blog</title>
	<link>https://blog.ovhcloud.com/tag/deep-learning/</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>How to serve LLMs with vLLM and OVHcloud AI Deploy</title>
		<link>https://blog.ovhcloud.com/how-to-serve-llms-with-vllm-and-ovhcloud-ai-deploy/</link>
		
		<dc:creator><![CDATA[Mathieu Busquet]]></dc:creator>
		<pubDate>Wed, 29 May 2024 12:22:26 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[AI Deploy]]></category>
		<category><![CDATA[AI Endpoints]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Deep learning]]></category>
		<category><![CDATA[GPU]]></category>
		<category><![CDATA[LLaMA]]></category>
		<category><![CDATA[LLaMA 3]]></category>
		<category><![CDATA[LLM Serving]]></category>
		<category><![CDATA[Mistral]]></category>
		<category><![CDATA[Mixtral]]></category>
		<category><![CDATA[vLLM]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=26762</guid>

					<description><![CDATA[In this tutorial, we will learn how to serve Large Language Models (LLMs) using vLLM and the OVHcloud AI Products.<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fhow-to-serve-llms-with-vllm-and-ovhcloud-ai-deploy%2F&amp;action_name=How%20to%20serve%20LLMs%20with%20vLLM%20and%20OVHcloud%20AI%20Deploy&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p><em>In this tutorial, we will walk you through the process of serving large language models (LLMs), providing step-by-step instruction</em>.</p>



<figure class="wp-block-image aligncenter size-large is-resized"><img fetchpriority="high" decoding="async" width="1024" height="345" src="https://blog.ovhcloud.com/wp-content/uploads/2023/07/LLaMA2_finetuning_OVHcloud_resized-1024x345.png" alt="" class="wp-image-25615" style="width:750px;height:auto" srcset="https://blog.ovhcloud.com/wp-content/uploads/2023/07/LLaMA2_finetuning_OVHcloud_resized-1024x345.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2023/07/LLaMA2_finetuning_OVHcloud_resized-300x101.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2023/07/LLaMA2_finetuning_OVHcloud_resized-768x259.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2023/07/LLaMA2_finetuning_OVHcloud_resized-1536x518.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2023/07/LLaMA2_finetuning_OVHcloud_resized-2048x690.png 2048w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<p></p>



<h3 class="wp-block-heading">Introduction</h3>



<p>In recent years, <strong>large language models</strong> (LLMs) have become increasingly <strong>popular</strong>, with <strong>open-source</strong> models like <em>Mistral</em> and <em>LLaMA</em> gaining widespread attention. In particular, the <em>LLaMA 3</em> model was released on <em>April 18, 2024</em>, is one of today&#8217;s most powerful open-source LLMs.</p>



<p>However, <strong>serving these LLMs can be challenging</strong>, particularly on hardware with limited resources. Indeed, even on expensive hardware, LLMs can be surprisingly slow, with high VRAM utilization and throughput limitations.</p>



<p>This is where<strong><em> </em></strong><em><a href="https://github.com/vllm-project/vllm" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong>vLLM</strong></a></em> comes in. <em><strong>vLLM</strong></em> is an <strong>open-source project</strong> that enables <strong>fast and easy-to-use LLM inference and serving</strong>. Designed for optimal performance and resource utilization, <em>vLLM</em> supports a range of <a href="https://docs.vllm.ai/en/latest/models/supported_models.html" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">LLM architectures</a> and offers <a href="https://docs.vllm.ai/en/latest/models/engine_args.html" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">flexible customization options</a>. That&#8217;s why we are going to use it to efficiently deploy and scale our LLMs.</p>



<h3 class="wp-block-heading">Objective</h3>



<p>In this guide, you will discover how to deploy a LLM thanks to <a href="https://github.com/vllm-project/vllm" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><em>vLLM</em></a> and the <strong><em>AI Deploy</em></strong> <em>OVHcloud</em> solution. This will enable you to benefit from <em>vLLM</em>&#8216;s optimisations and <em>OVHcloud</em>&#8216;s GPU computing resources. Your LLM will then be exposed by a secured API.</p>



<p>🎁 And for those who do not want to bother with the deployment process, <strong>a surprise awaits you at the <a href="#AI-ENDPOINTS">end of the article</a></strong>. We are going to introduce you to our new solution for using LLMs, called <a href="https://endpoints.ai.cloud.ovh.net/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><strong>AI Endpoints</strong></a>. This product makes it easy to integrate AI capabilities into your applications with a simple API call, without the need for deep AI expertise or infrastructure management. And while it&#8217;s in alpha, it&#8217;s <strong>free</strong>!</p>



<h3 class="wp-block-heading">Requirements</h3>



<p>To deploy your <em>vLLM</em> server, you need:</p>



<ul class="wp-block-list">
<li>An <em>OVHcloud</em> account to access the <a href="https://www.ovh.com/auth/?action=gotomanager&amp;from=https://www.ovh.co.uk/&amp;ovhSubsidiary=GB" data-wpel-link="exclude"><em>OVHcloud Control Panel</em></a></li>



<li>A <em>Public Cloud</em> project</li>



<li>A <a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-users?id=kb_article_view&amp;sysparm_article=KB0048170" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">user for the AI Products</a>, related to this <em>Public Cloud</em> project</li>



<li><a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-cli-install-client?id=kb_article_view&amp;sysparm_article=KB0047844" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">The <em>OVHcloud AI CLI</em></a> installed on your local computer (to interact with the AI products by running commands). </li>



<li><a href="https://www.docker.com/get-started" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Docker</a> installed on your local computer, <strong>or</strong> access to a Debian Docker Instance, which is available on the <a href="https://www.ovh.com/manager/public-cloud/" data-wpel-link="exclude"><em>Public Cloud</em></a></li>
</ul>



<p>Once these conditions have been met, you are ready to serve your LLMs.</p>



<h3 class="wp-block-heading">Building a Docker image</h3>



<p>Since the <a href="https://www.ovhcloud.com/en/public-cloud/ai-deploy/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><em>OVHcloud AI Deploy</em></a> solution is based on <a href="https://www.docker.com/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><em>Docker</em></a> images, we will be using a <em>Docker</em> image to deploy our <em>vLLM</em> inference server. </p>



<p>As a reminder, <em>Docker</em> is a platform that allows you to create, deploy, and run applications in containers. <em>Docker</em> containers are standalone and executable packages that include everything needed to run an application (code, libraries, system tools).</p>



<p>To create this <em>Docker</em> image, we will need to write the following <em><strong>Dockerfile</strong></em> into a new folder:</p>



<pre class="wp-block-code"><code lang="bash" class="language-bash">mkdir my_vllm_image
nano Dockerfile</code></pre>



<pre class="wp-block-code"><code lang="bash" class="language-bash"># 🐳 Base image
FROM pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime

# 👱 Set the working directory inside the container
WORKDIR /workspace

# 📚 Install missing system packages (git) so we can clone the vLLM project repository
RUN apt-get update &amp;&amp; apt-get install -y git
RUN git clone https://github.com/vllm-project/vllm/

# 📚 Install the Python dependencies
RUN pip3 install --upgrade pip
RUN pip3 install vllm 

# 🔑 Give correct access rights to the OVHcloud user
ENV HOME=/workspace
RUN chown -R 42420:42420 /workspace</code></pre>



<p>Let&#8217;s take a closer look at this <em>Dockerfile</em> to understand it:</p>



<ul class="wp-block-list">
<li><strong>FROM</strong>: Specify the base image for our <em>Docker</em> Image. We choose the <em>PyTorch</em> image since it comes with <em>CUDA</em>, <em>CuDNN</em> and <em>torch</em>, which is needed by <em>vLLM</em>. </li>



<li><strong>WORKDIR /workspace</strong>: We set the working directory for the <em>Docker</em> container to <em>/workspace</em>, which is the default folder when we use <em>AI Deploy</em>.</li>



<li><strong>RUN</strong>: It allows us to upgrade <em>pip</em> to the latest version to make sure we have access to the latest libraries and dependencies. We will install <em>vLLM</em> library, and <em>git</em>, which will enable to clone the <a href="https://github.com/vllm-project/vllm/tree/main" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><em>vLLM</em> repository</a> into th<em>e /workspace</em> directory.</li>



<li><strong>ENV</strong> HOME=/workspace: This sets the <em>HOME</em> environment variable to <em>/workspace</em>. This is a requirement to use the <em>OVHcloud</em> AI Products.</li>



<li><strong>RUN chown -R 42420:42420 /workspace</strong>: This changes the owner of the <em>/workspace</em> directory to the user and group with IDs of <em>42420</em> (<em>OVHcloud</em> user). This is also a requirement to use the <em>OVHcloud</em> AI Products.</li>
</ul>



<p>This <em>Dockerfile</em> does not contain a <strong>CMD</strong> instruction and therefore does not launch our <em>VLLM</em> server. Do not worry about that, we will do it directly from <a href="https://www.ovhcloud.com/en/public-cloud/ai-deploy/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Deploy</a>&nbsp;to have more flexibility.</p>



<p>Once your Dockerfile is written, launch the following command to build your image:</p>



<pre class="wp-block-code"><code lang="bash" class="language-bash">docker build . -t vllm_image:latest</code></pre>



<h3 class="wp-block-heading">Push the image into the shared registry</h3>



<p>Once you have built the Docker image, you will need to push it to a <strong>registry</strong> to make it accessible from <em>AI Deploy</em>. A <strong>registry</strong> is a service that allows you to store and distribute <em>Docker</em> images, making it easy to deploy them in different environments.</p>



<p>Several registries can be used (<em><a href="https://www.ovhcloud.com/en-gb/public-cloud/managed-private-registry/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OVHcloud Managed Private Registry</a>, <a href="https://hub.docker.com/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Docker Hub</a>, <a href="https://github.com/features/packages" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">GitHub packages</a>, &#8230;</em>). In this tutorial, we will use the <strong><em>OVHcloud</em> <em>shared registry</em></strong>. More information are available in the <a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-manage-registries?id=kb_article_view&amp;sysparm_article=KB0057949" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Registries documentation</a>.</p>



<p>To find the address of your shared registry, use the following command (<a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-cli-install-client?id=kb_article_view&amp;sysparm_article=KB0047844" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><em>ovhai CLI</em></a> needs to be installed on your computer):</p>



<pre class="wp-block-code"><code lang="bash" class="language-bash">ovhai registry list</code></pre>



<p>Then, log in on your <em>shared registry</em> with your usual <a href="https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-users?id=kb_article_view&amp;sysparm_article=KB0048170" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><em>AI Platform user</em></a> credentials:</p>



<pre class="wp-block-code"><code lang="bash" class="language-bash">docker login -u &lt;user&gt; -p &lt;password&gt; &lt;shared-registry-address&gt;</code></pre>



<p>Once you are logged in to the registry, tag the compiled image and push it into your shared registry:</p>



<pre class="wp-block-code"><code lang="bash" class="language-bash">docker tag vllm_image:latest &lt;shared-registry-address&gt;/vllm_image:latest
docker push &lt;shared-registry-address&gt;/vllm_image:latest</code></pre>



<h3 class="wp-block-heading">vLLM inference server deployment</h3>



<p>Once your image has been pushed, it can be used with <em>AI Deploy</em>, using either the <em>ovhai CLI</em> or the <em>OVHcloud Control Panel (UI)</em>.</p>



<h5 class="wp-block-heading">Creating an access token </h5>



<p>Tokens are used as unique authenticators to securely access the <em>AI Deploy</em> apps. By creating a token, you can ensure that only authorized requests are allowed to interact with the <em>vLLM</em> endpoint. You can create this token by using the <em>OVHcloud Control Panel (UI)</em> or by running the following command:</p>



<pre class="wp-block-code"><code lang="" class="">ovhai token create vllm --role operator --label-selector name=vllm</code></pre>



<p>This will give you a token that you will need to keep.</p>



<h5 class="wp-block-heading">Creating a Hugging Face token (optionnal)</h5>



<p>Note that some models, such as <a href="https://huggingface.co/meta-llama/Meta-Llama-3-8B" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">LLaMA 3</a> require you to accept their license, hence, you need to create a <a href="https://huggingface.co/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">HuggingFace account</a>, accept the model’s license, and generate a <a href="https://huggingface.co/settings/tokens" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">token</a> by accessing your account settings, that will allow you to access the model.</p>



<p>For example, when visiting the HugginFace <a href="https://huggingface.co/google/gemma-2b" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Gemma model page</a>, you’ll see this (if you are logged in):</p>



<figure class="wp-block-image size-full"><img decoding="async" width="716" height="312" src="https://blog.ovhcloud.com/wp-content/uploads/2024/05/Screenshot-2024-05-22-at-14.15.21.png" alt="accept_model_conditions_hugging_face" class="wp-image-26768" srcset="https://blog.ovhcloud.com/wp-content/uploads/2024/05/Screenshot-2024-05-22-at-14.15.21.png 716w, https://blog.ovhcloud.com/wp-content/uploads/2024/05/Screenshot-2024-05-22-at-14.15.21-300x131.png 300w" sizes="(max-width: 716px) 100vw, 716px" /></figure>



<p>If you want to use this model, you will have to Acknowledge the license, and then make sure to create a token in the <a href="https://huggingface.co/settings/tokens" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">tokens section</a>.</p>



<p>In the next step, we will set this token as an environment variable (named  <code>HF_TOKEN</code>). Doing this will enable us to use any LLM whose conditions of use we have accepted.</p>



<h5 class="wp-block-heading">Run the AI Deploy application</h5>



<p>Run the following command to deploy your <em>vLLM</em> server by running your customized <em>Docker</em> image:</p>



<pre class="wp-block-code"><code lang="" class="">ovhai app run &lt;shared-registry-address&gt;/vllm_image:latest \
  --name vllm_app \
  --flavor h100-1-gpu \
  --gpu 1 \
  --env HF_TOKEN="&lt;YOUR_HUGGING_FACE_TOKEN&gt;" \
  --label name=vllm \
  --default-http-port 8080 \
  -- python -m vllm.entrypoints.api_server --host 0.0.0.0 --port 8080 --model &lt;model&gt; --dtype half</code></pre>



<p><em>You just need to change the address of your registry to the one you used, and the name of the LLM you want to use. Also pay attention to the name of the image, its tag, and the label selector of your label if you haven&#8217;t used the same ones as those given in this tutorial.</em></p>



<p><strong>Parameters explanation</strong></p>



<ul class="wp-block-list">
<li><code>&lt;shared-registry-address&gt;/vllm_image:latest</code> is the image on which the app is based.</li>



<li><code>--name vllm_app</code> is an optional argument that allows you to give your app a custom name, making it easier to manage all your apps.</li>



<li><code>--flavor h100-1-gpu</code> indicates that we want to run our app on H100 GPU(s). You can access the full list of GPUs available by <code>running ovhai capabilities flavor list</code></li>



<li><code>--gpu 1</code> indicates that we request 1 GPU for that app.</li>



<li><code>--env HF_TOKEN</code> is an optional argument that allows us to set our Hugging Face token as an environment variable. This gives us access to models for which we have accepted the conditions.</li>



<li><code>--label name=vllm</code> allows to privatize our LLM by adding the token corresponding to the label selector <code>name=vllm</code>.</li>



<li><code>--default-http-port 8080</code> indicates that the port to reach on the app URL is the <code>8080</code>.</li>



<li><code>--python -m vllm.entrypoints.api_server --host 0.0.0.0 --port 8080 --model &lt;model&gt;</code> allows to start the vLLM API server. The specified &lt;model&gt; will be downloaded from Hugging Face. Here is a list of those that are <a href="https://docs.vllm.ai/en/latest/models/supported_models.html" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">supported by vLLM</a>. <a href="https://docs.vllm.ai/en/latest/models/engine_args.html" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Many arguments</a> can be used to optimize your inference.</li>
</ul>



<p>When this <code>ovhai app run</code> command is executed, several pieces of information will appear in your terminal. Get the ID of your application, and open the Info URL in a new tab. Wait a few minutes for your application to launch. When it is <strong>RUNNING</strong>, you can stream its logs by executing:</p>



<pre class="wp-block-code"><code class="">ovhai app logs -f &lt;APP_ID&gt;</code></pre>



<p>This will allow you to track the server launch, the model download and any errors you may encounter if you have used a model for which you have not accepted the user contract. </p>



<p>If all goes well, you should see the following output, which means that your server is up and running:</p>



<pre class="wp-block-code"><code class="">Started server process [11]
Waiting for application startup.
Application startup complete.
Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)</code></pre>



<h3 class="wp-block-heading">Interacting with your LLM</h3>



<p>Once the server is up and running, we can interact with our LLM by hitting the <code>/generate</code> endpoint.</p>



<p><strong>Using cURL</strong></p>



<p><em>Make sure you change the ID to that of your application so that you target the right endpoint. In order for the request to be accepted, also specify the token that you generated previously by executing</em> <code>ovhai token create</code>. Feel free to adapt the parameters of the request (<em>prompt</em>, <em>max_tokens</em>, <em>temperature</em>, &#8230;)</p>



<pre class="wp-block-code"><code lang="bash" class="language-bash">curl --request POST \                                             
  --url https://&lt;APP_ID&gt;.app.gra.ai.cloud.ovh.net/generate \
  --header 'Authorization: Bearer &lt;AI_TOKEN_generated_with_CLI&gt;' \
  --header 'Content-Type: application/json' \
  --data '{
        "prompt": "&lt;YOUR_PROMPT&gt;",
        "max_tokens": 50,
        "n": 1,
        "stream": false
}'</code></pre>



<p><strong>Using Python</strong></p>



<p><em>Here too, you need to add your personal token and the correct link for your application.</em></p>



<pre class="wp-block-code"><code lang="python" class="language-python">import requests
import json

# change for your host
APP_URL = "https://&lt;APP_ID&gt;.app.gra.ai.cloud.ovh.net"
TOKEN = "AI_TOKEN_generated_with_CLI"

url = f"{APP_URL}/generate"

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {TOKEN}"
}
data = {
    "prompt": "What a LLM is in AI?",
    "max_tokens": 100,
    "temperature": 0
}

response = requests.post(url, headers=headers, data=json.dumps(data))

print(response.json()["text"][0])</code></pre>



<h3 class="wp-block-heading" id="AI-ENDPOINTS">OVHcloud AI Endpoints</h3>



<p>If you are not interested in building your own image and deploying your own LLM inference server, you can use OVHcloud&#8217;s new <em><strong><a href="https://endpoints.ai.cloud.ovh.net/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">AI Endpoints</a></strong> </em>product which will make your life definitely easier!</p>



<p><a href="https://endpoints.ai.cloud.ovh.net/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><em>AI Endpoints</em></a> is a serverless solution that provides AI APIs, enabling you to easily use pre-trained and optimized AI models in your applications. </p>



<figure class="wp-block-video"><video height="1400" style="aspect-ratio: 2560 / 1400;" width="2560" controls src="https://blog.ovhcloud.com/wp-content/uploads/2024/05/demo-ai-endpoints.mp4"></video></figure>



<p class="has-text-align-center"><em>Overview of AI Endpoints</em></p>



<p>You can use LLM as a Service, choosing the desired model (such as <em>LLaMA</em>, <em>Mistral</em>, or <em>Mixtral</em>) and making an API call to use it in your application. This will allow you to interact with these models without even having to deploy them!</p>



<p>In addition to LLM capabilities, <a href="https://endpoints.ai.cloud.ovh.net/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><em>AI Endpoints</em></a> also offers a range of other AI models, including speech-to-text, translation, summarization, embeddings and computer vision. </p>



<p>Best of all, <a href="https://endpoints.ai.cloud.ovh.net/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><em>AI Endpoints</em></a> is currently in alpha phase and is <strong>free to use</strong>, making it an accessible and affordable solution for developers seeking to explore the possibilities of AI. Check <a href="https://blog.ovhcloud.com/enhance-your-applications-with-ai-endpoints/" data-wpel-link="internal">this article</a> and try it out today to discover the power of AI!</p>



<p>Join our <a href="https://discord.gg/ovhcloud" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Discord server</a> to interact with the community and send us your feedbacks (#<em>ai-endpoints</em> channel)!</p>
<img decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fhow-to-serve-llms-with-vllm-and-ovhcloud-ai-deploy%2F&amp;action_name=How%20to%20serve%20LLMs%20with%20vLLM%20and%20OVHcloud%20AI%20Deploy&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		<enclosure url="https://blog.ovhcloud.com/wp-content/uploads/2024/05/demo-ai-endpoints.mp4" length="14424826" type="video/mp4" />

			</item>
		<item>
		<title>Fine-Tuning LLaMA 2 Models using a single GPU, QLoRA and AI Notebooks</title>
		<link>https://blog.ovhcloud.com/fine-tuning-llama-2-models-using-a-single-gpu-qlora-and-ai-notebooks/</link>
		
		<dc:creator><![CDATA[Mathieu Busquet]]></dc:creator>
		<pubDate>Fri, 21 Jul 2023 15:04:00 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[AI Notebooks]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Deep learning]]></category>
		<category><![CDATA[Fine-tuning]]></category>
		<category><![CDATA[GPU]]></category>
		<category><![CDATA[LLaMa 2]]></category>
		<category><![CDATA[Machine learning]]></category>
		<category><![CDATA[PyTorch]]></category>
		<category><![CDATA[QLoRA]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=25613</guid>

					<description><![CDATA[In this tutorial, we will walk you through the process of fine-tuning LLaMA 2 models, providing step-by-step instructions. All the code related to this article is available in our dedicated GitHub repository. You can reproduce all the experiments with OVHcloud AI Notebooks. Introduction On July 18, 2023, Meta released LLaMA 2, the latest version of [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Ffine-tuning-llama-2-models-using-a-single-gpu-qlora-and-ai-notebooks%2F&amp;action_name=Fine-Tuning%20LLaMA%202%20Models%20using%20a%20single%20GPU%2C%20QLoRA%20and%20AI%20Notebooks&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p><em>In this tutorial, we will walk you through the process of fine-tuning <a href="https://ai.meta.com/llama/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">LLaMA 2</a> models, providing step-by-step instructions.</em> </p>



<figure class="wp-block-image aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2023/07/IMG_1564-1024x538.jpg" alt="Fine-Tuning LLaMA 2 Models with a single GPU and OVHcloud" class="wp-image-25629" width="512" height="269" srcset="https://blog.ovhcloud.com/wp-content/uploads/2023/07/IMG_1564-1024x538.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2023/07/IMG_1564-300x158.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2023/07/IMG_1564-768x404.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2023/07/IMG_1564.jpg 1199w" sizes="auto, (max-width: 512px) 100vw, 512px" /></figure>



<p class="has-text-align-center"><em>All the code related to this article is available in our dedicated <a href="https://github.com/ovh/ai-training-examples/blob/main/notebooks/natural-language-processing/llm/miniconda/llama2-fine-tuning/llama_2_finetuning.ipynb" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">GitHub repository</a><a href="https://github.com/ovh/ai-training-examples/tree/main/apps/streamlit/speech-to-text" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">.</a> You can reproduce all the experiments with</em> <a href="https://www.ovhcloud.com/en-gb/public-cloud/ai-notebooks/" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">OVHcloud AI Notebooks</a>.</p>



<h3 class="wp-block-heading">Introduction</h3>



<p>On July 18, 2023, <a href="https://about.meta.com/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Meta</a> released <a href="https://ai.meta.com/llama/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">LLaMA 2</a>, the latest version of their <strong>Large Language Model </strong>(LLM).</p>



<p>Trained between January 2023 and July 2023 on 2 trillion tokens, these new models outperforms other LLMs on many benchmarks, including reasoning, coding, proficiency, and knowledge tests. This release comes in different flavors, with parameter sizes of <strong>7B</strong>, <strong>13B</strong>, and a mind-blowing <strong>70B</strong>. Models are intended for free for both commercial and research use in English.</p>



<p>To suit every text generation needed and fine-tune these models, we will use <a href="https://arxiv.org/abs/2305.14314" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">QLoRA (Efficient Finetuning of Quantized LLMs)</a>, a highly efficient fine-tuning technique that involves quantizing a pretrained LLM to just 4 bits and adding small &#8220;Low-Rank Adapters&#8221;. This unique approach allows for fine-tuning LLMs <strong>using just a single GPU</strong>! This technique is supported by the <a href="https://huggingface.co/docs/peft/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">PEFT library</a>.</p>



<p>To fine-tune our model, we will create <em>a</em> <a href="https://www.ovhcloud.com/en-gb/public-cloud/ai-notebooks/" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">OVHcloud AI Notebooks</a> with only 1 GPU.</p>



<h3 class="wp-block-heading">Mandatory requirements</h3>



<p>To successfully fine-tune LLaMA 2 models, you will need the following:</p>



<ul class="wp-block-list">
<li>Fill Meta&#8217;s form to <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">request access to the next version of Llama</a>. Indeed, the use of Llama 2 is governed by the Meta license, that you must accept in order to download the model weights and tokenizer.<br></li>



<li>Have a <a href="https://huggingface.co/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Hugging Face</a> account (with the same email address you entered in Meta&#8217;s form).<br></li>



<li>Have a <a href="https://huggingface.co/settings/tokens" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Hugging Face token</a>.<br></li>



<li>Visit the page of one of the LLaMA 2 available models (version <a href="https://huggingface.co/meta-llama/Llama-2-7b-hf" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">7B</a>, <a href="https://huggingface.co/meta-llama/Llama-2-13b-hf" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">13B</a> or <a href="https://huggingface.co/meta-llama/Llama-2-70b-hf" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">70B</a>), and accept Hugging Face&#8217;s license terms and acceptable use policy.<br></li>



<li>Log in to the Hugging Face model Hub from your notebook&#8217;s terminal by running the <code>huggingface-cli login</code> command, and enter your token. You will not need to add your token as git credential.<br></li>



<li>Powerful Computing Resources: Fine-tuning the Llama 2 model requires substantial computational power. Ensure you are running code on GPU(s) when using <a href="https://www.ovhcloud.com/en-gb/public-cloud/ai-notebooks/" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">AI Notebooks</a> or <a href="https://www.ovhcloud.com/en/public-cloud/ai-training/" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">AI Training</a>.</li>
</ul>



<h3 class="wp-block-heading">Set up your Python environment</h3>



<p>Create the following <code>requirements.txt</code> file:</p>



<pre class="wp-block-code"><code lang="" class="">torch
accelerate @ git+https://github.com/huggingface/accelerate.git
bitsandbytes
datasets==2.13.1
transformers @ git+https://github.com/huggingface/transformers.git
peft @ git+https://github.com/huggingface/peft.git
trl @ git+https://github.com/lvwerra/trl.git
scipy</code></pre>



<p>Then install and import the installed libraries:</p>



<pre class="wp-block-code"><code class="">pip install -r requirements.txt</code></pre>



<pre class="wp-block-code"><code lang="python" class="language-python">import argparse
import bitsandbytes as bnb
from datasets import load_dataset
from functools import partial
import os
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training, AutoPeftModelForCausalLM
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed, Trainer, TrainingArguments, BitsAndBytesConfig, \
    DataCollatorForLanguageModeling, Trainer, TrainingArguments
from datasets import load_dataset</code></pre>



<h3 class="wp-block-heading">Download LLaMA 2 model</h3>



<p>As mentioned before, LLaMA 2 models come in different flavors which are 7B, 13B, and 70B. Your choice can be influenced by your computational resources. Indeed, larger models require more resources, memory, processing power, and training time.</p>



<p>To download the model you have been granted access to, <strong>make sure you are logged in to the Hugging Face model hub</strong>. As mentioned in the requirements step, you need to use the <code>huggingface-cli login</code> command.</p>



<p>The following function will help us to download the model and its tokenizer. It requires a bitsandbytes configuration that we will define later.</p>



<pre class="wp-block-code"><code lang="python" class="language-python">def load_model(model_name, bnb_config):
    n_gpus = torch.cuda.device_count()
    max_memory = f'{40960}MB'

    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        quantization_config=bnb_config,
        device_map="auto", # dispatch efficiently the model on the available ressources
        max_memory = {i: max_memory for i in range(n_gpus)},
    )
    tokenizer = AutoTokenizer.from_pretrained(model_name, use_auth_token=True)

    # Needed for LLaMA tokenizer
    tokenizer.pad_token = tokenizer.eos_token

    return model, tokenizer</code></pre>



<h3 class="wp-block-heading">Download a Dataset</h3>



<p>There are many datasets that can help you fine-tune your model. You can even use your own dataset!</p>



<p>In this tutorial, we are going to download and use the <a href="https://huggingface.co/datasets/databricks/databricks-dolly-15k" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Databricks Dolly 15k dataset</a>, which contains <strong>15,000 prompt/response pairs</strong>. It was crafted by over 5,000 <a href="https://www.databricks.com/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Databricks</a> employees during March and April of 2023.</p>



<p>This dataset is designed specifically for fine-tuning large language models. Released under the <a href="https://creativecommons.org/licenses/by-sa/3.0/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">CC BY-SA 3.0 license</a>, it can be used, modified, and extended by any individual or company, even for commercial applications. So it&#8217;s a perfect fit for our use case!</p>



<p>However, like most datasets, this one has <strong>its limitations</strong>. Indeed, pay attention to the following points:</p>



<ul class="wp-block-list">
<li>It consists of content collected from the public internet, which means it may contain objectionable, incorrect or biased content and typo errors, which could influence the behavior of models fine-tuned using this dataset.<br></li>



<li>Since the dataset has been created for Databricks by their own employees, it&#8217;s worth noting that the dataset reflects the interests and semantic choices of Databricks employees, which may not be representative of the global population at large.<br></li>



<li>We only have access to the <code>train</code> split of the dataset, which is its largest subset.</li>
</ul>



<pre class="wp-block-code"><code lang="python" class="language-python"># Load the databricks dataset from Hugging Face
from datasets import load_dataset

dataset = load_dataset("databricks/databricks-dolly-15k", split="train")</code></pre>



<h3 class="wp-block-heading">Explore dataset</h3>



<p>Once the dataset is downloaded, we can take a look at it to understand what it contains:</p>



<pre class="wp-block-code"><code lang="python" class="language-python">print(f'Number of prompts: {len(dataset)}')
print(f'Column names are: {dataset.column_names}')

*** OUTPUT ***
Number of prompts: 15011
Column Names are: ['instruction', 'context', 'response', 'category']</code></pre>



<p>As we can see, each sample is a dictionary that contains:</p>



<ul class="wp-block-list">
<li><strong>An instruction</strong>: What could be entered by the user, such as a question</li>



<li><strong>A context</strong>: Help to interpret the sample</li>



<li><strong>A response</strong>: Answer to the instruction</li>



<li><strong>A category</strong>: Classify the sample between Open Q&amp;A, Closed Q&amp;A, Extract information from Wikipedia, Summarize information from Wikipedia, Brainstorming, Classification, Creative writing</li>
</ul>



<h3 class="wp-block-heading">Pre-processing dataset</h3>



<p><strong>Instruction fine-tuning</strong> is a common technique used to fine-tune a base LLM for a specific downstream use-case.</p>



<p>It will help us to format our prompts as follows: </p>



<pre class="wp-block-code"><code class="">Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Sea or Mountain

### Response:
I believe Mountain are more attractive but Ocean has it's own beauty and this tropical weather definitely turn you on! SO 50% 50%

### End</code></pre>



<p>To delimit each prompt part by hashtags, we can use the following function:</p>



<pre class="wp-block-code"><code lang="python" class="language-python">def create_prompt_formats(sample):
    """
    Format various fields of the sample ('instruction', 'context', 'response')
    Then concatenate them using two newline characters 
    :param sample: Sample dictionnary
    """

    INTRO_BLURB = "Below is an instruction that describes a task. Write a response that appropriately completes the request."
    INSTRUCTION_KEY = "### Instruction:"
    INPUT_KEY = "Input:"
    RESPONSE_KEY = "### Response:"
    END_KEY = "### End"
    
    blurb = f"{INTRO_BLURB}"
    instruction = f"{INSTRUCTION_KEY}\n{sample['instruction']}"
    input_context = f"{INPUT_KEY}\n{sample['context']}" if sample["context"] else None
    response = f"{RESPONSE_KEY}\n{sample['response']}"
    end = f"{END_KEY}"
    
    parts = [part for part in [blurb, instruction, input_context, response, end] if part]

    formatted_prompt = "\n\n".join(parts)
    
    sample["text"] = formatted_prompt

    return sample</code></pre>



<p>Now, we will use our <strong>model tokenizer to process these prompts into tokenized ones</strong>. </p>



<p>The goal is to create input sequences of uniform length (which are suitable for fine-tuning the language model because it maximizes efficiency and minimize computational overhead), that must not exceed the model&#8217;s maximum token limit.</p>



<pre class="wp-block-code"><code lang="python" class="language-python"># SOURCE https://github.com/databrickslabs/dolly/blob/master/training/trainer.py
def get_max_length(model):
    conf = model.config
    max_length = None
    for length_setting in ["n_positions", "max_position_embeddings", "seq_length"]:
        max_length = getattr(model.config, length_setting, None)
        if max_length:
            print(f"Found max lenth: {max_length}")
            break
    if not max_length:
        max_length = 1024
        print(f"Using default max length: {max_length}")
    return max_length


def preprocess_batch(batch, tokenizer, max_length):
    """
    Tokenizing a batch
    """
    return tokenizer(
        batch["text"],
        max_length=max_length,
        truncation=True,
    )


# SOURCE https://github.com/databrickslabs/dolly/blob/master/training/trainer.py
def preprocess_dataset(tokenizer: AutoTokenizer, max_length: int, seed, dataset: str):
    """Format &amp; tokenize it so it is ready for training
    :param tokenizer (AutoTokenizer): Model Tokenizer
    :param max_length (int): Maximum number of tokens to emit from tokenizer
    """
    
    # Add prompt to each sample
    print("Preprocessing dataset...")
    dataset = dataset.map(create_prompt_formats)#, batched=True)
    
    # Apply preprocessing to each batch of the dataset &amp; and remove 'instruction', 'context', 'response', 'category' fields
    _preprocessing_function = partial(preprocess_batch, max_length=max_length, tokenizer=tokenizer)
    dataset = dataset.map(
        _preprocessing_function,
        batched=True,
        remove_columns=["instruction", "context", "response", "text", "category"],
    )

    # Filter out samples that have input_ids exceeding max_length
    dataset = dataset.filter(lambda sample: len(sample["input_ids"]) &lt; max_length)
    
    # Shuffle dataset
    dataset = dataset.shuffle(seed=seed)

    return dataset</code></pre>



<p>With these functions, our dataset will be ready for fine-tuning !</p>



<h3 class="wp-block-heading">Create a bitsandbytes configuration</h3>



<p>This will allow us to load our LLM in 4 bits. This way, we can divide the used memory by 4 and import the model on smaller devices. We choose to apply bfloat16 compute data type and nested quantization for memory-saving purposes.</p>



<pre class="wp-block-code"><code lang="python" class="language-python">def create_bnb_config():
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16,
    )

    return bnb_config</code></pre>



<p>To leverage the LoRa method, we need to wrap the model as a PeftModel.</p>



<p>To do this, we need to implement a <a href="https://huggingface.co/docs/peft/conceptual_guides/lora" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">LoRa configuration</a>:</p>



<pre class="wp-block-code"><code lang="python" class="language-python">def create_peft_config(modules):
    """
    Create Parameter-Efficient Fine-Tuning config for your model
    :param modules: Names of the modules to apply Lora to
    """
    config = LoraConfig(
        r=16,  # dimension of the updated matrices
        lora_alpha=64,  # parameter for scaling
        target_modules=modules,
        lora_dropout=0.1,  # dropout probability for layers
        bias="none",
        task_type="CAUSAL_LM",
    )

    return config</code></pre>



<p>Previous function needs the <strong>target modules</strong> to update the necessary matrices. The following function will get them for our model:</p>



<pre class="wp-block-code"><code lang="python" class="language-python"># SOURCE https://github.com/artidoro/qlora/blob/main/qlora.py

def find_all_linear_names(model):
    cls = bnb.nn.Linear4bit #if args.bits == 4 else (bnb.nn.Linear8bitLt if args.bits == 8 else torch.nn.Linear)
    lora_module_names = set()
    for name, module in model.named_modules():
        if isinstance(module, cls):
            names = name.split('.')
            lora_module_names.add(names[0] if len(names) == 1 else names[-1])

    if 'lm_head' in lora_module_names:  # needed for 16-bit
        lora_module_names.remove('lm_head')
    return list(lora_module_names)</code></pre>



<p>Once everything is set up and the base model is prepared, we can use the <em>print_trainable_parameters()</em> helper function to see how many trainable parameters are in the model. </p>



<pre class="wp-block-code"><code lang="python" class="language-python">def print_trainable_parameters(model, use_4bit=False):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        num_params = param.numel()
        # if using DS Zero 3 and the weights are initialized empty
        if num_params == 0 and hasattr(param, "ds_numel"):
            num_params = param.ds_numel

        all_param += num_params
        if param.requires_grad:
            trainable_params += num_params
    if use_4bit:
        trainable_params /= 2
    print(
        f"all params: {all_param:,d} || trainable params: {trainable_params:,d} || trainable%: {100 * trainable_params / all_param}"
    )</code></pre>



<p>We expect the LoRa model to have fewer trainable parameters compared to the original one, since we want to perform fine-tuning.</p>



<h3 class="wp-block-heading">Train</h3>



<p>Now that everything is ready, we can pre-process our dataset and load our model using the set configurations: </p>



<pre class="wp-block-code"><code lang="python" class="language-python"># Load model from HF with user's token and with bitsandbytes config

model_name = "meta-llama/Llama-2-7b-hf" 

bnb_config = create_bnb_config()

model, tokenizer = load_model(model_name, bnb_config)</code></pre>



<pre class="wp-block-code"><code lang="python" class="language-python">## Preprocess dataset

max_length = get_max_length(model)

dataset = preprocess_dataset(tokenizer, max_length, seed, dataset)</code></pre>



<p>Then, we can run our fine-tuning process: </p>



<pre class="wp-block-code"><code lang="python" class="language-python">def train(model, tokenizer, dataset, output_dir):
    # Apply preprocessing to the model to prepare it by
    # 1 - Enabling gradient checkpointing to reduce memory usage during fine-tuning
    model.gradient_checkpointing_enable()

    # 2 - Using the prepare_model_for_kbit_training method from PEFT
    model = prepare_model_for_kbit_training(model)

    # Get lora module names
    modules = find_all_linear_names(model)

    # Create PEFT config for these modules and wrap the model to PEFT
    peft_config = create_peft_config(modules)
    model = get_peft_model(model, peft_config)
    
    # Print information about the percentage of trainable parameters
    print_trainable_parameters(model)
    
    # Training parameters
    trainer = Trainer(
        model=model,
        train_dataset=dataset,
        args=TrainingArguments(
            per_device_train_batch_size=1,
            gradient_accumulation_steps=4,
            warmup_steps=2,
            max_steps=20,
            learning_rate=2e-4,
            fp16=True,
            logging_steps=1,
            output_dir="outputs",
            optim="paged_adamw_8bit",
        ),
        data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False)
    )
    
    model.config.use_cache = False  # re-enable for inference to speed up predictions for similar inputs
    
    ### SOURCE https://github.com/artidoro/qlora/blob/main/qlora.py
    # Verifying the datatypes before training
    
    dtypes = {}
    for _, p in model.named_parameters():
        dtype = p.dtype
        if dtype not in dtypes: dtypes[dtype] = 0
        dtypes[dtype] += p.numel()
    total = 0
    for k, v in dtypes.items(): total+= v
    for k, v in dtypes.items():
        print(k, v, v/total)
     
    do_train = True
    
    # Launch training
    print("Training...")
    
    if do_train:
        train_result = trainer.train()
        metrics = train_result.metrics
        trainer.log_metrics("train", metrics)
        trainer.save_metrics("train", metrics)
        trainer.save_state()
        print(metrics)    
    
    ###
    
    # Saving model
    print("Saving last checkpoint of the model...")
    os.makedirs(output_dir, exist_ok=True)
    trainer.model.save_pretrained(output_dir)
    
    # Free memory for merging weights
    del model
    del trainer
    torch.cuda.empty_cache()
    
    
output_dir = "results/llama2/final_checkpoint"
train(model, tokenizer, dataset, output_dir)</code></pre>



<p><em>If you prefer to have a number of epochs (entire training dataset will be passed through the model) instead of a number of training steps (forward and backward passes through the model with one batch of data), you can replace the <code>max_steps</code> argument by <code>num_train_epochs</code>.</em></p>



<p>To later load and use the model for inference, we have used the <code>trainer.model.save_pretrained(output_dir)</code> function, which saves the fine-tuned model&#8217;s weights, configuration, and tokenizer files.</p>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2023/07/finetuning-llama2-results-1024x498.png" alt="" class="wp-image-25619" width="870" height="422" srcset="https://blog.ovhcloud.com/wp-content/uploads/2023/07/finetuning-llama2-results-1024x498.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2023/07/finetuning-llama2-results-300x146.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2023/07/finetuning-llama2-results-768x374.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2023/07/finetuning-llama2-results.png 1320w" sizes="auto, (max-width: 870px) 100vw, 870px" /></figure>



<p class="has-text-align-center">Fine-tuning llama2 results on <a href="https://huggingface.co/datasets/databricks/databricks-dolly-15k" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">databricks-dolly-15k</a> dataset</p>



<p>Unfortunately, it is possible that the latest weights are not the best. To solve this problem, you can implement a <code>EarlyStoppingCallback</code>, from transformers, during your fine-tuning. This will enable you to regularly test your model on the validation set, if you have one, and keep only the best weights.</p>



<h3 class="wp-block-heading">Merge weights</h3>



<p>Once we have our fine-tuned weights, we can build our fine-tuned model and save it to a new directory, with its associated tokenizer. By performing these steps, we can have a memory-efficient fine-tuned model and tokenizer ready for inference!</p>



<pre class="wp-block-code"><code lang="python" class="language-python">model = AutoPeftModelForCausalLM.from_pretrained(output_dir, device_map="auto", torch_dtype=torch.bfloat16)
model = model.merge_and_unload()

output_merged_dir = "results/llama2/final_merged_checkpoint"
os.makedirs(output_merged_dir, exist_ok=True)
model.save_pretrained(output_merged_dir, safe_serialization=True)

# save tokenizer for easy inference
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.save_pretrained(output_merged_dir)</code></pre>



<h3 class="wp-block-heading">Conclusion</h3>



<p>We hope you have enjoyed this article!</p>



<p>You are now able to fine-tune LLaMA 2 models on your own datasets!</p>



<p>In our next tutorial, you will discover how to <strong>Deploy your Fine-tuned LLM on <a href="https://www.ovhcloud.com/en/public-cloud/ai-deploy/" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">OVHcloud AI Deploy</a> for inference</strong>!</p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Ffine-tuning-llama-2-models-using-a-single-gpu-qlora-and-ai-notebooks%2F&amp;action_name=Fine-Tuning%20LLaMA%202%20Models%20using%20a%20single%20GPU%2C%20QLoRA%20and%20AI%20Notebooks&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>AI Notebooks: analyze and classify sounds with AI</title>
		<link>https://blog.ovhcloud.com/ai-notebooks-analyze-and-classify-sounds-with-ai/</link>
		
		<dc:creator><![CDATA[Eléa Petton]]></dc:creator>
		<pubDate>Fri, 04 Mar 2022 08:57:00 +0000</pubDate>
				<category><![CDATA[OVHcloud Engineering]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[AI Notebook]]></category>
		<category><![CDATA[AI Solutions]]></category>
		<category><![CDATA[Deep learning]]></category>
		<category><![CDATA[Machine learning]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=21594</guid>

					<description><![CDATA[A guide to analyze and classify marine mammal sounds. Since you&#8217;re reading a blog post from a technology company, I bet you&#8217;ve heard about AI, Machine and Deep Learning many times before. Audio or sound classification is a technique with multiple applications in the field of AI and data science. Use cases Acoustic data classification: [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fai-notebooks-analyze-and-classify-sounds-with-ai%2F&amp;action_name=AI%20Notebooks%3A%20analyze%20and%20classify%20sounds%20with%20AI&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p><em>A guide to analyze and classify <strong>marine mammal sounds</strong>.</em></p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0834-1024x537.jpeg" alt="AI Notebooks: analyze and classify sounds with AI" class="wp-image-22610" width="512" height="269" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0834-1024x537.jpeg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0834-300x157.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0834-768x403.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0834.jpeg 1200w" sizes="auto, (max-width: 512px) 100vw, 512px" /></figure></div>



<p>Since you&#8217;re reading a blog post from a technology company, I bet you&#8217;ve heard about AI, Machine and Deep Learning many times before.</p>



<p>Audio or sound classification is a technique with multiple applications in the field of AI and data science.</p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0835-1024x467.png" alt="AI Notebooks: analyze and classify sounds with AI" class="wp-image-22611" width="768" height="350" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0835-1024x467.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0835-300x137.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0835-768x350.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0835.png 1322w" sizes="auto, (max-width: 768px) 100vw, 768px" /></figure></div>



<h3 class="wp-block-heading" id="Use-cases:">Use cases<a href="http://localhost:8888/notebooks/notebook-marine-sound-classification.ipynb#Use-cases:" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"></a></h3>



<ul class="wp-block-list"><li><strong>Acoustic data classification:</strong></li></ul>



<p>&#8211; identifies location<br>&#8211; differentiates environments<br>&#8211; has a role in ecosystem monitoring</p>



<ul class="wp-block-list"><li><strong>Environmental sound classification:</strong></li></ul>



<p>&#8211; recognition of urban sounds<br>&#8211; used in security system<br>&#8211; used in predictive maintenance<br>&#8211; used to differentiate animal sounds</p>



<ul class="wp-block-list"><li><strong>Music classification:</strong></li></ul>



<p>&#8211; classify music<br>&#8211; <em>key role in:</em> audio libraries organisation by genre, improvement of recommandation algorithms, discovery of trends, listener preferences through data analysis, &#8230;</p>



<ul class="wp-block-list"><li><strong>Natural language classification:</strong></li></ul>



<p>&#8211; human speech classification<br>&#8211; <em>common in:</em> chatbots, virtual assistants, tech-to-speech application, &#8230;</p>



<p>In this article we will look at the <strong>classification of marine mammal sounds</strong>.</p>



<h3 class="wp-block-heading" id="objective">Objective</h3>



<p>The purpose of this article is to explain how to train a model to classify audios using <em>AI Notebooks</em>.<br><br>In this tutorial, the sounds in the dataset are in <em>.wav</em> format. To be able to use them and obtain results, it is necessary to pre-process this data by following different steps.</p>



<ul class="wp-block-list" id="block-c53a8333-8cfa-4558-81f3-827e57035439"><li>Analyse one of these audio recordings</li><li>Transform each sound file into a <em>.csv</em> file</li><li>Train your model from the <em>.csv</em> file</li></ul>



<p><strong>USE CASE:</strong> <a href="https://www.kaggle.com/shreyj1729/best-of-watkins-marine-mammal-sound-database/version/3" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Best of Watkins Marine Mammal Sound Database</a></p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2022/02/Capture-décran-2022-02-01-à-14.40.36-1024x617.png" alt="" class="wp-image-21968" width="781" height="471" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/02/Capture-décran-2022-02-01-à-14.40.36-1024x617.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/02/Capture-décran-2022-02-01-à-14.40.36-300x181.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/02/Capture-décran-2022-02-01-à-14.40.36-768x463.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/02/Capture-décran-2022-02-01-à-14.40.36.png 1031w" sizes="auto, (max-width: 781px) 100vw, 781px" /></figure></div>



<p>This dataset is composed of <strong>55 different folders </strong>corresponding to the marine mammals. In each folder are stored several sound files of each animal.<br><br>You can get more information about this dataset on this <a href="https://cis.whoi.edu/science/B/whalesounds/index.cfm" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">website</a>.<br><br>The data distribution is as follows:</p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0837-1024x681.png" alt="The data distribution " class="wp-image-22615" width="512" height="341" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0837-1024x681.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0837-300x200.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0837-768x511.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0837.png 1114w" sizes="auto, (max-width: 512px) 100vw, 512px" /></figure></div>



<p class="has-ast-global-color-5-color has-ast-global-color-0-background-color has-text-color has-background">⚠️ <em>For this example, we choose only the </em><strong>first 45 classes</strong><em> (or folders).</em></p>



<p>Let&#8217;s follow the different steps!</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="188" src="https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0836-1024x188.png" alt="Data analysis and classification" class="wp-image-22613" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0836-1024x188.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0836-300x55.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0836-768x141.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0836-1536x282.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0836-2048x376.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<h3 class="wp-block-heading" id="audio-libraries">Audio libraries</h3>



<h4 class="wp-block-heading" id="1-loading-an-audio-file-with-librosa">1. Loading an audio file with Librosa</h4>



<p><em>Librosa</em> is a Python module for audio signal analysis. By using <em>Librosa</em>, you can extract key features from the audio samples such as Tempo, Chroma Energy Normalized, Mel-Freqency Cepstral Coefficients, Spectral Centroid, Spectral Contrast, Spectral Rolloff, and Zero Crossing Rate. If you want to know more about this library, refer to the <a href="https://librosa.org/doc/latest/index.html" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">documentation</a>.</p>



<pre class="wp-block-code"><code class="">import librosa
import librosa.display as lplt</code></pre>



<p>You can start by looking at your data by displaying different parameters using the <em>Librosa</em> library.<br><br>First, you can do a test on a file.</p>



<pre class="wp-block-code"><code class="">test_sound = "data/AtlanticSpottedDolphin/61025001.wav"</code></pre>



<p>Loads and decodes the audio.</p>



<pre class="wp-block-code"><code class="">data, sr = librosa.load(test_sound)
print(type(data), type(sr))</code></pre>



<p class="has-ast-global-color-3-color has-text-color"><code>&lt;class 'numpy.ndarray'&gt; &lt;class 'int'&gt;</code></p>



<pre class="wp-block-code"><code class="">librosa.load(test_sound ,sr = 45600)</code></pre>



<p class="has-ast-global-color-3-color has-text-color"><code>(array([-0.0739522 , -0.06588229, -0.06673266, ..., 0.03021295, 0.05592792, 0. ], dtype=float32), 45600)</code></p>



<h4 class="wp-block-heading" id="2-playing-audio-with-ipython-display-audio">2. Playing Audio with IPython.display.Audio</h4>



<p><a href="https://ipython.org/ipython-doc/stable/api/generated/IPython.display.html#IPython.display.Audio" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">IPython.display.Audio</a> advises you play audio directly in a Jupyter notebook.<br><br>Using <em>IPython.display.Audio</em> to play the audio.</p>



<pre class="wp-block-code"><code class="">import IPython

IPython.display.Audio(data, rate = sr)</code></pre>



<div class="wp-block-image"><figure class="aligncenter size-full is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0838.png" alt=" Playing the audio" class="wp-image-22618" width="518" height="130" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0838.png 690w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0838-300x75.png 300w" sizes="auto, (max-width: 518px) 100vw, 518px" /></figure></div>



<h3 class="wp-block-heading" id="visualizing-audio">Visualizing Audio</h3>



<h4 class="wp-block-heading" id="1-waveforms">1. Waveforms</h4>



<p><strong>Waveforms</strong> are visual representations of sound as time on the x-axis and amplitude on the y-axis. They allow for quick analysis of audio data.<br><br>We can plot the audio array using <em>librosa.display.waveplot</em>.</p>



<pre class="wp-block-code"><code class="">plt.show(librosa.display.waveplot(data))</code></pre>



<div class="wp-block-image"><figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="406" height="269" src="https://blog.ovhcloud.com/wp-content/uploads/2022/01/waveforms.png" alt="" class="wp-image-21601" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/01/waveforms.png 406w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/waveforms-300x199.png 300w" sizes="auto, (max-width: 406px) 100vw, 406px" /></figure></div>



<h4 class="wp-block-heading" id="2-spectrograms">2. Spectrograms</h4>



<p>A <strong>spectrogram</strong> is a visual way of representing the intensity of a signal over time at various frequencies present in a particular waveform.</p>



<pre class="wp-block-code"><code class="">stft = librosa.stft(data)
plt.colorbar(librosa.display.specshow(stft, sr = sr, x_axis = 'time', y_axis = 'hz'))</code></pre>



<div class="wp-block-image"><figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="406" height="269" src="https://blog.ovhcloud.com/wp-content/uploads/2022/01/spectrograms1.png" alt="" class="wp-image-21602" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/01/spectrograms1.png 406w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/spectrograms1-300x199.png 300w" sizes="auto, (max-width: 406px) 100vw, 406px" /></figure></div>



<pre class="wp-block-code"><code class="">stft_db = librosa.amplitude_to_db(abs(stft))
plt.colorbar(librosa.display.specshow(stft_db, sr = sr, x_axis = 'time', y_axis = 'hz'))</code></pre>



<div class="wp-block-image"><figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="406" height="269" src="https://blog.ovhcloud.com/wp-content/uploads/2022/01/spectrograms2.png" alt="" class="wp-image-21603" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/01/spectrograms2.png 406w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/spectrograms2-300x199.png 300w" sizes="auto, (max-width: 406px) 100vw, 406px" /></figure></div>



<h4 class="wp-block-heading" id="3-spectral-rolloff">3. Spectral Rolloff</h4>



<p><strong>Spectral Rolloff</strong> is the frequency below which a specified percentage of the total spectral energy.<br><br><em>librosa.feature.spectral_rolloff</em> calculates the attenuation frequency for each frame of a signal.</p>



<pre class="wp-block-code"><code class="">spectral_rolloff = librosa.feature.spectral_rolloff(data + 0.01, sr = sr)[0]
plt.show(librosa.display.waveplot(data, sr = sr, alpha = 0.4))</code></pre>



<div class="wp-block-image"><figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="406" height="269" src="https://blog.ovhcloud.com/wp-content/uploads/2022/01/spectral_rolloff.png" alt="" class="wp-image-21604" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/01/spectral_rolloff.png 406w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/spectral_rolloff-300x199.png 300w" sizes="auto, (max-width: 406px) 100vw, 406px" /></figure></div>



<h4 class="wp-block-heading" id="4-chroma-feature">4. Chroma Feature</h4>



<p>This tool is perfect for analyzing musical features whose pitches can be meaningfully categorized and whose tuning is close to the equal temperament scale.</p>



<pre class="wp-block-code"><code class="">chroma = librosa.feature.chroma_stft(data, sr = sr)
lplt.specshow(chroma, sr = sr, x_axis = "time" ,y_axis = "chroma", cmap = "coolwarm")
plt.colorbar()
plt.title("Chroma Features")
plt.show()</code></pre>



<div class="wp-block-image"><figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="406" height="280" src="https://blog.ovhcloud.com/wp-content/uploads/2022/01/chroma_feature.png" alt="" class="wp-image-21605" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/01/chroma_feature.png 406w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/chroma_feature-300x207.png 300w" sizes="auto, (max-width: 406px) 100vw, 406px" /></figure></div>



<h4 class="wp-block-heading" id="5-zero-crossing-rate">5. Zero Crossing Rate</h4>



<p>A <strong>zero crossing</strong> occurs if successive samples have different algebraic signs.</p>



<ul class="wp-block-list"><li>The rate at which zero crossings occur is a simple measure of the frequency content of a signal.</li><li>The number of zero-crossings measures the number of times in a time interval that the amplitude of speech signals passes through a zero value.</li></ul>



<pre class="wp-block-code"><code class="">start = 1000
end = 1200
plt.plot(data[start:end])
plt.grid()</code></pre>



<div class="wp-block-image"><figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="406" height="255" src="https://blog.ovhcloud.com/wp-content/uploads/2022/01/zero_crossing_rate.png" alt="" class="wp-image-21606" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/01/zero_crossing_rate.png 406w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/zero_crossing_rate-300x188.png 300w" sizes="auto, (max-width: 406px) 100vw, 406px" /></figure></div>



<h2 class="wp-block-heading" id="data-preprocessing"><strong>Data preprocessing</strong></h2>



<h4 class="wp-block-heading" id="1-data-transformation">1. Data transformation</h4>



<p>To train your model, preprocessing of data is required. First of all, you have to convert the <em>.wav</em> into a <em>.csv</em> file.</p>



<ul class="wp-block-list"><li>Define columns name:</li></ul>



<pre class="wp-block-code"><code class="">header = "filename length chroma_stft_mean chroma_stft_var rms_mean rms_var spectral_centroid_mean spectral_centroid_var spectral_bandwidth_mean spectral_bandwidth_var rolloff_mean rolloff_var zero_crossing_rate_mean zero_crossing_rate_var harmony_mean harmony_var perceptr_mean perceptr_var tempo mfcc1_mean mfcc1_var mfcc2_mean mfcc2_var mfcc3_mean mfcc3_var mfcc4_mean mfcc4_var label".split()</code></pre>



<ul class="wp-block-list"><li>Create the <em>data.csv</em> file:</li></ul>



<pre class="wp-block-code"><code class="">import csv

file = open('data.csv', 'w', newline = '')
with file:
    writer = csv.writer(file)
    writer.writerow(header)</code></pre>



<ul class="wp-block-list"><li>Define character string of marine mammals (45):</li></ul>



<p>There are 45 different marine animals, or 45 classes.</p>



<pre class="wp-block-code"><code class="">marine_mammals = "AtlanticSpottedDolphin BeardedSeal Beluga_WhiteWhale BlueWhale BottlenoseDolphin Boutu_AmazonRiverDolphin BowheadWhale ClymeneDolphin Commerson'sDolphin CommonDolphin Dall'sPorpoise DuskyDolphin FalseKillerWhale Fin_FinbackWhale FinlessPorpoise Fraser'sDolphin Grampus_Risso'sDolphin GraySeal GrayWhale HarborPorpoise HarbourSeal HarpSeal Heaviside'sDolphin HoodedSeal HumpbackWhale IrawaddyDolphin JuanFernandezFurSeal KillerWhale LeopardSeal Long_FinnedPilotWhale LongBeaked(Pacific)CommonDolphin MelonHeadedWhale MinkeWhale Narwhal NewZealandFurSeal NorthernRightWhale PantropicalSpottedDolphin RibbonSeal RingedSeal RossSeal Rough_ToothedDolphin SeaOtter Short_Finned(Pacific)PilotWhale SouthernRightWhale SpermWhale".split()</code></pre>



<ul class="wp-block-list"><li>Transform each <em>.wav</em> file into a <em>.csv</em> row:</li></ul>



<pre class="wp-block-code"><code class="">for animal in marine_mammals:

  for filename in os.listdir(f"/workspace/data/{animal}/"):

    sound_name = f"/workspace/data/{animal}/{filename}"
    y, sr = librosa.load(sound_name, mono = True, duration = 30)
    chroma_stft = librosa.feature.chroma_stft(y = y, sr = sr)
    rmse = librosa.feature.rms(y = y)
    spec_cent = librosa.feature.spectral_centroid(y = y, sr = sr)
    spec_bw = librosa.feature.spectral_bandwidth(y = y, sr = sr)
    rolloff = librosa.feature.spectral_rolloff(y = y, sr = sr)
    zcr = librosa.feature.zero_crossing_rate(y)
    mfcc = librosa.feature.mfcc(y = y, sr = sr)
    to_append = f'{filename} {np.mean(chroma_stft)} {np.mean(rmse)} {np.mean(spec_cent)} {np.mean(spec_bw)} {np.mean(rolloff)} {np.mean(zcr)}'
    
    for e in mfcc:
        to_append += f' {np.mean(e)}'

    to_append += f' {animal}'
    file = open('data.csv', 'a', newline = '')
    
    with file:
        writer = csv.writer(file)
        writer.writerow(to_append.split())</code></pre>



<ul class="wp-block-list"><li>Display the <em>data.csv</em> file:</li></ul>



<pre class="wp-block-code"><code class="">df = pd.read_csv('data.csv')</code></pre>



<h4 class="wp-block-heading" id="2-features-extraction">2. Features extraction</h4>



<p>In the preprocessing of the data, <em>feature extraction</em> is necessary before running the training. The purpose is to define the <strong>inputs</strong> and <strong>outputs </strong>of the neural network.</p>



<ul class="wp-block-list"><li><strong>OUTPUT</strong> (y): last column which is the <strong><em>label</em></strong>.</li></ul>



<p>You cannot use text directly for training. You will encode these labels with the <strong>LabelEncoder()</strong> function of <em>sklearn.preprocessing</em>.<br><br>Before running a model, you need to convert this type of categorical text data into numerical data that the model can understand.</p>



<pre class="wp-block-code"><code class="">from sklearn.preprocessing import LabelEncoder

class_list = df.iloc[:,-1]
converter = LabelEncoder()
y = converter.fit_transform(class_list)
print("y: ", y)</code></pre>



<p class="has-ast-global-color-3-color has-text-color"><code>y : [ 0 0 0 ... 44 44 44]</code></p>



<ul class="wp-block-list"><li><strong>INPUTS</strong> (X): all other columns are input parameters to the neural network.</li></ul>



<p>Remove the first column which does not provide any information for the training (the filename) and the last one which corresponds to the output.</p>



<pre class="wp-block-code"><code class="">from sklearn.preprocessing import StandardScaler

fit = StandardScaler()
X = fit.fit_transform(np.array(df.iloc[:, 1:26], dtype=float))</code></pre>



<h4 class="wp-block-heading" id="3-split-dataset-for-training">3. Split dataset for training</h4>



<pre class="wp-block-code"><code class="">from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test <strong>=</strong> train_test_split(X, y, test_size <strong>=</strong> 0.2)</code></pre>



<h2 class="wp-block-heading" id="building-the-model"><strong>Building the model</strong></h2>



<p id="block-08360fbe-4253-417f-ab02-9a4ee8b0d753">The first step is to build the model and display the summary.<br><br>For the CNN model, all hidden layers use a <strong>ReLU</strong> activation function, the output layer a <strong>Softmax</strong> function and a <strong>Dropout </strong>is used to avoid overfitting.</p>



<pre class="wp-block-code"><code class="">import keras
import tensorflow as tf
from tensorflow.keras.models import Sequential

model <strong>=</strong> tf<strong>.</strong>keras<strong>.</strong>models<strong>.</strong>Sequential([
    tf<strong>.</strong>keras<strong>.</strong>layers<strong>.</strong>Dense(512, activation <strong>=</strong> 'relu', input_shape <strong>=</strong> (X_train<strong>.</strong>shape[1],)),
    tf<strong>.</strong>keras<strong>.</strong>layers<strong>.</strong>Dropout(0.2),
    
    tf<strong>.</strong>keras<strong>.</strong>layers<strong>.</strong>Dense(256, activation <strong>=</strong> 'relu'),
    keras<strong>.</strong>layers<strong>.</strong>Dropout(0.2),
    
    tf<strong>.</strong>keras<strong>.</strong>layers<strong>.</strong>Dense(128, activation <strong>=</strong> 'relu'),
    tf<strong>.</strong>keras<strong>.</strong>layers<strong>.</strong>Dropout(0.2),
    
    tf<strong>.</strong>keras<strong>.</strong>layers<strong>.</strong>Dense(64, activation <strong>=</strong> 'relu'),
    tf<strong>.</strong>keras<strong>.</strong>layers<strong>.</strong>Dropout(0.2),
    
    tf<strong>.</strong>keras<strong>.</strong>layers<strong>.</strong>Dense(45, activation <strong>=</strong> 'softmax'),
])

print(model.summary())</code></pre>



<div class="wp-block-image"><figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="560" height="427" src="https://blog.ovhcloud.com/wp-content/uploads/2022/01/model_summary.png" alt="" class="wp-image-21598" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/01/model_summary.png 560w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/model_summary-300x229.png 300w" sizes="auto, (max-width: 560px) 100vw, 560px" /></figure></div>



<h2 class="wp-block-heading" id="model-training-and-evaluation"><strong>Model training and evaluation</strong></h2>



<p><strong>Adam</strong> optimizer is used to train the model over <em>100 epochs</em>. This choice was made because it allows us to obtain better results.<br><br>The loss is calculated with the <strong>sparse_categorical_crossentropy</strong> function.</p>



<pre class="wp-block-code"><code class=""><strong>def</strong> trainModel(model,epochs, optimizer):
    batch_size <strong>=</strong> 128
    model<strong>.</strong>compile(optimizer <strong>=</strong> optimizer, loss <strong>=</strong> 'sparse_categorical_crossentropy', metrics <strong>=</strong> 'accuracy')
    <strong>return</strong> model<strong>.</strong>fit(X_train, y_train, validation_data <strong>=</strong> (X_test, y_test), epochs <strong>=</strong> epochs, batch_size <strong>=</strong> batch_size)</code></pre>



<p>Now, launch the training!</p>



<pre class="wp-block-code"><code class="">model_history <strong>=</strong> trainModel(model <strong>=</strong> model, epochs <strong>=</strong> 100, optimizer <strong>=</strong> 'adam')</code></pre>



<ul class="wp-block-list"><li>Display <strong>loss</strong> curves</li></ul>



<pre class="wp-block-code"><code class="">loss_train_curve <strong>=</strong> model_history<strong>.</strong>history["loss"]
loss_val_curve <strong>=</strong> model_history<strong>.</strong>history["val_loss"]
plt<strong>.</strong>plot(loss_train_curve, label <strong>=</strong> "Train")
plt<strong>.</strong>plot(loss_val_curve, label <strong>=</strong> "Validation")
plt<strong>.</strong>legend(loc <strong>=</strong> 'upper right')
plt<strong>.</strong>title("Loss")
plt<strong>.</strong>show()</code></pre>



<div class="wp-block-image"><figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="390" height="269" src="https://blog.ovhcloud.com/wp-content/uploads/2022/02/loss.png" alt="" class="wp-image-22523" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/02/loss.png 390w, https://blog.ovhcloud.com/wp-content/uploads/2022/02/loss-300x207.png 300w" sizes="auto, (max-width: 390px) 100vw, 390px" /></figure></div>



<ul class="wp-block-list"><li>Display <strong>accuracy</strong> curves</li></ul>



<pre class="wp-block-code"><code class="">acc_train_curve <strong>=</strong> model_history<strong>.</strong>history["accuracy"]
acc_val_curve <strong>=</strong> model_history<strong>.</strong>history["val_accuracy"]
plt<strong>.</strong>plot(acc_train_curve, label <strong>=</strong> "Train")
plt<strong>.</strong>plot(acc_val_curve, label <strong>=</strong> "Validation")
plt<strong>.</strong>legend(loc <strong>=</strong> 'lower right')
plt<strong>.</strong>title("Accuracy")
plt<strong>.</strong>show()</code></pre>



<div class="wp-block-image"><figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="390" height="269" src="https://blog.ovhcloud.com/wp-content/uploads/2022/02/accuracy.png" alt="" class="wp-image-22524" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/02/accuracy.png 390w, https://blog.ovhcloud.com/wp-content/uploads/2022/02/accuracy-300x207.png 300w" sizes="auto, (max-width: 390px) 100vw, 390px" /></figure></div>



<pre class="wp-block-code"><code class="">test_loss, test_acc <strong>=</strong> model<strong>.</strong>evaluate(X_test, y_test, batch_size <strong>=</strong> 128)
print("The test loss is: ", test_loss)
print("The best accuracy is: ", test_acc<strong>*</strong>100)</code></pre>



<p class="has-ast-global-color-3-color has-text-color"><code>20/20 [==============================] - 0s 3ms/step - loss: 0.2854 - accuracy: 0.9371 </code><br><code>The test loss is: </code>0.24700121581554413<br><code>The best accuracy is: </code>93.71269345283508</p>



<h2 class="wp-block-heading"><strong>Save the model for future inference</strong></h2>



<h4 class="wp-block-heading">1. Save and store the model in an OVHcloud Object Container</h4>



<pre class="wp-block-code"><code class="">model.save('/workspace/model-marine-mammal-sounds/saved_model/my_model')</code></pre>



<p>You can check your model directory.</p>



<pre class="wp-block-code"><code class="">%ls /workspace/model-marine-mammal-sounds/saved_model</code></pre>



<p><strong><em>Saved_model</em></strong> contains an assets folder, saved_model.pb, and variables folder.</p>



<pre class="wp-block-code"><code class="">%ls /workspace/model-marine-mammal-sounds/saved_model/my_model</code></pre>



<p>Then, you are able to load this model.</p>



<pre class="wp-block-code"><code class="">model = tf.keras.models.load_model('/workspace/model-marine-mammal-sounds/saved_model/my_model')</code></pre>



<p><strong>Do you want to use this model in a Streamlit app?</strong> Refer to our <a href="https://github.com/ovh/ai-training-examples/tree/main/jobs/streamlit/marine_sounds_classification_app" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">GitHub repository</a>.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="509" src="https://blog.ovhcloud.com/wp-content/uploads/2022/03/Capture-décran-2022-02-28-à-21.01.00-1024x509.png" alt="" class="wp-image-22553" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/03/Capture-décran-2022-02-28-à-21.01.00-1024x509.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/03/Capture-décran-2022-02-28-à-21.01.00-300x149.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/03/Capture-décran-2022-02-28-à-21.01.00-768x382.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/03/Capture-décran-2022-02-28-à-21.01.00-1536x763.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2022/03/Capture-décran-2022-02-28-à-21.01.00.png 1906w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption>Streamlit app overview</figcaption></figure></div>



<h3 class="wp-block-heading" id="conclusion">Conclusion</h3>



<p>The accuracy of the model can be improved by increasing the number of epochs, but after a certain period we reach a threshold, so the value should be determined accordingly.<br><br>The accuracy obtained for the test set is <strong>93.71 %</strong>, which is a satisfactory result.</p>



<h4 class="wp-block-heading" id="want-to-find-out-more">Want to find out more?</h4>



<p>If you want to access the notebook, refer to the <a href="https://github.com/ovh/ai-training-examples/blob/main/notebooks/tensorflow/tuto/notebook-marine-sound-classification.ipynb" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">GitHub repository</a>.<br><br>To launch and test this notebook with <strong>AI Notebooks</strong>, please refer to our <a href="https://docs.ovh.com/gb/en/publiccloud/ai/notebooks/" data-wpel-link="exclude">documentation</a>.</p>



<p>You can also look at this presentation done at a <a href="https://startup.ovhcloud.com/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OVHcloud Startup Program</a> event at <a href="https://stationf.co/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Station F</a>:</p>


<div class="lazyblock-you-tube-gdpr-compliant-Z2oJhGR wp-block-lazyblock-you-tube-gdpr-compliant"><script type="module">
  import 'https://blog.ovhcloud.com/wp-content/assets/ovhcloud-gdrp-compliant-embedding-widgets/src/ovhcloud-gdrp-compliant-youtube.js';
</script>
      
      <ovhcloud-gdrp-compliant-youtube
          video="EN7XKmPpi78"
          debug></ovhcloud-gdrp-compliant-youtube>

</div>


<p class="has-text-align-center"><em><strong>I hope you have enjoyed this article. Try for yourself!</strong></em></p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0839-1024x386.png" alt="AI Notebooks: analyze and classify sounds with AI" class="wp-image-22620" width="768" height="290" srcset="https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0839-1024x386.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0839-300x113.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0839-768x290.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2022/01/IMG_0839.png 1219w" sizes="auto, (max-width: 768px) 100vw, 768px" /></figure></div>



<h3 class="wp-block-heading" id="references">References</h3>



<p><a href="https://blog.clairvoyantsoft.com/music-genre-classification-using-cnn-ef9461553726" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">https://blog.clairvoyantsoft.com/music-genre-classification-using-cnn-ef9461553726</a></p>



<p><a href="https://towardsdatascience.com/music-genre-classification-with-python-c714d032f0d8" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">https://towardsdatascience.com/music-genre-classification-with-python-c714d032f0d8</a></p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fai-notebooks-analyze-and-classify-sounds-with-ai%2F&amp;action_name=AI%20Notebooks%3A%20analyze%20and%20classify%20sounds%20with%20AI&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Managing GPU pools efficiently in AI pipelines</title>
		<link>https://blog.ovhcloud.com/managing-gpu-pools-efficiently-in-ai-pipelines/</link>
		
		<dc:creator><![CDATA[Bastien Verdebout]]></dc:creator>
		<pubDate>Tue, 22 Dec 2020 16:18:36 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Deep learning]]></category>
		<category><![CDATA[GPU]]></category>
		<guid isPermaLink="false">https://www.ovh.com/blog/?p=20146</guid>

					<description><![CDATA[A growing number of companies are using artificial intelligence on a daily basis — and dealing with the back-end architecture can reveal some unexpected challenges. Whether the machine learning workload involves fraud detection, forecasts, chatbots, computer vision or NLP, it will need frequent access to computing power for training and fine-tuning. GPUs have proven to [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fmanaging-gpu-pools-efficiently-in-ai-pipelines%2F&amp;action_name=Managing%20GPU%20pools%20efficiently%20in%20AI%20pipelines&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p>A growing number of companies are using artificial intelligence on a daily basis — and dealing with the back-end architecture can reveal some unexpected challenges.</p>



<p>Whether the machine learning workload involves fraud detection, forecasts, chatbots, computer vision or NLP, it will need frequent access to computing power for training and fine-tuning.</p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/12/IMG_0420-1024x537.png" alt="Managing GPU pools efficiently in AI pipelines" class="wp-image-20449" width="768" height="403" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/12/IMG_0420-1024x537.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/12/IMG_0420-300x157.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/12/IMG_0420-768x403.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/12/IMG_0420.png 1200w" sizes="auto, (max-width: 768px) 100vw, 768px" /></figure></div>



<p>GPUs have proven to be a game-changer for deep learning. If you&#8217;re wondering why, you can find out more by reading our blog post about <a href="https://www.ovh.com/blog/understanding-the-anatomy-of-gpus-using-pokemon/" target="_blank" rel="noreferrer noopener" data-wpel-link="exclude">GPU architectures</a>. A few years ago, manufacturers such as NVIDIA began to develop specific ranges for cloud datacentres. You may be familiar with the NVIDIA TITAN RTX for gaming — and in our datacentres, we use NVIDIA A100, V100, Tesla and DGX GPUs for enterprise-grade workloads.</p>



<p>In short, GPUs are perfect for tasks that can be solved or improved by AI, and require a lot of processing power.<br>They offer optimal compute, and are widely used in deep learning. A growing number of companies are using AI, and GPUs seem to be the best choice for them.</p>



<p>However, when dealing with pools of GPUs, the back-end architecture can be really tricky.  </p>



<p><strong>So how do we use them to benefit a company with minimal hassle and headaches?</strong> <strong>On-premise or in the cloud?</strong></p>



<p>These are good questions that I&#8217;m keen to discuss here, from both a business and technical perspective.</p>



<p></p>



<h3 class="wp-block-heading">Dealing with GPU pools&#8230; The struggle is real.</h3>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/12/IMG_0419.png" alt="One does not simply set up GPUs for Deep Learning" class="wp-image-20443" width="603" height="430" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/12/IMG_0419.png 804w, https://blog.ovhcloud.com/wp-content/uploads/2020/12/IMG_0419-300x214.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/12/IMG_0419-768x547.png 768w" sizes="auto, (max-width: 603px) 100vw, 603px" /></figure></div>



<p>For anyone who has had to deploy and manage more than 1 GPU for a data-AI team, I&#8217;m sure this topic will bring tears to your eyes, and make your voice tremble. Yes, it is indeed complicated.</p>



<p>I can talk about it on our blog, because our team of data scientists here at OVHcloud had to deal with the exact same annoying issues. Thankfully, we solved all of them — stay tuned!</p>



<p><strong>GPU sharing is hard</strong>. Even if one GPU is better than none, in most cases it will not be sufficient, and a GPU pool will be far more effective. From a tech perspective, dealing with a GPU pool — or worse, allowing your team to use this pool simultaneously — is very tricky. The market is really mature for CPU sharing (via hypervisors), but by design, a GPU has to be attached to a VM or container. This means that quite often, it needs to be &#8220;booked&#8221; for a specific workload. To get around this issue, you&#8217;ll need to provide a scale-out with orchestration, so that you can dynamically assign GPUs to jobs over time. Whenever you tell yourself &#8220;<em>I want to launch this task with 4 GPUs for 2 days</em>&#8220;, you should simply be able to ask, and the back-end should work its magic for you.</p>



<p><strong>Setting up and maintaining an architecture is time-consuming.&nbsp; </strong>So you&#8217;ve deployed servers with GPU, updated and upgraded your Linux distros, installed your main AI packages, CUDA drivers, and now you want to move on to something else. But wait — a new TensorFlow version has been released, and you also have a security patch to apply. What you initially thought to be a single task is now taking up 4-5 hours of your time per week.</p>



<p><strong>Diagnosing is quite complex</strong>. If, for whatever reason, something isn&#8217;t working as it should — good luck. You barely know who is doing what, and you can&#8217;t track jobs or usage unless you connect to the platform yourself and set up monitoring tools. Remember to grab your snorkel set, because you&#8217;ll need to deep-dive.</p>



<p><strong>Bottlenecks are almost inevitable</strong>. Imagine setting up a pool of GPUs based on your current AI project workloads. Your infrastructure is not really designed to scale automatically, and as soon as the AI workloads increase, your jobs have to be scheduled while the GPU fleet is being updated constantly. A backlog starts to accumulate, and a bottleneck is created as a result.</p>



<p><strong>Providing tools for teams to work collaboratively on code is mandatory.</strong> Usually, your team will need to share their data experimentations — and the best way to do this for now is with <strong>JupyterLab Notebooks</strong> (we love them) or <strong>VSCode. </strong>But you&#8217;ll need to keep in mind that this is more software to set up and maintain.</p>



<p><strong>Securing data access is essential. </strong>The required data must be easily accessible, and sensitive data must be covered by security guarantees.</p>



<p><strong>Cost control is difficult. </strong>Even worse, for one reason or another (who said holidays?), you might need to stop almost all your GPU servers for a week or two — but to do this, you would need to wait for any ongoing jobs to be completed.</p>



<p>All jokes aside, while we may be passionate about tech and hardware, we have other things to do. Data engineers cannot achieve their full potential and talent in maintenance-based or billing-based tasks.</p>



<h3 class="wp-block-heading">Kubeflow to the rescue?</h3>



<p>Kubernetes 1.0 was launched 5 years ago. Whatever your opinion is on it, in five years they have become the de facto standard for container orchestration in enterprise environments.</p>



<p>Data scientists use containers for portability, agility, and community — but Kubernetes was made to orchestrate services, not data experimentations.</p>



<p>Kubernetes alone is not tailored for a data team. It presents too much complexity, with the sole benefit of solving the orchestration issue.</p>



<p><strong>We need something that not only improves orchestration, but also code contribution, tests and deployments.</strong></p>



<p>Luckily, <a href="https://www.kubeflow.org/" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external"><strong>Kubeflow</strong> </a>appeared 2 years ago, and was open-sourced by Google at the time. Its main promise is to simplify complex ML workflows, for example <code>data processing =&gt; data labeling =&gt; training =&gt; serving</code>, and complete it with notebooks.</p>



<p>I do really love the promise, and the way they simplify ML pipelines. Kubeflow can be run over K8s clusters on-premise or in the cloud, and can also be set up on a single VM or even on a workstation (Linux/Mac/Windows).</p>



<p>Students can easily have their own ML environment. However, for the most advanced uses, a workstation or a single VM might be out of the question, and you would need a K8s cluster with Kubeflow installed on top of that. You&#8217;ll have a nice UI for starting notebooks and creating ML pipelines (processing/training/inference), <strong>but still zero GPU support by default</strong>.</p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.kubeflow.org/docs/images/central-ui.png" alt="" width="480" height="319"/><figcaption>Central Dashboard / Image property of Kubeflow.org</figcaption></figure></div>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.kubeflow.org/docs/images/pipelines-xgboost-graph.png" alt="" width="480" height="270"/><figcaption>XGBoost pipeline / Image property of Kubeflow.org</figcaption></figure></div>



<p>Your GPU support will depend on your setup. It may differ if you host it on GCP, AWS, Azure, OVHcloud, on-premise, MicroK8s, or anything else.</p>



<p>For example, on AWS EKS, you need to declare GPU pools in your Kubeflow manifest:</p>



<pre class="wp-block-code"><code lang="yaml" class="language-yaml"># Official doc: https://www.kubeflow.org/docs/aws/customizing-aws/

# NodeGroup holds all configuration attributes that are specific to a node group
# You can have several node groups in your cluster.
nodeGroups:
  - name: eks-gpu
    instanceType: p2.xlarge
    availabilityZones: ["us-west-2b"]
    desiredCapacity: 2
    minSize: 0
    maxSize: 2
    volumeSize: 30
    ssh:
      allow: true
      publicKeyPath: '~/.ssh/id_rsa.pub'</code></pre>



<p>On GCP GKE, you will need to run this command to export a GPU pool:</p>



<pre class="wp-block-code"><code lang="bash" class="language-bash"># Official doc: https://www.kubeflow.org/docs/gke/customizing-gke/#common-customizations
 
export GPU_POOL_NAME=&lt;name of the new GPU pool&gt;
 
gcloud container node-pools create ${GPU_POOL_NAME} \
--accelerator type=nvidia-tesla-k80,count=1 \
--zone us-central1-a --cluster ${KF_NAME} \
--num-nodes=1 --machine-type=n1-standard-4 --min-nodes=0 --max-nodes=5 --enable-autoscaling</code></pre>



<p>You will then need to install NVIDIA drivers on all the GPU nodes. NVIDIA maintains a <em>deamonset</em>, which enables you to install them easily:</p>



<pre class="wp-block-code"><code lang="bash" class="language-bash"># Official doc: https://www.kubeflow.org/docs/gke/customizing-gke/#common-customizations
 
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded.yaml</code></pre>



<p>Once you have done this, you will be able to create GPU pools (don&#8217;t forget to check your quotas before — with a basic account, you are restricted by default, and you will need to contact their support).</p>



<h3 class="wp-block-heading">Okay, but do things get easier from here?</h3>



<p>As we say in France, especially in Normandy, yes but no.</p>



<p>Yes, Kubeflow does resolve some of the challenges we&#8217;ve mentioned — but some of the biggest challenges are yet to come, and they will take up a lot of your daily routine. Many manual operations will still require you to dig into specific K8s documentation, or guides published by cloud providers.</p>



<p>Below is a summary of <strong>Kubeflow vs GPU pool challenges</strong>.</p>



<figure class="wp-block-table is-style-stripes"><table><thead><tr><th>Challenges</th><th>Status</th></tr></thead><tbody><tr><td><strong>GPU pool with sharing option</strong></td><td><strong><span class="has-inline-color has-vivid-green-cyan-color">YES</span></strong> but will require manual configuration (declaration in manifest, driver installation, etc.).</td></tr><tr><td><strong>Collaborative tools</strong></td><td><strong><span class="has-inline-color has-vivid-green-cyan-color">YES</span></strong> definitely. Notebooks are provided via Kubeflow.</td></tr><tr><td><strong>Infrastructure maintenance</strong></td><td>Definitely <strong><span class="has-inline-color has-vivid-red-color">NO</span></strong>.<br>Now you have a Kubeflow cluster to maintain and operate.</td></tr><tr><td><strong>Infrastructure diagnosis</strong></td><td><strong><span class="has-inline-color has-vivid-green-cyan-color">YES</span> BUT <span class="has-inline-color has-vivid-red-color">NO</span></strong>. Activity Dashboard and reporting tools based on SpartaKus, Logs, etc.<br>But provided to the data engineers, not data scientists themselves. They may come back to you.</td></tr><tr><td><strong>Infrastructure agility/flexibility</strong></td><td><strong>TRICKY</strong>. It will depend on your hosting implementation. If it&#8217;s on-premise, definitely no. You&#8217;ll need to buy hardware components (an NVIDIA V100 costs approximately $10K without chassis, electricity usage, etc.)<br>Some cloud providers can provide &#8220;auto-scaling GPU pools&#8221; from 0 to n, which is nice.</td></tr><tr><td><strong>Secured data access</strong></td><td><strong>TRICKY</strong>. It will depend on how you locate your data, and the technology used. It&#8217;s not a ready-to-use solution.</td></tr><tr><td><strong>Cost control</strong></td><td><strong>TRICKY.</strong> Again, it will depend on your hosting implementation. It&#8217;s not easy, since you need to take care of the infrastructure. Some hidden costs can appear, too (network traffic, monitoring, etc.).</td></tr></tbody></table><figcaption>Kubeflow vs Challenges</figcaption></figure>



<h3 class="wp-block-heading">Forget infrastructure, welcome to GPU platforms made for AI</h3>



<p>You can now find various third-party solutions on the market that go one step further. Instead of dealing with the architecture and the Kubernetes cluster, what if you simply focused on your machine learning or deep learning code?</p>



<p>There are well-known solutions such as <strong>Paperspace Gradient</strong> — or smaller ones, like <strong>Run:AI</strong> — and we&#8217;re pleased to offer another option on the market: <strong>AI Training</strong>. We&#8217;re using this post as a self-promotion opportunity (it&#8217;s our blog after all), but the logic remains the same for competitors.</p>



<p>What are the concepts behind it?</p>



<h4 class="wp-block-heading" id="id-[Blogpost]ManagingGPUspoolsefficentlyinAIpipelines-Noinfrastructuretomanage">No infrastructure to manage</h4>



<p>You don&#8217;t need to set up and manage a K8s cluster, or a Kubeflow cluster.</p>



<p>You don&#8217;t need to declare GPU pools in your manifest.</p>



<p>You don&#8217;t need to install NVIDIA drivers on the nodes.</p>



<p>With GPU Platforms like OVHcloud AI Training, your neural network training is as simple as this:</p>



<pre class="wp-block-code"><code lang="bash" class="language-bash"><code># Upload data directly to Object Storage</code>
<code>ovhai</code> <code>data upload myBucket@GRA train.zip</code>
&nbsp;
<code># Launch a job with 4 GPUs on a Pytorch environment, with Object Storage bucket directly linked to it</code>
<code>ovhai</code> <code>job run \</code>
 <code>    --gpu 4 \</code>
<code>    --volume myBucket@GRA:/data:RW \</code>
<code>    ovhcom/ai-training-pytorch:1.6.0</code></code></pre>



<p>This line of code will provide you with a JupyterLab Notebook directly plugged to a pool of 4x NVIDIA GPUs, with the Pytorch environment installed. This is all you need to do, and the entire process takes around 15 seconds. </p>



<h4 class="wp-block-heading" id="id-[Blogpost]ManagingGPUspoolsefficentlyinAIpipelines-Parralelizationforthewin">Parallel computing — a great advantage</h4>



<p>One of the most significant benefits is that since the infrastructure is not on your premises, you can count on the provider to scale it.</p>



<p>So you can run dozens of jobs simultaneously. A classic use case is to fine-tune all of your models once a week or once a month, with 1 line of bash script:</p>



<pre class="wp-block-code"><code lang="bash" class="language-bash"><code># Start a basic loop</code>
<code>for</code> <code>model in</code> <code>my_models_listing</code>
<code>do</code>
&nbsp;
<code># Launch a job with 4 GPUs on a Pytorch environment, with Object Storage bucket directly linked to it</code>
<code>echo</code> <code>"starting training of $model"</code>
<code>ovhai job run \</code>
<code>--gpu 3 \</code>
<code>--volume myBucket@GRA:/data:RW \</code>
<code>my_docker_repository/$model</code>
&nbsp;
<code>done</code></code></pre>



<p>If you have 10 models, it will launch 10x 3 GPUs in few seconds, and it will stop them once the job is complete, from sequential to parallel work.</p>



<h5 class="wp-block-heading">Collaboration out of the box</h5>



<p>All of these platforms natively include notebooks, directly plugged to GPU power. With OVHcloud AI Training, we also provide pre-installed environments for TensorFlow, Hugging Face, Pytorch, MXnet, Fast.AI — and others will be added to this list soon.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="571" src="https://www.ovh.com/blog/wp-content/uploads/2020/12/nbook-1024x571.png" alt="" class="wp-image-20259" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/12/nbook-1024x571.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/12/nbook-300x167.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/12/nbook-768x429.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/12/nbook-1536x857.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2020/12/nbook.png 1672w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption>JupyterLab Notebook</figcaption></figure></div>



<h4 class="wp-block-heading" id="id-[Blogpost]ManagingGPUspoolsefficentlyinAIpipelines-Datasetaccessmadeeasy">Data set access made easy</h4>



<p>I haven&#8217;t tested all the GPU platforms on the market, but usually they provide some useful ways to access data. We aim to provide the best work environment for data science teams, so we are also offering an easy way for them to access their data — by enabling them to attach object storage containers during the job launch.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="271" src="https://www.ovh.com/blog/wp-content/uploads/2020/12/container-1024x271.png" alt="" class="wp-image-20260" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/12/container-1024x271.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/12/container-300x79.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/12/container-768x204.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/12/container.png 1536w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption>OVHcloud AI Training : attach Object Storage containers to notebooks</figcaption></figure></div>



<h4 class="wp-block-heading" id="id-[Blogpost]ManagingGPUspoolsefficentlyinAIpipelines-Lastbutnotleast...CostControlsisareality">Cost control for users</h4>



<p>Third-party GPU platforms quite often provide clear pricing. This is the case for Paperspace, but not for Run:AI (I was unable to find their price list). This is also the case for OVHcloud AI Training.</p>



<ul class="wp-block-list"><li><strong>GPU power</strong>: You pay £1.58/hour/NVIDIA V100s GPU</li><li><strong>Storage</strong>: Standard price of OVHcloud Object Storage (compliant with AWS S3 protocol)</li><li><strong>Notebooks</strong>: Included</li><li><strong>Observability tools</strong>: Logs and metrics included</li><li><strong>Subscription</strong>: No, it&#8217;s pay-as-you-go, per minute</li></ul>



<p>And there we go — cost and budget estimation is now simple. Try it out for yourself!</p>



<h4 class="wp-block-heading" id="id-[Blogpost]ManagingGPUspoolsefficentlyinAIpipelines-Missioncomplete?">Mission complete?</h4>



<p>Below is a summary addressing the major challenges to resolve when dealing with GPU pool sharing. It&#8217;s a big yes!</p>



<figure class="wp-block-table is-style-stripes"><table><thead><tr><th>Challenges</th><th>Status</th></tr></thead><tbody><tr><td><strong>GPU pool with sharing option</strong></td><td><strong><span class="has-inline-color has-vivid-green-cyan-color">YES</span></strong> definitely. In fact, even many GPU pools in parallel, if you want to.</td></tr><tr><td><strong>Collaborative tools</strong></td><td><strong><span class="has-inline-color has-vivid-green-cyan-color">YES</span></strong> definitely. Notebooks are always provided, as far as I know.</td></tr><tr><td><strong>Infrastructure maintenance</strong></td><td><span class="has-inline-color has-vivid-green-cyan-color"><strong>YES</strong> </span>definitely. Infrastructure is managed by the provider. You will need need to connect via SSH to debug.</td></tr><tr><td><strong>Infrastructure diagnosis</strong></td><td><strong><span class="has-inline-color has-vivid-green-cyan-color">YES</span>. </strong>Logs and metrics provided on our side, at least.</td></tr><tr><td><strong>Infrastructure agility/flexibility</strong></td><td><strong><span class="has-inline-color has-vivid-green-cyan-color">YES</span> </strong>definitely. Scale up or down one or more GPU pools, use them for 10 minutes or a full month, etc.</td></tr><tr><td><strong>Secured data access</strong></td><td>Depends on the solution you choose, but usually it&#8217;s a <strong><span class="has-inline-color has-vivid-green-cyan-color">YES</span></strong> via simplified object storage access.</td></tr><tr><td><strong>Cost control</strong></td><td>Depends on the solution you choose, but usually is a <strong><span class="has-inline-color has-vivid-green-cyan-color">YES</span></strong> with packaged prices and zero investments to make (zero CAPEX).</td></tr></tbody></table></figure>



<p> </p>



<h3 class="wp-block-heading" id="id-[Blogpost]ManagingGPUspoolsefficentlyinAIpipelines-Conclusion">Conclusion</h3>



<p>If we go back to the main challenges faced by a company that requires shared GPU pools, we can say without a doubt that <strong>Kubernetes is a market-standard for AI pipeline orchestration</strong>.</p>



<p>An <strong>on-premise K8s cluster with Kubeflow</strong> is really interesting if the data cannot be processed into the cloud (e.g. banking, hospitals, any kind of sensitive data) or if your team has flat (and lower-level) GPU requirements. You can invest in a few GPUs and manage the fleet yourself with software on top. But if you need more power, <strong>very soon the cloud will become the only viable option</strong>. Hardware investments, hardware obsolescence, electricity usage and scaling will give you some headaches.</p>



<p>Then, depending on the situation, <strong>Kubeflow in the cloud might be really useful</strong>. It delivers powerful pipeline features, notebooks, and enables users to manage virtual GPU pools. </p>



<p>But if you want to avoid infrastructure tasks, control your spending, and focus on your added value and code, <strong>you might consider GPU platforms as your first choice</strong>.</p>



<p>However, there is no such thing as magic — and without knowing exactly what you want, even the best platform won&#8217;t be able to meet your needs. Yet some start-ups, not listed here, can offer a combination of platforms and expertise to help you in your project, infrastructures and use cases. </p>



<p>Thank you for reading, and don&#8217;t forget that we also offer inference at scale with ML Serving. This is the next logical step after training.</p>



<h5 class="wp-block-heading">Want to find out more?</h5>



<ul class="wp-block-list"><li>Solution page: <a href="https://www.ovhcloud.com/en-gb/public-cloud/ai-training/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">https://www.ovhcloud.com/en-gb/public-cloud/ai-training/</a></li><li>Public documentation: <a href="https://docs.ovh.com/gb/en/ai-training/" data-wpel-link="exclude">https://docs.ovh.com/gb/en/ai-training/</a></li><li>Community: <a href="http://community.ovh.com/en/" data-wpel-link="exclude">community.ovh.com/en/</a></li></ul>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fmanaging-gpu-pools-efficiently-in-ai-pipelines%2F&amp;action_name=Managing%20GPU%20pools%20efficiently%20in%20AI%20pipelines&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>A journey through the wondrous land of Machine Learning or &#8220;Can I really buy a palace in Paris for 100,000€?&#8221; (Part 2)</title>
		<link>https://blog.ovhcloud.com/a-journey-through-the-wondrous-land-of-machine-learning-or-can-i-really-buy-a-palace-in-paris-for-100000e-part-2/</link>
		
		<dc:creator><![CDATA[Guillaume Ruty]]></dc:creator>
		<pubDate>Thu, 03 Sep 2020 14:52:09 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[A journey into the wondrous land of Machine Learning]]></category>
		<category><![CDATA[Deep learning]]></category>
		<category><![CDATA[Machine learning]]></category>
		<guid isPermaLink="false">https://www.ovh.com/blog/?p=19078</guid>

					<description><![CDATA[Spoiler alert, no you can&#8217;t. A few months ago, I explained how to use Dataiku &#8211; a well-known interactive AI studio &#8211; and how to use data, made available by the French government, to build a Machine Learning model predicting the market value of real estate. Sadly, it failed miserably: when I tried it on [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fa-journey-through-the-wondrous-land-of-machine-learning-or-can-i-really-buy-a-palace-in-paris-for-100000e-part-2%2F&amp;action_name=A%20journey%20through%20the%20wondrous%20land%20of%20Machine%20Learning%20or%20%26%238220%3BCan%20I%20really%20buy%20a%20palace%20in%20Paris%20for%20100%2C000%E2%82%AC%3F%26%238221%3B%20%28Part%202%29&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p>Spoiler alert, no you can&#8217;t.</p>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/08/12B8273B-DCF4-4705-9E08-5FE05760102F-1024x537.png" alt="A journey through the wondrous land of Machine Learning or &quot;Can I really buy a palace in Paris for 100,000€?&quot; (Part 2)" class="wp-image-19147" width="768" height="403" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/08/12B8273B-DCF4-4705-9E08-5FE05760102F-1024x537.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/08/12B8273B-DCF4-4705-9E08-5FE05760102F-300x157.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/08/12B8273B-DCF4-4705-9E08-5FE05760102F-768x403.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/08/12B8273B-DCF4-4705-9E08-5FE05760102F.png 1200w" sizes="auto, (max-width: 768px) 100vw, 768px" /></figure>



<p><a href="https://www.ovh.com/blog/a-journey-into-the-wondrous-land-of-machine-learning-or-did-i-get-ripped-off-part-1/" target="_blank" rel="noreferrer noopener" data-wpel-link="exclude">A few months ago</a>, I explained how to use Dataiku &#8211; a well-known interactive AI studio &#8211; and how to use data, made available by the French government, to build a Machine Learning model predicting the market value of real estate. Sadly, it failed miserably: when I tried it on the transactions made in my street, the same year I bought my flat, the model predicted that all of them had the same market value. </p>



<p>In this blog post, I will point out several reasons why our experiment failed, and then I will try to train a new model, taking into account what we will have learned.</p>



<h3 class="wp-block-heading" id="AjourneyinthewondrouslandofMachineLearning,or&quot;DoesaPalaceinParisreallycost100000€?&quot;(Part2)-Whyourmodelfailed">Why our model failed</h3>



<p>There are several reasons why our model failed. Three of them stand out:</p>



<ul class="wp-block-list"><li>The open data format layout</li><li>The data variety</li><li>Dataiku&#8217;s default model parameters</li></ul>



<h4 class="wp-block-heading" id="AjourneyinthewondrouslandofMachineLearning,or&quot;DoesaPalaceinParisreallycost100000€?&quot;(Part2)-OpenDataFormatLayout">Open Data Format Layout</h4>



<p>You can find a description of the data layout on the dedicated <a href="https://www.data.gouv.fr/fr/datasets/5cc1b94a634f4165e96436c1/" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">webpage</a>. I will not list all the columns of the schema (there are 40 of them), but the most important one is the first one: id_mutation. This information is a unique transaction number, and not an unusual column to find. </p>



<p>However, if you look at the dataset itself, you will see that some transactions are spread over multiple lines. They correspond to transactions regrouping multiple parcels. In the example of my own transaction, there are two lines: one for the flat itself, and one for a separate basement under the building. </p>



<p>The problem is that the full price is found on every such line. From the point of view of my AI studio, which only sees a set of lines it interprets as data points, it looks like my basement and my flats are two different properties that cost an equal amount! This gets worse for properties that have lands and several constructs attached to them. How can we expect our algorithm to learn appropriately under these conditions?</p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/08/8AD89D9B-B3AE-4D6E-B92D-9601A091B47B-1024x766.png" alt="Open Data Format Layout" class="wp-image-19145" width="768" height="575" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/08/8AD89D9B-B3AE-4D6E-B92D-9601A091B47B-1024x766.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/08/8AD89D9B-B3AE-4D6E-B92D-9601A091B47B-300x225.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/08/8AD89D9B-B3AE-4D6E-B92D-9601A091B47B-768x575.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/08/8AD89D9B-B3AE-4D6E-B92D-9601A091B47B.png 1248w" sizes="auto, (max-width: 768px) 100vw, 768px" /><figcaption>Dataiku doesn&#8217;t naturally understand that a data point can consist of multiple lines!</figcaption></figure></div>



<h4 class="wp-block-heading" id="AjourneyinthewondrouslandofMachineLearning,or&quot;DoesaPalaceinParisreallycost100000€?&quot;(Part2)-DataVariety">Data Variety</h4>



<p>In this case, we are trying to predict the price of a flat in Paris. However, the data we gave the algorithm regroups every real estate transaction made in France over the last few years. While you might think that more data is always better, this is not necessary the case. </p>



<p>The real estate market changes according to where you are, and Paris is a very specific case in France, with prices being much higher than in other big cities and the rest of France. Of course, this can be seen in the data, but the training algorithm does not know that in advance, and it is very hard for it to learn how to price a small flat in Paris and a farm with acres of land in Lozère at the same time.&nbsp;</p>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/08/D673B22D-30EB-43F8-BC7B-D8EE0D25EEED-1024x563.png" alt="Data variety" class="wp-image-19141" width="768" height="422" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/08/D673B22D-30EB-43F8-BC7B-D8EE0D25EEED-1024x563.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/08/D673B22D-30EB-43F8-BC7B-D8EE0D25EEED-300x165.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/08/D673B22D-30EB-43F8-BC7B-D8EE0D25EEED-768x422.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/08/D673B22D-30EB-43F8-BC7B-D8EE0D25EEED.png 1253w" sizes="auto, (max-width: 768px) 100vw, 768px" /></figure>



<h4 class="wp-block-heading" id="AjourneyinthewondrouslandofMachineLearning,or&quot;DoesaPalaceinParisreallycost100000€?&quot;(Part2)-Modeltrainingparameters">Model training parameters</h4>



<p>In the last blog post, you have seen how easy it is to use Dataiku. But it comes at a price: the default script can be used for very simple use-cases. But it is not suited for complex tasks &#8211; like predicting real-estate prices. I myself do not have much experience with Dataiku. However, by digging deeper into the details, I was able to correct a few obvious mistakes:</p>



<ul class="wp-block-list"><li>Data types: A lot of the columns in the dataset are specific types: integers, geographic coordinates, dates etc. Most of them are correctly identified by Dataiku, but some of them &#8211; such as geographic coordinates, or dates &#8211; are not.<br></li><li>Data analysis: If you remember the previous post, at one point we were looking at different models trained by the algorithm. We didn&#8217;t take the time to look at the design automated by the model. This section allows us to tweak several elements; such as the types of algorithms we run, the learning parameters, the choice of the dataset etc&#8230; <br><br>With so many features present in the dataset, Dataiku tried to reduce the number of features it would analyze, in order to simplify the learning algorithm. But it made poor choices. For example, it considers the street number but not the street itself. Even worse, it doesn&#8217;t even look at the date, or the parcels&#8217; surface area (but it does consider land surface when present&#8230;), which is by far the most important factor in most cities!</li></ul>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/09/IMG_0255-1024x418.png" alt="Auto Machine Learning did bad choices" class="wp-image-19234" width="768" height="314" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/09/IMG_0255-1024x418.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/09/IMG_0255-300x122.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/09/IMG_0255-768x313.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/09/IMG_0255.png 1476w" sizes="auto, (max-width: 768px) 100vw, 768px" /></figure></div>



<h3 class="wp-block-heading" id="AjourneyinthewondrouslandofMachineLearning,or&quot;DoesaPalaceinParisreallycost100000€?&quot;(Part2)-Howtofixallofthat">How to fix all of that</h3>



<p>Fortunately, there are ways to solve these issues. Dataiku integrates tools to transform and filter your datasets before running your algorithms. It also allows you to change training parameters. Rather than walking you through all the steps, I&#8217;m going to summarize what I did for each of the issues we identified earlier:</p>



<h4 class="wp-block-heading">Data layout</h4>



<ul class="wp-block-list"><li>First, I grouped the lines that corresponded with the same transactions. Depending on the fields, I either summed them up (when it was a living area surface, for example), kept one of them (address), or concatenated them (when it was the identifier for an outbuilding, for example).</li><li>Second, I removed several unnecessary or redundant fields that add noise to the algorithm; such as street name (there are already per-city-unique street codes), street number suffix (&#8220;Bis&#8221; or &#8220;Ter&#8221; commonly found in an address after a street number) or other administration-related information.</li><li>Finally, some transactions contain not only several parcels (on several lines) but also several subparcels per parcel, each with its own surface and subparcel number. This subdivision is mostly administrative, and subparcels are often previously adjoining flats that have been reunited. To simplify the data, I cut the subparcel numbers and summed their respective surfaces, before regrouping the lines.</li></ul>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/09/IMG_0254-935x1024.png" alt="Cleaning data layout" class="wp-image-19232" width="701" height="768" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/09/IMG_0254-935x1024.png 935w, https://blog.ovhcloud.com/wp-content/uploads/2020/09/IMG_0254-274x300.png 274w, https://blog.ovhcloud.com/wp-content/uploads/2020/09/IMG_0254-768x841.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/09/IMG_0254.png 1248w" sizes="auto, (max-width: 701px) 100vw, 701px" /></figure></div>



<h4 class="wp-block-heading">Data variety</h4>



<ul class="wp-block-list"><li>First, as we are trying to train a model to estimate the price of Parisian flats, I filtered out all the transactions that didn&#8217;t happen in Paris (which as you can expect is most of it).<br></li><li>Second, I removed all the transactions that had incomplete data for important fields (such as surface or address).<br></li><li>Finally, I removed outliers: transactions corresponding to properties that don&#8217;t correspond to standard flats; such as houses, commercial land, very high-end flats etc&#8230;</li></ul>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/08/674C0C3E-33DE-4AF1-B682-83654DD42F3B-1024x701.png" alt="Tackling data variety" class="wp-image-19146" width="768" height="526" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/08/674C0C3E-33DE-4AF1-B682-83654DD42F3B-1024x701.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/08/674C0C3E-33DE-4AF1-B682-83654DD42F3B-300x205.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/08/674C0C3E-33DE-4AF1-B682-83654DD42F3B-768x526.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/08/674C0C3E-33DE-4AF1-B682-83654DD42F3B.png 1527w" sizes="auto, (max-width: 768px) 100vw, 768px" /></figure></div>



<h4 class="wp-block-heading">Model training parameters</h4>



<ul class="wp-block-list"><li>Model training parameters:<ul><li>First, I made sure that the model considered all the features. Note: rather than removing unnecessary fields from the dataset, I could have just told the algorithm to ignore the corresponding features. However, my preference is to increase the readability of the dataset to make it easier to explore. Moreover, Dataiku loads data in RAM to process it, so making it run on a clean dataset makes it more RAM-efficient.<br></li><li>Second, I trained the algorithm on different sets of features: in some cases I kept the district but not the street. As there are a lot of different streets in Paris this is a categorical feature with high cardinality (lots of different possibilities that can&#8217;t be numerized).<br></li><li>Finally, I tried different families of Machine Learning algorithms: Random Forest &#8211; basically building a decision tree; XGBoost &#8211; gradient boosting; SVM (Support Vector Machine)- a generalization of linear classifiers; and KNN (K-Nearest-Neighbours) &#8211; which tries to categorize data points by looking at its neighbors according to different metrics.</li></ul></li></ul>



<h1 class="wp-block-heading" id="AjourneyinthewondrouslandofMachineLearning,or&quot;DoesaPalaceinParisreallycost100000€?&quot;(Part2)-Diditwork?">Did it work?</h1>



<p>So, after all that, how did we fare? Well, first off, let us look at the R2 score of our models. Depending on the training session, our best models have an R2 score between 0.8 and 0.85. As a reminder, a R2 score of 1 would mean that the model perfectly predicts the price of every data point used in the training evaluation phase. The best models in our previous tries had an R2 score between 0.1 and 0.2, so we are already clearly better here. Let us now look at a few predictions from this model.</p>



<p>First, I re-checked all the transactions from my street. This time, the prediction for my flat is ~16% lower than the price I paid. But unlike last time, every flat has a different estimate and these estimates are all in the correct order of magnitude. Most values have less than 20% error when compared to the real price, and the worst estimates have ~50% error. Obviously, this margin of error is unacceptable when investing in a flat. However, when compared to the previous iteration of our model &#8211; that returned the same estimate for all the flats in my street &#8211; we are making significant progress.</p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/09/IMG_0250-1024x893.png" alt="A journey through the wondrous land of Machine Learning  - Did it work?" class="wp-image-19228" width="768" height="670" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/09/IMG_0250-1024x893.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/09/IMG_0250-300x262.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/09/IMG_0250-768x669.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/09/IMG_0250.png 1224w" sizes="auto, (max-width: 768px) 100vw, 768px" /></figure></div>



<p>So, now that we at least have the correct order of magnitude, let&#8217;s try and tweak some values in our input dataset to see if the model reacts predictably. To do this, I took the data point of my own transaction and created new data points, each time by changing one of the features of the original data point:</p>



<ol class="wp-block-list"><li>the surface to reduce it</li><li>the coordinates (street name, street code, geographic coordinates, etc) to put it in a cheaper district</li><li>the date of transaction to year 2015 (3 years prior to the real date)</li></ol>



<p>With each of these modifications, we would expect the new estimates to be lower than the original one (the real estate market in Paris is in permanent inflation). Let us look at the results:</p>



<figure class="wp-block-table"><table><thead><tr></tr></thead><tbody><tr><td class="has-text-align-center" data-align="center"><strong>Real Price</strong></td><td class="has-text-align-center" data-align="center"><strong>Original Estimate</strong></td><td class="has-text-align-center" data-align="center"><strong>Reduced Surface Estimate</strong></td><td class="has-text-align-center" data-align="center"><strong>Other District Estimate</strong></td><td class="has-text-align-center" data-align="center"><strong>Older Estimate</strong></td></tr><tr><td class="has-text-align-center" data-align="center">100%</td><td class="has-text-align-center" data-align="center">84%</td><td class="has-text-align-center" data-align="center">45%</td><td class="has-text-align-center" data-align="center">61%</td><td class="has-text-align-center" data-align="center">76%</td></tr></tbody></table></figure>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/09/IMG_0251-1024x562.png" alt="Tweaking values in the data-set" class="wp-image-19230" width="768" height="422" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/09/IMG_0251-1024x562.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/09/IMG_0251-300x165.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/09/IMG_0251-768x422.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/09/IMG_0251-1536x843.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2020/09/IMG_0251.png 1554w" sizes="auto, (max-width: 768px) 100vw, 768px" /></figure></div>



<p>At least the model behaves in an appropriate way, qualitatively speaking.</p>



<h1 class="wp-block-heading" id="AjourneyinthewondrouslandofMachineLearning,or&quot;DoesaPalaceinParisreallycost100000€?&quot;(Part2)-Howcouldwedobetter?">How could we do better?</h1>



<p>At this point, we have used common sense to significantly improve our previous results and build a model that gives predictions in a good order of magnitude and that behaves as we expect when tweaking the features of data points. However, the remaining margin of error makes it unsuitable for real-world application . But why, and what could we do to keep improving our model? Well there are several reasons:</p>



<p>Data complexity: I am going to contradict myself a little. While complex data is harder to digest for a Machine Learning algorithm, it is necessary to preserve this complexity if it reflects a complexity in the final result. In this case, we might not only have oversimplified the data, but the original data itself may lack a lot of relevant information.</p>



<p>We trained our algorithm on general location and surface, which admittedly are the most important criteria, but our dataset lacks very important information such as floors, exposure, construction years, insulation diagnostics, condition, accessibility, view, general state of the flats etc&#8230;</p>



<p>There are private datasets built by notarial offices that are more complete than our open dataset, but while those might have features such as floor or construction year, they would probably lack more subjective information; such as general state or view.</p>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/08/A72E1C7E-A66E-410A-A587-73AF9D538C95-1024x951.png" alt="Data complexity" class="wp-image-19143" width="768" height="713" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/08/A72E1C7E-A66E-410A-A587-73AF9D538C95-1024x951.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/08/A72E1C7E-A66E-410A-A587-73AF9D538C95-300x279.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/08/A72E1C7E-A66E-410A-A587-73AF9D538C95-768x713.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/08/A72E1C7E-A66E-410A-A587-73AF9D538C95.png 1395w" sizes="auto, (max-width: 768px) 100vw, 768px" /><figcaption>The dataset lacks very important information about the flats.</figcaption></figure>



<ul class="wp-block-list"><li>Data amount: Even if we had very complete data, we would need a vast amount of it. The more features we include in our training, the more data we need. And for such a complex task, the ~150K transactions per year we have in Paris are probably not enough. A solution could be to create artificial data points: flats that don&#8217;t really exist, but that human experts would still be able to evaluate. <br><br>But there are three issues with that: first, any bias in the experts would inevitably be passed on the model. Second, we would have to generate a huge number of artificial, but realistic data points and then would need the help of multiple human experts to label it. Finally, the aforementioned experts would label this artificial data based on their current experience. It would be very hard for them to remember the market prices from a few years ago. This means to have a homogeneous dataset over the years, we would have to create this artificial data over time and at the same pace as the real transactions happen.<br></li><li>Skills: Finally, being a data scientist is a full-time job that requires experience and skill. A real data scientist would probably be able to reach better results than what I obtained by adjusting the learning parameters and choosing the most appropriate algorithms.<br><br>Furthermore, even good data scientists would have to know their way around real estate and its pricing. It&#8217;s very hard to build advanced Machine Learning models without having a good comprehension of the topic at hand.</li></ul>



<h1 class="wp-block-heading" id="AjourneyinthewondrouslandofMachineLearning,or&quot;DoesaPalaceinParisreallycost100000€?&quot;(Part2)-Summary">Summary</h1>



<p>In this blog post, we discussed why our previous attempt at training a model to predict the price of flats in Paris failed. The data we used was not cleaned enough and we used Dataiku&#8217;s default training parameters rather than verifying that they made sense. </p>



<p>After that, we corrected our mistakes, cleaned the data and tweaked the training parameters. This improved the result of our model a lot, but not enough to use it realistically. There are ways to improve the model further, but the available datasets lack some information and the amount of data itself may not be sufficient to build a robust model. </p>



<p>Fortunately, the intent of this series was never to predict the price of flats in Paris perfectly. If it was possible, there would be no more real estate agencies. Instead, it serves as an illustration of how anyone can take raw data, find a problem related to the data and train a model to tackle this problem.</p>



<p>However, the dataset that we used in this example was quite small: only a few gigabytes. Everything happened on a single VM and we had to do everything manually, on a fixed dataset. What would I do if I wanted to handle petabytes of data? If I wanted to handle continuously streaming data? If I wanted to expose my model so that external applications could query it? </p>



<p>That is what we are going to look at next time, in the final blog post of the series.</p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fa-journey-through-the-wondrous-land-of-machine-learning-or-can-i-really-buy-a-palace-in-paris-for-100000e-part-2%2F&amp;action_name=A%20journey%20through%20the%20wondrous%20land%20of%20Machine%20Learning%20or%20%26%238220%3BCan%20I%20really%20buy%20a%20palace%20in%20Paris%20for%20100%2C000%E2%82%AC%3F%26%238221%3B%20%28Part%202%29&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>How PCI-Express works and why you should care? #GPU</title>
		<link>https://blog.ovhcloud.com/how-pci-express-works-and-why-you-should-care-gpu/</link>
		
		<dc:creator><![CDATA[Jean-Louis Queguiner]]></dc:creator>
		<pubDate>Thu, 09 Jul 2020 10:16:00 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Deep learning]]></category>
		<category><![CDATA[GPU]]></category>
		<category><![CDATA[Hardware]]></category>
		<category><![CDATA[Infrastructure]]></category>
		<category><![CDATA[Machine learning]]></category>
		<category><![CDATA[PCIe]]></category>
		<guid isPermaLink="false">https://blog.ovh.com/fr/blog/?p=14485</guid>

					<description><![CDATA[What is PCI-Express ? Everyone, and I mean everyone, should pay attention when they do intensive Machine Learning / Deep Learning Training. As I explained in a previous blog post, GPUs have accelerated Artificial Intelligence evolution massively. However, building a GPUs server is not that easy. And failing to create an appropriate infrastructure can have [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fhow-pci-express-works-and-why-you-should-care-gpu%2F&amp;action_name=How%20PCI-Express%20works%20and%20why%20you%20should%20care%3F%20%23GPU&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<div class="wp-block-image"><figure class="aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="538" src="https://www.ovh.com/blog/wp-content/uploads/2020/07/69659375-3553-40C9-A201-73C4CDED2461-1024x538.jpeg" alt="How PCI-Express works and why you should care? #GPU" class="wp-image-18783" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/07/69659375-3553-40C9-A201-73C4CDED2461-1024x538.jpeg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/69659375-3553-40C9-A201-73C4CDED2461-300x158.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/69659375-3553-40C9-A201-73C4CDED2461-768x403.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/69659375-3553-40C9-A201-73C4CDED2461.jpeg 1200w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure></div>



<h2 class="wp-block-heading">What is PCI-Express ?</h2>



<p>Everyone, and I mean everyone, should pay attention when they do intensive Machine Learning / Deep Learning Training. </p>



<p>As I explained in a previous blog post, GPUs have accelerated Artificial Intelligence evolution massively.  </p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><a href="https://www.ovh.com/blog/understanding-the-anatomy-of-gpus-using-pokemon/" data-wpel-link="exclude"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/05/EEAD0A02-DFCA-4745-802B-E36BC517EFED.png" alt="" class="wp-image-18103" width="334" height="254" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/05/EEAD0A02-DFCA-4745-802B-E36BC517EFED.png 668w, https://blog.ovhcloud.com/wp-content/uploads/2020/05/EEAD0A02-DFCA-4745-802B-E36BC517EFED-300x228.png 300w" sizes="auto, (max-width: 334px) 100vw, 334px" /></a></figure></div>



<p>However, building a GPUs server is not that easy. And failing to create an appropriate infrastructure can have consequences on training time.</p>



<p>If you use GPUs, you should know that there are 2 ways to connect them to the motherboard to allow it to connect to the other components (network, CPU, storage device). Solution 1 is through <strong>PCI Express </strong>and solution 2 through <strong>SXM2</strong>. We will talk about <strong>SXM2</strong> in the future. Today, we will focus on <strong>PCI Express</strong>. This is because it has a strong dependency with the choice of adjacent hardware such as PCI BUS or CPU.</p>



<figure class="wp-block-table"><table class="has-fixed-layout"><thead><tr><th>                     NVIDIA V100 with SXM2 design</th><th class="has-text-align-center" data-align="center">                          NVIDIA V100 with PCI express design</th></tr></thead><tbody><tr><td><img loading="lazy" decoding="async" width="609" height="644" class="wp-image-18763" style="width: 500px" src="https://www.ovh.com/blog/wp-content/uploads/2020/07/PCIe-01.jpg" alt="NVIDIA V100 with SXM2 design" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/07/PCIe-01.jpg 609w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/PCIe-01-284x300.jpg 284w" sizes="auto, (max-width: 609px) 100vw, 609px" /><br>Source : <a aria-label="undefined (opens in a new tab)" href="https://www.ebizpc.com/NVIDIA-Tesla-V100-900-2G502-0300-000-16GB-GPU-p/900-2g503-0310-000.htm" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">https://www.ebizpc.com/NVIDIA-Tesla-V100-900-2G502-0300-000-16GB-GPU-p/900-2g503-0310-000.htm</a></td><td class="has-text-align-center" data-align="center"><img loading="lazy" decoding="async" width="450" height="450" class="wp-image-18764" style="width: 500px" src="https://www.ovh.com/blog/wp-content/uploads/2020/07/PCIe-02.jpg" alt="NVIDIA V100 with PCI express design" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/07/PCIe-02.jpg 450w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/PCIe-02-300x300.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/PCIe-02-150x150.jpg 150w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/PCIe-02-70x70.jpg 70w" sizes="auto, (max-width: 450px) 100vw, 450px" /><br>Source : <a aria-label="undefined (opens in a new tab)" href="https://nvidiastore.com.br/nvidia-tesla-v100-16gb" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">https://nvidiastore.com.br/nvidia-tesla-v100-16gb</a></td></tr></tbody></table><figcaption>SXM2 design VS PCI Express Design</figcaption></figure>



<p>This is a major element to consider when talking about deep learning as data loading phase is a waste of compute time, so bandwidth between components and GPUs is a key bottleneck in most deep learning training contexts.</p>



<h2 class="wp-block-heading">How does PCI-Express work and why you should care about the number of PCIe lanes?</h2>



<h3 class="wp-block-heading">What is a PCI-Express Lanes and are there any associated CPU limitations?</h3>



<p>Each GPU V100 is using the 16 PCI-e lanes. What does it mean exactly?</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img loading="lazy" decoding="async" width="618" height="442" src="https://www.ovh.com/blog/wp-content/uploads/2020/07/PCIe-03.png" alt="Extract from NVidia V100 product specification sheet" class="wp-image-18767" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/07/PCIe-03.png 618w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/PCIe-03-300x215.png 300w" sizes="auto, (max-width: 618px) 100vw, 618px" /><figcaption>Extract from NVidia V100 product specification <a href="https://images.nvidia.com/content/technologies/volta/pdf/tesla-volta-v100-datasheet-letter-fnl-web.pdf" target="_blank" aria-label="undefined (opens in a new tab)" rel="noreferrer noopener nofollow external" data-wpel-link="external">sheet</a></figcaption></figure></div>



<p>The <strong><em>&#8220;x16&#8221;</em></strong> means that the PCIe has 16 dedicated lanes. So&#8230; next question: What is a PCI Express lane ?</p>



<h4 class="wp-block-heading">What&#8217;s a PCI Express lane?</h4>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/07/72DFDF80-DC39-4253-BAB3-CEB351B627D3.jpeg" alt="2 PCI Express Devices with its interconnexion" class="wp-image-18779" width="424" height="299" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/07/72DFDF80-DC39-4253-BAB3-CEB351B627D3.jpeg 848w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/72DFDF80-DC39-4253-BAB3-CEB351B627D3-300x211.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/72DFDF80-DC39-4253-BAB3-CEB351B627D3-768x541.jpeg 768w" sizes="auto, (max-width: 424px) 100vw, 424px" /><figcaption>2 PCI Express Devices with its interconnexion : figure inspired of the awesome <a aria-label="undefined (opens in a new tab)" href="https://www.phhsnews.com/what-is-chipset-and-why-should-i-care3538" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">article</a> &#8211; what is chipset and why should I care</figcaption></figure></div>



<p>PCIe lanes are used to communicate between PCIe Devices or between PCIe and CPU. A lane is composed of 2 wires: one for inbound communications and one, which has double the traffic bandwidth, for outbound. </p>



<p>Lane communications are similar to network Layer 1 communications &#8211; it’s all about transferring bits as fast as possible through electrical wires! However, the technique used for PCIe Link is a bit different as the PCIe device is composed of xN lanes. In our previous example N=16 but it could be any power of 2 from 1 to 16 (1/2/4/8/16).</p>



<h3 class="wp-block-heading">So… if PCIe is similar to network architecture it means that PCIe layers exist, doesn&#8217;t it?</h3>



<p>Yes ! you are right PCIe has 4 layers:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="724" src="https://www.ovh.com/blog/wp-content/uploads/2020/07/photo_2020-07-02-15.08.02-1024x724.jpeg" alt="" class="wp-image-18723" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/07/photo_2020-07-02-15.08.02-1024x724.jpeg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/photo_2020-07-02-15.08.02-300x212.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/photo_2020-07-02-15.08.02-768x543.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/photo_2020-07-02-15.08.02.jpeg 1280w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p></p>



<h4 class="wp-block-heading"><strong>The Physical Layer (aka <em>the Big Negotiation Layer</em>)</strong></h4>



<p>The<strong><em> Physical Layer (PL)</em></strong> is responsible for negotiating the terms and conditions for receiving the raw packets (PLP for Physical Layer Packets) i.e the lane width and the frequency with the other device.</p>



<p>You should be aware that only the smallest number of lanes of the two devices will be used. This is why choosing the appropriate CPU is so important. CPUs have a limited number of lanes that they can manage so <strong>having a nice GPU with 16 PCIe Lanes and having a CPU with 8 PCIe Bus lanes will be as efficient as throwing away half your money because it doesn’t fit in your wallet.</strong></p>



<p>Packets received at the <strong><em>Physical Layer (aka PHY) </em></strong>are coming from other PCIe devices or from the system (via <strong><em>Direct Access Memory — DAM</em></strong> or from CPU for instance) and are encapsulated in a frame. </p>



<p>The purpose of a Start-of-Frame is to say: “I am sending you data, this is the beginning,” and it takes just 1 byte to say that!</p>



<p>The <strong><em>End-of-Frame</em> </strong>word is also 1 byte to say “goodbye I’m done with it”.</p>



<p>This layer implement a <strong><em>8b/10b or 128b/130b decoding</em></strong> that we will explain later and is mainly used for <strong><em>clock recovery.</em></strong></p>



<h4 class="wp-block-heading"><strong>The Data Link Layer Packet (aka <em>Let’s put this mess in the right&nbsp;order</em>)</strong></h4>



<p>The <strong><em>Data Link Layer Packet (DLLP)</em></strong> is starting with a <strong><em>Packet Sequence Number.</em></strong> This is really important as a packet might get corrupted at one point, so may need to be uniquely identified for retry purposes. The <strong><em>Sequence Number </em></strong>is coded on 2 bytes.</p>



<p>The <strong><em>Data Link Layer Packet</em></strong> is then followed by the <strong><em>Transaction Layer Packet</em></strong> and then closed with the <strong><em>LCRC (Local Cyclic Redundancy Check) </em></strong>and is used to check the <strong><em>Transaction Layer Packet (meaning the actual Payload)</em></strong> integrity.</p>



<p>If the <strong><em>LCRC</em></strong> is validated, then the <em><strong>Data Link Layer</strong></em> sends an <strong><em>ACK (ACKnowledge)</em></strong> signal to the <em><strong>emitter</strong></em> through the <strong><em>Physical Layer</em>.</strong> Otherwise it sends a <strong><em>NAK (Not AcKnowledge) </em></strong>signal to the emitter which will resend the frame associated with the <strong><em>sequence number </em></strong>to retry; this part handles the replay buffer on the <em><strong>receiver</strong></em> side.</p>



<h4 class="wp-block-heading"><strong>The Transaction Layer</strong></h4>



<p>The<strong><em> Transaction Layer</em></strong> is responsible for <strong>managing the actual payload (Header + Data)</strong> as well as the (optional) message digest <strong><em>ECRC (End to End Cyclic Redundancy Check)</em></strong>. This <strong><em>Transaction Layer Packet </em></strong>is coming from the <strong><em>Data Link Layer</em></strong> where it has been <strong>decapsulated</strong>.</p>



<p>An <strong>integrity check</strong> is performed if needed/requested. This step will check the integrity of the business logic and will insure no packet corruption when passing data from<strong><em> Data Link Layer</em></strong> to <em><strong>Transaction Layer.</strong></em></p>



<p>The header is describing the type of transaction such as:</p>



<ul class="wp-block-list"><li>Memory Transaction</li><li>I/O Transaction</li><li>Configuration Transaction</li><li>or Message Transaction</li></ul>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/07/5E282911-B63F-410D-A2CD-AD52B928C62E-1024x600.jpeg" alt="PCIe Layers" class="wp-image-18781" width="512" height="300" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/07/5E282911-B63F-410D-A2CD-AD52B928C62E-1024x600.jpeg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/5E282911-B63F-410D-A2CD-AD52B928C62E-300x176.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/5E282911-B63F-410D-A2CD-AD52B928C62E-768x450.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/5E282911-B63F-410D-A2CD-AD52B928C62E.jpeg 1368w" sizes="auto, (max-width: 512px) 100vw, 512px" /></figure></div>



<h4 class="wp-block-heading"><strong>The Application Layer</strong></h4>



<p>The role of the <em><strong>application layer</strong></em> is to handle the<strong><em> User Logic</em></strong>. This layer is sending the <strong><em>Header</em></strong> <strong><em>and the data payload </em></strong>to the <strong><em>Transaction Layer</em></strong>. The magic happens in this layer where data in rooted to different hardware components.</p>



<h3 class="wp-block-heading">How PCIe is communicating with the rest of the&nbsp;world?</h3>



<p>PCIe Link is using the <strong>packet switching concept used in network in a full duplex mode.</strong></p>



<p>PCIe device have an <strong>internal clock to orchestrate PCIe </strong><em><strong>Data Transfer Cycles</strong>.</em> This <strong><em>Data Transfer Cycle</em></strong> is also orchestrated thanks to the <strong><em>Referential Clock.</em></strong> The latter is sending a signal through a <strong><em>Dedicated Lane</em> (which is not part of the x1/2/4/8/16/32 mentioned above)</strong>. This clock will help both receiving and emitting devices to synchronize for packets communications.</p>



<p><strong>Each PCIe lane is used to send bytes in parallel with other lanes</strong>. The<strong><em> Clock Synchronization </em></strong>mentioned above will help the receiver to put back those bytes in the right order</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img loading="lazy" decoding="async" width="618" height="442" src="https://www.ovh.com/blog/wp-content/uploads/2020/07/PCIe-03.png" alt="" class="wp-image-18767" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/07/PCIe-03.png 618w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/PCIe-03-300x215.png 300w" sizes="auto, (max-width: 618px) 100vw, 618px" /><figcaption>x16 means 16 lanes of parallel communication on generation 3 of PCIe&nbsp;protocol</figcaption></figure></div>



<h3 class="wp-block-heading">You may have the bytes in order but do you have the data integrity at the physical layer&nbsp;?</h3>



<p>To ensure <strong>integrity</strong> PCIe device uses <strong>8b/10b encoding for PCIe generations 1 and 2</strong> or <strong>128b/130b encoding scheme for generations 3</strong> <strong>and 4.</strong></p>



<p>These encodings are used to prevent the loss of temporal landmarks, especially when transmitting consecutive similar bits. This process is called “<strong><em>Clock Recovery</em></strong>”</p>



<p>Those 128 bits of payload data are sent and 2 bytes of control are appended to it.</p>



<h4 class="wp-block-heading">Quick examples</h4>



<p><em>Let’s simplify it with a 8b/10b example:</em> according to IEEE 802.3 clause 36, table 36–1a based on Ethernet specifications here is the table 8b/10b encoding:</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img loading="lazy" decoding="async" width="600" height="546" src="https://www.ovh.com/blog/wp-content/uploads/2020/07/PCIe-04.png" alt="IEEE 802.3 clause 36, table 36–1a - 8b/10b encoding table" class="wp-image-18770" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/07/PCIe-04.png 600w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/PCIe-04-300x273.png 300w" sizes="auto, (max-width: 600px) 100vw, 600px" /><figcaption>IEEE 802.3 clause 36, table 36–1a &#8211; 8b/10b encoding table</figcaption></figure></div>



<p>So how can the receiver make the difference between all those repeating 0 (Code Group Name D0.0) ?</p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/07/2B41AC73-59D2-4230-B8F4-73327F3991E4-1024x819.png" alt="Repeating bits everywhere" class="wp-image-18777" width="512" height="410" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/07/2B41AC73-59D2-4230-B8F4-73327F3991E4-1024x819.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/2B41AC73-59D2-4230-B8F4-73327F3991E4-300x240.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/2B41AC73-59D2-4230-B8F4-73327F3991E4-768x615.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/2B41AC73-59D2-4230-B8F4-73327F3991E4.png 1381w" sizes="auto, (max-width: 512px) 100vw, 512px" /></figure></div>



<p>8b/10b encoding is composed of 5b/6b + 3b/4b encodings.</p>



<p>Therefore <strong>00000 000</strong> will be encoded into <strong>100111 0100 </strong>the 5 first bits of the original data <strong>00000</strong> are encoded to <strong>100111</strong> using 5b/6b encoding (<strong>rd+</strong>); same goes for the second group of 3bits of original data <strong>000</strong> encoded into <strong>0100</strong> using 3b/4b encoding (<strong>rd-</strong>).</p>



<p>It could have been also <strong>5b/6b encoding rd+ </strong>and<strong> 3b/4b encoding rd- </strong>making <strong>00000 000</strong> turning into <strong>011000 1011</strong></p>



<p><strong>Therefore the original data which was 8bits is now 10bits due to bits control (1 control bit for 5b/6b and 1 fir 3b/4b). </strong></p>



<p>But don&#8217;t worry I will draft a blog post later dedicated to encoding.</p>



<p><strong>PCIe Generations 1 and 2 were designed with 8b/10b encoding </strong>meaning that the <strong>actual data transmitted was only 80% of the total load </strong>(as 20% — 2 bits are used as Clock synchronization).</p>



<p><strong>PCIe Gen3&amp;4 were designed with 128b/130b </strong>meaning that the <strong>control bits are now representing only 1.56% of the payload. </strong>Quite good isn’t it?</p>



<h3 class="wp-block-heading">Let’s calculate the PCIe bandwidth together</h3>



<p>Here is the table of PCIe versions specifications</p>



<figure class="wp-block-table"><table><thead><tr><th>Number of Lanes</th><th>PCIe 1.0 (2003)</th><th>PCIe 2.0 (2007)</th><th><strong>PCIe 3.0 (2010)</strong></th><th><strong>PCIe 4.0 (2017)</strong></th><th>PCIe 5.0 (2019)</th><th>PCIe 6.0 (2021)</th></tr></thead><tbody><tr><td>x1</td><td>250 MB/s</td><td>500 MB/s</td><td>1 GB/s</td><td>2 GB/s</td><td>4 GB/s</td><td>8 GB/s</td></tr><tr><td>x2</td><td>500 MB/s</td><td>1 GB/s</td><td>2 GB/s</td><td>4 GB/s</td><td>8 GB/s</td><td>16 GB/s</td></tr><tr><td>x4</td><td>1 GB/s</td><td>2 GB/s</td><td>4 GB/s</td><td>8 GB/s</td><td>16 GB/s</td><td>32 GB/s</td></tr><tr><td>x8</td><td>2 GB/s</td><td>4 GB/s</td><td>8 GB/s</td><td>16 GB/s</td><td>32 GB/s</td><td>64 GB/s</td></tr><tr><td><strong>x16</strong></td><td>4 GB/s</td><td>8 GB/s</td><td><strong>16 GB/s</strong></td><td>32 GB/s</td><td>64 GB/s</td><td>128 GB/s</td></tr></tbody></table><figcaption>consortium PCI-SIG PCIe theoretical bandwidth/Lane/Way specification sheet</figcaption></figure>



<figure class="wp-block-table"><table><thead><tr><th>                                </th><th>PCIe 1.0 (2003)</th><th>PCIe 2.0 (2007)</th><th>PCIe 3.0 (2010)</th><th>PCIe 4.0 (2017)</th><th>PCIe 5.0 (2019)</th><th>PCIe 6.0 (2021)</th></tr></thead><tbody><tr><td><strong>Frequency</strong></td><td>2.5 GT/s</td><td>5.0 GT/s</td><td>8.0 GT/s</td><td>16 GT/s</td><td>32 GT/s</td><td>64 GT/s</td></tr></tbody></table><figcaption>consortium PCI-SIG PCIe theoretical raw bit rate specification sheet</figcaption></figure>



<p>To obtain such numbers let&#8217;s look at the general Bandwidth formula:</p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/07/B529B3E3-419B-49DE-9544-8B7BF190D3BB-1024x155.jpeg" alt="" class="wp-image-18793" width="512" height="78" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/07/B529B3E3-419B-49DE-9544-8B7BF190D3BB-1024x155.jpeg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/B529B3E3-419B-49DE-9544-8B7BF190D3BB-300x46.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/B529B3E3-419B-49DE-9544-8B7BF190D3BB-768x117.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/B529B3E3-419B-49DE-9544-8B7BF190D3BB.jpeg 1298w" sizes="auto, (max-width: 512px) 100vw, 512px" /></figure></div>



<ul class="wp-block-list"><li>BW stands for Bandwidth</li><li>MT/s&nbsp;: Mega Transfers per second</li><li>Encoding could be 4b/5b/, 8b/10b, 128b/130b,&nbsp;…</li></ul>



<h4 class="wp-block-heading">For PCIe v1.0:</h4>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/07/A99597E2-4117-43B1-9048-1CE24EFAE227-1024x170.jpeg" alt="BW/lane\ (MB/s) = \ 2\ 500\ (MT/s)\ *\ \frac{8\ bits}{10\ bits} * \frac{1\ Byte}{8\ bits" class="wp-image-18785" width="512" height="85" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/07/A99597E2-4117-43B1-9048-1CE24EFAE227-1024x170.jpeg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/A99597E2-4117-43B1-9048-1CE24EFAE227-300x50.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/A99597E2-4117-43B1-9048-1CE24EFAE227-768x127.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/A99597E2-4117-43B1-9048-1CE24EFAE227.jpeg 1231w" sizes="auto, (max-width: 512px) 100vw, 512px" /></figure></div>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/07/BC5F6C70-2FCF-4CD4-9040-848C8EB654CB.jpeg" alt="BW/lane\ (MB/s) = \ 250\ (MB/s)" class="wp-image-18788" width="347" height="79" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/07/BC5F6C70-2FCF-4CD4-9040-848C8EB654CB.jpeg 806w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/BC5F6C70-2FCF-4CD4-9040-848C8EB654CB-300x67.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/BC5F6C70-2FCF-4CD4-9040-848C8EB654CB-768x172.jpeg 768w" sizes="auto, (max-width: 347px) 100vw, 347px" /></figure></div>



<h4 class="wp-block-heading">For PCIe v3.0 (the one that interest us for NVIDIA V100):</h4>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/07/6EFDAF22-C7FC-44FC-B5BE-D8C4D291B71A-1024x154.jpeg" alt="BW/lane\ (MB/s) = \ 8\ 000\ (MT/s)\ *\ \frac{128\ bits}{130\ bits} * \frac{1\ Byte}{8\ bits}" class="wp-image-18795" width="512" height="77" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/07/6EFDAF22-C7FC-44FC-B5BE-D8C4D291B71A-1024x154.jpeg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/6EFDAF22-C7FC-44FC-B5BE-D8C4D291B71A-300x45.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/6EFDAF22-C7FC-44FC-B5BE-D8C4D291B71A-768x115.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/6EFDAF22-C7FC-44FC-B5BE-D8C4D291B71A.jpeg 1292w" sizes="auto, (max-width: 512px) 100vw, 512px" /></figure></div>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/07/3B7E1754-67C8-4EF1-88BE-3A5D8985803F.jpeg" alt="BW/lane\ (MB/s) = \ 984.6\ (MB/s)" class="wp-image-18796" width="355" height="63" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/07/3B7E1754-67C8-4EF1-88BE-3A5D8985803F.jpeg 802w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/3B7E1754-67C8-4EF1-88BE-3A5D8985803F-300x53.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/3B7E1754-67C8-4EF1-88BE-3A5D8985803F-768x136.jpeg 768w" sizes="auto, (max-width: 355px) 100vw, 355px" /></figure></div>



<p>Therefore with <strong>16 lanes for a NVIDIA V100 connected in PCIe v3.0</strong>, we have an effective data rate transfer (data bandwidth)<strong> of nearly 16GB/s/way </strong>(<strong>actual bandwidth is 15.75GB/s/way</strong>)</p>



<p>You need to be careful not to get confused, as total bandwidth can also be interpreted as two ways bandwidth; in this case we consider total bandwidth x16 to be around 32GB/s.</p>



<p><em><strong>Note :</strong></em> Another element that we haven&#8217;t considered is that the maximum theoretical bandwidth needs to be reduced by around 1 Gb/s for error correction protocols (<strong><em>ECRC</em></strong> and <strong><em>LCRC</em></strong>) as well as the <strong><em>Headers</em></strong> (<strong><em>Start tag, Sequence tag, Header</em></strong>) and <strong><em>Footer</em></strong> (<em><strong>End</strong></em> tag) overheads explained earlier in this blog post.</p>



<h3 class="wp-block-heading">In conclusion</h3>



<p>We have seen that PCI Express has evolved a lot and that It&#8217;s based on the same concepts as network. To take the best from the PCIe devices it is necessary to understand the fundamentals of the underlying infrastructure. </p>



<p>Failing to choose the right underlying Motherboard, CPU or BUS can lead to major performance bottleneck and GPU under performance.</p>



<p>To sum up :</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow"><p>Friends don&#8217;t let friends build their own GPUs hosts 😉</p><cite>Jean-Louis Quéguiner July 1<sup>st</sup>, 2020</cite></blockquote>



<p>If you liked this post but you want to drill down a bit into the Deep Learning and AI aspect of things don&#8217;t hesitate to check out my other blog posts:</p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><a href="https://www.ovh.com/blog/deep-learning-explained-to-my-8-year-old-daughter/" data-wpel-link="exclude"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/05/BC0E1AC1-6593-4395-9844-A7D2CB457028.png" alt="" class="wp-image-18099" width="515" height="376" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/05/BC0E1AC1-6593-4395-9844-A7D2CB457028.png 748w, https://blog.ovhcloud.com/wp-content/uploads/2020/05/BC0E1AC1-6593-4395-9844-A7D2CB457028-300x219.png 300w" sizes="auto, (max-width: 515px) 100vw, 515px" /></a></figure></div>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><a href="https://www.ovh.com/blog/what-does-training-neural-networks-mean/" data-wpel-link="exclude"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/04/81921ABA-7642-4CA2-87BF-9B2D92278BF1-1024x538.png" alt="What does training neural networks mean?" class="wp-image-17932" width="512" height="269" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/04/81921ABA-7642-4CA2-87BF-9B2D92278BF1-1024x538.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/81921ABA-7642-4CA2-87BF-9B2D92278BF1-300x158.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/81921ABA-7642-4CA2-87BF-9B2D92278BF1-768x404.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/81921ABA-7642-4CA2-87BF-9B2D92278BF1.png 1200w" sizes="auto, (max-width: 512px) 100vw, 512px" /></a></figure></div>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><a href="https://www.ovh.com/blog/distributed-training-in-a-deep-learning-context/" data-wpel-link="exclude"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/05/20C35ECE-4738-4967-951E-6BC863342D5D-1024x537.png" alt="Distributed Learning in a Deep Learning context" class="wp-image-18106" width="512" height="269" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/05/20C35ECE-4738-4967-951E-6BC863342D5D-1024x537.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/05/20C35ECE-4738-4967-951E-6BC863342D5D-300x157.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/05/20C35ECE-4738-4967-951E-6BC863342D5D-768x403.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/05/20C35ECE-4738-4967-951E-6BC863342D5D.png 1200w" sizes="auto, (max-width: 512px) 100vw, 512px" /></a></figure></div>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fhow-pci-express-works-and-why-you-should-care-gpu%2F&amp;action_name=How%20PCI-Express%20works%20and%20why%20you%20should%20care%3F%20%23GPU&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Distributed Training in a Deep Learning Context</title>
		<link>https://blog.ovhcloud.com/distributed-training-in-a-deep-learning-context/</link>
		
		<dc:creator><![CDATA[Jean-Louis Queguiner]]></dc:creator>
		<pubDate>Tue, 05 May 2020 10:14:07 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Deep learning]]></category>
		<category><![CDATA[GPU]]></category>
		<category><![CDATA[Machine learning]]></category>
		<category><![CDATA[Neural networks]]></category>
		<guid isPermaLink="false">https://www.ovh.com/blog/?p=17871</guid>

					<description><![CDATA[Previously on OVHcloud Blog &#8230; In previous blog posts we have discussed a high level approach to deep learning as well as what is meant by &#8216;training&#8217; in relation to Deep Learning. Following the article, I had lots of questions entering my twitter inbox, especially regarding how GPUs actually works. I decided, therefore, to write [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fdistributed-training-in-a-deep-learning-context%2F&amp;action_name=Distributed%20Training%20in%20a%20Deep%20Learning%20Context&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="537" src="https://www.ovh.com/blog/wp-content/uploads/2020/05/20C35ECE-4738-4967-951E-6BC863342D5D-1024x537.png" alt="Distributed Learning in a Deep Learning context" class="wp-image-18106" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/05/20C35ECE-4738-4967-951E-6BC863342D5D-1024x537.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/05/20C35ECE-4738-4967-951E-6BC863342D5D-300x157.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/05/20C35ECE-4738-4967-951E-6BC863342D5D-768x403.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/05/20C35ECE-4738-4967-951E-6BC863342D5D.png 1200w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<h3 class="wp-block-heading">Previously on OVHcloud Blog &#8230;</h3>



<p>In previous blog posts we have discussed a <a href="https://www.ovh.com/blog/deep-learning-explained-to-my-8-year-old-daughter/" data-wpel-link="exclude">high level approach to deep learning</a> as well as what is meant by &#8216;training&#8217; in relation to Deep Learning.</p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><a href="https://www.ovh.com/blog/deep-learning-explained-to-my-8-year-old-daughter/" data-wpel-link="exclude"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/05/BC0E1AC1-6593-4395-9844-A7D2CB457028.png" alt="" class="wp-image-18099" width="374" height="273" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/05/BC0E1AC1-6593-4395-9844-A7D2CB457028.png 748w, https://blog.ovhcloud.com/wp-content/uploads/2020/05/BC0E1AC1-6593-4395-9844-A7D2CB457028-300x219.png 300w" sizes="auto, (max-width: 374px) 100vw, 374px" /></a></figure></div>



<p>Following the article, I had lots of questions entering my twitter inbox, especially regarding how GPUs actually works.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img loading="lazy" decoding="async" width="410" height="157" src="https://www.ovh.com/blog/wp-content/uploads/2020/04/image.png" alt="" class="wp-image-17882" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/04/image.png 410w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/image-300x115.png 300w" sizes="auto, (max-width: 410px) 100vw, 410px" /><figcaption>Don&#8217;t worry it&#8217;s a friend, he is ok with me sharing the DM 😉</figcaption></figure></div>



<p>I decided, therefore, to write an article on how GPUs work:</p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><a href="https://www.ovh.com/blog/understanding-the-anatomy-of-gpus-using-pokemon/" data-wpel-link="exclude"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/05/EEAD0A02-DFCA-4745-802B-E36BC517EFED.png" alt="" class="wp-image-18103" width="334" height="254" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/05/EEAD0A02-DFCA-4745-802B-E36BC517EFED.png 668w, https://blog.ovhcloud.com/wp-content/uploads/2020/05/EEAD0A02-DFCA-4745-802B-E36BC517EFED-300x228.png 300w" sizes="auto, (max-width: 334px) 100vw, 334px" /></a></figure></div>



<p>During our R&amp;D process around hardware and AI models, the question of distributed training came up (quickly). But before looking in-depth at distributed training, I invite you to read the following article to understand how Deep Learning training actually works:</p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><a href="https://www.ovh.com/blog/what-does-training-neural-networks-mean/" data-wpel-link="exclude"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/04/81921ABA-7642-4CA2-87BF-9B2D92278BF1-1024x538.png" alt="What does training neural networks mean?" class="wp-image-17932" width="476" height="249" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/04/81921ABA-7642-4CA2-87BF-9B2D92278BF1-1024x538.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/81921ABA-7642-4CA2-87BF-9B2D92278BF1-300x158.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/81921ABA-7642-4CA2-87BF-9B2D92278BF1-768x404.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/81921ABA-7642-4CA2-87BF-9B2D92278BF1.png 1200w" sizes="auto, (max-width: 476px) 100vw, 476px" /></a></figure></div>



<p>As previously discussed, Neural Networks training depends on :</p>



<ul class="wp-block-list"><li>Input Data</li><li>Neural Network architecture composed of &#8216;Layers&#8217;</li><li>Weights</li><li>Learning Rate (step used to adjust neural network weights)</li></ul>



<h2 class="wp-block-heading">Why do we need distributed Learning</h2>



<p>Deep Learning is mainly used for non structured data pattern learning. <strong>Non structured data &#8211; such as text corpus, image, video or sound &#8211; can represent a huge amount of data to train on.</strong></p>



<p>Training such a library can take days or even weeks because of the size of data and/or the size of the network.</p>



<p>Multiple distributed learning approaches can be considered.</p>



<h2 class="wp-block-heading">The different Distributed Learning approaches</h2>



<p>There are two main categories for distributed training when it comes to Deep Learning and both of them are based on the <strong><a rel="noreferrer noopener nofollow external" href="https://en.wikipedia.org/wiki/Divide-and-conquer_algorithm" target="_blank" data-wpel-link="external">divide and conquer paradigm.</a></strong></p>



<p>The first category is named : <strong>&#8220;Distributed Data Parallelism&#8221;</strong> where the <strong>data is split across m</strong>ultiple GPUs.</p>



<p>The second category is called : <strong>&#8220;Model Parallelism&#8221;</strong> where the deep learning <strong>model is split across multiple GPUs</strong>.</p>



<p>However the <strong>Distributed Data Parallelism </strong>is the most common approach as it <strong>fits almost any problem</strong>. The second approach has some serious technical limitations in relation to model splitting. Splitting a model is a highly technical approach, as you need to know the space used by each part of the network into the <strong>DRAM</strong> of the GPU. Once you have the <strong>DRAM usage per slice</strong> you need to enforce the computation by <strong>hard coding Neural Network Layers placement onto the desired GPU</strong>. T<strong>his approach makes it hardware-related</strong>, as the DRAM may vary from one GPU to the other, while the <strong>Distributed Data Parallelism </strong>will just require <strong>data size adjustments (usually batch size) which is relatively simple</strong>.</p>



<p><strong>Distributed Data Parallelism</strong> model has two variants, each of which has its advantages and disadvantages. The first variant allows you to train a model with a<strong> synchronous weight adjustment.</strong> That is to say that <strong>each training batch in each GPU will return the corrections</strong> that need to be made to the model in order for it to be trained. And <strong>that it will have to wait until all the workers have finished their task to have a new set of weights </strong>so it can recognise this in the next training batch. </p>



<p>Whereas the second variant lets you work in an <strong>asynchronous way</strong>. This means each batch of each GPU will report the corrections that need to be made to the neural network. The<strong> weights coordinator </strong>will send a <strong>new set of weights</strong> w<strong>ithout waiting for the other GPUs to finish training their own </strong>batch.</p>



<h2 class="wp-block-heading">3 cheat sheets to better understand Distributed Deep Learning</h2>



<p>In this cheat sheets lets assume you&#8217;re using docker with a volume attached.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="942" height="1024" src="https://www.ovh.com/blog/wp-content/uploads/2020/04/type1-942x1024.png" alt="" class="wp-image-18048" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/04/type1-942x1024.png 942w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/type1-276x300.png 276w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/type1-768x835.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/type1.png 1004w" sizes="auto, (max-width: 942px) 100vw, 942px" /></figure>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="904" height="1024" src="https://www.ovh.com/blog/wp-content/uploads/2020/04/type2-904x1024.png" alt="" class="wp-image-18049" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/04/type2-904x1024.png 904w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/type2-265x300.png 265w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/type2-768x870.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/type2.png 945w" sizes="auto, (max-width: 904px) 100vw, 904px" /></figure>



<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="1543" height="2182" src="https://www.ovh.com/blog/wp-content/uploads/2020/04/distrib-training1.jpeg" alt="" class="wp-image-18036" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/04/distrib-training1.jpeg 1543w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/distrib-training1-212x300.jpeg 212w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/distrib-training1-724x1024.jpeg 724w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/distrib-training1-768x1086.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/distrib-training1-1086x1536.jpeg 1086w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/distrib-training1-1448x2048.jpeg 1448w" sizes="auto, (max-width: 1543px) 100vw, 1543px" /></figure>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/05/63EDA175-2E61-4AC2-9157-97C18A973B78.png" alt="" class="wp-image-18096" width="320" height="322" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/05/63EDA175-2E61-4AC2-9157-97C18A973B78.png 640w, https://blog.ovhcloud.com/wp-content/uploads/2020/05/63EDA175-2E61-4AC2-9157-97C18A973B78-150x150.png 150w" sizes="auto, (max-width: 320px) 100vw, 320px" /><figcaption>Now you need to choose your Distributed Training strategy (wisely)</figcaption></figure></div>



<p></p>



<h2 class="wp-block-heading">Further Readings</h2>



<p>While we have covered a lot in this blog post, we haven&#8217;t nearly covered all the aspects of Deep Learning distributed training &#8211; including prior work, history and associated mathematics.</p>



<p>I highly suggest that you read the great paper<em> <a href="https://stanford.edu/~rezab/classes/cme323/S16/projects_reports/hedge_usmani.pdf" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">Parallel and Distributed Deep Learning</a></em> by <strong>Vishakh Hegde</strong> and <strong>Sheema</strong> <strong>Usmani</strong> (both from Stanford University)</p>



<p>As well as the article <em><a href="https://arxiv.org/pdf/1802.09941.pdf" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis</a></em> written by <strong>Tal Ben-Nun </strong>and <strong>Torsten Hoefler</strong> ETH Zurich, Switzerland. I suggest that you start by jumping to <strong>section 6.3</strong>.</p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fdistributed-training-in-a-deep-learning-context%2F&amp;action_name=Distributed%20Training%20in%20a%20Deep%20Learning%20Context&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>What does Training Neural Networks mean?</title>
		<link>https://blog.ovhcloud.com/what-does-training-neural-networks-mean/</link>
		
		<dc:creator><![CDATA[Jean-Louis Queguiner]]></dc:creator>
		<pubDate>Wed, 22 Apr 2020 16:37:25 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Deep learning]]></category>
		<category><![CDATA[GPU]]></category>
		<category><![CDATA[Machine learning]]></category>
		<category><![CDATA[Neural networks]]></category>
		<guid isPermaLink="false">https://www.ovh.com/blog/?p=17859</guid>

					<description><![CDATA[In a previous blog post we discussed general concepts surrounding Deep Learning. In this blog post, we will go deeper into the basic concepts of training a (deep) Neural Network. Where does &#8220;Neural&#8221; comes from ? As you should know, a biological neuron is composed of multiple dendrites, a nucleus and a axon (if only [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fwhat-does-training-neural-networks-mean%2F&amp;action_name=What%20does%20Training%20Neural%20Networks%20mean%3F&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p>In a previous<a rel="noreferrer noopener" href="https://www.ovh.com/blog/deep-learning-explained-to-my-8-year-old-daughter/" target="_blank" data-wpel-link="exclude"> blog post</a> we discussed general concepts surrounding Deep Learning. In this blog post, we will go deeper into the basic concepts of training a (deep) Neural Network.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="538" src="https://www.ovh.com/blog/wp-content/uploads/2020/04/81921ABA-7642-4CA2-87BF-9B2D92278BF1-1024x538.png" alt="" class="wp-image-17932" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/04/81921ABA-7642-4CA2-87BF-9B2D92278BF1-1024x538.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/81921ABA-7642-4CA2-87BF-9B2D92278BF1-300x158.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/81921ABA-7642-4CA2-87BF-9B2D92278BF1-768x404.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/81921ABA-7642-4CA2-87BF-9B2D92278BF1.png 1200w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<h2 class="wp-block-heading">Where does &#8220;Neural&#8221; comes from ?</h2>



<p>As you should know, a <strong>biological neuron</strong> is composed of multiple <strong>dendrites</strong>, a <strong>nucleus</strong> and a <strong>axon</strong> (if only you had paid attention in your biology classes). When a stimuli is sent to the brain, it is received through the <strong>synapse</strong> located at the extremity of the dendrite.</p>



<p>When a <strong>stimuli</strong> arrives at the brain it is transmitted to the neuron via the <strong>synaptic receptors</strong> which<strong> adjust the strength of the signal sent to the nucleus</strong>. This message is <strong>transported</strong> by the <strong>dendrites</strong> to the <strong>nucleus</strong> to then be <strong>processed</strong> in <strong>combination</strong> with other signals emanating from other receptors on the other dendrites. T<strong>hus the combination of all these signals takes place in the nucleus.</strong> After processing all these signals, <strong>the nucleus will emit an output signal through its single axon</strong>. The axon will then stream this signal to several other downstream neurons via its <strong>axon terminations</strong>. Thus a neuron analysis is pushed in the subsequent layers of neurons. When you are confronted with the complexity and efficiency of this system, you can only imagine the millennia of biological evolution that brought us here.</p>



<p>On the other hand, <strong>artificial neural networks </strong>are built on the principle of bio-mimicry. <strong>External stimuli (the data), </strong>whose signal strength is adjusted by the <strong>neuronal weights </strong>(remember the <strong>synapse</strong>?, <strong>circulates to the neuron</strong> (place where the mathematical calculation will happen) via the dendrites. The result of the calculation &#8211; called the <strong>output</strong> &#8211; is then re-transmitted (via the axon) to several other neurons and then subsequent layers are combined, and so on.</p>



<p>Therefore, their is a clear parallel between biological neurons and artificial neural networks as presented in the figure below.</p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/04/9A5100D7-A350-46FA-B1EB-190CFE0E9AF6-699x1024.png" alt="" class="wp-image-17933" width="350" height="512" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/04/9A5100D7-A350-46FA-B1EB-190CFE0E9AF6-699x1024.png 699w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/9A5100D7-A350-46FA-B1EB-190CFE0E9AF6-205x300.png 205w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/9A5100D7-A350-46FA-B1EB-190CFE0E9AF6.png 747w" sizes="auto, (max-width: 350px) 100vw, 350px" /><figcaption>Basec on https://medium.com/swlh/learning-paradigms-in-neural-networks-30854975aa8d</figcaption></figure></div>



<h2 class="wp-block-heading">The Artificial Neural Network Recipe</h2>



<p>To build a good Artificial Neural Network (ANN) you will need the following ingredients</p>



<h4 class="wp-block-heading"> Ingredients:</h4>



<ul class="wp-block-list"><li><strong>Artificial Neurons</strong> (processing node) composed of:<ul><li>(many) <strong>input </strong>neuron(s) connection(s) (dendrites)</li><li>a <strong>computation unit </strong>(nucleus) composed of:<ul><li>a <strong>linear function</strong> (ax+b)</li><li>an <strong>activation function</strong> (equivalent to the the <strong>synapse</strong>)</li></ul></li><li>an <strong>output</strong> (axon)</li></ul></li></ul>



<h2 class="wp-block-heading">Preparation to get an ANN for image classification training:</h2>



<ol class="wp-block-list"><li>Decide on the<strong> number of output classes </strong>(meaning the number of image classes &#8211; for example two for cat vs dog) </li><li>Draw as many computation units as the <strong>number of output classes</strong> (congrats you just create the <strong>Output Layer</strong> of the ANN)</li><li>Add as many <strong>Hidden Layers</strong> as needed within the defined <strong>architecture</strong> (for instance <a rel="noreferrer noopener nofollow external" href="https://neurohive.io/en/popular-networks/vgg16/" target="_blank" data-wpel-link="external">vgg16</a> or <a rel="noreferrer noopener nofollow external" href="https://neurohive.io/en/popular-networks/" target="_blank" data-wpel-link="external">any other popular architecture</a>). Tip &#8211; <strong>Hidden Layers</strong> are just a set of neighboured <strong>Compute Units</strong>, they are not linked together.</li><li>Stack those<strong> Hidden Layers </strong>to the <strong>Output Layer</strong> using <strong>Neural Connections</strong></li><li>It is important to understand that the <strong>Input Layer</strong> is basically a layer of data ingestion</li><li>Add an <strong>Input Layer</strong> that is adapted to ingest your data (or you will adapt your data format to the pre-defined architecture)</li><li>Assemble many Artificial Neurons together in a way where the <strong>output</strong> (axon) an <strong>Neuron</strong> on a given <strong>Layer</strong> is (one) of the <strong>input</strong> of another <strong>Neuron</strong> on a subsequent <strong>Layer</strong>. As a consequence, the <strong>Input Layer </strong>is linked to the <strong>Hidden Layers </strong>which are then linked to the <strong>Output Layer</strong> (as shown in the picture below) using <strong>Neural Connections</strong> (also shown in the picture below).</li><li>Enjoy your meal</li></ol>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/04/9C2D3CB9-4385-4348-963A-3DF79E3C3C62.png" alt="" class="wp-image-17934" width="476" height="371" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/04/9C2D3CB9-4385-4348-963A-3DF79E3C3C62.png 951w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/9C2D3CB9-4385-4348-963A-3DF79E3C3C62-300x234.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/9C2D3CB9-4385-4348-963A-3DF79E3C3C62-768x598.png 768w" sizes="auto, (max-width: 476px) 100vw, 476px" /><figcaption>simplified schema of a neural network architecture</figcaption></figure></div>



<h2 class="wp-block-heading">What does it mean to train an Artificial Neural Network ?</h2>



<p>All <strong>Neurons</strong> of a given <strong>Layer</strong> are generating an <strong>Output</strong>, but they don&#8217;t have the same <strong>Weight</strong> for the next<strong> Neurons Layer</strong>. This means that if a Neuron on a layer observes a given pattern it might mean less for the overall picture and will be partially or completely muted. This is what we call <strong>Weighting</strong>: a <strong>big weight means that the Input is important</strong> and of course <strong>a small weight means that we should ignore it</strong>. Every <strong>Neural Connection</strong> between <strong>Neurons</strong> will have <strong>an associated Weight</strong>.</p>



<p>And this is the magic of<strong> Neural Network Adaptability</strong>: <strong>Weights</strong> will be adjusted over the training to fit the <strong>objectives</strong> we have set (recognize that a dog is a dog and that a cat is a cat). <strong>In simple terms: Training a Neural Network means finding the appropriate Weights of the Neural Connections thanks to a feedback loop called Gradient Backward propagation &#8230; and that&#8217;s it</strong> <strong>folks.</strong></p>



<h2 class="wp-block-heading">Parallel between Control Theory and Deep Learning Training</h2>



<p>The engineering field of <strong>control theory</strong> defines similar principles to the mechanism used for training neural networks.</p>



<h3 class="wp-block-heading">Control Theory general concepts</h3>



<p>In control systems, a <strong>setpoint</strong> is the target value<strong> for the system.</strong><br><br>A <strong>setpoint</strong> (<strong>input</strong>) is defined and then processed by a controller, which adjusts the setpoint&#8217;s value according to the feedback loop (<strong>Manipulated Variable</strong>). Once the <strong>setpoint</strong> has been <strong>adjusted</strong> it is then sent to the <strong>controlled system</strong> which will <strong>produce an output.</strong> This output is monitored using an appropriate metric which is then <strong>compared (comparator) to the original input </strong>via a <strong>feedback loop</strong>. This allows the <strong>controller</strong> to define the <strong>level of adjustment (Manipulated Variable) </strong>of the original setpoint.</p>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/04/E4ADE5AC-8E87-47D2-A575-90B7D703A512_1_201_a-1024x381.jpeg" alt="" class="wp-image-17889" width="512" height="191" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/04/E4ADE5AC-8E87-47D2-A575-90B7D703A512_1_201_a-1024x381.jpeg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/E4ADE5AC-8E87-47D2-A575-90B7D703A512_1_201_a-300x112.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/E4ADE5AC-8E87-47D2-A575-90B7D703A512_1_201_a-768x286.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/E4ADE5AC-8E87-47D2-A575-90B7D703A512_1_201_a-1536x572.jpeg 1536w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/E4ADE5AC-8E87-47D2-A575-90B7D703A512_1_201_a-2048x762.jpeg 2048w" sizes="auto, (max-width: 512px) 100vw, 512px" /></figure>



<h3 class="wp-block-heading">Control Theory applied to a radiator</h3>



<p>Let&#8217;s take the example of a <strong>resistance (controlled system) </strong>in a radiator. Imagine you decide to <strong>set the room temperature to 20 ° C (setpoint)</strong>. The radiator starts up, supplies the <strong>resistance</strong> with a <strong>certain intensity </strong>defined by the <strong>controller</strong>. A <strong>probe (thermometer) </strong>will then take the ambient temperature (<strong>feedback elements</strong>) which is then <strong>compared <strong>(comparator)</strong></strong> <strong>to the setpoint </strong>(desired temperature) and adjusts <strong>(controller)</strong> the electric intensity sent to the resistance. The adjustment of the new intensity is deployed via an <strong>incremental adjustment step.</strong></p>



<h3 class="wp-block-heading">Control Theory applied to Neural Network Training</h3>



<p>The training of a neuron network is similar to a radiator insofar as the controlled system is the cat or dog detection model.<br><br>The objective is no longer to have the minimum difference between the setpoint temperature and the actual temperature but to <strong>minimize the error (Loss) between the classification of the incoming data (a cat is a cat) and the one made by the neural network.</strong><br><br>In order to achieve this, the system will have to look at the <strong>input</strong> (<strong>setpoint</strong>) and <strong>compute an output </strong>(<strong>controlled system</strong>) based on the parameters defined in the algorithm. This phase is called the<strong> forward pass.</strong></p>



<p><br>Once the <strong>output</strong> has been calculated, the system will <strong>re-propagate the evaluation error</strong> using <strong>Gradient Retro-propagation </strong>(<strong>Feedback Elements</strong>). While the temperature difference between the setpoint and the thermometer was converted into electrical intensity for the radiator, here <strong>the system will adjust the weights of the different inputs into each neuron with a given step (learning rate)</strong>.</p>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/04/BFB7FD92-22D8-4600-A211-14A29E191A70_1_201_a-1024x383.jpeg" alt="" class="wp-image-17888" width="512" height="192" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/04/BFB7FD92-22D8-4600-A211-14A29E191A70_1_201_a-1024x383.jpeg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/BFB7FD92-22D8-4600-A211-14A29E191A70_1_201_a-300x112.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/BFB7FD92-22D8-4600-A211-14A29E191A70_1_201_a-768x288.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/BFB7FD92-22D8-4600-A211-14A29E191A70_1_201_a-1536x575.jpeg 1536w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/BFB7FD92-22D8-4600-A211-14A29E191A70_1_201_a-2048x767.jpeg 2048w" sizes="auto, (max-width: 512px) 100vw, 512px" /><figcaption>Parallel between electrical engineering controlled system and neural network training process</figcaption></figure>



<h2 class="wp-block-heading">One thing to consider: The Valley Problem</h2>



<p>When training the system, the backward propagation will lead the system to reduce the error it&#8217;s making to best fit the objectives you have set (finding that a dog is a dog&#8230;).</p>



<p>Choosing the learning rate at which you will adjust your weights (what one call<strong> adjustment step</strong> in <a rel="noreferrer noopener nofollow external" href="https://en.wikipedia.org/wiki/Control_theory" target="_blank" data-wpel-link="external">Control Theory</a>). </p>



<p>Just as is the case in control theory, the control system can face several issues if it is not designed correctly:</p>



<ul class="wp-block-list"><li>If the <strong>correction step (learning rate)</strong> is too small it will lead to a very slow convergence (i.e. it will take a very long time to get your room to 20°C&#8230;).</li><li>Too smaller <strong>learning rate</strong> can also lead to you being <strong>stuck in a local minima</strong></li><li>If the <strong>correction step (learning rate)</strong> is too high it will lead the system to never converge (beat around the bush) or worse (i.e. the radiator will oscillate between being either too hot or too cold)</li><li>The system could enter into a resonance state (<strong>divergence</strong>).</li></ul>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/04/866F67B0-DF11-485B-9CC4-0A1C50752625-787x1024.jpeg" alt="Why Training a Neural Network Is Hard" class="wp-image-17868" width="394" height="512" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/04/866F67B0-DF11-485B-9CC4-0A1C50752625-787x1024.jpeg 787w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/866F67B0-DF11-485B-9CC4-0A1C50752625-230x300.jpeg 230w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/866F67B0-DF11-485B-9CC4-0A1C50752625-768x1000.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/866F67B0-DF11-485B-9CC4-0A1C50752625-1180x1536.jpeg 1180w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/866F67B0-DF11-485B-9CC4-0A1C50752625-1574x2048.jpeg 1574w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/866F67B0-DF11-485B-9CC4-0A1C50752625-scaled.jpeg 1967w" sizes="auto, (max-width: 394px) 100vw, 394px" /></figure>



<h2 class="wp-block-heading">In the end Training an Artificial Neural Network (ANN) requires just a few steps:</h2>



<ol class="wp-block-list"><li>First an ANN will require a <strong>random weight initialization</strong></li><li>Split the dataset in <strong>batches</strong> <strong>(batch size)</strong></li><li>Send the batches 1 by 1 to the GPU</li><li>Calculate the<strong> forward pass</strong> (what would be the output with the current weights)</li><li>Compare the calculated output to the expected output <strong>(loss)</strong></li><li>Adjust the <strong>weights</strong> (using the <strong>learning rate</strong> increment or decrement) according to the <strong>backward pass (backward gradient propagation)</strong>.</li><li>Go back to square 2</li></ol>



<h2 class="wp-block-heading">Further notice</h2>



<p>That’s all folks, you are now all set to read our future blog post which focuses on <strong>Distributed Training in a Deep Learning Context.</strong></p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fwhat-does-training-neural-networks-mean%2F&amp;action_name=What%20does%20Training%20Neural%20Networks%20mean%3F&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Understanding the anatomy of GPUs using Pokémon</title>
		<link>https://blog.ovhcloud.com/understanding-the-anatomy-of-gpus-using-pokemon/</link>
		
		<dc:creator><![CDATA[Jean-Louis Queguiner]]></dc:creator>
		<pubDate>Wed, 13 Mar 2019 16:25:32 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Deep learning]]></category>
		<category><![CDATA[GPU]]></category>
		<guid isPermaLink="false">https://blog.ovh.com/fr/blog/?p=14482</guid>

					<description><![CDATA[Please welcome this beautiful new born in GPGPU Nvidia Family Ampere BLOG UPDATE FROM MAY 14, 2020 In the previous episode&#8230; In our previous blog post about&#160;Deep Learning, we explained that this technology is all about massive parallel matrix computation, and that these computations are simplistic operations: + and x. Fact 1:&#160; GPUs are good [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Funderstanding-the-anatomy-of-gpus-using-pokemon%2F&amp;action_name=Understanding%20the%20anatomy%20of%20GPUs%20using%20Pok%C3%A9mon&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow"><p>Please welcome this beautiful new born in GPGPU Nvidia Family <strong>Ampere</strong></p><cite>BLOG UPDATE FROM MAY 14, 2020</cite></blockquote>



<div class="wp-block-image"><figure class="aligncenter size-large"><img loading="lazy" decoding="async" width="581" height="854" src="https://www.ovh.com/blog/wp-content/uploads/2020/05/Capture-d’écran-2020-05-15-à-17.25.01.png" alt="" class="wp-image-18271" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/05/Capture-d’écran-2020-05-15-à-17.25.01.png 581w, https://blog.ovhcloud.com/wp-content/uploads/2020/05/Capture-d’écran-2020-05-15-à-17.25.01-204x300.png 204w" sizes="auto, (max-width: 581px) 100vw, 581px" /></figure></div>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/05/Capture-d’écran-2020-05-15-à-17.36.57-1024x750.png" alt="" class="wp-image-18277" width="768" height="563" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/05/Capture-d’écran-2020-05-15-à-17.36.57-1024x750.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/05/Capture-d’écran-2020-05-15-à-17.36.57-300x220.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/05/Capture-d’écran-2020-05-15-à-17.36.57-768x562.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/05/Capture-d’écran-2020-05-15-à-17.36.57.png 1146w" sizes="auto, (max-width: 768px) 100vw, 768px" /><figcaption>Congratulations</figcaption></figure></div>



<h3 class="wp-block-heading">In the previous <a href="https://www.ovh.com/fr/blog/deep-learning-explained-to-my-8-year-old-daughter/" rel="nofollow" data-wpel-link="exclude">episode</a>&#8230;</h3>



<p>In our previous blog post about&nbsp;<a href="https://www.ovh.com/fr/blog/deep-learning-explained-to-my-8-year-old-daughter/" rel="nofollow" data-wpel-link="exclude">Deep Learning,</a> we explained that this technology is all about massive parallel matrix computation, and that these computations are simplistic operations: + and x.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="537" src="https://www.ovh.com/blog/wp-content/uploads/2020/05/46A55CAC-42D2-4782-B6D8-03F9A8C49C40-1024x537.png" alt="" class="wp-image-18283" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/05/46A55CAC-42D2-4782-B6D8-03F9A8C49C40-1024x537.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/05/46A55CAC-42D2-4782-B6D8-03F9A8C49C40-300x157.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/05/46A55CAC-42D2-4782-B6D8-03F9A8C49C40-768x403.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/05/46A55CAC-42D2-4782-B6D8-03F9A8C49C40.png 1200w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure></div>



<h3 class="wp-block-heading">Fact 1:&nbsp; GPUs are good for (drum roll)&#8230;</h3>



<p>Once you get that Deep Learning is just massive parallel matrix multiplications and additions, the magic happens. General Purpose Graphic Processing Units (GPGPU) (i.e. GPUs, or variants of GPUs, designed for something other than graphic processing) are perfect for&#8230;</p>



<div class="wp-block-image"><figure class="aligncenter"><img loading="lazy" decoding="async" width="475" height="263" src="/blog/wp-content/uploads/2019/02/ComplicatedOldIcterinewarbler-size_restricted.gif" alt="" class="wp-image-14672"/></figure></div>



<p>matrix multiplications and additions !</p>



<p>Perfect isn&#8217;t it ? But why ? Let me tell you a little story</p>



<h3 class="wp-block-heading">Fact 2: There was a time when GPUs were just GPUs</h3>



<p>Yes, you read that correctly&#8230;</p>



<p>The first GPUs in the 90s were designed in a very linear way. The engineer took the engineering process used for graphical rendering and implemented it into the hardware.</p>



<p>To keep it simple, this is what a graphical rendering process looks like:</p>



<div class="wp-block-image"><figure class="aligncenter is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2019/03/IMG_0148-1024x841.png" alt="Graphical rendering process" class="wp-image-15125" width="768" height="631" srcset="https://blog.ovhcloud.com/wp-content/uploads/2019/03/IMG_0148-1024x841.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/IMG_0148-300x246.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/IMG_0148-768x630.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/IMG_0148-1200x985.png 1200w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/IMG_0148.png 1871w" sizes="auto, (max-width: 768px) 100vw, 768px" /></figure></div>



<p>Uses for GPUs included transformation, building lighting effects, building triangle setups and clipping, and integrating rendering engines at a scale that was not achievable at the time (tens of millions of polygons per second).</p>



<p>The first GPUs integrated the various steps of image processing and rendering in a linear way. Each part of the process had predefined hardware components associated with vertex shaders, tessellation modules, geometry shaders, etc.</p>



<p>In short, graphics cards were initially designed &nbsp;to perform graphical processing. What a surprise!</p>



<h3 class="wp-block-heading">Fact 3: CPUs are sports cars, GPUs are massive trucks</h3>



<p>As explained earlier, for image processing and rendering, you don&#8217;t want your image being generated pixel per pixel – you want it in a single shot. That means that every pixel of the image – representing every object pointed in the camera, at a given time, in a given position – needs to be calculated at once.</p>



<p>It&#8217;s a complete contrast with <strong>CPU</strong> logic, where operations are meant to be achieved in a sequential way. As a result, <strong>GPGPUs</strong> needed a massively parallel general-purpose architecture to be able to process all the points (vertex), build all the meshes (tessellation), build the lighting, perform the object transformation from the absolute referential, apply texture, and perform shading (I&#8217;m still probably missing some parts!). However, the purpose of this blog post is not to look in-depth at image processing and rendering, as we will do that in another blog post in the future.</p>



<p>As explained in our previous post, CPUs are like sports cars, able to calculate a chunk of data really fast with minimal latency, while GPUs are trucks, moving lots of data at once, but suffering from latency as a result.</p>



<p>Here is a nice video from Mythbusters, where the two concepts of CPU and GPU are explained:</p>


<div class="lazyblock-youtube-gdpr-compliant-ZqsNoD wp-block-lazyblock-youtube-gdpr-compliant"><script type="module">
  import 'https://blog.ovhcloud.com/wp-content/assets/ovhcloud-gdrp-compliant-embedding-widgets/src/ovhcloud-gdrp-compliant-spreaker.js';
</script>
      
      <ovhcloud-gdrp-compliant-spreaker
          spreaker=""
          debug></ovhcloud-gdrp-compliant-spreaker> 

</div>


<h3 class="wp-block-heading">Fact 4: 2006&nbsp;–&nbsp;NVIDIA killed the image processing Taylorism</h3>



<div class="wp-block-image"><figure class="aligncenter"><img decoding="async" src="https://thumbs.gfycat.com/DirtyHastyAustrianpinscher-size_restricted.gif" alt="RÃ©sultat de recherche d'images pour &quot;temps modern gif&quot;"/></figure></div>



<p>The previous method for performing image processing was done using specialised manpower (hardware) at every stage of the production line in the image factory.</p>



<p>This all changed in 2006, when NVIDIA decided to introduce General Purpose Graphical Processing Units using Arithmetic Logical Units (ALUs), aka CUDA cores, which were able to run multi-purpose computations (a bit like a Jean-Claude Van Damme of GPU computation units!).</p>



<div class="wp-block-image"><figure class="aligncenter"><img decoding="async" src="https://media.lelombrik.net/t/64037621a78f86abb4c7c4e53a6c2b89/p/01.gif" alt=""/><figcaption>GoDaddy Commercial (2013) featuring Jean-Claude Van Damme Source : https://imgur.com/r/gifs/PvuZxBZ</figcaption></figure></div>



<p>Even today,&nbsp;<a href="https://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf" rel="nofollow external noopener noreferrer" data-wpel-link="external" target="_blank">modern GPU architectures</a> (such as Fermi, Kepler or Volta) are composed of non-general cores, named Special Function Units (SFUs), &nbsp;to run high-performance mathematical graphical operations, such as sin, cosine, reciprocal, and square root, as well as Texture Mapping Units (TMUs) for the high-dimension matrix operations involved in image texture mapping.</p>



<h3 class="wp-block-heading">Fact 5: GPGPUs can be explained simply with Pokémon!</h3>



<p>GPU architectures can seem difficult to understand at first, but trust me&#8230; they are not!</p>



<p>Here is my gift to you: a <a href="https://bulbapedia.bulbagarden.net/wiki/Pok%C3%A9dex" rel="nofollow external noopener noreferrer" data-wpel-link="external" target="_blank">Pokédex</a> to help you understand GPUs in simple terms.</p>



<h3 class="wp-block-heading">The&nbsp;<em>Micro-Architecture </em>Family</h3>



<h4 class="wp-block-heading">Here&#8217;s how you use it&#8230;</h4>



<p>You basically have four families of cards:</p>



<p>This family will already be known to many of you. We are, of course, talking about Fermi, Maxwell, Kepler, Volta, Ampere etc.</p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/05/gpu-families-387x1024.jpg" alt="" class="wp-image-18293" width="387" height="1024" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/05/gpu-families-387x1024.jpg 387w, https://blog.ovhcloud.com/wp-content/uploads/2020/05/gpu-families-113x300.jpg 113w, https://blog.ovhcloud.com/wp-content/uploads/2020/05/gpu-families-768x2032.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/05/gpu-families-581x1536.jpg 581w, https://blog.ovhcloud.com/wp-content/uploads/2020/05/gpu-families-774x2048.jpg 774w, https://blog.ovhcloud.com/wp-content/uploads/2020/05/gpu-families-scaled.jpg 968w" sizes="auto, (max-width: 387px) 100vw, 387px" /><figcaption>A beautiful picture of new born with all the other familier</figcaption></figure></div>



<h4 class="wp-block-heading">The <em>Architecture</em> Family</h4>



<p>This is the center, where the magic happens: orchestration, cache, workload scheduling&#8230; It&#8217;s the brain of the GPU.</p>



<figure class="wp-block-table"><table><tbody><tr><td><figure><img loading="lazy" decoding="async" class="aligncenter wp-image-15084" src="/blog/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.29.56.png" alt="" width="200" height="263" srcset="https://blog.ovhcloud.com/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.29.56.png 764w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.29.56-228x300.png 228w" sizes="auto, (max-width: 200px) 100vw, 200px" /></figure></td><td><figure><img loading="lazy" decoding="async" class="aligncenter wp-image-15083" src="/blog/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.29.58.png" alt="" width="200" height="263" srcset="https://blog.ovhcloud.com/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.29.58.png 764w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.29.58-228x300.png 228w" sizes="auto, (max-width: 200px) 100vw, 200px" /></figure></td><td><figure><img loading="lazy" decoding="async" class="aligncenter wp-image-15082" src="/blog/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.30.00.png" alt="" width="200" height="263" srcset="https://blog.ovhcloud.com/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.30.00.png 764w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.30.00-228x300.png 228w" sizes="auto, (max-width: 200px) 100vw, 200px" /></figure></td></tr></tbody></table></figure>



<h4 class="wp-block-heading">The <em>Multi-Core Units</em>&nbsp;<i>(aka CUDA Cores)&nbsp;</i>Family</h4>



<p>This represents the physical core, where the maths computations actually happen.</p>



<figure class="wp-block-table"><table><tbody><tr><td><figure><img loading="lazy" decoding="async" class="aligncenter wp-image-15081" src="/blog/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.30.02.png" alt="" width="200" height="263" srcset="https://blog.ovhcloud.com/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.30.02.png 764w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.30.02-228x300.png 228w" sizes="auto, (max-width: 200px) 100vw, 200px" /></figure></td><td><figure><img loading="lazy" decoding="async" class="aligncenter wp-image-15080" src="/blog/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.30.04.png" alt="" width="200" height="263" srcset="https://blog.ovhcloud.com/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.30.04.png 764w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.30.04-228x300.png 228w" sizes="auto, (max-width: 200px) 100vw, 200px" /></figure></td><td><figure><img loading="lazy" decoding="async" class="aligncenter wp-image-15079" src="/blog/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.30.06.png" alt="" width="200" height="263" srcset="https://blog.ovhcloud.com/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.30.06.png 764w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.30.06-228x300.png 228w" sizes="auto, (max-width: 200px) 100vw, 200px" /></figure></td></tr><tr><td><figure><img loading="lazy" decoding="async" class="aligncenter wp-image-15078" src="/blog/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.30.09.png" alt="" width="200" height="263" srcset="https://blog.ovhcloud.com/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.30.09.png 764w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.30.09-228x300.png 228w" sizes="auto, (max-width: 200px) 100vw, 200px" /></figure></td><td><figure><img loading="lazy" decoding="async" class="aligncenter wp-image-15077" src="/blog/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.30.12.png" alt="" width="200" height="263" srcset="https://blog.ovhcloud.com/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.30.12.png 764w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.30.12-228x300.png 228w" sizes="auto, (max-width: 200px) 100vw, 200px" /></figure></td><td><figure><img loading="lazy" decoding="async" class="aligncenter wp-image-15076" src="/blog/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.30.14.png" alt="" width="200" height="263" srcset="https://blog.ovhcloud.com/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.30.14.png 764w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.30.14-228x300.png 228w" sizes="auto, (max-width: 200px) 100vw, 200px" /></figure></td></tr><tr><td><figure><img loading="lazy" decoding="async" class="aligncenter wp-image-15075" src="/blog/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.30.16.png" alt="" width="200" height="263" srcset="https://blog.ovhcloud.com/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.30.16.png 764w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.30.16-228x300.png 228w" sizes="auto, (max-width: 200px) 100vw, 200px" /></figure></td><td><figure><img loading="lazy" decoding="async" class="aligncenter wp-image-15074" src="/blog/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.30.26.png" alt="" width="200" height="263" srcset="https://blog.ovhcloud.com/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.30.26.png 764w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.30.26-228x300.png 228w" sizes="auto, (max-width: 200px) 100vw, 200px" /></figure></td><td>&nbsp;</td></tr></tbody></table></figure>



<h4 class="wp-block-heading">The<em>&nbsp;Programming Model</em> Family</h4>



<p>The different layers of the programming model are used to abstract the GPU&#8217;s parallel computation for a programmer. It also makes the code portable to any GPU architecture.</p>



<figure class="wp-block-table"><table><tbody><tr><td><figure><img loading="lazy" decoding="async" class="aligncenter wp-image-15089" src="/blog/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.29.46.png" alt="" width="342" height="450" srcset="https://blog.ovhcloud.com/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.29.46.png 764w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.29.46-228x300.png 228w" sizes="auto, (max-width: 342px) 100vw, 342px" /></figure></td><td><figure><img loading="lazy" decoding="async" class="aligncenter wp-image-15088" src="/blog/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.29.48.png" alt="" width="352" height="464" srcset="https://blog.ovhcloud.com/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.29.48.png 764w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.29.48-228x300.png 228w" sizes="auto, (max-width: 352px) 100vw, 352px" /></figure></td><td><figure><img loading="lazy" decoding="async" class="aligncenter wp-image-15087" src="/blog/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.29.49.png" alt="" width="346" height="456" srcset="https://blog.ovhcloud.com/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.29.49.png 764w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.29.49-228x300.png 228w" sizes="auto, (max-width: 346px) 100vw, 346px" /></figure></td></tr><tr><td><figure><img loading="lazy" decoding="async" class="aligncenter wp-image-15086" src="/blog/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.29.51.png" alt="" width="352" height="464" srcset="https://blog.ovhcloud.com/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.29.51.png 764w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.29.51-228x300.png 228w" sizes="auto, (max-width: 352px) 100vw, 352px" /></figure></td><td><figure><img loading="lazy" decoding="async" class="aligncenter wp-image-15085" src="/blog/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.29.53.png" alt="" width="344" height="453" srcset="https://blog.ovhcloud.com/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.29.53.png 764w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/Screen-Shot-2019-03-08-at-14.29.53-228x300.png 228w" sizes="auto, (max-width: 344px) 100vw, 344px" /></figure></td></tr></tbody></table></figure>



<h3 class="wp-block-heading">How to play</h3>



<ol class="wp-block-list"><li>Start by choosing a card from the <em>Micro-Architecture</em> family</li><li>Look at the components, and choose the appropriate card from the <em>Architecture</em>&nbsp;family</li><li>Look at the components within the<em> Micro-Architecture</em> family and pick them from the <i>Multi-Core Units </i>family, then place them under the <em>Architecture</em>&nbsp;card</li><li>Now, if you want to know how to program a GPU, place the <i>Programming Model &#8211; Multi-Core Units</i>&nbsp;special card on top of the&nbsp;<em>Multi-Core Units&nbsp;</em>cards</li><li>Finally, on top of the <i>Programming Model &#8211; Multi-Core Units </i>special card, place all the <i>Programming Model&nbsp;</i>cards near the <em>SM</em></li><li>You then should have something that look like this:</li></ol>



<div class="wp-block-image"><figure class="aligncenter"><img loading="lazy" decoding="async" width="4032" height="3024" src="/blog/wp-content/uploads/2019/03/IMG_2139.jpg" alt="" class="wp-image-15108" srcset="https://blog.ovhcloud.com/wp-content/uploads/2019/03/IMG_2139.jpg 4032w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/IMG_2139-300x225.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/IMG_2139-768x576.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/IMG_2139-1024x768.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/IMG_2139-1200x900.jpg 1200w" sizes="auto, (max-width: 4032px) 100vw, 4032px" /></figure></div>



<h3 class="wp-block-heading">Examples of card configurations:</h3>



<h4 class="wp-block-heading">Fermi</h4>



<div class="wp-block-image"><figure class="aligncenter"><img loading="lazy" decoding="async" width="4032" height="3024" src="/blog/wp-content/uploads/2019/03/IMG_2148.jpg" alt="" class="wp-image-15098" srcset="https://blog.ovhcloud.com/wp-content/uploads/2019/03/IMG_2148.jpg 4032w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/IMG_2148-300x225.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/IMG_2148-768x576.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/IMG_2148-1024x768.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/IMG_2148-1200x900.jpg 1200w" sizes="auto, (max-width: 4032px) 100vw, 4032px" /></figure></div>



<h4 class="wp-block-heading">Kepler</h4>



<div class="wp-block-image"><figure class="aligncenter"><img loading="lazy" decoding="async" width="4032" height="3024" src="/blog/wp-content/uploads/2019/03/IMG_2147.jpg" alt="" class="wp-image-15100" srcset="https://blog.ovhcloud.com/wp-content/uploads/2019/03/IMG_2147.jpg 4032w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/IMG_2147-300x225.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/IMG_2147-768x576.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/IMG_2147-1024x768.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/IMG_2147-1200x900.jpg 1200w" sizes="auto, (max-width: 4032px) 100vw, 4032px" /></figure></div>



<h4 class="wp-block-heading">Maxwell</h4>



<div class="wp-block-image"><figure class="aligncenter"><img loading="lazy" decoding="async" width="4032" height="3024" src="/blog/wp-content/uploads/2019/03/IMG_2146.jpg" alt="" class="wp-image-15101" srcset="https://blog.ovhcloud.com/wp-content/uploads/2019/03/IMG_2146.jpg 4032w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/IMG_2146-300x225.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/IMG_2146-768x576.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/IMG_2146-1024x768.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/IMG_2146-1200x900.jpg 1200w" sizes="auto, (max-width: 4032px) 100vw, 4032px" /></figure></div>



<h4 class="wp-block-heading">Pascal</h4>



<div class="wp-block-image"><figure class="aligncenter"><img loading="lazy" decoding="async" width="4032" height="3024" src="/blog/wp-content/uploads/2019/03/IMG_2143.jpg" alt="" class="wp-image-15104" srcset="https://blog.ovhcloud.com/wp-content/uploads/2019/03/IMG_2143.jpg 4032w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/IMG_2143-300x225.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/IMG_2143-768x576.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/IMG_2143-1024x768.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/IMG_2143-1200x900.jpg 1200w" sizes="auto, (max-width: 4032px) 100vw, 4032px" /></figure></div>



<h4 class="wp-block-heading">Volta</h4>



<div class="wp-block-image"><figure class="aligncenter"><img loading="lazy" decoding="async" width="4032" height="3024" src="/blog/wp-content/uploads/2019/03/IMG_2140.jpg" alt="" class="wp-image-15107" srcset="https://blog.ovhcloud.com/wp-content/uploads/2019/03/IMG_2140.jpg 4032w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/IMG_2140-300x225.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/IMG_2140-768x576.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/IMG_2140-1024x768.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/IMG_2140-1200x900.jpg 1200w" sizes="auto, (max-width: 4032px) 100vw, 4032px" /></figure></div>



<h4 class="wp-block-heading">Turing</h4>



<div class="wp-block-image"><figure class="aligncenter"><img loading="lazy" decoding="async" width="4032" height="3024" src="/blog/wp-content/uploads/2019/03/IMG_2145.jpg" alt="" class="wp-image-15102" srcset="https://blog.ovhcloud.com/wp-content/uploads/2019/03/IMG_2145.jpg 4032w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/IMG_2145-300x225.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/IMG_2145-768x576.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/IMG_2145-1024x768.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/IMG_2145-1200x900.jpg 1200w" sizes="auto, (max-width: 4032px) 100vw, 4032px" /></figure></div>



<p><br>After playing around with different&nbsp;<em>Micro-Architectures</em>, <em>Architectures</em> and <em>Multi-Core Units</em> for a bit, you should see that GPUs are just as simple as Pokémon!</p>



<p>Enjoy the attached PDF, which will allow you to print your own GPU Pokédex.&nbsp;You can download it here: <a href="https://www.ovh.com/blog/wp-content/uploads/2020/05/GPU-Cards-1.pdf" target="_blank" rel="noreferrer noopener" data-wpel-link="exclude">GPU Cards Game</a></p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Funderstanding-the-anatomy-of-gpus-using-pokemon%2F&amp;action_name=Understanding%20the%20anatomy%20of%20GPUs%20using%20Pok%C3%A9mon&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Deep Learning explained to my 8-year-old daughter</title>
		<link>https://blog.ovhcloud.com/deep-learning-explained-to-my-8-year-old-daughter/</link>
		
		<dc:creator><![CDATA[Jean-Louis Queguiner]]></dc:creator>
		<pubDate>Fri, 15 Feb 2019 14:56:56 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Deep learning]]></category>
		<category><![CDATA[GPU]]></category>
		<category><![CDATA[Machine learning]]></category>
		<category><![CDATA[Neural networks]]></category>
		<guid isPermaLink="false">https://blog.ovh.com/fr/blog/?p=14481</guid>

					<description><![CDATA[Machine Learning and especially Deep Learning&#160;are hot topics and you are sure to have come across the buzzword &#8220;Artificial Intelligence&#8221; in the media. Yet these are not new concepts. The first Artificial Neural Network (ANN) was introduced in the 40s. So why all the recent interest around neural networks&#160;and Deep Learning?&#160; We will explore this [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fdeep-learning-explained-to-my-8-year-old-daughter%2F&amp;action_name=Deep%20Learning%20explained%20to%20my%208-year-old%20daughter&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p><strong>Machine Learning</strong> and especially <strong><a href="https://www.kdnuggets.com/2016/01/seven-steps-deep-learning.html" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Deep Learning</a></strong>&nbsp;are hot topics and you are sure to have come across the buzzword &#8220;Artificial Intelligence&#8221; in the media.</p>



<div class="wp-block-image"><figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="885" height="508" src="https://www.ovh.com/blog/wp-content/uploads/2019/02/IMG_0057.jpg" alt="Deep Learning: A new hype" class="wp-image-14620" srcset="https://blog.ovhcloud.com/wp-content/uploads/2019/02/IMG_0057.jpg 885w, https://blog.ovhcloud.com/wp-content/uploads/2019/02/IMG_0057-300x172.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2019/02/IMG_0057-768x441.jpg 768w" sizes="auto, (max-width: 885px) 100vw, 885px" /></figure></div>



<p>Yet these are not new concepts. The first <strong>Artificial Neural Network</strong> (ANN) was introduced in the 40s. So why all the recent interest around neural networks&nbsp;and Deep Learning?<strong>&nbsp;</strong></p>



<p>We will explore this and other concepts in a series of blog posts on&nbsp;<strong>GPUs and Machine Learning</strong>.</p>



<h2 class="wp-block-heading"><strong>YABAIR &#8211; Yet Another Blog About Image Recognition</strong></h2>



<p>In the 80s, I remember my father building character recognition for bank checks. He used primitives and derivatives around pixel darkness level. Examining so many different types of handwriting was a real pain because he needed one equation to apply to all the variations.</p>



<p>In the last few years, It has become clear that the best way to deal with this type of problem is through Convolutional Neural Networks. Equations designed by humans are no longer fit to handle infinite handwriting patterns.</p>



<p>Let&#8217;s take a look at one of the most classic examples: building a number recognition system, a neural network to recognise handwritten digits.</p>



<h3 class="wp-block-heading">Fact 1: It&#8217;s as simple as counting</h3>



<p>We&#8217;ll start by counting how many times the small red shapes in the top row can be seen in each of the black, hand-written digits, (in the left-hand column).</p>



<div class="wp-block-image wp-image-14651"><figure class="aligncenter size-full is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2019/02/IMG_0067.jpg" alt="Simplified matrix for handwritten numbers" class="wp-image-14651" width="337" height="400" srcset="https://blog.ovhcloud.com/wp-content/uploads/2019/02/IMG_0067.jpg 674w, https://blog.ovhcloud.com/wp-content/uploads/2019/02/IMG_0067-253x300.jpg 253w" sizes="auto, (max-width: 337px) 100vw, 337px" /><figcaption>Simplified matrix for handwritten numbers</figcaption></figure></div>



<p>Now let&#8217;s try to recognise (infer) a new hand-written digit, by counting the number of matches with the same red shapes. We&#8217;ll then compare this to our previous table, in order to identify which number has the most correspondences:</p>



<div class="wp-block-image size-medium wp-image-14652"><figure class="aligncenter size-full is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2019/02/IMG_0069.jpg" alt="Matching shapes for handwritten numbers " class="wp-image-14652" width="443" height="400" srcset="https://blog.ovhcloud.com/wp-content/uploads/2019/02/IMG_0069.jpg 885w, https://blog.ovhcloud.com/wp-content/uploads/2019/02/IMG_0069-300x271.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2019/02/IMG_0069-768x693.jpg 768w" sizes="auto, (max-width: 443px) 100vw, 443px" /><figcaption>Matching shapes for handwritten numbers</figcaption></figure></div>



<p>Congratulations! You&#8217;ve just built the world&#8217;s simplest neural network system for recognising hand-written digits.</p>



<h3 class="wp-block-heading">Fact 2: An image is just a matrix</h3>



<p class="graf graf--p">A computer views an image as a&nbsp;<strong>matrix</strong>. A black and white image is a 2D matrix.</p>



<p>Let&#8217;s consider an image. To keep it simple, let&#8217;s take a small black and white image of an 8, with square dimensions of 28 pixels.</p>



<p>Every cell of the matrix represents the intensity of the pixel from 0 (which represents black), to 255 (which represents a pure white pixel).</p>



<p>The image will therefore be represented as the following 28 x 28 pixel matrix.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img loading="lazy" decoding="async" width="723" height="504" src="https://www.ovh.com/blog/wp-content/uploads/2020/06/1_cLsTCWtUL1GYBUv8vnbOxw.jpeg" alt="Image of a handwritten 8 and the associated intensity matrix" class="wp-image-18492" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/06/1_cLsTCWtUL1GYBUv8vnbOxw.jpeg 723w, https://blog.ovhcloud.com/wp-content/uploads/2020/06/1_cLsTCWtUL1GYBUv8vnbOxw-300x209.jpeg 300w" sizes="auto, (max-width: 723px) 100vw, 723px" /><figcaption>Image of a handwritten 8 and the associated intensity matrix</figcaption></figure></div>



<h3 class="wp-block-heading">Fact 3: Convolutional layers are just bat-signals</h3>



<p class="graf graf--p">To work out which pattern is displayed in a picture (in this case the handwritten 8) we will use a kind of bat-signal/flashlight. In machine learning, the flashlight is called a filter. The filter is used to perform a classic convolution matrix calculation used in usual image processing software such as&nbsp;<a href="https://docs.gimp.org/2.8/en/plug-in-convmatrix.html" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Gimp.</a></p>



<div class="wp-block-image"><figure class="aligncenter"><img decoding="async" src="https://media2.giphy.com/media/l0NwGpoOVLTAyUJSo/giphy.gif" alt="RÃ©sultat de recherche d'images pour &quot;batman torch light sky gif&quot;"/></figure></div>



<p>The filter will <strong>scan the picture</strong>&nbsp; in order to <strong>find the pattern</strong> in the image and will trigger a <strong>positive feedback</strong> if a match is found. It works a bit like a toddler shape sorting box: triangle filter matching triangle hole, square filter matching square hole and so on.</p>



<div class="wp-block-image"><figure class="aligncenter"><img decoding="async" src="https://multimedia.bbycastatic.ca/multimedia/products/500x500/103/10319/10319838.jpg" alt="Image filters work like children shape sorting boxes"/><figcaption>Image filters work like children shape sorting boxes</figcaption></figure></div>



<h3 class="wp-block-heading">Fact 4: Filter matching is an embarrassingly&nbsp;parallel task</h3>



<p class="graf graf--p">To be more scientific the image filtering process looks a bit like the animation below. As you can see, <strong>every step</strong> of the filter scanning is <strong>independent</strong>, which means that this task can be <strong>highly parallelised</strong>.</p>



<p>It&#8217;s important to note that <strong>tens of filters</strong>&nbsp;will be applied at the same time,&nbsp;<strong>in parallel</strong> as none of them are dependent.</p>



<div class="wp-block-image"><figure class="aligncenter"><a href="https://cdn-images-1.medium.com/max/800/0*rKUDc--RZg1v66wq" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><img decoding="async" src="https://cdn-images-1.medium.com/max/800/0*rKUDc--RZg1v66wq" alt="Convolution Filter over an input image"/></a><figcaption>https://github.com/vdumoulin</figcaption></figure></div>



<h3 class="wp-block-heading"><strong><strong><br></strong></strong>Fact 5: Just repeat the filtering operation (matrix convolution) as many times as possible</h3>



<p>We just saw that the input image/matrix is filtered using multiple matrix convolutions.</p>



<p>To improve the accuracy of the image recognition just take the filtered image from the previous operation and filter again and again and again&#8230;</p>



<p>Of course, we are oversimplifying things somewhat, but generally the more filters you apply, and the more you repeat this operation in sequence, the more precise your results will be.</p>



<p>It&#8217;s like creating new abstraction layers to get a clearer and clearer object filter description, starting from primitive filters to filters that look like edges, wheel, squares, cubes, &#8230;</p>



<p><strong style="font-size: 23px;">Fact 6: Matrix convolutions are just <em>x</em>&nbsp;and <em>+</em></strong></p>



<p>An image is worth a thousand words: the following picture is a simplistic view of a source image (8×8) filtered with a convolution filter (3×3). The projection of the torch light (in this example a Sobel Gx Filter) provides one value.</p>



<div class="wp-block-image"><figure class="aligncenter"><img decoding="async" src="https://i.stack.imgur.com/YDusp.png" alt=""/><figcaption>Example of a convolution filter (Sobel Gx) applied to an input matrix (Source : https://datascience.stackexchange.com/questions/23183/why-convolutions-always-use-odd-numbers-as-filter-size/23186)</figcaption></figure></div>



<p>This is where the magic happens, simple matrix operations are highly parallelised which fits perfectly with a General Purpose Graphical Processing Unit use case.</p>



<h3 class="wp-block-heading">Fact 7: Need to simplify and summarise what&#8217;s been detected? Just use max()</h3>



<p class="graf graf--figure">We need to <strong>summarise</strong>&nbsp;what&#8217;s been detected by the filters in order to <strong>generalise the knowledge</strong>.</p>



<p class="graf graf--figure">To do so, we will sample the output of the previous filtering operation.</p>



<p class="graf graf--figure">This operation is call&nbsp;<strong>pooling</strong>&nbsp;or <strong>downsampling&nbsp;</strong>but in fact it&#8217;s about reducing the size of the matrix.</p>



<p class="graf graf--figure">You can use any reducing operation such as: max, min, average, count, median, sum and so on.</p>



<div class="wp-block-image"><figure class="aligncenter"><img decoding="async" src="https://qph.fs.quoracdn.net/main-qimg-8afedfb2f82f279781bfefa269bc6a90.webp" alt=""/><figcaption>Example of a max pooling layer (Source : Stanford&#8217;s CS231n)</figcaption></figure></div>



<h3 class="wp-block-heading">Fact 8: Flatten everything to get on your feet</h3>



<p>Let&#8217;s not forget the main purpose of the neural network we are working on: building an image recognition system, also called <strong>image classification</strong>.</p>



<p>If the purpose of the neural network is to detect hand-written digits there will be 10 classes at the end to map the input image to : [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]</p>



<p>To map this input to a class after passing through all those filters and downsampling layers, we will have just 10 neurons (each of them representing a class) and each will connect to the last sub sampled layer.</p>



<p>Below is an overview of the original LeNet-5 Convolutional Neural Network designed by <a href="https://en.wikipedia.org/wiki/Yann_LeCun" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Yann Lecun</a>&nbsp; one of the few early adopter of this technology for image recognition.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img loading="lazy" decoding="async" width="713" height="213" src="https://www.ovh.com/blog/wp-content/uploads/2020/06/Architecture-of-CNN-by-LeCun-et-al-LeNet5.png" alt="" class="wp-image-18491" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/06/Architecture-of-CNN-by-LeCun-et-al-LeNet5.png 713w, https://blog.ovhcloud.com/wp-content/uploads/2020/06/Architecture-of-CNN-by-LeCun-et-al-LeNet5-300x90.png 300w" sizes="auto, (max-width: 713px) 100vw, 713px" /><figcaption>LeNet-5 architecture published in the original paper (source : http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf).</figcaption></figure></div>



<h3 class="graf graf--figure wp-block-heading"><b>Fact 9: Deep Learning is just LEAN &#8211; continuous&nbsp;improvement based on a feedback loop</b></h3>



<p class="graf graf--figure">The beauty of the technology does not only come from the convolution but from the capacity of the network to learn and adapt by itself. By implementing a feedback loop called&nbsp;<em><strong><a href="https://en.wikipedia.org/wiki/Backpropagation" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">backpropagation</a>&nbsp;</strong></em>the network will mitigate and&nbsp;inhibit some &#8220;neurons&#8221; in the different layers using&nbsp;<span style="text-decoration: underline;"><em><strong><a href="https://www.quora.com/What-does-weight-mean-in-terms-of-neural-networks" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">weights</a></strong></em></span><em>.&nbsp;</em></p>



<p class="graf graf--figure">Let&#8217;s KISS (keep it simple): we look at the output of the network, if the guess (the output 0,1,2,3,4,5,6,7,8 or 9) is wrong, we look at which filter(s) &#8220;made a mistake&#8221;, we give this filter or filters a small weight so they will not make the same mistake next time. And voila! The system learns and keeps improving itself.</p>



<h3 class="wp-block-heading"><b>Fact 10: It all amounts to the fact that Deep Learning is embarrassingly&nbsp;parallel</b></h3>



<p>Ingesting thousands of images, running tens of filters, applying downsampling, flattening the output &#8230; all of these steps can be done in parallel which make the system <a href="https://en.wikipedia.org/wiki/Embarrassingly_parallel" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">embarrassingly parallel</a>. Embarrassingly means in reality a <strong><em>perfectly parallel&nbsp;</em></strong>problem and it&#8217;s just a perfect use case for <em><strong>GPGPU (General Purpose Graphic Processing Unit),</strong></em> which&nbsp;are perfect for massively parallel computing.</p>



<h3 class="wp-block-heading"><strong>Fact 11: Need more precision? Just go deeper</strong></h3>



<p>Of course it is a bit of an oversimplification, but if we look at the main &#8220;image recognition competition&#8221;, known as the <a href="https://en.wikipedia.org/wiki/ImageNet#ImageNet_Challenge" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">ImageNet challenge,</a> we can see that the error rate has decreased with the depth of the neural network. It is generally acknowledged that, among other elements, the depth of the network will lead to a better capacity for generalisation and precision.</p>



<div class="wp-block-image"><figure class="aligncenter"><img decoding="async" src="https://cdn-images-1.medium.com/max/800/1*DBXf6dzNB78QPHGDofHA4Q.png" alt=""/><figcaption>Imagenet competition winner error rates VS number of layers in the network (source : https://medium.com/@sidereal/cnns-architectures-lenet-alexnet-vgg-googlenet-resnet-and-more-666091488df5)</figcaption></figure></div>



<h3 class="wp-block-heading"><strong>In conclusion&nbsp;&nbsp;</strong></h3>



<p>We have taken a brief look at the concept of Deep Learning as applied to image recognition. It&#8217;s worth noting that almost every&nbsp; new architecture for image recognition (medical, satellite, autonomous driving, &#8230;) uses these same principles with a different number of layers, different types of filters, different initialisation points, different matrix sizes, different tricks (like image augmentation, dropouts, weight compression, &#8230;). The concepts remain the same:</p>



<div class="wp-block-image wp-image-14654"><figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="885" height="469" src="https://www.ovh.com/blog/wp-content/uploads/2019/02/IMG_0070.jpg" alt="Number detection process" class="wp-image-14654" srcset="https://blog.ovhcloud.com/wp-content/uploads/2019/02/IMG_0070.jpg 885w, https://blog.ovhcloud.com/wp-content/uploads/2019/02/IMG_0070-300x159.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2019/02/IMG_0070-768x407.jpg 768w" sizes="auto, (max-width: 885px) 100vw, 885px" /><figcaption>51Number detection process</figcaption></figure></div>



<p>In other words, we saw that the training and inference of deep learning models comes down to lots and lots of basic matrix operations that can be done in parallel, and this is exactly what our good old graphical processors (GPU) are made for.</p>



<p>In the next post we will discuss how precisely a GPU works and how technically deep learning is implemented into it.</p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fdeep-learning-explained-to-my-8-year-old-daughter%2F&amp;action_name=Deep%20Learning%20explained%20to%20my%208-year-old%20daughter&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
