<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Neural networks Archives - OVHcloud Blog</title>
	<atom:link href="https://blog.ovhcloud.com/tag/neural-networks/feed/" rel="self" type="application/rss+xml" />
	<link>https://blog.ovhcloud.com/tag/neural-networks/</link>
	<description>Innovation for Freedom</description>
	<lastBuildDate>Wed, 10 Jun 2020 12:42:23 +0000</lastBuildDate>
	<language>en-GB</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>https://blog.ovhcloud.com/wp-content/uploads/2019/07/cropped-cropped-nouveau-logo-ovh-rebranding-32x32.gif</url>
	<title>Neural networks Archives - OVHcloud Blog</title>
	<link>https://blog.ovhcloud.com/tag/neural-networks/</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Distributed Training in a Deep Learning Context</title>
		<link>https://blog.ovhcloud.com/distributed-training-in-a-deep-learning-context/</link>
		
		<dc:creator><![CDATA[Jean-Louis Queguiner]]></dc:creator>
		<pubDate>Tue, 05 May 2020 10:14:07 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Deep learning]]></category>
		<category><![CDATA[GPU]]></category>
		<category><![CDATA[Machine learning]]></category>
		<category><![CDATA[Neural networks]]></category>
		<guid isPermaLink="false">https://www.ovh.com/blog/?p=17871</guid>

					<description><![CDATA[Previously on OVHcloud Blog &#8230; In previous blog posts we have discussed a high level approach to deep learning as well as what is meant by &#8216;training&#8217; in relation to Deep Learning. Following the article, I had lots of questions entering my twitter inbox, especially regarding how GPUs actually works. I decided, therefore, to write [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fdistributed-training-in-a-deep-learning-context%2F&amp;action_name=Distributed%20Training%20in%20a%20Deep%20Learning%20Context&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<figure class="wp-block-image size-large"><img fetchpriority="high" decoding="async" width="1024" height="537" src="https://www.ovh.com/blog/wp-content/uploads/2020/05/20C35ECE-4738-4967-951E-6BC863342D5D-1024x537.png" alt="Distributed Learning in a Deep Learning context" class="wp-image-18106" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/05/20C35ECE-4738-4967-951E-6BC863342D5D-1024x537.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/05/20C35ECE-4738-4967-951E-6BC863342D5D-300x157.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/05/20C35ECE-4738-4967-951E-6BC863342D5D-768x403.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/05/20C35ECE-4738-4967-951E-6BC863342D5D.png 1200w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<h3 class="wp-block-heading">Previously on OVHcloud Blog &#8230;</h3>



<p>In previous blog posts we have discussed a <a href="https://www.ovh.com/blog/deep-learning-explained-to-my-8-year-old-daughter/" data-wpel-link="exclude">high level approach to deep learning</a> as well as what is meant by &#8216;training&#8217; in relation to Deep Learning.</p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><a href="https://www.ovh.com/blog/deep-learning-explained-to-my-8-year-old-daughter/" data-wpel-link="exclude"><img decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/05/BC0E1AC1-6593-4395-9844-A7D2CB457028.png" alt="" class="wp-image-18099" width="374" height="273" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/05/BC0E1AC1-6593-4395-9844-A7D2CB457028.png 748w, https://blog.ovhcloud.com/wp-content/uploads/2020/05/BC0E1AC1-6593-4395-9844-A7D2CB457028-300x219.png 300w" sizes="(max-width: 374px) 100vw, 374px" /></a></figure></div>



<p>Following the article, I had lots of questions entering my twitter inbox, especially regarding how GPUs actually works.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="410" height="157" src="https://www.ovh.com/blog/wp-content/uploads/2020/04/image.png" alt="" class="wp-image-17882" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/04/image.png 410w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/image-300x115.png 300w" sizes="(max-width: 410px) 100vw, 410px" /><figcaption>Don&#8217;t worry it&#8217;s a friend, he is ok with me sharing the DM 😉</figcaption></figure></div>



<p>I decided, therefore, to write an article on how GPUs work:</p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><a href="https://www.ovh.com/blog/understanding-the-anatomy-of-gpus-using-pokemon/" data-wpel-link="exclude"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/05/EEAD0A02-DFCA-4745-802B-E36BC517EFED.png" alt="" class="wp-image-18103" width="334" height="254" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/05/EEAD0A02-DFCA-4745-802B-E36BC517EFED.png 668w, https://blog.ovhcloud.com/wp-content/uploads/2020/05/EEAD0A02-DFCA-4745-802B-E36BC517EFED-300x228.png 300w" sizes="auto, (max-width: 334px) 100vw, 334px" /></a></figure></div>



<p>During our R&amp;D process around hardware and AI models, the question of distributed training came up (quickly). But before looking in-depth at distributed training, I invite you to read the following article to understand how Deep Learning training actually works:</p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><a href="https://www.ovh.com/blog/what-does-training-neural-networks-mean/" data-wpel-link="exclude"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/04/81921ABA-7642-4CA2-87BF-9B2D92278BF1-1024x538.png" alt="What does training neural networks mean?" class="wp-image-17932" width="476" height="249" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/04/81921ABA-7642-4CA2-87BF-9B2D92278BF1-1024x538.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/81921ABA-7642-4CA2-87BF-9B2D92278BF1-300x158.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/81921ABA-7642-4CA2-87BF-9B2D92278BF1-768x404.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/81921ABA-7642-4CA2-87BF-9B2D92278BF1.png 1200w" sizes="auto, (max-width: 476px) 100vw, 476px" /></a></figure></div>



<p>As previously discussed, Neural Networks training depends on :</p>



<ul class="wp-block-list"><li>Input Data</li><li>Neural Network architecture composed of &#8216;Layers&#8217;</li><li>Weights</li><li>Learning Rate (step used to adjust neural network weights)</li></ul>



<h2 class="wp-block-heading">Why do we need distributed Learning</h2>



<p>Deep Learning is mainly used for non structured data pattern learning. <strong>Non structured data &#8211; such as text corpus, image, video or sound &#8211; can represent a huge amount of data to train on.</strong></p>



<p>Training such a library can take days or even weeks because of the size of data and/or the size of the network.</p>



<p>Multiple distributed learning approaches can be considered.</p>



<h2 class="wp-block-heading">The different Distributed Learning approaches</h2>



<p>There are two main categories for distributed training when it comes to Deep Learning and both of them are based on the <strong><a rel="noreferrer noopener nofollow external" href="https://en.wikipedia.org/wiki/Divide-and-conquer_algorithm" target="_blank" data-wpel-link="external">divide and conquer paradigm.</a></strong></p>



<p>The first category is named : <strong>&#8220;Distributed Data Parallelism&#8221;</strong> where the <strong>data is split across m</strong>ultiple GPUs.</p>



<p>The second category is called : <strong>&#8220;Model Parallelism&#8221;</strong> where the deep learning <strong>model is split across multiple GPUs</strong>.</p>



<p>However the <strong>Distributed Data Parallelism </strong>is the most common approach as it <strong>fits almost any problem</strong>. The second approach has some serious technical limitations in relation to model splitting. Splitting a model is a highly technical approach, as you need to know the space used by each part of the network into the <strong>DRAM</strong> of the GPU. Once you have the <strong>DRAM usage per slice</strong> you need to enforce the computation by <strong>hard coding Neural Network Layers placement onto the desired GPU</strong>. T<strong>his approach makes it hardware-related</strong>, as the DRAM may vary from one GPU to the other, while the <strong>Distributed Data Parallelism </strong>will just require <strong>data size adjustments (usually batch size) which is relatively simple</strong>.</p>



<p><strong>Distributed Data Parallelism</strong> model has two variants, each of which has its advantages and disadvantages. The first variant allows you to train a model with a<strong> synchronous weight adjustment.</strong> That is to say that <strong>each training batch in each GPU will return the corrections</strong> that need to be made to the model in order for it to be trained. And <strong>that it will have to wait until all the workers have finished their task to have a new set of weights </strong>so it can recognise this in the next training batch. </p>



<p>Whereas the second variant lets you work in an <strong>asynchronous way</strong>. This means each batch of each GPU will report the corrections that need to be made to the neural network. The<strong> weights coordinator </strong>will send a <strong>new set of weights</strong> w<strong>ithout waiting for the other GPUs to finish training their own </strong>batch.</p>



<h2 class="wp-block-heading">3 cheat sheets to better understand Distributed Deep Learning</h2>



<p>In this cheat sheets lets assume you&#8217;re using docker with a volume attached.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="942" height="1024" src="https://www.ovh.com/blog/wp-content/uploads/2020/04/type1-942x1024.png" alt="" class="wp-image-18048" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/04/type1-942x1024.png 942w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/type1-276x300.png 276w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/type1-768x835.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/type1.png 1004w" sizes="auto, (max-width: 942px) 100vw, 942px" /></figure>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="904" height="1024" src="https://www.ovh.com/blog/wp-content/uploads/2020/04/type2-904x1024.png" alt="" class="wp-image-18049" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/04/type2-904x1024.png 904w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/type2-265x300.png 265w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/type2-768x870.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/type2.png 945w" sizes="auto, (max-width: 904px) 100vw, 904px" /></figure>



<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="1543" height="2182" src="https://www.ovh.com/blog/wp-content/uploads/2020/04/distrib-training1.jpeg" alt="" class="wp-image-18036" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/04/distrib-training1.jpeg 1543w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/distrib-training1-212x300.jpeg 212w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/distrib-training1-724x1024.jpeg 724w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/distrib-training1-768x1086.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/distrib-training1-1086x1536.jpeg 1086w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/distrib-training1-1448x2048.jpeg 1448w" sizes="auto, (max-width: 1543px) 100vw, 1543px" /></figure>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/05/63EDA175-2E61-4AC2-9157-97C18A973B78.png" alt="" class="wp-image-18096" width="320" height="322" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/05/63EDA175-2E61-4AC2-9157-97C18A973B78.png 640w, https://blog.ovhcloud.com/wp-content/uploads/2020/05/63EDA175-2E61-4AC2-9157-97C18A973B78-150x150.png 150w" sizes="auto, (max-width: 320px) 100vw, 320px" /><figcaption>Now you need to choose your Distributed Training strategy (wisely)</figcaption></figure></div>



<p></p>



<h2 class="wp-block-heading">Further Readings</h2>



<p>While we have covered a lot in this blog post, we haven&#8217;t nearly covered all the aspects of Deep Learning distributed training &#8211; including prior work, history and associated mathematics.</p>



<p>I highly suggest that you read the great paper<em> <a href="https://stanford.edu/~rezab/classes/cme323/S16/projects_reports/hedge_usmani.pdf" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">Parallel and Distributed Deep Learning</a></em> by <strong>Vishakh Hegde</strong> and <strong>Sheema</strong> <strong>Usmani</strong> (both from Stanford University)</p>



<p>As well as the article <em><a href="https://arxiv.org/pdf/1802.09941.pdf" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis</a></em> written by <strong>Tal Ben-Nun </strong>and <strong>Torsten Hoefler</strong> ETH Zurich, Switzerland. I suggest that you start by jumping to <strong>section 6.3</strong>.</p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fdistributed-training-in-a-deep-learning-context%2F&amp;action_name=Distributed%20Training%20in%20a%20Deep%20Learning%20Context&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>What does Training Neural Networks mean?</title>
		<link>https://blog.ovhcloud.com/what-does-training-neural-networks-mean/</link>
		
		<dc:creator><![CDATA[Jean-Louis Queguiner]]></dc:creator>
		<pubDate>Wed, 22 Apr 2020 16:37:25 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Deep learning]]></category>
		<category><![CDATA[GPU]]></category>
		<category><![CDATA[Machine learning]]></category>
		<category><![CDATA[Neural networks]]></category>
		<guid isPermaLink="false">https://www.ovh.com/blog/?p=17859</guid>

					<description><![CDATA[In a previous blog post we discussed general concepts surrounding Deep Learning. In this blog post, we will go deeper into the basic concepts of training a (deep) Neural Network. Where does &#8220;Neural&#8221; comes from ? As you should know, a biological neuron is composed of multiple dendrites, a nucleus and a axon (if only [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fwhat-does-training-neural-networks-mean%2F&amp;action_name=What%20does%20Training%20Neural%20Networks%20mean%3F&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p>In a previous<a rel="noreferrer noopener" href="https://www.ovh.com/blog/deep-learning-explained-to-my-8-year-old-daughter/" target="_blank" data-wpel-link="exclude"> blog post</a> we discussed general concepts surrounding Deep Learning. In this blog post, we will go deeper into the basic concepts of training a (deep) Neural Network.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="538" src="https://www.ovh.com/blog/wp-content/uploads/2020/04/81921ABA-7642-4CA2-87BF-9B2D92278BF1-1024x538.png" alt="" class="wp-image-17932" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/04/81921ABA-7642-4CA2-87BF-9B2D92278BF1-1024x538.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/81921ABA-7642-4CA2-87BF-9B2D92278BF1-300x158.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/81921ABA-7642-4CA2-87BF-9B2D92278BF1-768x404.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/81921ABA-7642-4CA2-87BF-9B2D92278BF1.png 1200w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<h2 class="wp-block-heading">Where does &#8220;Neural&#8221; comes from ?</h2>



<p>As you should know, a <strong>biological neuron</strong> is composed of multiple <strong>dendrites</strong>, a <strong>nucleus</strong> and a <strong>axon</strong> (if only you had paid attention in your biology classes). When a stimuli is sent to the brain, it is received through the <strong>synapse</strong> located at the extremity of the dendrite.</p>



<p>When a <strong>stimuli</strong> arrives at the brain it is transmitted to the neuron via the <strong>synaptic receptors</strong> which<strong> adjust the strength of the signal sent to the nucleus</strong>. This message is <strong>transported</strong> by the <strong>dendrites</strong> to the <strong>nucleus</strong> to then be <strong>processed</strong> in <strong>combination</strong> with other signals emanating from other receptors on the other dendrites. T<strong>hus the combination of all these signals takes place in the nucleus.</strong> After processing all these signals, <strong>the nucleus will emit an output signal through its single axon</strong>. The axon will then stream this signal to several other downstream neurons via its <strong>axon terminations</strong>. Thus a neuron analysis is pushed in the subsequent layers of neurons. When you are confronted with the complexity and efficiency of this system, you can only imagine the millennia of biological evolution that brought us here.</p>



<p>On the other hand, <strong>artificial neural networks </strong>are built on the principle of bio-mimicry. <strong>External stimuli (the data), </strong>whose signal strength is adjusted by the <strong>neuronal weights </strong>(remember the <strong>synapse</strong>?, <strong>circulates to the neuron</strong> (place where the mathematical calculation will happen) via the dendrites. The result of the calculation &#8211; called the <strong>output</strong> &#8211; is then re-transmitted (via the axon) to several other neurons and then subsequent layers are combined, and so on.</p>



<p>Therefore, their is a clear parallel between biological neurons and artificial neural networks as presented in the figure below.</p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/04/9A5100D7-A350-46FA-B1EB-190CFE0E9AF6-699x1024.png" alt="" class="wp-image-17933" width="350" height="512" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/04/9A5100D7-A350-46FA-B1EB-190CFE0E9AF6-699x1024.png 699w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/9A5100D7-A350-46FA-B1EB-190CFE0E9AF6-205x300.png 205w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/9A5100D7-A350-46FA-B1EB-190CFE0E9AF6.png 747w" sizes="auto, (max-width: 350px) 100vw, 350px" /><figcaption>Basec on https://medium.com/swlh/learning-paradigms-in-neural-networks-30854975aa8d</figcaption></figure></div>



<h2 class="wp-block-heading">The Artificial Neural Network Recipe</h2>



<p>To build a good Artificial Neural Network (ANN) you will need the following ingredients</p>



<h4 class="wp-block-heading"> Ingredients:</h4>



<ul class="wp-block-list"><li><strong>Artificial Neurons</strong> (processing node) composed of:<ul><li>(many) <strong>input </strong>neuron(s) connection(s) (dendrites)</li><li>a <strong>computation unit </strong>(nucleus) composed of:<ul><li>a <strong>linear function</strong> (ax+b)</li><li>an <strong>activation function</strong> (equivalent to the the <strong>synapse</strong>)</li></ul></li><li>an <strong>output</strong> (axon)</li></ul></li></ul>



<h2 class="wp-block-heading">Preparation to get an ANN for image classification training:</h2>



<ol class="wp-block-list"><li>Decide on the<strong> number of output classes </strong>(meaning the number of image classes &#8211; for example two for cat vs dog) </li><li>Draw as many computation units as the <strong>number of output classes</strong> (congrats you just create the <strong>Output Layer</strong> of the ANN)</li><li>Add as many <strong>Hidden Layers</strong> as needed within the defined <strong>architecture</strong> (for instance <a rel="noreferrer noopener nofollow external" href="https://neurohive.io/en/popular-networks/vgg16/" target="_blank" data-wpel-link="external">vgg16</a> or <a rel="noreferrer noopener nofollow external" href="https://neurohive.io/en/popular-networks/" target="_blank" data-wpel-link="external">any other popular architecture</a>). Tip &#8211; <strong>Hidden Layers</strong> are just a set of neighboured <strong>Compute Units</strong>, they are not linked together.</li><li>Stack those<strong> Hidden Layers </strong>to the <strong>Output Layer</strong> using <strong>Neural Connections</strong></li><li>It is important to understand that the <strong>Input Layer</strong> is basically a layer of data ingestion</li><li>Add an <strong>Input Layer</strong> that is adapted to ingest your data (or you will adapt your data format to the pre-defined architecture)</li><li>Assemble many Artificial Neurons together in a way where the <strong>output</strong> (axon) an <strong>Neuron</strong> on a given <strong>Layer</strong> is (one) of the <strong>input</strong> of another <strong>Neuron</strong> on a subsequent <strong>Layer</strong>. As a consequence, the <strong>Input Layer </strong>is linked to the <strong>Hidden Layers </strong>which are then linked to the <strong>Output Layer</strong> (as shown in the picture below) using <strong>Neural Connections</strong> (also shown in the picture below).</li><li>Enjoy your meal</li></ol>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/04/9C2D3CB9-4385-4348-963A-3DF79E3C3C62.png" alt="" class="wp-image-17934" width="476" height="371" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/04/9C2D3CB9-4385-4348-963A-3DF79E3C3C62.png 951w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/9C2D3CB9-4385-4348-963A-3DF79E3C3C62-300x234.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/9C2D3CB9-4385-4348-963A-3DF79E3C3C62-768x598.png 768w" sizes="auto, (max-width: 476px) 100vw, 476px" /><figcaption>simplified schema of a neural network architecture</figcaption></figure></div>



<h2 class="wp-block-heading">What does it mean to train an Artificial Neural Network ?</h2>



<p>All <strong>Neurons</strong> of a given <strong>Layer</strong> are generating an <strong>Output</strong>, but they don&#8217;t have the same <strong>Weight</strong> for the next<strong> Neurons Layer</strong>. This means that if a Neuron on a layer observes a given pattern it might mean less for the overall picture and will be partially or completely muted. This is what we call <strong>Weighting</strong>: a <strong>big weight means that the Input is important</strong> and of course <strong>a small weight means that we should ignore it</strong>. Every <strong>Neural Connection</strong> between <strong>Neurons</strong> will have <strong>an associated Weight</strong>.</p>



<p>And this is the magic of<strong> Neural Network Adaptability</strong>: <strong>Weights</strong> will be adjusted over the training to fit the <strong>objectives</strong> we have set (recognize that a dog is a dog and that a cat is a cat). <strong>In simple terms: Training a Neural Network means finding the appropriate Weights of the Neural Connections thanks to a feedback loop called Gradient Backward propagation &#8230; and that&#8217;s it</strong> <strong>folks.</strong></p>



<h2 class="wp-block-heading">Parallel between Control Theory and Deep Learning Training</h2>



<p>The engineering field of <strong>control theory</strong> defines similar principles to the mechanism used for training neural networks.</p>



<h3 class="wp-block-heading">Control Theory general concepts</h3>



<p>In control systems, a <strong>setpoint</strong> is the target value<strong> for the system.</strong><br><br>A <strong>setpoint</strong> (<strong>input</strong>) is defined and then processed by a controller, which adjusts the setpoint&#8217;s value according to the feedback loop (<strong>Manipulated Variable</strong>). Once the <strong>setpoint</strong> has been <strong>adjusted</strong> it is then sent to the <strong>controlled system</strong> which will <strong>produce an output.</strong> This output is monitored using an appropriate metric which is then <strong>compared (comparator) to the original input </strong>via a <strong>feedback loop</strong>. This allows the <strong>controller</strong> to define the <strong>level of adjustment (Manipulated Variable) </strong>of the original setpoint.</p>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/04/E4ADE5AC-8E87-47D2-A575-90B7D703A512_1_201_a-1024x381.jpeg" alt="" class="wp-image-17889" width="512" height="191" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/04/E4ADE5AC-8E87-47D2-A575-90B7D703A512_1_201_a-1024x381.jpeg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/E4ADE5AC-8E87-47D2-A575-90B7D703A512_1_201_a-300x112.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/E4ADE5AC-8E87-47D2-A575-90B7D703A512_1_201_a-768x286.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/E4ADE5AC-8E87-47D2-A575-90B7D703A512_1_201_a-1536x572.jpeg 1536w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/E4ADE5AC-8E87-47D2-A575-90B7D703A512_1_201_a-2048x762.jpeg 2048w" sizes="auto, (max-width: 512px) 100vw, 512px" /></figure>



<h3 class="wp-block-heading">Control Theory applied to a radiator</h3>



<p>Let&#8217;s take the example of a <strong>resistance (controlled system) </strong>in a radiator. Imagine you decide to <strong>set the room temperature to 20 ° C (setpoint)</strong>. The radiator starts up, supplies the <strong>resistance</strong> with a <strong>certain intensity </strong>defined by the <strong>controller</strong>. A <strong>probe (thermometer) </strong>will then take the ambient temperature (<strong>feedback elements</strong>) which is then <strong>compared <strong>(comparator)</strong></strong> <strong>to the setpoint </strong>(desired temperature) and adjusts <strong>(controller)</strong> the electric intensity sent to the resistance. The adjustment of the new intensity is deployed via an <strong>incremental adjustment step.</strong></p>



<h3 class="wp-block-heading">Control Theory applied to Neural Network Training</h3>



<p>The training of a neuron network is similar to a radiator insofar as the controlled system is the cat or dog detection model.<br><br>The objective is no longer to have the minimum difference between the setpoint temperature and the actual temperature but to <strong>minimize the error (Loss) between the classification of the incoming data (a cat is a cat) and the one made by the neural network.</strong><br><br>In order to achieve this, the system will have to look at the <strong>input</strong> (<strong>setpoint</strong>) and <strong>compute an output </strong>(<strong>controlled system</strong>) based on the parameters defined in the algorithm. This phase is called the<strong> forward pass.</strong></p>



<p><br>Once the <strong>output</strong> has been calculated, the system will <strong>re-propagate the evaluation error</strong> using <strong>Gradient Retro-propagation </strong>(<strong>Feedback Elements</strong>). While the temperature difference between the setpoint and the thermometer was converted into electrical intensity for the radiator, here <strong>the system will adjust the weights of the different inputs into each neuron with a given step (learning rate)</strong>.</p>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/04/BFB7FD92-22D8-4600-A211-14A29E191A70_1_201_a-1024x383.jpeg" alt="" class="wp-image-17888" width="512" height="192" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/04/BFB7FD92-22D8-4600-A211-14A29E191A70_1_201_a-1024x383.jpeg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/BFB7FD92-22D8-4600-A211-14A29E191A70_1_201_a-300x112.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/BFB7FD92-22D8-4600-A211-14A29E191A70_1_201_a-768x288.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/BFB7FD92-22D8-4600-A211-14A29E191A70_1_201_a-1536x575.jpeg 1536w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/BFB7FD92-22D8-4600-A211-14A29E191A70_1_201_a-2048x767.jpeg 2048w" sizes="auto, (max-width: 512px) 100vw, 512px" /><figcaption>Parallel between electrical engineering controlled system and neural network training process</figcaption></figure>



<h2 class="wp-block-heading">One thing to consider: The Valley Problem</h2>



<p>When training the system, the backward propagation will lead the system to reduce the error it&#8217;s making to best fit the objectives you have set (finding that a dog is a dog&#8230;).</p>



<p>Choosing the learning rate at which you will adjust your weights (what one call<strong> adjustment step</strong> in <a rel="noreferrer noopener nofollow external" href="https://en.wikipedia.org/wiki/Control_theory" target="_blank" data-wpel-link="external">Control Theory</a>). </p>



<p>Just as is the case in control theory, the control system can face several issues if it is not designed correctly:</p>



<ul class="wp-block-list"><li>If the <strong>correction step (learning rate)</strong> is too small it will lead to a very slow convergence (i.e. it will take a very long time to get your room to 20°C&#8230;).</li><li>Too smaller <strong>learning rate</strong> can also lead to you being <strong>stuck in a local minima</strong></li><li>If the <strong>correction step (learning rate)</strong> is too high it will lead the system to never converge (beat around the bush) or worse (i.e. the radiator will oscillate between being either too hot or too cold)</li><li>The system could enter into a resonance state (<strong>divergence</strong>).</li></ul>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/04/866F67B0-DF11-485B-9CC4-0A1C50752625-787x1024.jpeg" alt="Why Training a Neural Network Is Hard" class="wp-image-17868" width="394" height="512" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/04/866F67B0-DF11-485B-9CC4-0A1C50752625-787x1024.jpeg 787w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/866F67B0-DF11-485B-9CC4-0A1C50752625-230x300.jpeg 230w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/866F67B0-DF11-485B-9CC4-0A1C50752625-768x1000.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/866F67B0-DF11-485B-9CC4-0A1C50752625-1180x1536.jpeg 1180w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/866F67B0-DF11-485B-9CC4-0A1C50752625-1574x2048.jpeg 1574w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/866F67B0-DF11-485B-9CC4-0A1C50752625-scaled.jpeg 1967w" sizes="auto, (max-width: 394px) 100vw, 394px" /></figure>



<h2 class="wp-block-heading">In the end Training an Artificial Neural Network (ANN) requires just a few steps:</h2>



<ol class="wp-block-list"><li>First an ANN will require a <strong>random weight initialization</strong></li><li>Split the dataset in <strong>batches</strong> <strong>(batch size)</strong></li><li>Send the batches 1 by 1 to the GPU</li><li>Calculate the<strong> forward pass</strong> (what would be the output with the current weights)</li><li>Compare the calculated output to the expected output <strong>(loss)</strong></li><li>Adjust the <strong>weights</strong> (using the <strong>learning rate</strong> increment or decrement) according to the <strong>backward pass (backward gradient propagation)</strong>.</li><li>Go back to square 2</li></ol>



<h2 class="wp-block-heading">Further notice</h2>



<p>That’s all folks, you are now all set to read our future blog post which focuses on <strong>Distributed Training in a Deep Learning Context.</strong></p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fwhat-does-training-neural-networks-mean%2F&amp;action_name=What%20does%20Training%20Neural%20Networks%20mean%3F&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Deep Learning explained to my 8-year-old daughter</title>
		<link>https://blog.ovhcloud.com/deep-learning-explained-to-my-8-year-old-daughter/</link>
		
		<dc:creator><![CDATA[Jean-Louis Queguiner]]></dc:creator>
		<pubDate>Fri, 15 Feb 2019 14:56:56 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Deep learning]]></category>
		<category><![CDATA[GPU]]></category>
		<category><![CDATA[Machine learning]]></category>
		<category><![CDATA[Neural networks]]></category>
		<guid isPermaLink="false">https://blog.ovh.com/fr/blog/?p=14481</guid>

					<description><![CDATA[Machine Learning and especially Deep Learning&#160;are hot topics and you are sure to have come across the buzzword &#8220;Artificial Intelligence&#8221; in the media. Yet these are not new concepts. The first Artificial Neural Network (ANN) was introduced in the 40s. So why all the recent interest around neural networks&#160;and Deep Learning?&#160; We will explore this [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fdeep-learning-explained-to-my-8-year-old-daughter%2F&amp;action_name=Deep%20Learning%20explained%20to%20my%208-year-old%20daughter&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p><strong>Machine Learning</strong> and especially <strong><a href="https://www.kdnuggets.com/2016/01/seven-steps-deep-learning.html" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Deep Learning</a></strong>&nbsp;are hot topics and you are sure to have come across the buzzword &#8220;Artificial Intelligence&#8221; in the media.</p>



<div class="wp-block-image"><figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="885" height="508" src="https://www.ovh.com/blog/wp-content/uploads/2019/02/IMG_0057.jpg" alt="Deep Learning: A new hype" class="wp-image-14620" srcset="https://blog.ovhcloud.com/wp-content/uploads/2019/02/IMG_0057.jpg 885w, https://blog.ovhcloud.com/wp-content/uploads/2019/02/IMG_0057-300x172.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2019/02/IMG_0057-768x441.jpg 768w" sizes="auto, (max-width: 885px) 100vw, 885px" /></figure></div>



<p>Yet these are not new concepts. The first <strong>Artificial Neural Network</strong> (ANN) was introduced in the 40s. So why all the recent interest around neural networks&nbsp;and Deep Learning?<strong>&nbsp;</strong></p>



<p>We will explore this and other concepts in a series of blog posts on&nbsp;<strong>GPUs and Machine Learning</strong>.</p>



<h2 class="wp-block-heading"><strong>YABAIR &#8211; Yet Another Blog About Image Recognition</strong></h2>



<p>In the 80s, I remember my father building character recognition for bank checks. He used primitives and derivatives around pixel darkness level. Examining so many different types of handwriting was a real pain because he needed one equation to apply to all the variations.</p>



<p>In the last few years, It has become clear that the best way to deal with this type of problem is through Convolutional Neural Networks. Equations designed by humans are no longer fit to handle infinite handwriting patterns.</p>



<p>Let&#8217;s take a look at one of the most classic examples: building a number recognition system, a neural network to recognise handwritten digits.</p>



<h3 class="wp-block-heading">Fact 1: It&#8217;s as simple as counting</h3>



<p>We&#8217;ll start by counting how many times the small red shapes in the top row can be seen in each of the black, hand-written digits, (in the left-hand column).</p>



<div class="wp-block-image wp-image-14651"><figure class="aligncenter size-full is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2019/02/IMG_0067.jpg" alt="Simplified matrix for handwritten numbers" class="wp-image-14651" width="337" height="400" srcset="https://blog.ovhcloud.com/wp-content/uploads/2019/02/IMG_0067.jpg 674w, https://blog.ovhcloud.com/wp-content/uploads/2019/02/IMG_0067-253x300.jpg 253w" sizes="auto, (max-width: 337px) 100vw, 337px" /><figcaption>Simplified matrix for handwritten numbers</figcaption></figure></div>



<p>Now let&#8217;s try to recognise (infer) a new hand-written digit, by counting the number of matches with the same red shapes. We&#8217;ll then compare this to our previous table, in order to identify which number has the most correspondences:</p>



<div class="wp-block-image size-medium wp-image-14652"><figure class="aligncenter size-full is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2019/02/IMG_0069.jpg" alt="Matching shapes for handwritten numbers " class="wp-image-14652" width="443" height="400" srcset="https://blog.ovhcloud.com/wp-content/uploads/2019/02/IMG_0069.jpg 885w, https://blog.ovhcloud.com/wp-content/uploads/2019/02/IMG_0069-300x271.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2019/02/IMG_0069-768x693.jpg 768w" sizes="auto, (max-width: 443px) 100vw, 443px" /><figcaption>Matching shapes for handwritten numbers</figcaption></figure></div>



<p>Congratulations! You&#8217;ve just built the world&#8217;s simplest neural network system for recognising hand-written digits.</p>



<h3 class="wp-block-heading">Fact 2: An image is just a matrix</h3>



<p class="graf graf--p">A computer views an image as a&nbsp;<strong>matrix</strong>. A black and white image is a 2D matrix.</p>



<p>Let&#8217;s consider an image. To keep it simple, let&#8217;s take a small black and white image of an 8, with square dimensions of 28 pixels.</p>



<p>Every cell of the matrix represents the intensity of the pixel from 0 (which represents black), to 255 (which represents a pure white pixel).</p>



<p>The image will therefore be represented as the following 28 x 28 pixel matrix.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img loading="lazy" decoding="async" width="723" height="504" src="https://www.ovh.com/blog/wp-content/uploads/2020/06/1_cLsTCWtUL1GYBUv8vnbOxw.jpeg" alt="Image of a handwritten 8 and the associated intensity matrix" class="wp-image-18492" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/06/1_cLsTCWtUL1GYBUv8vnbOxw.jpeg 723w, https://blog.ovhcloud.com/wp-content/uploads/2020/06/1_cLsTCWtUL1GYBUv8vnbOxw-300x209.jpeg 300w" sizes="auto, (max-width: 723px) 100vw, 723px" /><figcaption>Image of a handwritten 8 and the associated intensity matrix</figcaption></figure></div>



<h3 class="wp-block-heading">Fact 3: Convolutional layers are just bat-signals</h3>



<p class="graf graf--p">To work out which pattern is displayed in a picture (in this case the handwritten 8) we will use a kind of bat-signal/flashlight. In machine learning, the flashlight is called a filter. The filter is used to perform a classic convolution matrix calculation used in usual image processing software such as&nbsp;<a href="https://docs.gimp.org/2.8/en/plug-in-convmatrix.html" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Gimp.</a></p>



<div class="wp-block-image"><figure class="aligncenter"><img decoding="async" src="https://media2.giphy.com/media/l0NwGpoOVLTAyUJSo/giphy.gif" alt="RÃ©sultat de recherche d'images pour &quot;batman torch light sky gif&quot;"/></figure></div>



<p>The filter will <strong>scan the picture</strong>&nbsp; in order to <strong>find the pattern</strong> in the image and will trigger a <strong>positive feedback</strong> if a match is found. It works a bit like a toddler shape sorting box: triangle filter matching triangle hole, square filter matching square hole and so on.</p>



<div class="wp-block-image"><figure class="aligncenter"><img decoding="async" src="https://multimedia.bbycastatic.ca/multimedia/products/500x500/103/10319/10319838.jpg" alt="Image filters work like children shape sorting boxes"/><figcaption>Image filters work like children shape sorting boxes</figcaption></figure></div>



<h3 class="wp-block-heading">Fact 4: Filter matching is an embarrassingly&nbsp;parallel task</h3>



<p class="graf graf--p">To be more scientific the image filtering process looks a bit like the animation below. As you can see, <strong>every step</strong> of the filter scanning is <strong>independent</strong>, which means that this task can be <strong>highly parallelised</strong>.</p>



<p>It&#8217;s important to note that <strong>tens of filters</strong>&nbsp;will be applied at the same time,&nbsp;<strong>in parallel</strong> as none of them are dependent.</p>



<div class="wp-block-image"><figure class="aligncenter"><a href="https://cdn-images-1.medium.com/max/800/0*rKUDc--RZg1v66wq" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"><img decoding="async" src="https://cdn-images-1.medium.com/max/800/0*rKUDc--RZg1v66wq" alt="Convolution Filter over an input image"/></a><figcaption>https://github.com/vdumoulin</figcaption></figure></div>



<h3 class="wp-block-heading"><strong><strong><br></strong></strong>Fact 5: Just repeat the filtering operation (matrix convolution) as many times as possible</h3>



<p>We just saw that the input image/matrix is filtered using multiple matrix convolutions.</p>



<p>To improve the accuracy of the image recognition just take the filtered image from the previous operation and filter again and again and again&#8230;</p>



<p>Of course, we are oversimplifying things somewhat, but generally the more filters you apply, and the more you repeat this operation in sequence, the more precise your results will be.</p>



<p>It&#8217;s like creating new abstraction layers to get a clearer and clearer object filter description, starting from primitive filters to filters that look like edges, wheel, squares, cubes, &#8230;</p>



<p><strong style="font-size: 23px;">Fact 6: Matrix convolutions are just <em>x</em>&nbsp;and <em>+</em></strong></p>



<p>An image is worth a thousand words: the following picture is a simplistic view of a source image (8×8) filtered with a convolution filter (3×3). The projection of the torch light (in this example a Sobel Gx Filter) provides one value.</p>



<div class="wp-block-image"><figure class="aligncenter"><img decoding="async" src="https://i.stack.imgur.com/YDusp.png" alt=""/><figcaption>Example of a convolution filter (Sobel Gx) applied to an input matrix (Source : https://datascience.stackexchange.com/questions/23183/why-convolutions-always-use-odd-numbers-as-filter-size/23186)</figcaption></figure></div>



<p>This is where the magic happens, simple matrix operations are highly parallelised which fits perfectly with a General Purpose Graphical Processing Unit use case.</p>



<h3 class="wp-block-heading">Fact 7: Need to simplify and summarise what&#8217;s been detected? Just use max()</h3>



<p class="graf graf--figure">We need to <strong>summarise</strong>&nbsp;what&#8217;s been detected by the filters in order to <strong>generalise the knowledge</strong>.</p>



<p class="graf graf--figure">To do so, we will sample the output of the previous filtering operation.</p>



<p class="graf graf--figure">This operation is call&nbsp;<strong>pooling</strong>&nbsp;or <strong>downsampling&nbsp;</strong>but in fact it&#8217;s about reducing the size of the matrix.</p>



<p class="graf graf--figure">You can use any reducing operation such as: max, min, average, count, median, sum and so on.</p>



<div class="wp-block-image"><figure class="aligncenter"><img decoding="async" src="https://qph.fs.quoracdn.net/main-qimg-8afedfb2f82f279781bfefa269bc6a90.webp" alt=""/><figcaption>Example of a max pooling layer (Source : Stanford&#8217;s CS231n)</figcaption></figure></div>



<h3 class="wp-block-heading">Fact 8: Flatten everything to get on your feet</h3>



<p>Let&#8217;s not forget the main purpose of the neural network we are working on: building an image recognition system, also called <strong>image classification</strong>.</p>



<p>If the purpose of the neural network is to detect hand-written digits there will be 10 classes at the end to map the input image to : [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]</p>



<p>To map this input to a class after passing through all those filters and downsampling layers, we will have just 10 neurons (each of them representing a class) and each will connect to the last sub sampled layer.</p>



<p>Below is an overview of the original LeNet-5 Convolutional Neural Network designed by <a href="https://en.wikipedia.org/wiki/Yann_LeCun" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Yann Lecun</a>&nbsp; one of the few early adopter of this technology for image recognition.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img loading="lazy" decoding="async" width="713" height="213" src="https://www.ovh.com/blog/wp-content/uploads/2020/06/Architecture-of-CNN-by-LeCun-et-al-LeNet5.png" alt="" class="wp-image-18491" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/06/Architecture-of-CNN-by-LeCun-et-al-LeNet5.png 713w, https://blog.ovhcloud.com/wp-content/uploads/2020/06/Architecture-of-CNN-by-LeCun-et-al-LeNet5-300x90.png 300w" sizes="auto, (max-width: 713px) 100vw, 713px" /><figcaption>LeNet-5 architecture published in the original paper (source : http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf).</figcaption></figure></div>



<h3 class="graf graf--figure wp-block-heading"><b>Fact 9: Deep Learning is just LEAN &#8211; continuous&nbsp;improvement based on a feedback loop</b></h3>



<p class="graf graf--figure">The beauty of the technology does not only come from the convolution but from the capacity of the network to learn and adapt by itself. By implementing a feedback loop called&nbsp;<em><strong><a href="https://en.wikipedia.org/wiki/Backpropagation" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">backpropagation</a>&nbsp;</strong></em>the network will mitigate and&nbsp;inhibit some &#8220;neurons&#8221; in the different layers using&nbsp;<span style="text-decoration: underline;"><em><strong><a href="https://www.quora.com/What-does-weight-mean-in-terms-of-neural-networks" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">weights</a></strong></em></span><em>.&nbsp;</em></p>



<p class="graf graf--figure">Let&#8217;s KISS (keep it simple): we look at the output of the network, if the guess (the output 0,1,2,3,4,5,6,7,8 or 9) is wrong, we look at which filter(s) &#8220;made a mistake&#8221;, we give this filter or filters a small weight so they will not make the same mistake next time. And voila! The system learns and keeps improving itself.</p>



<h3 class="wp-block-heading"><b>Fact 10: It all amounts to the fact that Deep Learning is embarrassingly&nbsp;parallel</b></h3>



<p>Ingesting thousands of images, running tens of filters, applying downsampling, flattening the output &#8230; all of these steps can be done in parallel which make the system <a href="https://en.wikipedia.org/wiki/Embarrassingly_parallel" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">embarrassingly parallel</a>. Embarrassingly means in reality a <strong><em>perfectly parallel&nbsp;</em></strong>problem and it&#8217;s just a perfect use case for <em><strong>GPGPU (General Purpose Graphic Processing Unit),</strong></em> which&nbsp;are perfect for massively parallel computing.</p>



<h3 class="wp-block-heading"><strong>Fact 11: Need more precision? Just go deeper</strong></h3>



<p>Of course it is a bit of an oversimplification, but if we look at the main &#8220;image recognition competition&#8221;, known as the <a href="https://en.wikipedia.org/wiki/ImageNet#ImageNet_Challenge" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">ImageNet challenge,</a> we can see that the error rate has decreased with the depth of the neural network. It is generally acknowledged that, among other elements, the depth of the network will lead to a better capacity for generalisation and precision.</p>



<div class="wp-block-image"><figure class="aligncenter"><img decoding="async" src="https://cdn-images-1.medium.com/max/800/1*DBXf6dzNB78QPHGDofHA4Q.png" alt=""/><figcaption>Imagenet competition winner error rates VS number of layers in the network (source : https://medium.com/@sidereal/cnns-architectures-lenet-alexnet-vgg-googlenet-resnet-and-more-666091488df5)</figcaption></figure></div>



<h3 class="wp-block-heading"><strong>In conclusion&nbsp;&nbsp;</strong></h3>



<p>We have taken a brief look at the concept of Deep Learning as applied to image recognition. It&#8217;s worth noting that almost every&nbsp; new architecture for image recognition (medical, satellite, autonomous driving, &#8230;) uses these same principles with a different number of layers, different types of filters, different initialisation points, different matrix sizes, different tricks (like image augmentation, dropouts, weight compression, &#8230;). The concepts remain the same:</p>



<div class="wp-block-image wp-image-14654"><figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="885" height="469" src="https://www.ovh.com/blog/wp-content/uploads/2019/02/IMG_0070.jpg" alt="Number detection process" class="wp-image-14654" srcset="https://blog.ovhcloud.com/wp-content/uploads/2019/02/IMG_0070.jpg 885w, https://blog.ovhcloud.com/wp-content/uploads/2019/02/IMG_0070-300x159.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2019/02/IMG_0070-768x407.jpg 768w" sizes="auto, (max-width: 885px) 100vw, 885px" /><figcaption>51Number detection process</figcaption></figure></div>



<p>In other words, we saw that the training and inference of deep learning models comes down to lots and lots of basic matrix operations that can be done in parallel, and this is exactly what our good old graphical processors (GPU) are made for.</p>



<p>In the next post we will discuss how precisely a GPU works and how technically deep learning is implemented into it.</p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fdeep-learning-explained-to-my-8-year-old-daughter%2F&amp;action_name=Deep%20Learning%20explained%20to%20my%208-year-old%20daughter&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
