<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Hardware Archives - OVHcloud Blog</title>
	<atom:link href="https://blog.ovhcloud.com/tag/hardware/feed/" rel="self" type="application/rss+xml" />
	<link>https://blog.ovhcloud.com/tag/hardware/</link>
	<description>Innovation for Freedom</description>
	<lastBuildDate>Thu, 16 Jul 2020 07:37:37 +0000</lastBuildDate>
	<language>en-GB</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>https://blog.ovhcloud.com/wp-content/uploads/2019/07/cropped-cropped-nouveau-logo-ovh-rebranding-32x32.gif</url>
	<title>Hardware Archives - OVHcloud Blog</title>
	<link>https://blog.ovhcloud.com/tag/hardware/</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>How PCI-Express works and why you should care? #GPU</title>
		<link>https://blog.ovhcloud.com/how-pci-express-works-and-why-you-should-care-gpu/</link>
		
		<dc:creator><![CDATA[Jean-Louis Queguiner]]></dc:creator>
		<pubDate>Thu, 09 Jul 2020 10:16:00 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Deep learning]]></category>
		<category><![CDATA[GPU]]></category>
		<category><![CDATA[Hardware]]></category>
		<category><![CDATA[Infrastructure]]></category>
		<category><![CDATA[Machine learning]]></category>
		<category><![CDATA[PCIe]]></category>
		<guid isPermaLink="false">https://blog.ovh.com/fr/blog/?p=14485</guid>

					<description><![CDATA[What is PCI-Express ? Everyone, and I mean everyone, should pay attention when they do intensive Machine Learning / Deep Learning Training. As I explained in a previous blog post, GPUs have accelerated Artificial Intelligence evolution massively. However, building a GPUs server is not that easy. And failing to create an appropriate infrastructure can have [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fhow-pci-express-works-and-why-you-should-care-gpu%2F&amp;action_name=How%20PCI-Express%20works%20and%20why%20you%20should%20care%3F%20%23GPU&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<div class="wp-block-image"><figure class="aligncenter size-large"><img fetchpriority="high" decoding="async" width="1024" height="538" src="https://www.ovh.com/blog/wp-content/uploads/2020/07/69659375-3553-40C9-A201-73C4CDED2461-1024x538.jpeg" alt="How PCI-Express works and why you should care? #GPU" class="wp-image-18783" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/07/69659375-3553-40C9-A201-73C4CDED2461-1024x538.jpeg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/69659375-3553-40C9-A201-73C4CDED2461-300x158.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/69659375-3553-40C9-A201-73C4CDED2461-768x403.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/69659375-3553-40C9-A201-73C4CDED2461.jpeg 1200w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure></div>



<h2 class="wp-block-heading">What is PCI-Express ?</h2>



<p>Everyone, and I mean everyone, should pay attention when they do intensive Machine Learning / Deep Learning Training. </p>



<p>As I explained in a previous blog post, GPUs have accelerated Artificial Intelligence evolution massively.  </p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><a href="https://www.ovh.com/blog/understanding-the-anatomy-of-gpus-using-pokemon/" data-wpel-link="exclude"><img decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/05/EEAD0A02-DFCA-4745-802B-E36BC517EFED.png" alt="" class="wp-image-18103" width="334" height="254" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/05/EEAD0A02-DFCA-4745-802B-E36BC517EFED.png 668w, https://blog.ovhcloud.com/wp-content/uploads/2020/05/EEAD0A02-DFCA-4745-802B-E36BC517EFED-300x228.png 300w" sizes="(max-width: 334px) 100vw, 334px" /></a></figure></div>



<p>However, building a GPUs server is not that easy. And failing to create an appropriate infrastructure can have consequences on training time.</p>



<p>If you use GPUs, you should know that there are 2 ways to connect them to the motherboard to allow it to connect to the other components (network, CPU, storage device). Solution 1 is through <strong>PCI Express </strong>and solution 2 through <strong>SXM2</strong>. We will talk about <strong>SXM2</strong> in the future. Today, we will focus on <strong>PCI Express</strong>. This is because it has a strong dependency with the choice of adjacent hardware such as PCI BUS or CPU.</p>



<figure class="wp-block-table"><table class="has-fixed-layout"><thead><tr><th>                     NVIDIA V100 with SXM2 design</th><th class="has-text-align-center" data-align="center">                          NVIDIA V100 with PCI express design</th></tr></thead><tbody><tr><td><img decoding="async" width="609" height="644" class="wp-image-18763" style="width: 500px" src="https://www.ovh.com/blog/wp-content/uploads/2020/07/PCIe-01.jpg" alt="NVIDIA V100 with SXM2 design" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/07/PCIe-01.jpg 609w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/PCIe-01-284x300.jpg 284w" sizes="(max-width: 609px) 100vw, 609px" /><br>Source : <a aria-label="undefined (opens in a new tab)" href="https://www.ebizpc.com/NVIDIA-Tesla-V100-900-2G502-0300-000-16GB-GPU-p/900-2g503-0310-000.htm" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">https://www.ebizpc.com/NVIDIA-Tesla-V100-900-2G502-0300-000-16GB-GPU-p/900-2g503-0310-000.htm</a></td><td class="has-text-align-center" data-align="center"><img loading="lazy" decoding="async" width="450" height="450" class="wp-image-18764" style="width: 500px" src="https://www.ovh.com/blog/wp-content/uploads/2020/07/PCIe-02.jpg" alt="NVIDIA V100 with PCI express design" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/07/PCIe-02.jpg 450w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/PCIe-02-300x300.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/PCIe-02-150x150.jpg 150w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/PCIe-02-70x70.jpg 70w" sizes="auto, (max-width: 450px) 100vw, 450px" /><br>Source : <a aria-label="undefined (opens in a new tab)" href="https://nvidiastore.com.br/nvidia-tesla-v100-16gb" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">https://nvidiastore.com.br/nvidia-tesla-v100-16gb</a></td></tr></tbody></table><figcaption>SXM2 design VS PCI Express Design</figcaption></figure>



<p>This is a major element to consider when talking about deep learning as data loading phase is a waste of compute time, so bandwidth between components and GPUs is a key bottleneck in most deep learning training contexts.</p>



<h2 class="wp-block-heading">How does PCI-Express work and why you should care about the number of PCIe lanes?</h2>



<h3 class="wp-block-heading">What is a PCI-Express Lanes and are there any associated CPU limitations?</h3>



<p>Each GPU V100 is using the 16 PCI-e lanes. What does it mean exactly?</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img loading="lazy" decoding="async" width="618" height="442" src="https://www.ovh.com/blog/wp-content/uploads/2020/07/PCIe-03.png" alt="Extract from NVidia V100 product specification sheet" class="wp-image-18767" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/07/PCIe-03.png 618w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/PCIe-03-300x215.png 300w" sizes="auto, (max-width: 618px) 100vw, 618px" /><figcaption>Extract from NVidia V100 product specification <a href="https://images.nvidia.com/content/technologies/volta/pdf/tesla-volta-v100-datasheet-letter-fnl-web.pdf" target="_blank" aria-label="undefined (opens in a new tab)" rel="noreferrer noopener nofollow external" data-wpel-link="external">sheet</a></figcaption></figure></div>



<p>The <strong><em>&#8220;x16&#8221;</em></strong> means that the PCIe has 16 dedicated lanes. So&#8230; next question: What is a PCI Express lane ?</p>



<h4 class="wp-block-heading">What&#8217;s a PCI Express lane?</h4>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/07/72DFDF80-DC39-4253-BAB3-CEB351B627D3.jpeg" alt="2 PCI Express Devices with its interconnexion" class="wp-image-18779" width="424" height="299" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/07/72DFDF80-DC39-4253-BAB3-CEB351B627D3.jpeg 848w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/72DFDF80-DC39-4253-BAB3-CEB351B627D3-300x211.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/72DFDF80-DC39-4253-BAB3-CEB351B627D3-768x541.jpeg 768w" sizes="auto, (max-width: 424px) 100vw, 424px" /><figcaption>2 PCI Express Devices with its interconnexion : figure inspired of the awesome <a aria-label="undefined (opens in a new tab)" href="https://www.phhsnews.com/what-is-chipset-and-why-should-i-care3538" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">article</a> &#8211; what is chipset and why should I care</figcaption></figure></div>



<p>PCIe lanes are used to communicate between PCIe Devices or between PCIe and CPU. A lane is composed of 2 wires: one for inbound communications and one, which has double the traffic bandwidth, for outbound. </p>



<p>Lane communications are similar to network Layer 1 communications &#8211; it’s all about transferring bits as fast as possible through electrical wires! However, the technique used for PCIe Link is a bit different as the PCIe device is composed of xN lanes. In our previous example N=16 but it could be any power of 2 from 1 to 16 (1/2/4/8/16).</p>



<h3 class="wp-block-heading">So… if PCIe is similar to network architecture it means that PCIe layers exist, doesn&#8217;t it?</h3>



<p>Yes ! you are right PCIe has 4 layers:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="724" src="https://www.ovh.com/blog/wp-content/uploads/2020/07/photo_2020-07-02-15.08.02-1024x724.jpeg" alt="" class="wp-image-18723" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/07/photo_2020-07-02-15.08.02-1024x724.jpeg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/photo_2020-07-02-15.08.02-300x212.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/photo_2020-07-02-15.08.02-768x543.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/photo_2020-07-02-15.08.02.jpeg 1280w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p></p>



<h4 class="wp-block-heading"><strong>The Physical Layer (aka <em>the Big Negotiation Layer</em>)</strong></h4>



<p>The<strong><em> Physical Layer (PL)</em></strong> is responsible for negotiating the terms and conditions for receiving the raw packets (PLP for Physical Layer Packets) i.e the lane width and the frequency with the other device.</p>



<p>You should be aware that only the smallest number of lanes of the two devices will be used. This is why choosing the appropriate CPU is so important. CPUs have a limited number of lanes that they can manage so <strong>having a nice GPU with 16 PCIe Lanes and having a CPU with 8 PCIe Bus lanes will be as efficient as throwing away half your money because it doesn’t fit in your wallet.</strong></p>



<p>Packets received at the <strong><em>Physical Layer (aka PHY) </em></strong>are coming from other PCIe devices or from the system (via <strong><em>Direct Access Memory — DAM</em></strong> or from CPU for instance) and are encapsulated in a frame. </p>



<p>The purpose of a Start-of-Frame is to say: “I am sending you data, this is the beginning,” and it takes just 1 byte to say that!</p>



<p>The <strong><em>End-of-Frame</em> </strong>word is also 1 byte to say “goodbye I’m done with it”.</p>



<p>This layer implement a <strong><em>8b/10b or 128b/130b decoding</em></strong> that we will explain later and is mainly used for <strong><em>clock recovery.</em></strong></p>



<h4 class="wp-block-heading"><strong>The Data Link Layer Packet (aka <em>Let’s put this mess in the right&nbsp;order</em>)</strong></h4>



<p>The <strong><em>Data Link Layer Packet (DLLP)</em></strong> is starting with a <strong><em>Packet Sequence Number.</em></strong> This is really important as a packet might get corrupted at one point, so may need to be uniquely identified for retry purposes. The <strong><em>Sequence Number </em></strong>is coded on 2 bytes.</p>



<p>The <strong><em>Data Link Layer Packet</em></strong> is then followed by the <strong><em>Transaction Layer Packet</em></strong> and then closed with the <strong><em>LCRC (Local Cyclic Redundancy Check) </em></strong>and is used to check the <strong><em>Transaction Layer Packet (meaning the actual Payload)</em></strong> integrity.</p>



<p>If the <strong><em>LCRC</em></strong> is validated, then the <em><strong>Data Link Layer</strong></em> sends an <strong><em>ACK (ACKnowledge)</em></strong> signal to the <em><strong>emitter</strong></em> through the <strong><em>Physical Layer</em>.</strong> Otherwise it sends a <strong><em>NAK (Not AcKnowledge) </em></strong>signal to the emitter which will resend the frame associated with the <strong><em>sequence number </em></strong>to retry; this part handles the replay buffer on the <em><strong>receiver</strong></em> side.</p>



<h4 class="wp-block-heading"><strong>The Transaction Layer</strong></h4>



<p>The<strong><em> Transaction Layer</em></strong> is responsible for <strong>managing the actual payload (Header + Data)</strong> as well as the (optional) message digest <strong><em>ECRC (End to End Cyclic Redundancy Check)</em></strong>. This <strong><em>Transaction Layer Packet </em></strong>is coming from the <strong><em>Data Link Layer</em></strong> where it has been <strong>decapsulated</strong>.</p>



<p>An <strong>integrity check</strong> is performed if needed/requested. This step will check the integrity of the business logic and will insure no packet corruption when passing data from<strong><em> Data Link Layer</em></strong> to <em><strong>Transaction Layer.</strong></em></p>



<p>The header is describing the type of transaction such as:</p>



<ul class="wp-block-list"><li>Memory Transaction</li><li>I/O Transaction</li><li>Configuration Transaction</li><li>or Message Transaction</li></ul>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/07/5E282911-B63F-410D-A2CD-AD52B928C62E-1024x600.jpeg" alt="PCIe Layers" class="wp-image-18781" width="512" height="300" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/07/5E282911-B63F-410D-A2CD-AD52B928C62E-1024x600.jpeg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/5E282911-B63F-410D-A2CD-AD52B928C62E-300x176.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/5E282911-B63F-410D-A2CD-AD52B928C62E-768x450.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/5E282911-B63F-410D-A2CD-AD52B928C62E.jpeg 1368w" sizes="auto, (max-width: 512px) 100vw, 512px" /></figure></div>



<h4 class="wp-block-heading"><strong>The Application Layer</strong></h4>



<p>The role of the <em><strong>application layer</strong></em> is to handle the<strong><em> User Logic</em></strong>. This layer is sending the <strong><em>Header</em></strong> <strong><em>and the data payload </em></strong>to the <strong><em>Transaction Layer</em></strong>. The magic happens in this layer where data in rooted to different hardware components.</p>



<h3 class="wp-block-heading">How PCIe is communicating with the rest of the&nbsp;world?</h3>



<p>PCIe Link is using the <strong>packet switching concept used in network in a full duplex mode.</strong></p>



<p>PCIe device have an <strong>internal clock to orchestrate PCIe </strong><em><strong>Data Transfer Cycles</strong>.</em> This <strong><em>Data Transfer Cycle</em></strong> is also orchestrated thanks to the <strong><em>Referential Clock.</em></strong> The latter is sending a signal through a <strong><em>Dedicated Lane</em> (which is not part of the x1/2/4/8/16/32 mentioned above)</strong>. This clock will help both receiving and emitting devices to synchronize for packets communications.</p>



<p><strong>Each PCIe lane is used to send bytes in parallel with other lanes</strong>. The<strong><em> Clock Synchronization </em></strong>mentioned above will help the receiver to put back those bytes in the right order</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img loading="lazy" decoding="async" width="618" height="442" src="https://www.ovh.com/blog/wp-content/uploads/2020/07/PCIe-03.png" alt="" class="wp-image-18767" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/07/PCIe-03.png 618w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/PCIe-03-300x215.png 300w" sizes="auto, (max-width: 618px) 100vw, 618px" /><figcaption>x16 means 16 lanes of parallel communication on generation 3 of PCIe&nbsp;protocol</figcaption></figure></div>



<h3 class="wp-block-heading">You may have the bytes in order but do you have the data integrity at the physical layer&nbsp;?</h3>



<p>To ensure <strong>integrity</strong> PCIe device uses <strong>8b/10b encoding for PCIe generations 1 and 2</strong> or <strong>128b/130b encoding scheme for generations 3</strong> <strong>and 4.</strong></p>



<p>These encodings are used to prevent the loss of temporal landmarks, especially when transmitting consecutive similar bits. This process is called “<strong><em>Clock Recovery</em></strong>”</p>



<p>Those 128 bits of payload data are sent and 2 bytes of control are appended to it.</p>



<h4 class="wp-block-heading">Quick examples</h4>



<p><em>Let’s simplify it with a 8b/10b example:</em> according to IEEE 802.3 clause 36, table 36–1a based on Ethernet specifications here is the table 8b/10b encoding:</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img loading="lazy" decoding="async" width="600" height="546" src="https://www.ovh.com/blog/wp-content/uploads/2020/07/PCIe-04.png" alt="IEEE 802.3 clause 36, table 36–1a - 8b/10b encoding table" class="wp-image-18770" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/07/PCIe-04.png 600w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/PCIe-04-300x273.png 300w" sizes="auto, (max-width: 600px) 100vw, 600px" /><figcaption>IEEE 802.3 clause 36, table 36–1a &#8211; 8b/10b encoding table</figcaption></figure></div>



<p>So how can the receiver make the difference between all those repeating 0 (Code Group Name D0.0) ?</p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/07/2B41AC73-59D2-4230-B8F4-73327F3991E4-1024x819.png" alt="Repeating bits everywhere" class="wp-image-18777" width="512" height="410" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/07/2B41AC73-59D2-4230-B8F4-73327F3991E4-1024x819.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/2B41AC73-59D2-4230-B8F4-73327F3991E4-300x240.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/2B41AC73-59D2-4230-B8F4-73327F3991E4-768x615.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/2B41AC73-59D2-4230-B8F4-73327F3991E4.png 1381w" sizes="auto, (max-width: 512px) 100vw, 512px" /></figure></div>



<p>8b/10b encoding is composed of 5b/6b + 3b/4b encodings.</p>



<p>Therefore <strong>00000 000</strong> will be encoded into <strong>100111 0100 </strong>the 5 first bits of the original data <strong>00000</strong> are encoded to <strong>100111</strong> using 5b/6b encoding (<strong>rd+</strong>); same goes for the second group of 3bits of original data <strong>000</strong> encoded into <strong>0100</strong> using 3b/4b encoding (<strong>rd-</strong>).</p>



<p>It could have been also <strong>5b/6b encoding rd+ </strong>and<strong> 3b/4b encoding rd- </strong>making <strong>00000 000</strong> turning into <strong>011000 1011</strong></p>



<p><strong>Therefore the original data which was 8bits is now 10bits due to bits control (1 control bit for 5b/6b and 1 fir 3b/4b). </strong></p>



<p>But don&#8217;t worry I will draft a blog post later dedicated to encoding.</p>



<p><strong>PCIe Generations 1 and 2 were designed with 8b/10b encoding </strong>meaning that the <strong>actual data transmitted was only 80% of the total load </strong>(as 20% — 2 bits are used as Clock synchronization).</p>



<p><strong>PCIe Gen3&amp;4 were designed with 128b/130b </strong>meaning that the <strong>control bits are now representing only 1.56% of the payload. </strong>Quite good isn’t it?</p>



<h3 class="wp-block-heading">Let’s calculate the PCIe bandwidth together</h3>



<p>Here is the table of PCIe versions specifications</p>



<figure class="wp-block-table"><table><thead><tr><th>Number of Lanes</th><th>PCIe 1.0 (2003)</th><th>PCIe 2.0 (2007)</th><th><strong>PCIe 3.0 (2010)</strong></th><th><strong>PCIe 4.0 (2017)</strong></th><th>PCIe 5.0 (2019)</th><th>PCIe 6.0 (2021)</th></tr></thead><tbody><tr><td>x1</td><td>250 MB/s</td><td>500 MB/s</td><td>1 GB/s</td><td>2 GB/s</td><td>4 GB/s</td><td>8 GB/s</td></tr><tr><td>x2</td><td>500 MB/s</td><td>1 GB/s</td><td>2 GB/s</td><td>4 GB/s</td><td>8 GB/s</td><td>16 GB/s</td></tr><tr><td>x4</td><td>1 GB/s</td><td>2 GB/s</td><td>4 GB/s</td><td>8 GB/s</td><td>16 GB/s</td><td>32 GB/s</td></tr><tr><td>x8</td><td>2 GB/s</td><td>4 GB/s</td><td>8 GB/s</td><td>16 GB/s</td><td>32 GB/s</td><td>64 GB/s</td></tr><tr><td><strong>x16</strong></td><td>4 GB/s</td><td>8 GB/s</td><td><strong>16 GB/s</strong></td><td>32 GB/s</td><td>64 GB/s</td><td>128 GB/s</td></tr></tbody></table><figcaption>consortium PCI-SIG PCIe theoretical bandwidth/Lane/Way specification sheet</figcaption></figure>



<figure class="wp-block-table"><table><thead><tr><th>                                </th><th>PCIe 1.0 (2003)</th><th>PCIe 2.0 (2007)</th><th>PCIe 3.0 (2010)</th><th>PCIe 4.0 (2017)</th><th>PCIe 5.0 (2019)</th><th>PCIe 6.0 (2021)</th></tr></thead><tbody><tr><td><strong>Frequency</strong></td><td>2.5 GT/s</td><td>5.0 GT/s</td><td>8.0 GT/s</td><td>16 GT/s</td><td>32 GT/s</td><td>64 GT/s</td></tr></tbody></table><figcaption>consortium PCI-SIG PCIe theoretical raw bit rate specification sheet</figcaption></figure>



<p>To obtain such numbers let&#8217;s look at the general Bandwidth formula:</p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/07/B529B3E3-419B-49DE-9544-8B7BF190D3BB-1024x155.jpeg" alt="" class="wp-image-18793" width="512" height="78" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/07/B529B3E3-419B-49DE-9544-8B7BF190D3BB-1024x155.jpeg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/B529B3E3-419B-49DE-9544-8B7BF190D3BB-300x46.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/B529B3E3-419B-49DE-9544-8B7BF190D3BB-768x117.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/B529B3E3-419B-49DE-9544-8B7BF190D3BB.jpeg 1298w" sizes="auto, (max-width: 512px) 100vw, 512px" /></figure></div>



<ul class="wp-block-list"><li>BW stands for Bandwidth</li><li>MT/s&nbsp;: Mega Transfers per second</li><li>Encoding could be 4b/5b/, 8b/10b, 128b/130b,&nbsp;…</li></ul>



<h4 class="wp-block-heading">For PCIe v1.0:</h4>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/07/A99597E2-4117-43B1-9048-1CE24EFAE227-1024x170.jpeg" alt="BW/lane\ (MB/s) = \ 2\ 500\ (MT/s)\ *\ \frac{8\ bits}{10\ bits} * \frac{1\ Byte}{8\ bits" class="wp-image-18785" width="512" height="85" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/07/A99597E2-4117-43B1-9048-1CE24EFAE227-1024x170.jpeg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/A99597E2-4117-43B1-9048-1CE24EFAE227-300x50.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/A99597E2-4117-43B1-9048-1CE24EFAE227-768x127.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/A99597E2-4117-43B1-9048-1CE24EFAE227.jpeg 1231w" sizes="auto, (max-width: 512px) 100vw, 512px" /></figure></div>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/07/BC5F6C70-2FCF-4CD4-9040-848C8EB654CB.jpeg" alt="BW/lane\ (MB/s) = \ 250\ (MB/s)" class="wp-image-18788" width="347" height="79" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/07/BC5F6C70-2FCF-4CD4-9040-848C8EB654CB.jpeg 806w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/BC5F6C70-2FCF-4CD4-9040-848C8EB654CB-300x67.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/BC5F6C70-2FCF-4CD4-9040-848C8EB654CB-768x172.jpeg 768w" sizes="auto, (max-width: 347px) 100vw, 347px" /></figure></div>



<h4 class="wp-block-heading">For PCIe v3.0 (the one that interest us for NVIDIA V100):</h4>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/07/6EFDAF22-C7FC-44FC-B5BE-D8C4D291B71A-1024x154.jpeg" alt="BW/lane\ (MB/s) = \ 8\ 000\ (MT/s)\ *\ \frac{128\ bits}{130\ bits} * \frac{1\ Byte}{8\ bits}" class="wp-image-18795" width="512" height="77" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/07/6EFDAF22-C7FC-44FC-B5BE-D8C4D291B71A-1024x154.jpeg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/6EFDAF22-C7FC-44FC-B5BE-D8C4D291B71A-300x45.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/6EFDAF22-C7FC-44FC-B5BE-D8C4D291B71A-768x115.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/6EFDAF22-C7FC-44FC-B5BE-D8C4D291B71A.jpeg 1292w" sizes="auto, (max-width: 512px) 100vw, 512px" /></figure></div>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/07/3B7E1754-67C8-4EF1-88BE-3A5D8985803F.jpeg" alt="BW/lane\ (MB/s) = \ 984.6\ (MB/s)" class="wp-image-18796" width="355" height="63" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/07/3B7E1754-67C8-4EF1-88BE-3A5D8985803F.jpeg 802w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/3B7E1754-67C8-4EF1-88BE-3A5D8985803F-300x53.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/07/3B7E1754-67C8-4EF1-88BE-3A5D8985803F-768x136.jpeg 768w" sizes="auto, (max-width: 355px) 100vw, 355px" /></figure></div>



<p>Therefore with <strong>16 lanes for a NVIDIA V100 connected in PCIe v3.0</strong>, we have an effective data rate transfer (data bandwidth)<strong> of nearly 16GB/s/way </strong>(<strong>actual bandwidth is 15.75GB/s/way</strong>)</p>



<p>You need to be careful not to get confused, as total bandwidth can also be interpreted as two ways bandwidth; in this case we consider total bandwidth x16 to be around 32GB/s.</p>



<p><em><strong>Note :</strong></em> Another element that we haven&#8217;t considered is that the maximum theoretical bandwidth needs to be reduced by around 1 Gb/s for error correction protocols (<strong><em>ECRC</em></strong> and <strong><em>LCRC</em></strong>) as well as the <strong><em>Headers</em></strong> (<strong><em>Start tag, Sequence tag, Header</em></strong>) and <strong><em>Footer</em></strong> (<em><strong>End</strong></em> tag) overheads explained earlier in this blog post.</p>



<h3 class="wp-block-heading">In conclusion</h3>



<p>We have seen that PCI Express has evolved a lot and that It&#8217;s based on the same concepts as network. To take the best from the PCIe devices it is necessary to understand the fundamentals of the underlying infrastructure. </p>



<p>Failing to choose the right underlying Motherboard, CPU or BUS can lead to major performance bottleneck and GPU under performance.</p>



<p>To sum up :</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow"><p>Friends don&#8217;t let friends build their own GPUs hosts 😉</p><cite>Jean-Louis Quéguiner July 1<sup>st</sup>, 2020</cite></blockquote>



<p>If you liked this post but you want to drill down a bit into the Deep Learning and AI aspect of things don&#8217;t hesitate to check out my other blog posts:</p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><a href="https://www.ovh.com/blog/deep-learning-explained-to-my-8-year-old-daughter/" data-wpel-link="exclude"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/05/BC0E1AC1-6593-4395-9844-A7D2CB457028.png" alt="" class="wp-image-18099" width="515" height="376" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/05/BC0E1AC1-6593-4395-9844-A7D2CB457028.png 748w, https://blog.ovhcloud.com/wp-content/uploads/2020/05/BC0E1AC1-6593-4395-9844-A7D2CB457028-300x219.png 300w" sizes="auto, (max-width: 515px) 100vw, 515px" /></a></figure></div>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><a href="https://www.ovh.com/blog/what-does-training-neural-networks-mean/" data-wpel-link="exclude"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/04/81921ABA-7642-4CA2-87BF-9B2D92278BF1-1024x538.png" alt="What does training neural networks mean?" class="wp-image-17932" width="512" height="269" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/04/81921ABA-7642-4CA2-87BF-9B2D92278BF1-1024x538.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/81921ABA-7642-4CA2-87BF-9B2D92278BF1-300x158.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/81921ABA-7642-4CA2-87BF-9B2D92278BF1-768x404.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/04/81921ABA-7642-4CA2-87BF-9B2D92278BF1.png 1200w" sizes="auto, (max-width: 512px) 100vw, 512px" /></a></figure></div>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><a href="https://www.ovh.com/blog/distributed-training-in-a-deep-learning-context/" data-wpel-link="exclude"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/05/20C35ECE-4738-4967-951E-6BC863342D5D-1024x537.png" alt="Distributed Learning in a Deep Learning context" class="wp-image-18106" width="512" height="269" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/05/20C35ECE-4738-4967-951E-6BC863342D5D-1024x537.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/05/20C35ECE-4738-4967-951E-6BC863342D5D-300x157.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/05/20C35ECE-4738-4967-951E-6BC863342D5D-768x403.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/05/20C35ECE-4738-4967-951E-6BC863342D5D.png 1200w" sizes="auto, (max-width: 512px) 100vw, 512px" /></a></figure></div>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fhow-pci-express-works-and-why-you-should-care-gpu%2F&amp;action_name=How%20PCI-Express%20works%20and%20why%20you%20should%20care%3F%20%23GPU&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
