<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>IPMI Archives - OVHcloud Blog</title>
	<atom:link href="https://blog.ovhcloud.com/tag/ipmi/feed/" rel="self" type="application/rss+xml" />
	<link>https://blog.ovhcloud.com/tag/ipmi/</link>
	<description>Innovation for Freedom</description>
	<lastBuildDate>Fri, 27 Sep 2019 09:00:28 +0000</lastBuildDate>
	<language>en-GB</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>https://blog.ovhcloud.com/wp-content/uploads/2019/07/cropped-cropped-nouveau-logo-ovh-rebranding-32x32.gif</url>
	<title>IPMI Archives - OVHcloud Blog</title>
	<link>https://blog.ovhcloud.com/tag/ipmi/</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>The ins and outs of IPMI</title>
		<link>https://blog.ovhcloud.com/the-ins-and-outs-of-ipmi/</link>
		
		<dc:creator><![CDATA[Phil Perfetti]]></dc:creator>
		<pubDate>Mon, 16 Sep 2019 14:39:10 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Bare Metal servers]]></category>
		<category><![CDATA[IPMI]]></category>
		<category><![CDATA[United States]]></category>
		<guid isPermaLink="false">https://www.ovh.com/blog/?p=15993</guid>

					<description><![CDATA[What is IPMI? What&#8217;s the purpose of IPMI? Why should I care about IPMI? These are all fair questions. In the hosting provider world, IPMI or (Intelligent Platform Management Interface) is thrown around almost as much as &#8220;SDDC (Software Defined Data Center)&#8221; or &#8220;IaaS (Infrastructure as a Service)&#8221; but what does it mean, and why [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fthe-ins-and-outs-of-ipmi%2F&amp;action_name=The%20ins%20and%20outs%20of%20IPMI&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p>What is IPMI? What&#8217;s the purpose of IPMI? Why should I care about IPMI? These are all fair questions. In the hosting provider world, IPMI or (Intelligent Platform Management Interface) is thrown around almost as much as &#8220;SDDC (Software Defined Data Center)&#8221; or &#8220;IaaS (Infrastructure as a Service)&#8221; but what does it mean, and why should you care?</p>



<p>IPMI was created in a cooperative partnership between Intel, Dell, Hewlett Packard, and NEC. Since its creation, it has become an industry standard as an important hardware solution that allows Server Admins to monitor hardware status, log server data, and allow access to the server without having physical access of the server. By accessing a server through IPMI you are granted access to the system’s BIOS, having this access allows you to install or reinstall your own operating system, fix any network misconfigurations, or re enable SSH or RDP access using KVM (Keyboard Video Mouse) access to a server.</p>



<figure class="wp-block-image"><img fetchpriority="high" decoding="async" width="1024" height="539" src="https://www.ovh.com/blog/wp-content/uploads/2019/08/IMG_0368-1024x539.jpg" alt="" class="wp-image-16033" srcset="https://blog.ovhcloud.com/wp-content/uploads/2019/08/IMG_0368-1024x539.jpg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2019/08/IMG_0368-300x158.jpg 300w, https://blog.ovhcloud.com/wp-content/uploads/2019/08/IMG_0368-768x404.jpg 768w, https://blog.ovhcloud.com/wp-content/uploads/2019/08/IMG_0368.jpg 1202w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<p>By utilizing OVHcloud® infrastructure you will be able to use IPMI and have access to your server&#8217;s BIOS. This enables you to be an effective server administrator and troubleshoot any issues you may have with your server as well as install any operating system compatible with your server’s components.</p>



<p>At OVHcloud it&#8217;s important to us that our customers have the freedom and flexibility to innovate solutions to any challenge or problem they see before them; utilizing IPMI is one way we can give our customers such freedom.</p>



<p>To learn more about how to access IPMI from your OVHcloud Manager and how to install an operating system utilizing IPMI please consult the following guides that take you step by step through the process: <a href="https://support.us.ovhcloud.com/hc/en-us/articles/360007816120-Getting-Started-with-IPMI" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Getting Started with IPMI</a>, <a href="https://support.us.ovhcloud.com/hc/en-us/articles/360000108630-How-to-Install-an-OS-with-IPMI" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">How to Install an OS with IPMI</a>.</p>
<img decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fthe-ins-and-outs-of-ipmi%2F&amp;action_name=The%20ins%20and%20outs%20of%20IPMI&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Alerting based on IPMI data collection</title>
		<link>https://blog.ovhcloud.com/alerting-based-on-ipmi-data-collection/</link>
		
		<dc:creator><![CDATA[Morvan Le Goff]]></dc:creator>
		<pubDate>Fri, 10 May 2019 13:56:55 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Alerting]]></category>
		<category><![CDATA[Data Collection]]></category>
		<category><![CDATA[IPMI]]></category>
		<category><![CDATA[Observability]]></category>
		<guid isPermaLink="false">https://blog.ovh.com/fr/blog/?p=14974</guid>

					<description><![CDATA[The problem to solve&#8230; How to continuously monitor the health of all OVH servers, without any impact on their performance, and no intrusion on the operating systems running on them&#160;– this was the issue to address. The end goal of this data collection is to allow us to detect and forecast potential hardware failure, in [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Falerting-based-on-ipmi-data-collection%2F&amp;action_name=Alerting%20based%20on%20IPMI%20data%20collection&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<h2 class="wp-block-heading">The problem to solve&#8230;</h2>



<p>How to continuously monitor the health of all OVH servers, without any impact on their performance, and no intrusion on the operating systems running on them&nbsp;– this was the issue to address. The end goal of this data collection is to allow us to detect and forecast potential hardware failure, in order to improve the quality of service delivered to our customers.</p>



<p>We began by splitting the problem into four general steps:</p>



<ul class="wp-block-list"><li style="list-style-type: none;">
<ul>
<li>Data collection</li>
<li>Data storage</li>
<li>Data analytics</li>
<li>Visualisation/actions</li>
</ul>
</li></ul>



<h2 class="wp-block-heading">Data collection</h2>



<p>How did we collect massive amounts of server health data, in a non-intrusive way, within short time intervals?</p>



<div class="wp-block-image"><figure class="aligncenter is-resized"><img decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2019/05/CBD51216-1458-45ED-B575-69229AD64E2D-1024x667.jpeg" alt="" class="wp-image-15455" width="768" height="500" srcset="https://blog.ovhcloud.com/wp-content/uploads/2019/05/CBD51216-1458-45ED-B575-69229AD64E2D-1024x667.jpeg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2019/05/CBD51216-1458-45ED-B575-69229AD64E2D-300x195.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2019/05/CBD51216-1458-45ED-B575-69229AD64E2D-768x500.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2019/05/CBD51216-1458-45ED-B575-69229AD64E2D-1200x782.jpeg 1200w, https://blog.ovhcloud.com/wp-content/uploads/2019/05/CBD51216-1458-45ED-B575-69229AD64E2D.jpeg 1725w" sizes="(max-width: 768px) 100vw, 768px" /></figure></div>



<h3 class="wp-block-heading">Which data to collect?</h3>



<p>On modern servers, a BMC (Board Management Controller) allows us to control the firmware updates, reboots, etc.. This controller is independent of the system running on the server. In addition, the BMC gives us access to sensors for all the motherboard components through an I2C bus. The protocol used to communicate with the BMC is the IPMI protocol, which accessible via LAN (RMCP).</p>



<h4 class="wp-block-heading">What is IPMI?</h4>



<ul class="wp-block-list"><li>Intelligent Platform Management Interface.</li><li>Management and monitoring capabilities independently of the host’s OS.</li><li>Led by INTEL, first published in 1998.</li><li>Supported by more than 200 computer system vendors such as Cisco, DELL, HP, Intel, SuperMicro…</li></ul>



<h4 class="wp-block-heading">Why use IPMI?</h4>



<ul class="wp-block-list"><li>Access to hardware sensors (cpu temp, memory temp, chassis status, power, etc.).</li><li>No dependency on the OS (i.e. an agentless solution)</li><li>IPMI functions accessible after OS/system failure</li><li>Restricted access to IPMI functionalities via user privileges</li></ul>



<div class="wp-block-image"><figure class="aligncenter is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2019/05/34BC9464-E831-4E9A-83E2-5CD96B6A0869-1024x533.jpeg" alt="IPMI-poller node" class="wp-image-15456" width="768" height="400" srcset="https://blog.ovhcloud.com/wp-content/uploads/2019/05/34BC9464-E831-4E9A-83E2-5CD96B6A0869-1024x533.jpeg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2019/05/34BC9464-E831-4E9A-83E2-5CD96B6A0869-300x156.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2019/05/34BC9464-E831-4E9A-83E2-5CD96B6A0869-768x400.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2019/05/34BC9464-E831-4E9A-83E2-5CD96B6A0869-1200x625.jpeg 1200w, https://blog.ovhcloud.com/wp-content/uploads/2019/05/34BC9464-E831-4E9A-83E2-5CD96B6A0869.jpeg 2000w" sizes="auto, (max-width: 768px) 100vw, 768px" /></figure></div>



<h3 class="wp-block-heading">Multi-source data collection</h3>



<p>We needed a scalable and responsive multi-source data collection tool to grab the IPMI data of about 400k servers at fixed intervals.</p>



<div class="wp-block-image"><figure class="alignright is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2019/05/60E15FD2-69E4-471A-908B-8A06172973B4.png" alt="Akka" class="wp-image-15467" width="200" height="71" srcset="https://blog.ovhcloud.com/wp-content/uploads/2019/05/60E15FD2-69E4-471A-908B-8A06172973B4.png 352w, https://blog.ovhcloud.com/wp-content/uploads/2019/05/60E15FD2-69E4-471A-908B-8A06172973B4-300x106.png 300w" sizes="auto, (max-width: 200px) 100vw, 200px" /></figure></div>



<p>We decided to build our IPMI data collector on an&nbsp;<a href="https://github.com/akka/akka" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Akka</a> framework.&nbsp;Akka&nbsp;is a open-source toolkit and runtime, simplifying the construction of concurrent and distributed applications on the JVM.</p>



<p>The Akka framework defines an abstraction built above thread called &#8216;actor&#8217;. This actor is an entity that handles messages. This abstraction eases the creation of multi-thread applications, so there&#8217;s no need to fight against deadlock. By selecting the dispatcher policy for a group of actors, you can fine-tune your application to be fully reactive and adaptable to the load. This way, we were able to design an efficient data collector that could adapt to the load, as we intended to grab each sensor value every minute.</p>



<p>In addition, the cluster architecture provided by the framework allowed us to handle all the servers in a datacentre with a single cluster. The cluster architecture also helped us to design a resilient system, so if a node of the cluster crashes or becomes too slow, it will automatically restart. The servers monitored by the failing node are then handled by the remaining, valid nodes of the cluster.</p>



<p>With the cluster architecture, we implemented a quorum feature, to take down the whole cluster if the minimal number of started nodes is not reached. With this feature, we can easily solve the split-brain problem, as if the connection is broken between nodes, the cluster will be split into two entities, and the one that does not reached the quorum will be automatically shut down.</p>



<p>A REST API is defined to communicate with the data collector in two ways:</p>



<ul class="wp-block-list"><li>To send the configurations</li><li>To get information on the monitored servers </li></ul>



<p>A cluster node is running on one JVM, and we are able to launch one or more nodes on a dedicated server. Each dedicated server used in the cluster is put in an OVH VRACK. An IPMI gateway pool is used to access the BMC of each server, with the communication between the gateway and the IPMI data collector secured by IPSEC connections.</p>



<div class="wp-block-image"><figure class="aligncenter is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2019/05/2F3F033B-5D0D-4A3B-8F52-829087BF1349-1024x872.jpeg" alt="IPMI-poller clustering" class="wp-image-15457" width="512" height="436" srcset="https://blog.ovhcloud.com/wp-content/uploads/2019/05/2F3F033B-5D0D-4A3B-8F52-829087BF1349-1024x872.jpeg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2019/05/2F3F033B-5D0D-4A3B-8F52-829087BF1349-300x256.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2019/05/2F3F033B-5D0D-4A3B-8F52-829087BF1349-768x654.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2019/05/2F3F033B-5D0D-4A3B-8F52-829087BF1349-1200x1022.jpeg 1200w, https://blog.ovhcloud.com/wp-content/uploads/2019/05/2F3F033B-5D0D-4A3B-8F52-829087BF1349.jpeg 1491w" sizes="auto, (max-width: 512px) 100vw, 512px" /></figure></div>



<h2 class="wp-block-heading">Data storage</h2>



<div class="wp-block-image"><figure class="alignright is-resized"><img loading="lazy" decoding="async" src="/blog/wp-content/uploads/2019/05/1B79F173-0885-44F0-A356-D03D64DB7631.png" alt="OVH Metrics" class="wp-image-15470" width="199" height="179" srcset="https://blog.ovhcloud.com/wp-content/uploads/2019/05/1B79F173-0885-44F0-A356-D03D64DB7631.png 409w, https://blog.ovhcloud.com/wp-content/uploads/2019/05/1B79F173-0885-44F0-A356-D03D64DB7631-300x268.png 300w" sizes="auto, (max-width: 199px) 100vw, 199px" /></figure></div>



<p>Of course, we use the OVH Metrics service for data storage! Before storing the data, the IPMI data collector unifies the metrics, by qualifying each sensor. The final metric name is defined by the entity the sensor belongs to and the base unit of the value. This will ease the post-treatment processes and data visualisation/comparison.</p>



<p>Each datacentre IPMI collector pushes its data to a Metrics live cache server with a limited persistence time. All important information is persisted in the OVH Metrics server.</p>



<h2 class="wp-block-heading">Data analytics</h2>



<div class="wp-block-image"><figure class="alignright is-resized"><img loading="lazy" decoding="async" src="/blog/wp-content/uploads/2019/05/4BF68819-3158-43B2-A4DF-51123521806D-300x127.png" alt="Warp 10" class="wp-image-15468" width="201" height="85" srcset="https://blog.ovhcloud.com/wp-content/uploads/2019/05/4BF68819-3158-43B2-A4DF-51123521806D-300x127.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2019/05/4BF68819-3158-43B2-A4DF-51123521806D.png 450w" sizes="auto, (max-width: 201px) 100vw, 201px" /></figure></div>



<p>We store ours metrics in <a href="https://github.com/senx/warp10-platform" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">warp10</a>. Warp 10 comes with a Time series scripting language: WarpScript which wakes the analytics powerful to easily manipulate and post-process (on the server side) our collected data.</p>



<p>We have defined three levels of analysis to monitor the health of the servers:</p>



<ul class="wp-block-list"><li style="list-style-type: none;">
<ul>
<li>A simple threshold-per-server metric.</li>
<li>By using OVH metric loops service, we aggregate data per rack and per room and calculate a mean. We set a threshold for this mean, this permits to detect racks or room common failure in the cooling or power supply system.</li>
<li>The OVH MLS service performs some anomaly detections on the racks and rooms by forecasting the possible evolution of metrics, depending on past values. If the metrics value is outside of this template, an anomaly is raised.</li>
</ul>
</li></ul>



<h2 class="wp-block-heading">Visualisation/actions</h2>



<div class="wp-block-image"><figure class="alignright is-resized"><img loading="lazy" decoding="async" src="/blog/wp-content/uploads/2019/05/F8551D2C-5386-4754-912B-2A0C0F278684-150x150.png" alt="TAT" class="wp-image-15472" width="100" height="100"/></figure></div>



<p>All the alerts generated by the data analysis are pushed under <a href="https://github.com/ovh/tat" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">TAT</a>, which is an OVH tool we use to handle the alerting flow.</p>



<div style="height:20px" aria-hidden="true" class="wp-block-spacer"></div>



<div class="wp-block-image"><figure class="alignright is-resized"><img loading="lazy" decoding="async" src="/blog/wp-content/uploads/2019/05/47315BB4-8989-46AB-8882-25E804AFBFC1.png" alt="Grafana" class="wp-image-15473" width="150" height="124"/></figure></div>



<p>Grafana is used to monitored the metrics. We have dashboards to visualise the metrics and the aggregations for each rack and room, the detected anomalies, and the evolution of the opened alerts.</p>



<div style="height:20px" aria-hidden="true" class="wp-block-spacer"></div>



<div class="wp-block-image"><figure class="aligncenter"><img loading="lazy" decoding="async" width="300" height="163" src="/blog/wp-content/uploads/2019/03/Capture-d’écran-2019-03-05-à-10.17.16-1-300x163.png" alt="" class="wp-image-14985" srcset="https://blog.ovhcloud.com/wp-content/uploads/2019/03/Capture-d’écran-2019-03-05-à-10.17.16-1-300x163.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/Capture-d’écran-2019-03-05-à-10.17.16-1-768x417.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/Capture-d’écran-2019-03-05-à-10.17.16-1-1024x556.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2019/03/Capture-d’écran-2019-03-05-à-10.17.16-1-1200x651.png 1200w" sizes="auto, (max-width: 300px) 100vw, 300px" /></figure></div>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Falerting-based-on-ipmi-data-collection%2F&amp;action_name=Alerting%20based%20on%20IPMI%20data%20collection&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
