<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>ceph Archives - OVHcloud Blog</title>
	<atom:link href="https://blog.ovhcloud.com/tag/ceph/feed/" rel="self" type="application/rss+xml" />
	<link>https://blog.ovhcloud.com/tag/ceph/</link>
	<description>Innovation for Freedom</description>
	<lastBuildDate>Mon, 10 Jun 2024 14:26:03 +0000</lastBuildDate>
	<language>en-GB</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>https://blog.ovhcloud.com/wp-content/uploads/2019/07/cropped-cropped-nouveau-logo-ovh-rebranding-32x32.gif</url>
	<title>ceph Archives - OVHcloud Blog</title>
	<link>https://blog.ovhcloud.com/tag/ceph/</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Proxmox VE Ceph cluster and DRP using OVHcloud dedicated servers</title>
		<link>https://blog.ovhcloud.com/ovh-proxmox-drp-servers/</link>
		
		<dc:creator><![CDATA[Carles Munoz&nbsp;and&nbsp;Cristina Ortiz]]></dc:creator>
		<pubDate>Mon, 10 Jun 2024 13:59:17 +0000</pubDate>
				<category><![CDATA[OVHcloud Partner Program]]></category>
		<category><![CDATA[ceph]]></category>
		<category><![CDATA[Cluster]]></category>
		<category><![CDATA[DRP]]></category>
		<category><![CDATA[Proxmox]]></category>
		<category><![CDATA[VE]]></category>
		<guid isPermaLink="false">https://blog.ovhcloud.com/?p=26885</guid>

					<description><![CDATA[OVHcloud’s IaaS (Infraestructure as a Service) services allow us to hire dedicated servers (bare metal) for rent that we can have at our disposal in a matter of minutes. There is a wide range of options (rise, advance, storage, scale, high quality, etc.) from which we can choose according to our needs. In this article [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fovh-proxmox-drp-servers%2F&amp;action_name=Proxmox%20VE%20Ceph%20cluster%20and%20DRP%20using%20OVHcloud%20dedicated%20servers&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p></p>



<p><a href="https://www.ovhcloud.com/en-gb/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OVHcloud’s</a> <strong>IaaS</strong> (Infraestructure as a Service) services allow us to hire <strong>dedicated servers</strong> (bare metal) for rent that we can have at our disposal in a matter of minutes. There is a wide range of options (rise, advance, storage, scale, high quality, etc.) from which we can choose according to our needs.</p>



<p>In this article we will see how we can use these dedicated servers to create <a href="https://soltecsis.com/en/proxmox-ve-ceph-cluster/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Proxmox VE Ceph Clusters</a> with the same functionalities that we would have if we used proprietary servers in our facilities (on premises). In addition, we will see how <a href="https://www.ovhcloud.com/en-gb/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OVHcloud</a> services allow us to create a very complete <a href="https://soltecsis.com/en/disaster-recovery-plan-drp/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">DRP</a> (Disaster Recovery Plan) to achieve maximum resilience for our data</p>



<figure class="wp-block-image size-full"><img fetchpriority="high" decoding="async" width="850" height="476" src="https://blog.ovhcloud.com/wp-content/uploads/2024/06/DedicatedServers.png" alt="" class="wp-image-26894" srcset="https://blog.ovhcloud.com/wp-content/uploads/2024/06/DedicatedServers.png 850w, https://blog.ovhcloud.com/wp-content/uploads/2024/06/DedicatedServers-300x168.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2024/06/DedicatedServers-768x430.png 768w" sizes="(max-width: 850px) 100vw, 850px" /></figure>



<p><a href="https://www.ovhcloud.com/en-gb/bare-metal/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"></a>OVHCloud &#8211; Dedicated Servers</p>



<p><strong>Ceph Proxmox VE Cluster</strong></p>



<p>A cluster of Proxmox VE servers combined with a Ceph distributed storage system allows you to create a highly available, load-balanced, horizontally scalable, hyperconverged virtualization infrastructure with ease.</p>



<p>Let’s first look at some concepts to fully understand what a <a href="https://soltecsis.com/en/proxmox-ve-ceph-cluster/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Proxmox VE Ceph Cluster</a> is.</p>



<p><strong>What is a cluster?</strong></p>



<p>A cluster in computing refers to a group of interconnected computers or nodes that function together as if they were a single entity. Clusters are used to improve the availability, performance, and scalability of applications and services. There are different types of clusters, but in general they share the objective of providing greater processing capacity and redundancy.</p>



<p><strong>What is Ceph?</strong></p>



<p>Ceph is a distributed storage system designed to provide object, block, and file storage in a single unified cluster. . Proxmox can use Ceph as a virtual machines storege.</p>



<p><strong>What is a Proxmox VE Ceph cluster?</strong></p>



<p>It is three or more servers forming part of a Proxmox cluster and using Ceph as a distributed storage system, all managed from the Proxmox web interface, thanks to which we achieve a hyperconverged virtualization infrastructure.</p>



<p><strong>What is hyperconvergence?</strong></p>



<p>A hyperconverged virtualization infrastructure is an integrated system that combines compute, storage, and networking in a single environment. This simplifies management, improves efficiency, and enables easy scalability, making it easy to create and manage virtual machines in a single cluster.</p>



<figure class="wp-block-image size-full"><img decoding="async" width="850" height="444" src="https://blog.ovhcloud.com/wp-content/uploads/2024/06/ProxmoxVECeph.png" alt="" class="wp-image-26895" srcset="https://blog.ovhcloud.com/wp-content/uploads/2024/06/ProxmoxVECeph.png 850w, https://blog.ovhcloud.com/wp-content/uploads/2024/06/ProxmoxVECeph-300x157.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2024/06/ProxmoxVECeph-768x401.png 768w" sizes="(max-width: 850px) 100vw, 850px" /></figure>



<p><a href="https://soltecsis.com/en/proxmox-ve-ceph-cluster/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"></a>Proxmox VE Ceph Cluster</p>



<p>By means of the OVHcloud dedicated server service we can create a <a href="https://soltecsis.com/en/proxmox-ve-ceph-cluster/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Proxmox VE Ceph cluster</a> with three or more nodes in the same way that we would do if we acquired our own servers to create the cluster in our facilities, but with the versatility and advantages that come with using rental servers instead of proprietary hardware, among which we can mention:</p>



<ul class="wp-block-list">
<li><strong>Abstraction</strong> of the hardware layer since any breakdown will be solved by <a href="https://www.ovhcloud.com/en-gb/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OVHcloud</a> and, if necessary, they will replace the damaged parts or even the entire server.</li>



<li>We <strong>forget</strong> about <strong>hardware obsolescence</strong>. After a few years we will be able to add new servers with the latest technologies to our cluster and eliminate the old ones in a completely transparent way for the user of the virtualization environment, without any interruption of service.</li>



<li>Easily add <strong>more nodes</strong> to our virtualization cluster to increase your computing power. In a matter of minutes we can have new servers that we can add to our cluster.</li>
</ul>



<p>The large number of dedicated server options that <a href="https://www.ovhcloud.com/en-gb/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OVHcloud</a> makes available to its clients allows us to create <a href="https://soltecsis.com/en/proxmox-ve-ceph-cluster/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Proxmox VE Ceph Clusters</a> for practically any client. We can use different CPUs depending on our needs, large amounts of RAM, large NVMe disks, dedicated 25Gbps networks for Ceph communication, the vRACK service to connect our servers, dedicated IP ranges, etc. All this allows us to cover practically the needs of any client.</p>



<figure class="wp-block-image size-full"><img decoding="async" width="850" height="450" src="https://blog.ovhcloud.com/wp-content/uploads/2024/06/A2ScaleServers.png" alt="" class="wp-image-26896" srcset="https://blog.ovhcloud.com/wp-content/uploads/2024/06/A2ScaleServers.png 850w, https://blog.ovhcloud.com/wp-content/uploads/2024/06/A2ScaleServers-300x159.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2024/06/A2ScaleServers-768x407.png 768w" sizes="(max-width: 850px) 100vw, 850px" /></figure>



<p><a href="https://www.ovhcloud.com/en-gb/bare-metal/scale/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"></a>OVHCloud &#8211; A2-Scale Servers</p>



<p><strong>DRP (Disaster Recovery Plan)</strong></p>



<p>A <a href="https://soltecsis.com/en/disaster-recovery-plan-drp/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">DRP</a> (Disaster Recovery Plan) is crucial to maintaining business operations in the event of disasters, guaranteeing the continuity and protection of essential data. It is very important to have a good <a href="https://soltecsis.com/en/disaster-recovery-plan-drp/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">DRP</a> to ensure the resilience of the data.</p>



<p><strong>What is a Disaster Recovery Plan?</strong></p>



<p><a href="https://soltecsis.com/en/disaster-recovery-plan-drp/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Disaster Recovery Plan</a> in IT refers to a set of strategies, policies and procedures that an organization implements to restore its critical systems and data after a catastrophic event or disaster that causes significant disruptions to operations. normal.que cause interrupciones significativas en las operaciones normales.</p>



<p>These events may include:</p>



<ul class="wp-block-list">
<li>Natural disasters: Such as earthquakes, floods, storms, etc.</li>



<li>Man-made disasters: Such as cyber attacks, infrastructure failures, acts of vandalism, etc.</li>
</ul>



<p>The goal of the <a href="https://soltecsis.com/en/disaster-recovery-plan-drp/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Disaster Recovery Plan</a> is to minimize downtime and ensure business continuity, allowing the organization to recover quickly after a disaster.</p>



<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="850" height="534" src="https://blog.ovhcloud.com/wp-content/uploads/2024/06/DRP.png" alt="" class="wp-image-26897" srcset="https://blog.ovhcloud.com/wp-content/uploads/2024/06/DRP.png 850w, https://blog.ovhcloud.com/wp-content/uploads/2024/06/DRP-300x188.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2024/06/DRP-768x482.png 768w" sizes="auto, (max-width: 850px) 100vw, 850px" /></figure>



<h3 class="wp-block-heading"><a href="https://soltecsis.com/en/disaster-recovery-plan-drp/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"></a><strong>DRP (Disaster Recovery Plan)</strong></h3>



<p>Below we will list several options, taking into account that our <a href="https://soltecsis.com/en/disaster-recovery-plan-drp/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">DRP</a> may have some of these or even a combination of all of them, depending on the level of data resilience desired. <strong>Option 1: Proxmox VE Ceph Cluster distributed in OVHcloud 3-AZ region</strong> Our partner <a href="https://www.ovhcloud.com/en-gb/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OVHcloud</a> has a service called <a href="https://www.ovhcloud.com/en-gb/bare-metal/uc-3-az-resilience/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">3-AZ region</a> thanks to which we can distribute a <a href="https://soltecsis.com/en/proxmox-ve-ceph-cluster/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Proxmox VE Ceph Cluster</a> made up of dedicated servers (bare metal) between three different data centers separated by a few tens of kilometers. These data centers that make up the <a href="https://www.ovhcloud.com/en-gb/bare-metal/uc-3-az-resilience/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">3-AZ region</a> are interconnected through redundant fibers with minimal latency, thanks to which we can set up a <a href="https://soltecsis.com/en/proxmox-ve-ceph-cluster/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Proxmox VE Ceph Cluster</a> distributed within said region. This provides us with great resilience of the data against incidents located in one of the data centers since our virtualization service will not be affected by it.</p>



<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="850" height="246" src="https://blog.ovhcloud.com/wp-content/uploads/2024/06/3AZ.png" alt="" class="wp-image-26898" srcset="https://blog.ovhcloud.com/wp-content/uploads/2024/06/3AZ.png 850w, https://blog.ovhcloud.com/wp-content/uploads/2024/06/3AZ-300x87.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2024/06/3AZ-768x222.png 768w" sizes="auto, (max-width: 850px) 100vw, 850px" /></figure>



<p><a href="https://www.ovhcloud.com/en-gb/bare-metal/uc-3-az-resilience/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"></a>OVHcloud &#8211; 3-AZ Region</p>



<p><strong>Option 2: Proxmox Backup Server with frequent replication</strong> Make use of a <a href="https://soltecsis.com/en/proxmox-backup-server/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Proxmox Backup Server</a> (PBS) or by means of our <a href="https://soltecsis.com/en/proxmox-backup-server/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">PBS Online service</a> in a data center (even a different country or continent) where you have a backup copy of all virtual machines, maintaining a history of various versions over time depending on the space available for copies. For those virtual machines that are more critical, you can even make more frequent copies (for example, every hour) so that if you have to activate the <a href="https://soltecsis.com/en/disaster-recovery-plan-drp/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">DRP</a>, the data loss is as little as possible. This option can be implemented using the <a href="https://soltecsis.com/en/proxmox-backup-server/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">PBS Online</a> service, the <a href="https://www.ovhcloud.com/en-gb/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OVHcloud</a> IaaS service to rent a dedicated server on which to install the <a href="https://soltecsis.com/en/proxmox-backup-server/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">PBS</a> or any other cloud service or even own facilities in which to locate the <a href="https://soltecsis.com/en/proxmox-backup-server/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">PBS</a>.</p>



<p><strong>Option 3: Ceph Replicación</strong> Create a <a href="https://soltecsis.com/en/proxmox-ve-ceph-cluster/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Proxmox VE Ceph Cluster</a> identical to the production one in a data center geographically separated from the main cluster and activate <strong>Ceph replication</strong> between both clusters. This is the option with the least data loss if we compare it with option 2, but much more expensive since we have to have a cluster equal to the main cluster that we will only activate in case of disaster. This option can be implemented using the <a href="https://www.ovhcloud.com/en-gb/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OVHcloud</a> IaaS service, given that they have data centers distributed in several countries and continents. Therefore, it would be viable to host the main cluster in a data center and the replication cluster located in another country. It can also be implemented using different cloud providers and also on the customer’s premises if he has geographically separated data centers.</p>



<p><strong>Conclusion</strong></p>



<p>As we have seen throughout this article, we can create a hyperconverged virtualization infrastructure with great data resilience using the <a href="https://soltecsis.com/en/proxmox-ve/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Proxmox VE</a> hypervisor and dedicated <a href="https://www.ovhcloud.com/en-gb/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OVHCloud</a> servers. A <a href="https://soltecsis.com/en/proxmox-ve-ceph-cluster/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Proxmox VE Ceph Cluster</a> with three or more nodes located in a <a href="https://www.ovhcloud.com/en-gb/bare-metal/uc-3-az-resilience/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">3-AZ region</a> of <a href="https://www.ovhcloud.com/en-gb/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">OVHcloud</a>, in combination with a <a href="https://soltecsis.com/en/proxmox-backup-server/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">PBS</a> (Proxmox Backup Server) hosted in a different data center and country is a highly available solution, with high scalability and with great data resilience. If we also add an identical cluster in another data center outside the <a href="https://www.ovhcloud.com/en-gb/bare-metal/uc-3-az-resilience/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">3-AZ region</a> with real-time Ceph replication, we can have a <a href="https://soltecsis.com/en/disaster-recovery-plan-drp/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">DRP</a> (Disaster Recovery Plan) that allows rapid disaster recovery.</p>



<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="850" height="596" src="https://blog.ovhcloud.com/wp-content/uploads/2024/06/Proxmox2.png" alt="" class="wp-image-26899" srcset="https://blog.ovhcloud.com/wp-content/uploads/2024/06/Proxmox2.png 850w, https://blog.ovhcloud.com/wp-content/uploads/2024/06/Proxmox2-300x210.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2024/06/Proxmox2-768x539.png 768w" sizes="auto, (max-width: 850px) 100vw, 850px" /></figure>



<p><a href="https://soltecsis.com/en/proxmox-ve/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer"></a>&nbsp;</p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fovh-proxmox-drp-servers%2F&amp;action_name=Proxmox%20VE%20Ceph%20cluster%20and%20DRP%20using%20OVHcloud%20dedicated%20servers&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Journey to next-gen Ceph storage at OVHcloud with LXD</title>
		<link>https://blog.ovhcloud.com/journey-to-next-gen-ceph-storage-at-ovhcloud-with-lxd/</link>
		
		<dc:creator><![CDATA[Filip Dorosz]]></dc:creator>
		<pubDate>Mon, 15 Jun 2020 14:35:12 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[ceph]]></category>
		<category><![CDATA[Infrastructure]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Storage]]></category>
		<guid isPermaLink="false">https://www.ovh.com/blog/?p=18385</guid>

					<description><![CDATA[Introduction My name is Filip Dorosz. I&#8217;ve been working at OVHcloud since 2017 as a DevOps Engineer. Today I want to tell you a story of how we deployed next-gen Ceph at OVHcloud. But first, a few words about Ceph: Ceph is a software defined storage solution that powers OVHcloud’s additional Public Cloud volumes as [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fjourney-to-next-gen-ceph-storage-at-ovhcloud-with-lxd%2F&amp;action_name=Journey%20to%20next-gen%20Ceph%20storage%20at%20OVHcloud%20with%20LXD&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<h2 class="wp-block-heading" id="Journeytonext-genCephstorageatOVHcloudwithLXD-1.Introduction">Introduction</h2>



<p>My name is Filip Dorosz. I&#8217;ve been working at OVHcloud since 2017 as a DevOps Engineer. Today I want to tell you a story of how we deployed next-gen <a href="https://ceph.io/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Ceph </a>at OVHcloud. But first, a few words about Ceph: Ceph is a software defined storage solution that powers OVHcloud’s additional Public Cloud volumes as well as our product Cloud Disk Array. But I won’t bore you with the marketing stuff &#8211; let the story begin!</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="537" src="https://www.ovh.com/blog/wp-content/uploads/2020/06/46F6FD2B-15FE-4A84-9AF5-C1CDEB4DBC51-1024x537.jpeg" alt="Journey to next-gen Ceph storage at OVHcloud with LXD" class="wp-image-18543" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/06/46F6FD2B-15FE-4A84-9AF5-C1CDEB4DBC51-1024x537.jpeg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/06/46F6FD2B-15FE-4A84-9AF5-C1CDEB4DBC51-300x157.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/06/46F6FD2B-15FE-4A84-9AF5-C1CDEB4DBC51-768x403.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/06/46F6FD2B-15FE-4A84-9AF5-C1CDEB4DBC51.jpeg 1200w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure></div>



<h2 class="wp-block-heading">This looks like an interesting task&#8230;</h2>



<p>One and a half years ago we started a very familiar sprint. Aside from usual stuff that we have to deal with, there were one task that looked a little more interesting. The title read: “<em>Evaluate whether we can run newer versions of Ceph on our current software</em>”. We needed newer versions of Ceph and Bluestore to create a next-gen Ceph solution with all flash storage.</p>



<p>Our software solution (which we call a legacy solution) is based on Docker. It sounds really cool but we run Docker a bit differently from it&#8217;s intended purpose. Our containers are <em>very</em> <em>stateful</em>. We run a full blown init system inside the container as docker entry point. And that init system then starts all the software we need inside the container, including Puppet which we use to manage the <em>“things”</em>. It sounds like we&#8217;re using Docker containers similarly to LXC containers doesn’t it?&#8230;</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="536" src="https://www.ovh.com/blog/wp-content/uploads/2020/06/9CFADB21-44BC-4526-B42C-E7757BA28576-1024x536.jpeg" alt="Our legacy Ceph infrastructure (allegory)" class="wp-image-18535" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/06/9CFADB21-44BC-4526-B42C-E7757BA28576-1024x536.jpeg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/06/9CFADB21-44BC-4526-B42C-E7757BA28576-300x157.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/06/9CFADB21-44BC-4526-B42C-E7757BA28576-768x402.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/06/9CFADB21-44BC-4526-B42C-E7757BA28576.jpeg 1199w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption>Our legacy Ceph infrastructure (allegory)</figcaption></figure></div>



<p>It quickly turned out that it is not possible to run newer Ceph releases in our in-house solution because newer versions of Ceph make use of systemd and in our current solution we don’t run systemd at all &#8211; not inside the containers and not on the hosts that host them.</p>



<p>The hunt for solutions began. One possibility was to package Ceph ourselves and get rid of systemd, but that&#8217;s a lot of work with little added value. Ceph community provides tested packages which need to be taken advantage of, so that option was off the table.</p>



<p>Second option was to run Ceph with supervisord inside the Docker container. While it sounds like a plan, even supervisord docs clearly states that supervisord <em>“is not meant to be run as a substitute for init as “process id 1”.”.</em> So that was clearly not an option either.</p>



<h2 class="wp-block-heading">We needed systemd!</h2>



<p>At this point, it was clear that we needed a solution that enables us to run systemd inside the container as well as on the host. It sounded like a perfect time to switch to a brand new solution &#8211; a solution that was designed to run a full OS inside the container. As our Docker used LXC backend it was a natural choice to evaluate LXC. It had all the features we need but with LXC we would have to code all the container-related automation ourselves. But could all this additional work be avoided? It turns out it could&#8230;</p>



<p>As I used LXD in my previous project I knew it is capable of managing images, networks, block devices and all the nice features that are needed to setup a fully functional Ceph cluster.</p>



<p>So I reinstalled my developer servers with an <a href="https://ubuntu.com/server" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">Ubuntu Server LTS</a> release and installed <a href="https://linuxcontainers.org/lxd/introduction/" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">LXD</a> on them.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="156" src="https://www.ovh.com/blog/wp-content/uploads/2020/06/C495086D-4364-407D-9B61-799616864922-1024x156.jpeg" alt="Ubuntu &amp; LXD" class="wp-image-18538" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/06/C495086D-4364-407D-9B61-799616864922-1024x156.jpeg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/06/C495086D-4364-407D-9B61-799616864922-300x46.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/06/C495086D-4364-407D-9B61-799616864922-768x117.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/06/C495086D-4364-407D-9B61-799616864922.jpeg 1297w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure></div>



<p>LXD has everything that was needed to create fully functional Ceph clusters:</p>



<ul class="wp-block-list"><li>it supports &#8216;fat&#8217; stateful containers,</li><li>it supports systemd inside the container,</li><li>it supports container images so we can prepare customized images and use them without hassle,</li><li>passing whole block devices to containers,</li><li>passing ordinary directories to containers,</li><li>support for easy container start, stop, restart,</li><li>REST API that will be covered in later parts of the article,</li><li>support for multiple network interfaces within containers using macvlan.</li></ul>



<p>After just a few hours of manual work I had Ceph cluster running Mimic release inside LXD containers. I typed ceph health and I got ‘HEALTH_OK’. Nice! It worked great.</p>



<h2 class="wp-block-heading">How do we industrialize that?</h2>



<p>To industrialize it and plug it into our Control Plane we needed a Puppet module for LXD so Puppet could manage all the container related elements on the host. There was no such module that provided the functionality we needed so we needed to code it ourselves.</p>



<p>LXD daemon exposes handy <a href="https://github.com/lxc/lxd/blob/master/doc/rest-api.md" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">REST API</a> that we utilized to create our Puppet module. You can talk to the API locally over unix socket and through the network if you configure to expose it. For usage within the module it was really convenient to use&nbsp;<em>lxc query</em> command which works by sending raw queries to LXD over unix socket. The module is now <a rel="noreferrer noopener nofollow external" href="https://github.com/ovh/lxd-puppet-module" target="_blank" data-wpel-link="external">opensource</a> on GitHub so you can download and play with it. It enables you to configure basic LXD settings as well as create containers, profiles, storage pools etc.</p>



<p>The module allows you to create, as well as manage the state of the resources. Just change your manifests, run puppet agent, and it will do the rest.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="243" src="https://www.ovh.com/blog/wp-content/uploads/2020/06/DA9D31B2-A35E-41A8-8DCF-67056D76F528-1024x243.jpeg" alt="Open source LXD Puppet Module, available on GitHub" class="wp-image-18540" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/06/DA9D31B2-A35E-41A8-8DCF-67056D76F528-1024x243.jpeg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/06/DA9D31B2-A35E-41A8-8DCF-67056D76F528-300x71.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/06/DA9D31B2-A35E-41A8-8DCF-67056D76F528-768x182.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/06/DA9D31B2-A35E-41A8-8DCF-67056D76F528.jpeg 1375w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>The LXD Puppet module as of writing this provides the following defines:</p>



<ul class="wp-block-list"><li>lxd::profile</li><li>lxd::image</li><li>lxd::storage</li><li>lxd::container</li></ul>



<p>For full reference please check out its <a href="https://github.com/ovh/lxd-puppet-module" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">GitHub page</a>.</p>



<h3 class="wp-block-heading" id="Journeytonext-genCephstorageatOVHcloudwithLXD-ManualsetupVSAutomaticsetupwithPuppet">Manual setup VS Automatic setup with Puppet</h3>



<p>I will show you a simple example of how to create the exact same setup manually and then again automatically with Puppet. For the purpose of this article I created a new Public Cloud instance with Ubuntu 18.04, one additional disk and already configured bridge device br0. Lets assume there is also a DHCP server listening on the br0 interface.</p>



<p>It&#8217;s worth noting that generally you don&#8217;t need to create your own image, you can just use the upstream ones with built-in commands. But for the purpose of this article, lets create a custom image that will be exactly like upstream. To create such image you just have to type some commands to repack upstream image into Unified Tarball.</p>



<pre class="wp-block-preformatted">root@ubuntu:~# wget https://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-amd64-root.tar.xz<br>root@ubuntu:~# wget https://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-amd64-lxd.tar.xz<br>root@ubuntu:~# mkdir -p ubuntu1804/rootfs<br>root@ubuntu:~# tar -xf bionic-server-cloudimg-amd64-lxd.tar.xz -C ubuntu1804/<br>root@ubuntu:~# tar -xf bionic-server-cloudimg-amd64-root.tar.xz -C ubuntu1804/rootfs/<br>root@ubuntu:~# cd ubuntu1804/<br>root@ubuntu:~/ubuntu1804# tar -czf ../ubuntu1804.tar.gz *</pre>



<p>You will end with a ubuntu1804.tar.gz image that can be used with LXD. For the purpose of this article I&#8217;ve put this image in a directory reachable through HTTP for example: http://example.net/lxd-image/</p>



<h3 class="wp-block-heading">Manual setup</h3>



<p>First thing first lets install LXD.</p>



<pre class="wp-block-preformatted">root@ubuntu:~# apt install lxd</pre>



<p>During package install you will be greeted with the message: <em>&#8220;To go through the initial LXD configuration, run: lxd init&#8221;</em> but we will just do the steps manually.</p>



<p>Next step is to add the new storage pool.</p>



<pre class="wp-block-preformatted">root@ubuntu:~# lxc storage create default dir source=/var/lib/lxd/storage-pools/default<br>Storage pool default create</pre>



<p>Next, create a custom profile that will have: environment variable http_proxy set to &#8221;, 2GB memory limit, roofs on default storage-pool and eth0 that will be part of bridge br0.</p>



<pre class="wp-block-preformatted">root@ubuntu:~# lxc profile create customprofile<br>Profile customprofile created<br>root@ubuntu:~# lxc profile device add customprofile root disk path=/ pool=default<br>Device root added to customprofile<br>root@ubuntu:~# lxc profile device add customprofile eth0 nic nictype=bridged parent=br0<br>Device eth0 added to customprofile<br>root@ubuntu:~# lxc profile set customprofile limits.memory 2GB</pre>



<p>Lets print out the whole profile to check if its ok:</p>



<pre class="wp-block-preformatted">root@ubuntu:~# lxc profile show customprofile
config:
  environment.http_proxy: ""
  limits.memory: 2GB
description: ""
devices:
  eth0:
    nictype: bridged
    parent: br0
    type: nic
  root:
    path: /
    pool: default
    type: disk
name: customprofile
used_by: []</pre>



<p>Then lets fetch the LXD image in the Unified Tarball format:</p>



<pre class="wp-block-preformatted">root@ubuntu:~# wget -O /tmp/ubuntu1804.tar.gz http://example.net/lxd-images/ubuntu1804.tar.gz</pre>



<p>And import it:</p>



<pre class="wp-block-preformatted">root@ubuntu:~# lxc image import /tmp/ubuntu1804.tar.gz --alias ubuntu1804
Image imported with fingerprint: dc6f4c678e68cfd4d166afbaddf5287b65d2327659a6d51264ee05774c819e70</pre>



<p>Once we have everything in place lets create our first container:</p>



<pre class="wp-block-preformatted">root@ubuntu:~# lxc init ubuntu1804 container01 --profile customprofile
Creating container01</pre>



<p>Now lets add some host directories to the container:<br>Please note that you have to set proper owner of the directory on the host!</p>



<pre class="wp-block-preformatted">root@ubuntu:~# mkdir /srv/log01<br>root@ubuntu:~# lxc config device add container01 log disk source=/srv/log01 path=/var/log/</pre>



<p>And as a final touch add a host&#8217;s partition to the container:</p>



<pre class="wp-block-preformatted">root@ubuntu:~# lxc config device add container01 bluestore unix-block source=/dev/sdb1 path=/dev/bluestore</pre>



<p>/dev/sdb1 will be available inside the container. We use it for passing devices for Ceph&#8217;s Bluestore to the container.</p>



<p>The container is ready to be started.</p>



<pre class="wp-block-preformatted">root@ubuntu:~# lxc start container01</pre>



<p>Voila! Container is up and running. We setup our containers very similarly to the above.</p>



<p>Although it was quite easy to setup the above by hand. For a massive deployment, you need to automate things. So now lets do the above using our LXD Puppet module.</p>



<h3 class="wp-block-heading">Automatic setup with Puppet</h3>



<p>To make use of the module, download it to your puppet server and place it in the module path.</p>



<p>Then, create a new class or add it to one of the existing classes; whatever suits you.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img loading="lazy" decoding="async" width="967" height="314" src="https://www.ovh.com/blog/wp-content/uploads/2020/06/0306553F-5F05-425C-924E-4035BDE8BAEB.jpeg" alt="Automatic LXD setup with Puppet" class="wp-image-18545" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/06/0306553F-5F05-425C-924E-4035BDE8BAEB.jpeg 967w, https://blog.ovhcloud.com/wp-content/uploads/2020/06/0306553F-5F05-425C-924E-4035BDE8BAEB-300x97.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/06/0306553F-5F05-425C-924E-4035BDE8BAEB-768x249.jpeg 768w" sizes="auto, (max-width: 967px) 100vw, 967px" /></figure></div>



<p>I plugged it into my puppet server. Please note that I am using bridge device <em>br0</em> that was prepared earlier by other modules and that LXD images are hosted on a webserver <a href="http://example.net/lxd-images/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">http://example.net/lxd-images/</a> as Unified Tarballs.</p>



<p>Full example module that makes use of LXD Puppet module:</p>



<pre class="wp-block-preformatted">class mymodule {
 
    class {'::lxd': }
 
    lxd::storage { 'default':
        driver =&gt; 'dir',
        config =&gt; {
            'source' =&gt; '/var/lib/lxd/storage-pools/default'
        }
    }
 
    lxd::profile { 'exampleprofile':
        ensure  =&gt; 'present',
        config  =&gt; {
            'environment.http_proxy' =&gt; '',
            'limits.memory' =&gt; '2GB',
        },
        devices =&gt; {
            'root' =&gt; {
                'path' =&gt; '/',
                'pool' =&gt; 'default',
                'type' =&gt; 'disk',
            },
            'eth0' =&gt; {
                'nictype' =&gt; 'bridged',
                'parent'  =&gt; 'br0',
                'type'    =&gt; 'nic',
            }
        }
    }
 
    lxd::image { 'ubuntu1804':
        ensure      =&gt; 'present',
        repo_url    =&gt; 'http://example.net/lxd-images/',
        image_file  =&gt; 'ubuntu1804.tar.gz',
        image_alias =&gt; 'ubuntu1804',
    }
 
    lxd::container { 'container01':
        state   =&gt; 'started',
        config  =&gt; {
            'user.somecustomconfig' =&gt; 'My awesome custom env variable',
        },
        profiles =&gt; ['exampleprofile'],
        image   =&gt; 'ubuntu1804',
        devices =&gt; {
            'log'  =&gt; {
                'path'   =&gt; '/var/log/',
                'source' =&gt; '/srv/log01',
                'type'   =&gt; 'disk',
            },
            'bluestore' =&gt; {
                'path'   =&gt; '/dev/bluestore',
                'source' =&gt; '/dev/sdb1',
                'type'   =&gt; 'unix-block',
            }
        }
    }
}</pre>



<p>Now the only thing left to do is to run puppet agent on the machine. It will apply desired state:</p>



<pre class="wp-block-preformatted">root@ubuntu:~# puppet agent -t
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Retrieving locales
Info: Loading facts
Info: Caching catalog for ubuntu.openstacklocal
Info: Applying configuration version '1588767214'
Notice: /Stage[main]/Lxd::Install/Package[lxd]/ensure: created
Notice: /Stage[main]/Lxd::Config/Lxd_config[global_images.auto_update_interval]/ensure: created
Notice: /Stage[main]/Mymodule/Lxd::Storage[default]/Lxd_storage[default]/ensure: created
Notice: /Stage[main]/Mymodule/Lxd::Profile[exampleprofile]/Lxd_profile[exampleprofile]/ensure: created
Notice: /Stage[main]/Mymodule/Lxd::Image[ubuntu1804]/Exec[lxd image present http://example.net/lxd-images//ubuntu1804.tar.gz]/returns: executed successfully
Notice: /Stage[main]/Mymodule/Lxd::Container[container01]/Lxd_container[container01]/ensure: created
Notice: Applied catalog in 37.56 seconds
</pre>



<p>In the end you will have new container up and running:</p>



<pre class="wp-block-preformatted">root@ubuntu:~# lxc ls
+-------------+---------+--------------------+------+------------+-----------+
|    NAME     |  STATE  |        IPV4        | IPV6 |    TYPE    | SNAPSHOTS |
+-------------+---------+--------------------+------+------------+-----------+
| container01 | RUNNING | 192.168.0.5 (eth0) |      | PERSISTENT | 0         |
+-------------+---------+--------------------+------+------------+-----------+</pre>



<p>Because you can expose custom environment variables in the container, it opens a lot of possibilities to configure new containers.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="243" src="https://www.ovh.com/blog/wp-content/uploads/2020/06/DA9D31B2-A35E-41A8-8DCF-67056D76F528-1024x243.jpeg" alt="Open source LXD Puppet Module, available on GitHub" class="wp-image-18540" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/06/DA9D31B2-A35E-41A8-8DCF-67056D76F528-1024x243.jpeg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/06/DA9D31B2-A35E-41A8-8DCF-67056D76F528-300x71.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/06/DA9D31B2-A35E-41A8-8DCF-67056D76F528-768x182.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/06/DA9D31B2-A35E-41A8-8DCF-67056D76F528.jpeg 1375w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure></div>



<p>How good is that!?</p>



<p>I encourage everyone to contribute to the <a href="https://github.com/ovh/lxd-puppet-module" target="_blank" rel="noreferrer noopener nofollow external" data-wpel-link="external">module</a> or give it a star on GitHub if you find it useful.</p>



<h2 class="wp-block-heading">Plans for the future</h2>



<p>After extensive testing we are sure that everything works as intended and we were confident that we can go to prod with the new solution with Ceph storage on all flash based storage, without HDDs.</p>



<p>In the future, we plan to migrate all our legacy infrastructure to the new LXD based solution. It will be a mammoth project to migrate, with over 50PB that sit on over 2000 dedicated servers, but that&#8217;s a story for another time.</p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fjourney-to-next-gen-ceph-storage-at-ovhcloud-with-lxd%2F&amp;action_name=Journey%20to%20next-gen%20Ceph%20storage%20at%20OVHcloud%20with%20LXD&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Doing BIG automation with Celery</title>
		<link>https://blog.ovhcloud.com/doing-big-automation-with-celery/</link>
		
		<dc:creator><![CDATA[Bartosz Rabiega]]></dc:creator>
		<pubDate>Fri, 06 Mar 2020 16:14:18 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Automation]]></category>
		<category><![CDATA[celery]]></category>
		<category><![CDATA[ceph]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[workflows]]></category>
		<guid isPermaLink="false">https://www.ovh.com/blog/?p=17100</guid>

					<description><![CDATA[Intro TL;DR: You might want to skip the intro and jump right into “Celery &#8211; Distributed Task Queue”. Hello! I’m Bartosz Rabiega, and I’m part of the R&#38;D/DevOps teams at OVHcloud. As part of our daily work, we’re developing and maintaining the Ceph-as-a-Service project, in order to provide highly available, solid, distributed storage for various [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fdoing-big-automation-with-celery%2F&amp;action_name=Doing%20BIG%20automation%20with%20Celery&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<h2 class="wp-block-heading">Intro</h2>



<p><strong>TL;DR</strong>: You might want to skip the intro and jump right into “Celery &#8211; Distributed Task Queue”.</p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/03/2A010EF2-2666-42D4-91C1-F1FAE33148FE-1024x537.png" alt="" class="wp-image-17420" width="512" height="269" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/03/2A010EF2-2666-42D4-91C1-F1FAE33148FE-1024x537.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/2A010EF2-2666-42D4-91C1-F1FAE33148FE-300x157.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/2A010EF2-2666-42D4-91C1-F1FAE33148FE-768x403.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/2A010EF2-2666-42D4-91C1-F1FAE33148FE.png 1200w" sizes="auto, (max-width: 512px) 100vw, 512px" /></figure></div>



<p>Hello! I’m Bartosz Rabiega, and I’m part of the R&amp;D/DevOps teams at OVHcloud. As part of our daily work, we’re developing and maintaining the Ceph-as-a-Service project, in order to provide highly available, solid, distributed storage for various applications. We’re dealing with 60PB+ of data, across 10 regions, so as you might imagine, we’ve got quite a lot of work ahead in terms of replacing broken hardware, handling natural growth, provisioning new regions and datacentres, evaluating new hardware, optimising software and hardware configurations, researching new storage solutions, and much more!</p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/03/B87CD670-7779-4325-92D9-F30A1C8C71A2.png" alt="" class="wp-image-17382" width="705" height="471" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/03/B87CD670-7779-4325-92D9-F30A1C8C71A2.png 940w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/B87CD670-7779-4325-92D9-F30A1C8C71A2-300x200.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/B87CD670-7779-4325-92D9-F30A1C8C71A2-768x513.png 768w" sizes="auto, (max-width: 705px) 100vw, 705px" /></figure></div>



<p>Because of the wide scope of our work, we need to offload as many repetitive tasks as possible. And we do that through automation.</p>



<h2 class="wp-block-heading">Automating your work</h2>



<p>To some extent, every manual process can be described as set of actions and conditions. If we somehow managed to force something to automatically perform the actions and check the conditions, we would be able to automate the process, resulting in an automated workflow. Take a look at the example below, which shows some generic steps for manually replacing hardware in our project.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="291" src="https://www.ovh.com/blog/wp-content/uploads/2020/03/E9662233-9498-4F2F-9A7E-B640F85EE295-1024x291.png" alt="" class="wp-image-17389" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/03/E9662233-9498-4F2F-9A7E-B640F85EE295-1024x291.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/E9662233-9498-4F2F-9A7E-B640F85EE295-300x85.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/E9662233-9498-4F2F-9A7E-B640F85EE295-768x218.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/E9662233-9498-4F2F-9A7E-B640F85EE295-1536x436.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/E9662233-9498-4F2F-9A7E-B640F85EE295.png 1677w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure></div>



<p>Hmm… What could help us do this automatically? Doesn’t a computer sound like a perfect fit? 🙂 There are many ways to force computers to process automated workflows, but first we need to define some building blocks (let’s call them tasks) and get them to run sequentially or in parallel (i.e. a workflow). Fortunately, there are software solutions that can help with that, among which is Celery.</p>



<h2 class="wp-block-heading">Celery &#8211; Distributed Task Queue</h2>



<p>Celery is a well-known and widely adopted piece of software that allows us to process tasks asynchronously. The description of the project on its main page (<a href="http://www.celeryproject.org/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">http://www.celeryproject.org/</a>) may sound a little bit enigmatic, but we can narrow down its basic functionality to something like this:</p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/03/4749B2AA-AA5B-4BEF-BA3A-FC0B67FCD447-1024x539.png" alt="" class="wp-image-17414" width="768" height="404" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/03/4749B2AA-AA5B-4BEF-BA3A-FC0B67FCD447-1024x539.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/4749B2AA-AA5B-4BEF-BA3A-FC0B67FCD447-300x158.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/4749B2AA-AA5B-4BEF-BA3A-FC0B67FCD447-768x404.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/4749B2AA-AA5B-4BEF-BA3A-FC0B67FCD447.png 1294w" sizes="auto, (max-width: 768px) 100vw, 768px" /></figure></div>



<p>Such machinery is perfectly suited to tasks like sending emails asynchronously (i.e. &#8216;fire and forget&#8217;), but it can also be used for different purposes. So what other tasks could it handle? Basically, any tasks you can implement in Python (the main Celery language)! I won’t go too much into the details, as they are available in the Celery documentation. What matters is that since we can implement any task we want, we can use that to create the building blocks for our automation.</p>



<p>There is one more important thing&#8230; Celery natively supports combining such tasks into workflows (Celery primitives: chains, groups, chords, etc.). So let’s get through some examples&#8230;</p>



<p>We’ll use the following task definitions &#8211; single task, printing <em>args</em> and <em>kwargs</em>:</p>



<pre class="wp-block-code"><code class="">@celery_app.task
def noop(*args, **kwargs):
    # Task accepts any arguments and does nothing
    print(args, kwargs)
    return True</code></pre>



<p>Now we can execute the task asynchronously, using the following code:</p>



<pre class="wp-block-code"><code class="">task = noop.s(777)
task.apply_async()</code></pre>



<p>The elementary tasks can be parametrised and combined into a complex workflow using celery methods, i.e. “chain”, “group”, and “chord”. See the examples below. In each of them, the left side shows a visual representation of a workflow, while the right side shows the code snippet that generates it. The green box is the starting point, after which the workflow execution progresses vertically.</p>



<div class="wp-block-group"><div class="wp-block-group__inner-container is-layout-flow wp-block-group-is-layout-flow">
<h4 class="wp-block-heading">Chain &#8211; a set of tasks processed sequentially</h4>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/03/705AD975-048B-4E6A-8BFF-F68775C9C5C7.png" alt="" class="wp-image-17394" width="92" height="320"/></figure></div>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<pre class="wp-block-code"><code class="">workflow = (
    chain([noop.s(i) for i in range(3)])
)</code></pre>
</div>
</div>



<h4 class="wp-block-heading">Group &#8211; a set of tasks processed in parallel</h4>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/03/B112B87E-2813-46DD-9105-4B528BB3C110.png" alt="" class="wp-image-17396" width="317" height="169" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/03/B112B87E-2813-46DD-9105-4B528BB3C110.png 633w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/B112B87E-2813-46DD-9105-4B528BB3C110-300x160.png 300w" sizes="auto, (max-width: 317px) 100vw, 317px" /></figure>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<pre class="wp-block-code"><code class="">workflow = (
    group([noop.s(i) for i in range(5)])
)</code></pre>
</div>
</div>



<h4 class="wp-block-heading">Chord &#8211; a group of tasks chained to the following task</h4>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/03/4E75C373-2CE1-4A68-8599-245E768167A4.png" alt="" class="wp-image-17397" width="311" height="223" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/03/4E75C373-2CE1-4A68-8599-245E768167A4.png 621w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/4E75C373-2CE1-4A68-8599-245E768167A4-300x215.png 300w" sizes="auto, (max-width: 311px) 100vw, 311px" /></figure>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<pre class="wp-block-code"><code class="">workflow = chord(
        [noop.s(i) for i in range(5)],
        noop.s(i)
)

# Equivalent:
workflow = chain([
        group([noop.s(i) for i in range(5)]),
        noop.s(i)
])</code></pre>
</div>
</div>
</div></div>



<p>An important point: the execution of a workflow will always stop in the event of a failed task. As a result, a chain won’t be continued if some task fails in the middle of it. This gives us quite a powerful framework for implementing some neat automation, and that’s exactly what we’re using for Ceph-as-a-Service at OVHcloud! We’ve implemented lots of small, flexible, parameterisable tasks, which we combine together to reach a common goal. Here are some real-life examples of elementary tasks, used for the automatic removal of old hardware:</p>



<ul class="wp-block-list"><li>Change weight of Ceph node (used to increase/decrease the amount of data on node. Triggers data rebalance)</li><li>Set service downtime (data rebalance triggers monitoring probes, but this is expected, so set downtime for this particular monitoring entry)</li><li>Wait until Ceph is healthy (wait until the data rebalance is complete &#8211; repeating task)</li><li>Remove Ceph node from a cluster (node is empty so it can simply be uninstalled)</li><li>Send info to technicians in DC (hardware is ready to be replaced)</li><li>Add new Ceph node to a cluster (install new empty node)</li></ul>



<p>We parametrise these tasks and tie them together, using Celery chains, groups and chords to create the desired workflow. Celery then does the rest by asynchronously executing the workflow.</p>



<h2 class="wp-block-heading">Big workflows and Celery</h2>



<p>As our infrastructure grows, so doo our automated workflows grow, with more tasks per workflow, higher complexity of workflows&#8230; What do we understand as a big workflow? A workflow consisting of 1,000-10,000 tasks. Just to visualize it take a look on following examples:</p>



<div class="wp-block-group"><div class="wp-block-group__inner-container is-layout-flow wp-block-group-is-layout-flow">
<h4 class="wp-block-heading">A few chords chained together (57 tasks in total)</h4>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<div class="wp-block-image"><figure class="aligncenter"><img decoding="async" src="https://lh4.googleusercontent.com/XZWOfqmSMu68u7GcbvceB0mc8_HA_v8higDeoG08dlO5oTlRd9R98QBSlf4sMLPuiFB2RPVgM-6i7vG86jtAxMCrKSLTkt0nK4z5JSbYE4QkXF96qkXh3uSJYj1X82UUm-agBMxu" alt=""/></figure></div>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<pre class="wp-block-code"><code class="">workflow = chain([
    noop.s(0),
    chord([noop.s(i) for i in range(10)], noop.s()),
    chord([noop.s(i) for i in range(10)], noop.s()),
    chord([noop.s(i) for i in range(10)], noop.s()),
    chord([noop.s(i) for i in range(10)], noop.s()),
    chord([noop.s(i) for i in range(10)], noop.s()),
    noop.s()
])</code></pre>
</div>
</div>



<h4 class="wp-block-heading">More complex graph structure built from chains and groups (23 tasks in total)</h4>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<div class="wp-block-image"><figure class="aligncenter"><img decoding="async" src="https://lh5.googleusercontent.com/gUQlIa5Nmb4a5oNDbojhBtukEn--6dSxlKrn-enggXk9eCtuBvgVBTxecwAczOMghEoZ0zOtKuz0nohZTsj01QqVBxkbX8bxqyVVvYjC6B1sfrpXN8pferDSgg-RE6TB6v5SOBdL" alt=""/></figure></div>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<pre class="wp-block-code"><code class=""># | is ‘chain’ operator in celery
workflow = (
    group(
        group(
            group([noop.s() for i in range(5)]),
            chain([noop.s() for i in range(5)])
        ) |
        noop.s() |
        group([noop.s() for i in range(5)]) |
        noop.s(),
        chain([noop.s() for i in range(5)])
    ) |
    noop.s()
)</code></pre>
</div>
</div>
</div></div>



<p>As you can probably imagine, visualisations get quite big and messy when 1,000 tasks are involved! Celery is a powerful tool, and has lots of features that are well-suited for automation, but it still struggles when it comes to processing big, complex, long-running workflows. Orchestrating the execution of 10,000 tasks, with a variety of dependencies, is no trivial thing. There are several issues we encountered when our automation grew too big:</p>



<ul class="wp-block-list"><li>Memory issues during workflow building (client side)</li><li>Serialisation issues (client -&gt; Celery backend transfer)</li><li>Nondeterministic, broken execution of workflows</li><li>Memory issues in Celery workers (Celery backend)</li><li>Disappearing tasks</li><li>And more&#8230;</li></ul>



<p>Take a look at some GitHub tickets:</p>



<ul class="wp-block-list"><li><a href="https://github.com/celery/celery/issues/5000" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">https://github.com/celery/celery/issues/5000</a></li><li><a href="https://github.com/celery/celery/issues/5286" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">https://github.com/celery/celery/issues/5286</a></li><li><a href="https://github.com/celery/celery/issues/5327" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">https://github.com/celery/celery/issues/5327</a></li><li><a href="https://github.com/celery/celery/issues/3723" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">https://github.com/celery/celery/issues/3723</a></li></ul>



<p>Using Celery for our particular use case became difficult and unreliable. Celery’s native support for workflows doesn’t seem to be the right choice for handling 100/1,000/10,000 tasks. In its current state, it’s just not enough. So here we stand, in front of a solid, concrete wall… Either we somehow fix Celery, or we rewrite our automation using a different framework.</p>



<h2 class="wp-block-heading">Celery &#8211; to fix&#8230; or to fix?</h2>



<p>Rewriting all of our automation would be possible, although relatively painful. Since I’m a rather lazy person, perhaps attempting to fix Celery wasn’t an entirely bad idea? So I took some time to dig through Celery’s code, and managed to find the parts responsible for building workflows, and executing chains and chords. It was still a little bit difficult for me to understand all the different code paths handling the wide range of use cases, but I realised it would be possible to implement a clean, straightforward orchestration that would handle all the tasks and their combinations in the same way. What’s more, I had a glimpse that it wouldn&#8217;t take too much effort to integrate it into our automation (let’s not forget the main goal!). </p>



<p>Unfortunately, introducing new orchestration into the Celery project would probably be quite hard, and would most likely break some backwards compatibility. So I decided to take a different approach &#8211; writing an extension or a plugin that wouldn’t require changes in Celery. Something pluggable, and as non-invasive as possible. That’s how Celery Dyrygent emerged&#8230;</p>



<h2 class="wp-block-heading">Celery Dyrygent</h2>



<p><a href="https://github.com/ovh/celery-dyrygent" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">https://github.com/ovh/celery-dyrygent</a></p>



<h3 class="wp-block-heading">How to represent a workflow</h3>



<p>You can think of a workflow as a directed acyclic graph (DAG), where each task is a separate graph node. When it comes to acyclic graphs, it is relatively easy to store and resolve dependencies between nodes, which leads to straightforward orchestration. Celery Dyrygent was implemented based on these features. Each task in the workflow has an unique identifier (Celery already assigns task IDs when a task is pushed for execution) and each one of them is wrapped into a workflow node. Each workflow node consists of a task signature (a plain Celery signature) and a list of IDs for the tasks it depends on. See the example below:</p>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/03/F4601B45-EB13-4710-9325-B9684BF77918-1024x533.png" alt="" class="wp-image-17400" width="512" height="267" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/03/F4601B45-EB13-4710-9325-B9684BF77918-1024x533.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/F4601B45-EB13-4710-9325-B9684BF77918-300x156.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/F4601B45-EB13-4710-9325-B9684BF77918-768x400.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/F4601B45-EB13-4710-9325-B9684BF77918.png 1172w" sizes="auto, (max-width: 512px) 100vw, 512px" /></figure>



<h3 class="wp-block-heading">How to process a workflow</h3>



<p>So we know how to store a workflow in a clean and easy way. Now we just need to execute it. How about using&#8230; Celery? Why not? For this, Celery Dyrygent introduces a <strong>workflow processor</strong> task (an ordinary Celery task). This task wraps a whole workflow and schedules an execution of primitive tasks, according to their dependencies. Once the scheduling part is over, the task repeats itself (it &#8216;ticks&#8217; with some delay). </p>



<p>Throughout the whole processing cycle, workflow processor retains the state of the entire workflow internally. As a result, it updates the state with each repetition. You can see an orchestration example below:</p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/03/CE6EE688-92F2-4BA5-9A6B-147BD956A0F0-1024x553.png" alt="" class="wp-image-17416" width="512" height="277" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/03/CE6EE688-92F2-4BA5-9A6B-147BD956A0F0-1024x553.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/CE6EE688-92F2-4BA5-9A6B-147BD956A0F0-300x162.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/CE6EE688-92F2-4BA5-9A6B-147BD956A0F0-768x415.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/CE6EE688-92F2-4BA5-9A6B-147BD956A0F0.png 1470w" sizes="auto, (max-width: 512px) 100vw, 512px" /></figure></div>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/03/7764C3D5-1EF9-44A9-A588-4C37A275570B-1024x553.png" alt="" class="wp-image-17417" width="512" height="277" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/03/7764C3D5-1EF9-44A9-A588-4C37A275570B-1024x553.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/7764C3D5-1EF9-44A9-A588-4C37A275570B-300x162.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/7764C3D5-1EF9-44A9-A588-4C37A275570B-768x415.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/7764C3D5-1EF9-44A9-A588-4C37A275570B.png 1470w" sizes="auto, (max-width: 512px) 100vw, 512px" /></figure></div>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/03/F2E6717E-B355-46AB-AD73-6C98B6CE4B19-1024x553.png" alt="" class="wp-image-17418" width="512" height="277" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/03/F2E6717E-B355-46AB-AD73-6C98B6CE4B19-1024x553.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/F2E6717E-B355-46AB-AD73-6C98B6CE4B19-300x162.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/F2E6717E-B355-46AB-AD73-6C98B6CE4B19-768x415.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/F2E6717E-B355-46AB-AD73-6C98B6CE4B19.png 1470w" sizes="auto, (max-width: 512px) 100vw, 512px" /></figure></div>



<p>Most notably, workflow processor stops its execution in two cases:</p>



<ul class="wp-block-list"><li>Once the whole workflow finishes, with all tasks successfully completed</li><li>When it can’t proceed any further, due to a failed task</li></ul>



<h3 class="wp-block-heading">How to integrate</h3>



<p>So how do we use this? Fortunately, I was able to find a way to use Celery Dyrygent quite easily. First of all, you need to inject the workflow processor task definition into your Celery applicationP:</p>



<pre class="wp-block-code"><code class="">from celery_dyrygent.tasks import register_workflow_processor
app = Celery() #  your celery application instance
workflow_processor = register_workflow_processor(app)</code></pre>



<p>Next, you need to convert your Celery defined workflow into a Celery Dyrygent workflow:</p>



<pre class="wp-block-code"><code class="">from celery_dyrygent.workflows import Workflow

celery_workflow = chain([
    noop.s(0),
    chord([noop.s(i) for i in range(10)], noop.s()),
    chord([noop.s(i) for i in range(10)], noop.s()),
    chord([noop.s(i) for i in range(10)], noop.s()),
    chord([noop.s(i) for i in range(10)], noop.s()),
    chord([noop.s(i) for i in range(10)], noop.s()),
    noop.s()
])

workflow = Workflow()
workflow.add_celery_canvas(celery_workflow)</code></pre>



<p>Finally, simply execute the workflow, just as you would an ordinary Celery task:</p>



<pre class="wp-block-code"><code class="">workflow.apply_async()</code></pre>



<p>That’s it! You can always go back if you wish, as the small changes are very easy to undo.</p>



<h3 class="wp-block-heading">Give it a try!</h3>



<p>Celery Dyrygent is free to use, and its source code is available on Github (<a href="https://github.com/ovh/celery-dyrygent" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">https://github.com/ovh/celery-dyrygent</a>). Feel free to use it, improve it, request features, and report any bugs! It has a few additional features not described here, so I&#8217;d encourage you to take a look at the project’s readme file. For our automation requirements, it&#8217;s already a solid, battle-tested solution. We’ve been using it since the end of 2018, and it has processed thousands of workflows, consisting of hundreds of thousands of tasks. Here are some productions stats, from June 2019 to February 2020:</p>



<ul class="wp-block-list"><li>936,248 elementary tasks executed</li><li>11,170 workflows processed</li><li>4,098 tasks in the biggest workflow so far</li><li>~84 tasks per workflow, on average</li></ul>



<p>Automation is always a good idea!</p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fdoing-big-automation-with-celery%2F&amp;action_name=Doing%20BIG%20automation%20with%20Celery&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
