<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>celery Archives - OVHcloud Blog</title>
	<atom:link href="https://blog.ovhcloud.com/tag/celery/feed/" rel="self" type="application/rss+xml" />
	<link>https://blog.ovhcloud.com/tag/celery/</link>
	<description>Innovation for Freedom</description>
	<lastBuildDate>Fri, 06 Mar 2020 23:00:14 +0000</lastBuildDate>
	<language>en-GB</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>https://blog.ovhcloud.com/wp-content/uploads/2019/07/cropped-cropped-nouveau-logo-ovh-rebranding-32x32.gif</url>
	<title>celery Archives - OVHcloud Blog</title>
	<link>https://blog.ovhcloud.com/tag/celery/</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Doing BIG automation with Celery</title>
		<link>https://blog.ovhcloud.com/doing-big-automation-with-celery/</link>
		
		<dc:creator><![CDATA[Bartosz Rabiega]]></dc:creator>
		<pubDate>Fri, 06 Mar 2020 16:14:18 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Automation]]></category>
		<category><![CDATA[celery]]></category>
		<category><![CDATA[ceph]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[workflows]]></category>
		<guid isPermaLink="false">https://www.ovh.com/blog/?p=17100</guid>

					<description><![CDATA[Intro TL;DR: You might want to skip the intro and jump right into “Celery &#8211; Distributed Task Queue”. Hello! I’m Bartosz Rabiega, and I’m part of the R&#38;D/DevOps teams at OVHcloud. As part of our daily work, we’re developing and maintaining the Ceph-as-a-Service project, in order to provide highly available, solid, distributed storage for various [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fdoing-big-automation-with-celery%2F&amp;action_name=Doing%20BIG%20automation%20with%20Celery&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<h2 class="wp-block-heading">Intro</h2>



<p><strong>TL;DR</strong>: You might want to skip the intro and jump right into “Celery &#8211; Distributed Task Queue”.</p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img fetchpriority="high" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/03/2A010EF2-2666-42D4-91C1-F1FAE33148FE-1024x537.png" alt="" class="wp-image-17420" width="512" height="269" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/03/2A010EF2-2666-42D4-91C1-F1FAE33148FE-1024x537.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/2A010EF2-2666-42D4-91C1-F1FAE33148FE-300x157.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/2A010EF2-2666-42D4-91C1-F1FAE33148FE-768x403.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/2A010EF2-2666-42D4-91C1-F1FAE33148FE.png 1200w" sizes="(max-width: 512px) 100vw, 512px" /></figure></div>



<p>Hello! I’m Bartosz Rabiega, and I’m part of the R&amp;D/DevOps teams at OVHcloud. As part of our daily work, we’re developing and maintaining the Ceph-as-a-Service project, in order to provide highly available, solid, distributed storage for various applications. We’re dealing with 60PB+ of data, across 10 regions, so as you might imagine, we’ve got quite a lot of work ahead in terms of replacing broken hardware, handling natural growth, provisioning new regions and datacentres, evaluating new hardware, optimising software and hardware configurations, researching new storage solutions, and much more!</p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/03/B87CD670-7779-4325-92D9-F30A1C8C71A2.png" alt="" class="wp-image-17382" width="705" height="471" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/03/B87CD670-7779-4325-92D9-F30A1C8C71A2.png 940w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/B87CD670-7779-4325-92D9-F30A1C8C71A2-300x200.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/B87CD670-7779-4325-92D9-F30A1C8C71A2-768x513.png 768w" sizes="(max-width: 705px) 100vw, 705px" /></figure></div>



<p>Because of the wide scope of our work, we need to offload as many repetitive tasks as possible. And we do that through automation.</p>



<h2 class="wp-block-heading">Automating your work</h2>



<p>To some extent, every manual process can be described as set of actions and conditions. If we somehow managed to force something to automatically perform the actions and check the conditions, we would be able to automate the process, resulting in an automated workflow. Take a look at the example below, which shows some generic steps for manually replacing hardware in our project.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="1024" height="291" src="https://www.ovh.com/blog/wp-content/uploads/2020/03/E9662233-9498-4F2F-9A7E-B640F85EE295-1024x291.png" alt="" class="wp-image-17389" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/03/E9662233-9498-4F2F-9A7E-B640F85EE295-1024x291.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/E9662233-9498-4F2F-9A7E-B640F85EE295-300x85.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/E9662233-9498-4F2F-9A7E-B640F85EE295-768x218.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/E9662233-9498-4F2F-9A7E-B640F85EE295-1536x436.png 1536w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/E9662233-9498-4F2F-9A7E-B640F85EE295.png 1677w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure></div>



<p>Hmm… What could help us do this automatically? Doesn’t a computer sound like a perfect fit? 🙂 There are many ways to force computers to process automated workflows, but first we need to define some building blocks (let’s call them tasks) and get them to run sequentially or in parallel (i.e. a workflow). Fortunately, there are software solutions that can help with that, among which is Celery.</p>



<h2 class="wp-block-heading">Celery &#8211; Distributed Task Queue</h2>



<p>Celery is a well-known and widely adopted piece of software that allows us to process tasks asynchronously. The description of the project on its main page (<a href="http://www.celeryproject.org/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">http://www.celeryproject.org/</a>) may sound a little bit enigmatic, but we can narrow down its basic functionality to something like this:</p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/03/4749B2AA-AA5B-4BEF-BA3A-FC0B67FCD447-1024x539.png" alt="" class="wp-image-17414" width="768" height="404" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/03/4749B2AA-AA5B-4BEF-BA3A-FC0B67FCD447-1024x539.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/4749B2AA-AA5B-4BEF-BA3A-FC0B67FCD447-300x158.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/4749B2AA-AA5B-4BEF-BA3A-FC0B67FCD447-768x404.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/4749B2AA-AA5B-4BEF-BA3A-FC0B67FCD447.png 1294w" sizes="auto, (max-width: 768px) 100vw, 768px" /></figure></div>



<p>Such machinery is perfectly suited to tasks like sending emails asynchronously (i.e. &#8216;fire and forget&#8217;), but it can also be used for different purposes. So what other tasks could it handle? Basically, any tasks you can implement in Python (the main Celery language)! I won’t go too much into the details, as they are available in the Celery documentation. What matters is that since we can implement any task we want, we can use that to create the building blocks for our automation.</p>



<p>There is one more important thing&#8230; Celery natively supports combining such tasks into workflows (Celery primitives: chains, groups, chords, etc.). So let’s get through some examples&#8230;</p>



<p>We’ll use the following task definitions &#8211; single task, printing <em>args</em> and <em>kwargs</em>:</p>



<pre class="wp-block-code"><code class="">@celery_app.task
def noop(*args, **kwargs):
    # Task accepts any arguments and does nothing
    print(args, kwargs)
    return True</code></pre>



<p>Now we can execute the task asynchronously, using the following code:</p>



<pre class="wp-block-code"><code class="">task = noop.s(777)
task.apply_async()</code></pre>



<p>The elementary tasks can be parametrised and combined into a complex workflow using celery methods, i.e. “chain”, “group”, and “chord”. See the examples below. In each of them, the left side shows a visual representation of a workflow, while the right side shows the code snippet that generates it. The green box is the starting point, after which the workflow execution progresses vertically.</p>



<div class="wp-block-group"><div class="wp-block-group__inner-container is-layout-flow wp-block-group-is-layout-flow">
<h4 class="wp-block-heading">Chain &#8211; a set of tasks processed sequentially</h4>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/03/705AD975-048B-4E6A-8BFF-F68775C9C5C7.png" alt="" class="wp-image-17394" width="92" height="320"/></figure></div>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<pre class="wp-block-code"><code class="">workflow = (
    chain([noop.s(i) for i in range(3)])
)</code></pre>
</div>
</div>



<h4 class="wp-block-heading">Group &#8211; a set of tasks processed in parallel</h4>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/03/B112B87E-2813-46DD-9105-4B528BB3C110.png" alt="" class="wp-image-17396" width="317" height="169" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/03/B112B87E-2813-46DD-9105-4B528BB3C110.png 633w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/B112B87E-2813-46DD-9105-4B528BB3C110-300x160.png 300w" sizes="auto, (max-width: 317px) 100vw, 317px" /></figure>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<pre class="wp-block-code"><code class="">workflow = (
    group([noop.s(i) for i in range(5)])
)</code></pre>
</div>
</div>



<h4 class="wp-block-heading">Chord &#8211; a group of tasks chained to the following task</h4>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/03/4E75C373-2CE1-4A68-8599-245E768167A4.png" alt="" class="wp-image-17397" width="311" height="223" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/03/4E75C373-2CE1-4A68-8599-245E768167A4.png 621w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/4E75C373-2CE1-4A68-8599-245E768167A4-300x215.png 300w" sizes="auto, (max-width: 311px) 100vw, 311px" /></figure>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<pre class="wp-block-code"><code class="">workflow = chord(
        [noop.s(i) for i in range(5)],
        noop.s(i)
)

# Equivalent:
workflow = chain([
        group([noop.s(i) for i in range(5)]),
        noop.s(i)
])</code></pre>
</div>
</div>
</div></div>



<p>An important point: the execution of a workflow will always stop in the event of a failed task. As a result, a chain won’t be continued if some task fails in the middle of it. This gives us quite a powerful framework for implementing some neat automation, and that’s exactly what we’re using for Ceph-as-a-Service at OVHcloud! We’ve implemented lots of small, flexible, parameterisable tasks, which we combine together to reach a common goal. Here are some real-life examples of elementary tasks, used for the automatic removal of old hardware:</p>



<ul class="wp-block-list"><li>Change weight of Ceph node (used to increase/decrease the amount of data on node. Triggers data rebalance)</li><li>Set service downtime (data rebalance triggers monitoring probes, but this is expected, so set downtime for this particular monitoring entry)</li><li>Wait until Ceph is healthy (wait until the data rebalance is complete &#8211; repeating task)</li><li>Remove Ceph node from a cluster (node is empty so it can simply be uninstalled)</li><li>Send info to technicians in DC (hardware is ready to be replaced)</li><li>Add new Ceph node to a cluster (install new empty node)</li></ul>



<p>We parametrise these tasks and tie them together, using Celery chains, groups and chords to create the desired workflow. Celery then does the rest by asynchronously executing the workflow.</p>



<h2 class="wp-block-heading">Big workflows and Celery</h2>



<p>As our infrastructure grows, so doo our automated workflows grow, with more tasks per workflow, higher complexity of workflows&#8230; What do we understand as a big workflow? A workflow consisting of 1,000-10,000 tasks. Just to visualize it take a look on following examples:</p>



<div class="wp-block-group"><div class="wp-block-group__inner-container is-layout-flow wp-block-group-is-layout-flow">
<h4 class="wp-block-heading">A few chords chained together (57 tasks in total)</h4>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<div class="wp-block-image"><figure class="aligncenter"><img decoding="async" src="https://lh4.googleusercontent.com/XZWOfqmSMu68u7GcbvceB0mc8_HA_v8higDeoG08dlO5oTlRd9R98QBSlf4sMLPuiFB2RPVgM-6i7vG86jtAxMCrKSLTkt0nK4z5JSbYE4QkXF96qkXh3uSJYj1X82UUm-agBMxu" alt=""/></figure></div>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<pre class="wp-block-code"><code class="">workflow = chain([
    noop.s(0),
    chord([noop.s(i) for i in range(10)], noop.s()),
    chord([noop.s(i) for i in range(10)], noop.s()),
    chord([noop.s(i) for i in range(10)], noop.s()),
    chord([noop.s(i) for i in range(10)], noop.s()),
    chord([noop.s(i) for i in range(10)], noop.s()),
    noop.s()
])</code></pre>
</div>
</div>



<h4 class="wp-block-heading">More complex graph structure built from chains and groups (23 tasks in total)</h4>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<div class="wp-block-image"><figure class="aligncenter"><img decoding="async" src="https://lh5.googleusercontent.com/gUQlIa5Nmb4a5oNDbojhBtukEn--6dSxlKrn-enggXk9eCtuBvgVBTxecwAczOMghEoZ0zOtKuz0nohZTsj01QqVBxkbX8bxqyVVvYjC6B1sfrpXN8pferDSgg-RE6TB6v5SOBdL" alt=""/></figure></div>
</div>



<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow">
<pre class="wp-block-code"><code class=""># | is ‘chain’ operator in celery
workflow = (
    group(
        group(
            group([noop.s() for i in range(5)]),
            chain([noop.s() for i in range(5)])
        ) |
        noop.s() |
        group([noop.s() for i in range(5)]) |
        noop.s(),
        chain([noop.s() for i in range(5)])
    ) |
    noop.s()
)</code></pre>
</div>
</div>
</div></div>



<p>As you can probably imagine, visualisations get quite big and messy when 1,000 tasks are involved! Celery is a powerful tool, and has lots of features that are well-suited for automation, but it still struggles when it comes to processing big, complex, long-running workflows. Orchestrating the execution of 10,000 tasks, with a variety of dependencies, is no trivial thing. There are several issues we encountered when our automation grew too big:</p>



<ul class="wp-block-list"><li>Memory issues during workflow building (client side)</li><li>Serialisation issues (client -&gt; Celery backend transfer)</li><li>Nondeterministic, broken execution of workflows</li><li>Memory issues in Celery workers (Celery backend)</li><li>Disappearing tasks</li><li>And more&#8230;</li></ul>



<p>Take a look at some GitHub tickets:</p>



<ul class="wp-block-list"><li><a href="https://github.com/celery/celery/issues/5000" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">https://github.com/celery/celery/issues/5000</a></li><li><a href="https://github.com/celery/celery/issues/5286" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">https://github.com/celery/celery/issues/5286</a></li><li><a href="https://github.com/celery/celery/issues/5327" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">https://github.com/celery/celery/issues/5327</a></li><li><a href="https://github.com/celery/celery/issues/3723" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">https://github.com/celery/celery/issues/3723</a></li></ul>



<p>Using Celery for our particular use case became difficult and unreliable. Celery’s native support for workflows doesn’t seem to be the right choice for handling 100/1,000/10,000 tasks. In its current state, it’s just not enough. So here we stand, in front of a solid, concrete wall… Either we somehow fix Celery, or we rewrite our automation using a different framework.</p>



<h2 class="wp-block-heading">Celery &#8211; to fix&#8230; or to fix?</h2>



<p>Rewriting all of our automation would be possible, although relatively painful. Since I’m a rather lazy person, perhaps attempting to fix Celery wasn’t an entirely bad idea? So I took some time to dig through Celery’s code, and managed to find the parts responsible for building workflows, and executing chains and chords. It was still a little bit difficult for me to understand all the different code paths handling the wide range of use cases, but I realised it would be possible to implement a clean, straightforward orchestration that would handle all the tasks and their combinations in the same way. What’s more, I had a glimpse that it wouldn&#8217;t take too much effort to integrate it into our automation (let’s not forget the main goal!). </p>



<p>Unfortunately, introducing new orchestration into the Celery project would probably be quite hard, and would most likely break some backwards compatibility. So I decided to take a different approach &#8211; writing an extension or a plugin that wouldn’t require changes in Celery. Something pluggable, and as non-invasive as possible. That’s how Celery Dyrygent emerged&#8230;</p>



<h2 class="wp-block-heading">Celery Dyrygent</h2>



<p><a href="https://github.com/ovh/celery-dyrygent" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">https://github.com/ovh/celery-dyrygent</a></p>



<h3 class="wp-block-heading">How to represent a workflow</h3>



<p>You can think of a workflow as a directed acyclic graph (DAG), where each task is a separate graph node. When it comes to acyclic graphs, it is relatively easy to store and resolve dependencies between nodes, which leads to straightforward orchestration. Celery Dyrygent was implemented based on these features. Each task in the workflow has an unique identifier (Celery already assigns task IDs when a task is pushed for execution) and each one of them is wrapped into a workflow node. Each workflow node consists of a task signature (a plain Celery signature) and a list of IDs for the tasks it depends on. See the example below:</p>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/03/F4601B45-EB13-4710-9325-B9684BF77918-1024x533.png" alt="" class="wp-image-17400" width="512" height="267" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/03/F4601B45-EB13-4710-9325-B9684BF77918-1024x533.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/F4601B45-EB13-4710-9325-B9684BF77918-300x156.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/F4601B45-EB13-4710-9325-B9684BF77918-768x400.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/F4601B45-EB13-4710-9325-B9684BF77918.png 1172w" sizes="auto, (max-width: 512px) 100vw, 512px" /></figure>



<h3 class="wp-block-heading">How to process a workflow</h3>



<p>So we know how to store a workflow in a clean and easy way. Now we just need to execute it. How about using&#8230; Celery? Why not? For this, Celery Dyrygent introduces a <strong>workflow processor</strong> task (an ordinary Celery task). This task wraps a whole workflow and schedules an execution of primitive tasks, according to their dependencies. Once the scheduling part is over, the task repeats itself (it &#8216;ticks&#8217; with some delay). </p>



<p>Throughout the whole processing cycle, workflow processor retains the state of the entire workflow internally. As a result, it updates the state with each repetition. You can see an orchestration example below:</p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/03/CE6EE688-92F2-4BA5-9A6B-147BD956A0F0-1024x553.png" alt="" class="wp-image-17416" width="512" height="277" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/03/CE6EE688-92F2-4BA5-9A6B-147BD956A0F0-1024x553.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/CE6EE688-92F2-4BA5-9A6B-147BD956A0F0-300x162.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/CE6EE688-92F2-4BA5-9A6B-147BD956A0F0-768x415.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/CE6EE688-92F2-4BA5-9A6B-147BD956A0F0.png 1470w" sizes="auto, (max-width: 512px) 100vw, 512px" /></figure></div>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/03/7764C3D5-1EF9-44A9-A588-4C37A275570B-1024x553.png" alt="" class="wp-image-17417" width="512" height="277" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/03/7764C3D5-1EF9-44A9-A588-4C37A275570B-1024x553.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/7764C3D5-1EF9-44A9-A588-4C37A275570B-300x162.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/7764C3D5-1EF9-44A9-A588-4C37A275570B-768x415.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/7764C3D5-1EF9-44A9-A588-4C37A275570B.png 1470w" sizes="auto, (max-width: 512px) 100vw, 512px" /></figure></div>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.ovh.com/blog/wp-content/uploads/2020/03/F2E6717E-B355-46AB-AD73-6C98B6CE4B19-1024x553.png" alt="" class="wp-image-17418" width="512" height="277" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/03/F2E6717E-B355-46AB-AD73-6C98B6CE4B19-1024x553.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/F2E6717E-B355-46AB-AD73-6C98B6CE4B19-300x162.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/F2E6717E-B355-46AB-AD73-6C98B6CE4B19-768x415.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/03/F2E6717E-B355-46AB-AD73-6C98B6CE4B19.png 1470w" sizes="auto, (max-width: 512px) 100vw, 512px" /></figure></div>



<p>Most notably, workflow processor stops its execution in two cases:</p>



<ul class="wp-block-list"><li>Once the whole workflow finishes, with all tasks successfully completed</li><li>When it can’t proceed any further, due to a failed task</li></ul>



<h3 class="wp-block-heading">How to integrate</h3>



<p>So how do we use this? Fortunately, I was able to find a way to use Celery Dyrygent quite easily. First of all, you need to inject the workflow processor task definition into your Celery applicationP:</p>



<pre class="wp-block-code"><code class="">from celery_dyrygent.tasks import register_workflow_processor
app = Celery() #  your celery application instance
workflow_processor = register_workflow_processor(app)</code></pre>



<p>Next, you need to convert your Celery defined workflow into a Celery Dyrygent workflow:</p>



<pre class="wp-block-code"><code class="">from celery_dyrygent.workflows import Workflow

celery_workflow = chain([
    noop.s(0),
    chord([noop.s(i) for i in range(10)], noop.s()),
    chord([noop.s(i) for i in range(10)], noop.s()),
    chord([noop.s(i) for i in range(10)], noop.s()),
    chord([noop.s(i) for i in range(10)], noop.s()),
    chord([noop.s(i) for i in range(10)], noop.s()),
    noop.s()
])

workflow = Workflow()
workflow.add_celery_canvas(celery_workflow)</code></pre>



<p>Finally, simply execute the workflow, just as you would an ordinary Celery task:</p>



<pre class="wp-block-code"><code class="">workflow.apply_async()</code></pre>



<p>That’s it! You can always go back if you wish, as the small changes are very easy to undo.</p>



<h3 class="wp-block-heading">Give it a try!</h3>



<p>Celery Dyrygent is free to use, and its source code is available on Github (<a href="https://github.com/ovh/celery-dyrygent" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">https://github.com/ovh/celery-dyrygent</a>). Feel free to use it, improve it, request features, and report any bugs! It has a few additional features not described here, so I&#8217;d encourage you to take a look at the project’s readme file. For our automation requirements, it&#8217;s already a solid, battle-tested solution. We’ve been using it since the end of 2018, and it has processed thousands of workflows, consisting of hundreds of thousands of tasks. Here are some productions stats, from June 2019 to February 2020:</p>



<ul class="wp-block-list"><li>936,248 elementary tasks executed</li><li>11,170 workflows processed</li><li>4,098 tasks in the biggest workflow so far</li><li>~84 tasks per workflow, on average</li></ul>



<p>Automation is always a good idea!</p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fdoing-big-automation-with-celery%2F&amp;action_name=Doing%20BIG%20automation%20with%20Celery&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Introducing Director – a tool to build your Celery workflows</title>
		<link>https://blog.ovhcloud.com/introducing-director-a-tool-to-build-your-celery-workflows/</link>
		
		<dc:creator><![CDATA[Nicolas Crocfer]]></dc:creator>
		<pubDate>Wed, 26 Feb 2020 12:38:57 +0000</pubDate>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Automation]]></category>
		<category><![CDATA[celery]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[python]]></category>
		<guid isPermaLink="false">https://www.ovh.com/blog/?p=17064</guid>

					<description><![CDATA[As developers, we often need to execute tasks in the background. Fortunately, some tools already exist for this. In the Python ecosystem, for instance, the most well-known library is Celery. If you have already used it, you know how great it is! But you will also have probably discovered how complicated it can be to [&#8230;]<img src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fintroducing-director-a-tool-to-build-your-celery-workflows%2F&amp;action_name=Introducing%20Director%20%E2%80%93%20a%20tool%20to%20build%20your%20Celery%20workflows&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></description>
										<content:encoded><![CDATA[
<p>As developers, we often need to execute tasks in the background. Fortunately, some tools already exist for this. In the Python ecosystem, for instance, the most well-known library is <a href="http://www.celeryproject.org/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Celery</a>. If you have already used it, you know how great it is! But you will also have probably discovered how complicated it can be to follow the state of a complex workflow.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="537" src="https://www.ovh.com/blog/wp-content/uploads/2020/02/7E201458-960D-44E8-8DF8-816CE1DE766E-1024x537.jpeg" alt="" class="wp-image-17224" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/02/7E201458-960D-44E8-8DF8-816CE1DE766E-1024x537.jpeg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/02/7E201458-960D-44E8-8DF8-816CE1DE766E-300x157.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/02/7E201458-960D-44E8-8DF8-816CE1DE766E-768x403.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/02/7E201458-960D-44E8-8DF8-816CE1DE766E.jpeg 1200w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure></div>



<p><strong>Celery Director</strong> is a tool we created at OVHcloud to fix this problem. The code is now open-sourced and is available on <a href="https://github.com/ovh/celery-director" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Github</a>.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="525" src="https://www.ovh.com/blog/wp-content/uploads/2020/02/director-1024x525.png" alt="" class="wp-image-17098" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/02/director-1024x525.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/02/director-300x154.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/02/director-768x394.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/02/director.png 1440w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Following the talk we did during <a href="https://fosdem.org/2020/schedule/event/python2020_celery/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">FOSDEM 2020</a>, this post aims to present the tool. We&#8217;ll take a close look at what Celery is, why we created Director, and how to use it.</p>



<h2 class="wp-block-heading">What is Celery?</h2>



<p>Here is the official description of Celery:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow"><p>Celery is an asynchronous <strong>task queue</strong>/job queue based on distributed message passing. It is focused on real-time operation, but supports scheduling as well.</p></blockquote>



<p>The important words here are &#8220;task queue&#8221;. This is a mechanism used to distribute work across a pool of machines or threads.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="572" src="https://www.ovh.com/blog/wp-content/uploads/2020/02/51EA37AB-E3E5-453F-9EFD-92414C84523F-1024x572.jpeg" alt="" class="wp-image-17220" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/02/51EA37AB-E3E5-453F-9EFD-92414C84523F-1024x572.jpeg 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/02/51EA37AB-E3E5-453F-9EFD-92414C84523F-300x168.jpeg 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/02/51EA37AB-E3E5-453F-9EFD-92414C84523F-768x429.jpeg 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/02/51EA37AB-E3E5-453F-9EFD-92414C84523F.jpeg 1156w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>The queue, in the middle of the above diagram, stores messages sent by the producers (APIs, for instance). On the other side, consumers are constantly reading the queue to display new messages and execute tasks.</p>



<p>In Celery, a message sent by the producer is the signature of a Python function: <code>send_email("john.doe")</code>, for example.</p>



<p>The queue (named <em>broker</em> in Celery) stores this signature until a worker reads it and <strong>really</strong> executes the function within the given parameter.</p>



<p>But why execute a Python function <em>somewhere else</em>? The main reason is to quickly return a response in cases of long-running functions. Indeed, it&#8217;s not an option to keep users waiting for a response for several seconds or minutes. </p>



<p>Just as we can imagine producers without enough resources, with a CPU-bound task, a more robust worker could handle its execution.</p>



<h2 class="wp-block-heading">How to use Celery</h2>



<p>So Celery is a library used to execute a Python code <em>somewhere else</em>, but how does it do that? In fact, it&#8217;s really simple! To illustrate this, we&#8217;ll use some of the available methods to send tasks to the broker, then we&#8217;ll start a worker to consume them.</p>



<p>Here is the code to create a Celery task:</p>



<pre class="wp-block-code"><code class=""># tasks.py
from celery import Celery

app = Celery("tasks", broker="redis://127.0.0.1:6379/0")

@app.task
def add(x, y):
    return x + y</code></pre>



<p>As you can see, a Celery task is just a Python function transformed to be sent in a broker. Note that we passed the redis connection to the Celery application (named app) to inform the broker where to store the messages.</p>



<p>This means it&#8217;s now possible to send a task in the broker:</p>



<pre class="wp-block-code"><code class="">>>> from tasks import add
>>> add.delay(2, 3)</code></pre>



<p>That&#8217;s all! We used the <code>.delay()</code> method, so our producer didn&#8217;t execute the Python code but instead sent the task signature to the broker.</p>



<p>Now it&#8217;s time to consume it with a Celery worker:</p>



<pre class="wp-block-code"><code class="">$ celery worker -A tasks --loglevel=INFO
[...]
[2020-02-14 17:13:38,947: INFO/MainProcess] Received task: tasks.add[0e9b6ff2-7aec-46c3-b810-b62a32188000]
[2020-02-14 17:13:38,954: INFO/ForkPoolWorker-2] Task tasks.add[0e9b6ff2-7aec-46c3-b810-b62a32188000] succeeded in 0.0024250600254163146s: 5</code></pre>



<p>It&#8217;s even possible to combine the Celery tasks with some primitives (the full list is <a href="https://docs.celeryproject.org/en/stable/userguide/canvas.html#the-primitives" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">here</a>):</p>



<ul class="wp-block-list"><li>Chain: will execute tasks one after the other.</li><li>Group: will execute tasks in parallel by routing them to multiple workers.</li></ul>



<p>For example, the following code will make two additions in parallel, then sum the results:</p>



<pre class="wp-block-code"><code class="">from celery import chain, group

# Create the canvas
canvas = chain(
    group(
        add.si(1, 2),
        add.si(3, 4)
    ),
    sum_numbers.s()
)

# Execute it
canvas.delay()</code></pre>



<p>You probably noted we didn&#8217;t use the <em>.delay()</em> method here. Instead we created a <strong>canvas</strong>, used to combine a selection of tasks.</p>



<p>The <code>.si()</code> method is used to create an immutable signature (i.e. one that does not receive data from a previous task), while <code>.s()</code> relies on the data returned by the two previous tasks.</p>



<p>This introduction to Celery has just covered its very basic usage. If you&#8217;re keen to find out more, I invite you to read the documentation, where you&#8217;ll discover all the powerful features, including <strong>rate limits</strong>, <strong>tasks retrying</strong>, or even <strong>periodic tasks</strong>. </p>



<h2 class="wp-block-heading">As a developer, I want&#8230;</h2>



<p>I&#8217;m part of a team whose goal is to deploy and monitor internal infrastructures. As part of this, we needed to launch some background tasks, and as Python developers our natural choice was to use Celery. But, out of the box, Celery didn&#8217;t supported certain specific requirements for our projects:</p>



<ul class="wp-block-list"><li>Tracking the tasks&#8217; evolution and their dependencies in a WebUI.</li><li>Executing the workflows using API calls, or simply with a CLI.</li><li>Combining tasks to create workflows in YAML format.</li><li>Periodically executing a whole workflow.</li></ul>



<p>Some other cool tools exist for this, like Flower, but this only allows us to track each task individually, not a whole workflow and its component tasks.</p>



<p>And as we really needed these features, we decided to create <a href="https://github.com/ovh/celery-director" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Celery Director</a>.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img loading="lazy" decoding="async" width="377" height="377" src="https://www.ovh.com/blog/wp-content/uploads/2020/02/2E75457D-256F-4CB9-942B-B1B8C00CF79B.png" alt="" class="wp-image-17222" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/02/2E75457D-256F-4CB9-942B-B1B8C00CF79B.png 377w, https://blog.ovhcloud.com/wp-content/uploads/2020/02/2E75457D-256F-4CB9-942B-B1B8C00CF79B-300x300.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/02/2E75457D-256F-4CB9-942B-B1B8C00CF79B-150x150.png 150w" sizes="auto, (max-width: 377px) 100vw, 377px" /></figure></div>



<h2 class="wp-block-heading">How to use Director</h2>



<p>The installation can be done using the <code>pip</code>command:</p>



<pre class="wp-block-code"><code class="">$ pip install celery-director</code></pre>



<p>Director provides a simple command to create a new workspace folder:</p>



<pre class="wp-block-code"><code class="">$ director init workflows
[*] Project created in /home/ncrocfer/workflows
[*] Do not forget to initialize the database
You can now export the DIRECTOR_HOME environment variable</code></pre>



<p>A new tasks folder and a workflow example has been created for you below:</p>



<pre class="wp-block-code"><code class="">$ tree -a workflows/
├── .env
├── tasks
│   └── etl.py
└── workflows.yml</code></pre>



<p>The <code>tasks/*.py</code> files will contain your Celery tasks, while the <code>workflows.yml</code> file will combine them:</p>



<pre class="wp-block-code"><code class="">$ cat workflows.yml
---
ovh.SIMPLE_ETL:
  tasks:
    - EXTRACT
    - TRANSFORM
    - LOAD</code></pre>



<p>This example, named <strong>ovh.SIMPLE_ETL</strong>, will execute three tasks, one after the other. You can find more examples in the <a href="https://ovh.github.io/celery-director/guides/build-workflows/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">documentation</a>.</p>



<p>After exporting the <code>DIRECTOR_HOME</code> variable and initialising the database with <code>director db upgrade</code>, you can execute this workflow :</p>



<pre class="wp-block-code"><code class="">$ director workflow list
+----------------+----------+-----------+
| Workflows (1)  | Periodic | Tasks     |
+----------------+----------+-----------+
| ovh.SIMPLE_ETL |    --    | EXTRACT   |
|                |          | TRANSFORM |
|                |          | LOAD      |
+----------------+----------+-----------+
$ director workflow run ovh.SIMPLE_ETL</code></pre>



<p>The broker has received the tasks, so now you can launch the Celery worker to execute them:</p>



<pre class="wp-block-code"><code class="">$ director celery worker --loglevel=INFO</code></pre>



<p>And then display the results using the webserver command (<code>director webserver</code>):</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="530" src="https://www.ovh.com/blog/wp-content/uploads/2020/02/director_etl-1024x530.png" alt="" class="wp-image-17094" srcset="https://blog.ovhcloud.com/wp-content/uploads/2020/02/director_etl-1024x530.png 1024w, https://blog.ovhcloud.com/wp-content/uploads/2020/02/director_etl-300x155.png 300w, https://blog.ovhcloud.com/wp-content/uploads/2020/02/director_etl-768x397.png 768w, https://blog.ovhcloud.com/wp-content/uploads/2020/02/director_etl.png 1440w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>This is just the beginning, as Director provides other features, allowing you to parametrise a workflow or periodically execute it, for example. You will find more details on these features in the <a href="https://ovh.github.io/celery-director/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">documentation</a>.</p>



<h2 class="wp-block-heading">Conclusion</h2>



<p>Our teams use Director regularly to launch our workflows. No more boilerplating, and no more need for advanced Celery knowledge&#8230; A new colleague can easily create its tasks in Python and combine them in YAML, without using the Celery primitives discussed earlier.</p>



<p>Sometimes we need to execute a workflow periodically (to populate a cache, for instance), and sometimes we need to manually call it from another web service (note that a workflow can also be executed through an <a href="https://ovh.github.io/celery-director/guides/run-workflows/#using-the-api" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">API call</a>). This is now possible using our single Director instance.</p>



<p>We invite you to try Director for yourself, and give us your feedback via Github, so we can continue to enhance it. The source code can be found in <a href="https://github.com/ovh/celery-director" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">Github</a>, and the 2020 FOSDEM presentation is available <a href="https://fosdem.org/2020/schedule/event/python2020_celery/" data-wpel-link="external" target="_blank" rel="nofollow external noopener noreferrer">here</a>.</p>
<img loading="lazy" decoding="async" src="//blog.ovhcloud.com/wp-content/plugins/matomo/app/matomo.php?idsite=1&amp;rec=1&amp;url=https%3A%2F%2Fblog.ovhcloud.com%2Fintroducing-director-a-tool-to-build-your-celery-workflows%2F&amp;action_name=Introducing%20Director%20%E2%80%93%20a%20tool%20to%20build%20your%20Celery%20workflows&amp;urlref=https%3A%2F%2Fblog.ovhcloud.com%2Ffeed%2F" style="border:0;width:0;height:0" width="0" height="0" alt="" />]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
