ceph Archives - OVHcloud Blog

Proxmox VE Ceph cluster and DRP using OVHcloud dedicated servers

Carles Munoz and Cristina Ortiz — Mon, 10 Jun 2024 13:59:17 +0000

OVHcloud’s IaaS (Infraestructure as a Service) services allow us to hire dedicated servers (bare metal) for rent that we can have at our disposal in a matter of minutes. There is a wide range of options (rise, advance, storage, scale, high quality, etc.) from which we can choose according to our needs.

In this article we will see how we can use these dedicated servers to create Proxmox VE Ceph Clusters with the same functionalities that we would have if we used proprietary servers in our facilities (on premises). In addition, we will see how OVHcloud services allow us to create a very complete DRP (Disaster Recovery Plan) to achieve maximum resilience for our data

OVHCloud – Dedicated Servers

Ceph Proxmox VE Cluster

A cluster of Proxmox VE servers combined with a Ceph distributed storage system allows you to create a highly available, load-balanced, horizontally scalable, hyperconverged virtualization infrastructure with ease.

Let’s first look at some concepts to fully understand what a Proxmox VE Ceph Cluster is.

What is a cluster?

A cluster in computing refers to a group of interconnected computers or nodes that function together as if they were a single entity. Clusters are used to improve the availability, performance, and scalability of applications and services. There are different types of clusters, but in general they share the objective of providing greater processing capacity and redundancy.

What is Ceph?

Ceph is a distributed storage system designed to provide object, block, and file storage in a single unified cluster. . Proxmox can use Ceph as a virtual machines storege.

What is a Proxmox VE Ceph cluster?

It is three or more servers forming part of a Proxmox cluster and using Ceph as a distributed storage system, all managed from the Proxmox web interface, thanks to which we achieve a hyperconverged virtualization infrastructure.

What is hyperconvergence?

A hyperconverged virtualization infrastructure is an integrated system that combines compute, storage, and networking in a single environment. This simplifies management, improves efficiency, and enables easy scalability, making it easy to create and manage virtual machines in a single cluster.

Proxmox VE Ceph Cluster

By means of the OVHcloud dedicated server service we can create a Proxmox VE Ceph cluster with three or more nodes in the same way that we would do if we acquired our own servers to create the cluster in our facilities, but with the versatility and advantages that come with using rental servers instead of proprietary hardware, among which we can mention:

Abstraction of the hardware layer since any breakdown will be solved by OVHcloud and, if necessary, they will replace the damaged parts or even the entire server.
We forget about hardware obsolescence. After a few years we will be able to add new servers with the latest technologies to our cluster and eliminate the old ones in a completely transparent way for the user of the virtualization environment, without any interruption of service.
Easily add more nodes to our virtualization cluster to increase your computing power. In a matter of minutes we can have new servers that we can add to our cluster.

The large number of dedicated server options that OVHcloud makes available to its clients allows us to create Proxmox VE Ceph Clusters for practically any client. We can use different CPUs depending on our needs, large amounts of RAM, large NVMe disks, dedicated 25Gbps networks for Ceph communication, the vRACK service to connect our servers, dedicated IP ranges, etc. All this allows us to cover practically the needs of any client.

OVHCloud – A2-Scale Servers

DRP (Disaster Recovery Plan)

A DRP (Disaster Recovery Plan) is crucial to maintaining business operations in the event of disasters, guaranteeing the continuity and protection of essential data. It is very important to have a good DRP to ensure the resilience of the data.

What is a Disaster Recovery Plan?

Disaster Recovery Plan in IT refers to a set of strategies, policies and procedures that an organization implements to restore its critical systems and data after a catastrophic event or disaster that causes significant disruptions to operations. normal.que cause interrupciones significativas en las operaciones normales.

These events may include:

Natural disasters: Such as earthquakes, floods, storms, etc.
Man-made disasters: Such as cyber attacks, infrastructure failures, acts of vandalism, etc.

The goal of the Disaster Recovery Plan is to minimize downtime and ensure business continuity, allowing the organization to recover quickly after a disaster.

DRP (Disaster Recovery Plan)

Below we will list several options, taking into account that our DRP may have some of these or even a combination of all of them, depending on the level of data resilience desired. Option 1: Proxmox VE Ceph Cluster distributed in OVHcloud 3-AZ region Our partner OVHcloud has a service called 3-AZ region thanks to which we can distribute a Proxmox VE Ceph Cluster made up of dedicated servers (bare metal) between three different data centers separated by a few tens of kilometers. These data centers that make up the 3-AZ region are interconnected through redundant fibers with minimal latency, thanks to which we can set up a Proxmox VE Ceph Cluster distributed within said region. This provides us with great resilience of the data against incidents located in one of the data centers since our virtualization service will not be affected by it.

OVHcloud – 3-AZ Region

Option 2: Proxmox Backup Server with frequent replication Make use of a Proxmox Backup Server (PBS) or by means of our PBS Online service in a data center (even a different country or continent) where you have a backup copy of all virtual machines, maintaining a history of various versions over time depending on the space available for copies. For those virtual machines that are more critical, you can even make more frequent copies (for example, every hour) so that if you have to activate the DRP, the data loss is as little as possible. This option can be implemented using the PBS Online service, the OVHcloud IaaS service to rent a dedicated server on which to install the PBS or any other cloud service or even own facilities in which to locate the PBS.

Option 3: Ceph Replicación Create a Proxmox VE Ceph Cluster identical to the production one in a data center geographically separated from the main cluster and activate Ceph replication between both clusters. This is the option with the least data loss if we compare it with option 2, but much more expensive since we have to have a cluster equal to the main cluster that we will only activate in case of disaster. This option can be implemented using the OVHcloud IaaS service, given that they have data centers distributed in several countries and continents. Therefore, it would be viable to host the main cluster in a data center and the replication cluster located in another country. It can also be implemented using different cloud providers and also on the customer’s premises if he has geographically separated data centers.

Conclusion

As we have seen throughout this article, we can create a hyperconverged virtualization infrastructure with great data resilience using the Proxmox VE hypervisor and dedicated OVHCloud servers. A Proxmox VE Ceph Cluster with three or more nodes located in a 3-AZ region of OVHcloud, in combination with a PBS (Proxmox Backup Server) hosted in a different data center and country is a highly available solution, with high scalability and with great data resilience. If we also add an identical cluster in another data center outside the 3-AZ region with real-time Ceph replication, we can have a DRP (Disaster Recovery Plan) that allows rapid disaster recovery.

Journey to next-gen Ceph storage at OVHcloud with LXD

Filip Dorosz — Mon, 15 Jun 2020 14:35:12 +0000

Introduction

My name is Filip Dorosz. I’ve been working at OVHcloud since 2017 as a DevOps Engineer. Today I want to tell you a story of how we deployed next-gen Ceph at OVHcloud. But first, a few words about Ceph: Ceph is a software defined storage solution that powers OVHcloud’s additional Public Cloud volumes as well as our product Cloud Disk Array. But I won’t bore you with the marketing stuff – let the story begin!

This looks like an interesting task…

One and a half years ago we started a very familiar sprint. Aside from usual stuff that we have to deal with, there were one task that looked a little more interesting. The title read: “Evaluate whether we can run newer versions of Ceph on our current software”. We needed newer versions of Ceph and Bluestore to create a next-gen Ceph solution with all flash storage.

Our software solution (which we call a legacy solution) is based on Docker. It sounds really cool but we run Docker a bit differently from it’s intended purpose. Our containers are very stateful. We run a full blown init system inside the container as docker entry point. And that init system then starts all the software we need inside the container, including Puppet which we use to manage the “things”. It sounds like we’re using Docker containers similarly to LXC containers doesn’t it?…

Our legacy Ceph infrastructure (allegory)

It quickly turned out that it is not possible to run newer Ceph releases in our in-house solution because newer versions of Ceph make use of systemd and in our current solution we don’t run systemd at all – not inside the containers and not on the hosts that host them.

The hunt for solutions began. One possibility was to package Ceph ourselves and get rid of systemd, but that’s a lot of work with little added value. Ceph community provides tested packages which need to be taken advantage of, so that option was off the table.

Second option was to run Ceph with supervisord inside the Docker container. While it sounds like a plan, even supervisord docs clearly states that supervisord “is not meant to be run as a substitute for init as “process id 1”.”. So that was clearly not an option either.

We needed systemd!

At this point, it was clear that we needed a solution that enables us to run systemd inside the container as well as on the host. It sounded like a perfect time to switch to a brand new solution – a solution that was designed to run a full OS inside the container. As our Docker used LXC backend it was a natural choice to evaluate LXC. It had all the features we need but with LXC we would have to code all the container-related automation ourselves. But could all this additional work be avoided? It turns out it could…

As I used LXD in my previous project I knew it is capable of managing images, networks, block devices and all the nice features that are needed to setup a fully functional Ceph cluster.

So I reinstalled my developer servers with an Ubuntu Server LTS release and installed LXD on them.

LXD has everything that was needed to create fully functional Ceph clusters:

it supports ‘fat’ stateful containers,
it supports systemd inside the container,
it supports container images so we can prepare customized images and use them without hassle,
passing whole block devices to containers,
passing ordinary directories to containers,
support for easy container start, stop, restart,
REST API that will be covered in later parts of the article,
support for multiple network interfaces within containers using macvlan.

After just a few hours of manual work I had Ceph cluster running Mimic release inside LXD containers. I typed ceph health and I got ‘HEALTH_OK’. Nice! It worked great.

How do we industrialize that?

To industrialize it and plug it into our Control Plane we needed a Puppet module for LXD so Puppet could manage all the container related elements on the host. There was no such module that provided the functionality we needed so we needed to code it ourselves.

LXD daemon exposes handy REST API that we utilized to create our Puppet module. You can talk to the API locally over unix socket and through the network if you configure to expose it. For usage within the module it was really convenient to use lxc query command which works by sending raw queries to LXD over unix socket. The module is now opensource on GitHub so you can download and play with it. It enables you to configure basic LXD settings as well as create containers, profiles, storage pools etc.

The module allows you to create, as well as manage the state of the resources. Just change your manifests, run puppet agent, and it will do the rest.

The LXD Puppet module as of writing this provides the following defines:

lxd::profile
lxd::image
lxd::storage
lxd::container

For full reference please check out its GitHub page.

Manual setup VS Automatic setup with Puppet

I will show you a simple example of how to create the exact same setup manually and then again automatically with Puppet. For the purpose of this article I created a new Public Cloud instance with Ubuntu 18.04, one additional disk and already configured bridge device br0. Lets assume there is also a DHCP server listening on the br0 interface.

It’s worth noting that generally you don’t need to create your own image, you can just use the upstream ones with built-in commands. But for the purpose of this article, lets create a custom image that will be exactly like upstream. To create such image you just have to type some commands to repack upstream image into Unified Tarball.

root@ubuntu:~# wget https://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-amd64-root.tar.xz
root@ubuntu:~# wget https://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-amd64-lxd.tar.xz
root@ubuntu:~# mkdir -p ubuntu1804/rootfs
root@ubuntu:~# tar -xf bionic-server-cloudimg-amd64-lxd.tar.xz -C ubuntu1804/
root@ubuntu:~# tar -xf bionic-server-cloudimg-amd64-root.tar.xz -C ubuntu1804/rootfs/
root@ubuntu:~# cd ubuntu1804/
root@ubuntu:~/ubuntu1804# tar -czf ../ubuntu1804.tar.gz *

You will end with a ubuntu1804.tar.gz image that can be used with LXD. For the purpose of this article I’ve put this image in a directory reachable through HTTP for example: http://example.net/lxd-image/

Manual setup

First thing first lets install LXD.

root@ubuntu:~# apt install lxd

During package install you will be greeted with the message: “To go through the initial LXD configuration, run: lxd init” but we will just do the steps manually.

Next step is to add the new storage pool.

root@ubuntu:~# lxc storage create default dir source=/var/lib/lxd/storage-pools/default
Storage pool default create

Next, create a custom profile that will have: environment variable http_proxy set to ”, 2GB memory limit, roofs on default storage-pool and eth0 that will be part of bridge br0.

root@ubuntu:~# lxc profile create customprofile
Profile customprofile created
root@ubuntu:~# lxc profile device add customprofile root disk path=/ pool=default
Device root added to customprofile
root@ubuntu:~# lxc profile device add customprofile eth0 nic nictype=bridged parent=br0
Device eth0 added to customprofile
root@ubuntu:~# lxc profile set customprofile limits.memory 2GB

Lets print out the whole profile to check if its ok:

root@ubuntu:~# lxc profile show customprofile
config:
  environment.http_proxy: ""
  limits.memory: 2GB
description: ""
devices:
  eth0:
    nictype: bridged
    parent: br0
    type: nic
  root:
    path: /
    pool: default
    type: disk
name: customprofile
used_by: []

Then lets fetch the LXD image in the Unified Tarball format:

root@ubuntu:~# wget -O /tmp/ubuntu1804.tar.gz http://example.net/lxd-images/ubuntu1804.tar.gz

And import it:

root@ubuntu:~# lxc image import /tmp/ubuntu1804.tar.gz --alias ubuntu1804
Image imported with fingerprint: dc6f4c678e68cfd4d166afbaddf5287b65d2327659a6d51264ee05774c819e70

Once we have everything in place lets create our first container:

root@ubuntu:~# lxc init ubuntu1804 container01 --profile customprofile
Creating container01

Now lets add some host directories to the container:
Please note that you have to set proper owner of the directory on the host!

root@ubuntu:~# mkdir /srv/log01
root@ubuntu:~# lxc config device add container01 log disk source=/srv/log01 path=/var/log/

And as a final touch add a host’s partition to the container:

root@ubuntu:~# lxc config device add container01 bluestore unix-block source=/dev/sdb1 path=/dev/bluestore

/dev/sdb1 will be available inside the container. We use it for passing devices for Ceph’s Bluestore to the container.

The container is ready to be started.

root@ubuntu:~# lxc start container01

Voila! Container is up and running. We setup our containers very similarly to the above.

Although it was quite easy to setup the above by hand. For a massive deployment, you need to automate things. So now lets do the above using our LXD Puppet module.

Automatic setup with Puppet

To make use of the module, download it to your puppet server and place it in the module path.

Then, create a new class or add it to one of the existing classes; whatever suits you.

I plugged it into my puppet server. Please note that I am using bridge device br0 that was prepared earlier by other modules and that LXD images are hosted on a webserver http://example.net/lxd-images/ as Unified Tarballs.

Full example module that makes use of LXD Puppet module:

class mymodule {
 
    class {'::lxd': }
 
    lxd::storage { 'default':
        driver => 'dir',
        config => {
            'source' => '/var/lib/lxd/storage-pools/default'
        }
    }
 
    lxd::profile { 'exampleprofile':
        ensure  => 'present',
        config  => {
            'environment.http_proxy' => '',
            'limits.memory' => '2GB',
        },
        devices => {
            'root' => {
                'path' => '/',
                'pool' => 'default',
                'type' => 'disk',
            },
            'eth0' => {
                'nictype' => 'bridged',
                'parent'  => 'br0',
                'type'    => 'nic',
            }
        }
    }
 
    lxd::image { 'ubuntu1804':
        ensure      => 'present',
        repo_url    => 'http://example.net/lxd-images/',
        image_file  => 'ubuntu1804.tar.gz',
        image_alias => 'ubuntu1804',
    }
 
    lxd::container { 'container01':
        state   => 'started',
        config  => {
            'user.somecustomconfig' => 'My awesome custom env variable',
        },
        profiles => ['exampleprofile'],
        image   => 'ubuntu1804',
        devices => {
            'log'  => {
                'path'   => '/var/log/',
                'source' => '/srv/log01',
                'type'   => 'disk',
            },
            'bluestore' => {
                'path'   => '/dev/bluestore',
                'source' => '/dev/sdb1',
                'type'   => 'unix-block',
            }
        }
    }
}

Now the only thing left to do is to run puppet agent on the machine. It will apply desired state:

root@ubuntu:~# puppet agent -t
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Retrieving locales
Info: Loading facts
Info: Caching catalog for ubuntu.openstacklocal
Info: Applying configuration version '1588767214'
Notice: /Stage[main]/Lxd::Install/Package[lxd]/ensure: created
Notice: /Stage[main]/Lxd::Config/Lxd_config[global_images.auto_update_interval]/ensure: created
Notice: /Stage[main]/Mymodule/Lxd::Storage[default]/Lxd_storage[default]/ensure: created
Notice: /Stage[main]/Mymodule/Lxd::Profile[exampleprofile]/Lxd_profile[exampleprofile]/ensure: created
Notice: /Stage[main]/Mymodule/Lxd::Image[ubuntu1804]/Exec[lxd image present http://example.net/lxd-images//ubuntu1804.tar.gz]/returns: executed successfully
Notice: /Stage[main]/Mymodule/Lxd::Container[container01]/Lxd_container[container01]/ensure: created
Notice: Applied catalog in 37.56 seconds

In the end you will have new container up and running:

root@ubuntu:~# lxc ls
+-------------+---------+--------------------+------+------------+-----------+
|    NAME     |  STATE  |        IPV4        | IPV6 |    TYPE    | SNAPSHOTS |
+-------------+---------+--------------------+------+------------+-----------+
| container01 | RUNNING | 192.168.0.5 (eth0) |      | PERSISTENT | 0         |
+-------------+---------+--------------------+------+------------+-----------+

Because you can expose custom environment variables in the container, it opens a lot of possibilities to configure new containers.

How good is that!?

I encourage everyone to contribute to the module or give it a star on GitHub if you find it useful.

Plans for the future

After extensive testing we are sure that everything works as intended and we were confident that we can go to prod with the new solution with Ceph storage on all flash based storage, without HDDs.

In the future, we plan to migrate all our legacy infrastructure to the new LXD based solution. It will be a mammoth project to migrate, with over 50PB that sit on over 2000 dedicated servers, but that’s a story for another time.

Doing BIG automation with Celery

Bartosz Rabiega — Fri, 06 Mar 2020 16:14:18 +0000

Intro

TL;DR: You might want to skip the intro and jump right into “Celery – Distributed Task Queue”.

Hello! I’m Bartosz Rabiega, and I’m part of the R&D/DevOps teams at OVHcloud. As part of our daily work, we’re developing and maintaining the Ceph-as-a-Service project, in order to provide highly available, solid, distributed storage for various applications. We’re dealing with 60PB+ of data, across 10 regions, so as you might imagine, we’ve got quite a lot of work ahead in terms of replacing broken hardware, handling natural growth, provisioning new regions and datacentres, evaluating new hardware, optimising software and hardware configurations, researching new storage solutions, and much more!

Because of the wide scope of our work, we need to offload as many repetitive tasks as possible. And we do that through automation.

Automating your work

To some extent, every manual process can be described as set of actions and conditions. If we somehow managed to force something to automatically perform the actions and check the conditions, we would be able to automate the process, resulting in an automated workflow. Take a look at the example below, which shows some generic steps for manually replacing hardware in our project.

Hmm… What could help us do this automatically? Doesn’t a computer sound like a perfect fit? 🙂 There are many ways to force computers to process automated workflows, but first we need to define some building blocks (let’s call them tasks) and get them to run sequentially or in parallel (i.e. a workflow). Fortunately, there are software solutions that can help with that, among which is Celery.

Celery – Distributed Task Queue

Celery is a well-known and widely adopted piece of software that allows us to process tasks asynchronously. The description of the project on its main page (http://www.celeryproject.org/) may sound a little bit enigmatic, but we can narrow down its basic functionality to something like this:

Such machinery is perfectly suited to tasks like sending emails asynchronously (i.e. ‘fire and forget’), but it can also be used for different purposes. So what other tasks could it handle? Basically, any tasks you can implement in Python (the main Celery language)! I won’t go too much into the details, as they are available in the Celery documentation. What matters is that since we can implement any task we want, we can use that to create the building blocks for our automation.

There is one more important thing… Celery natively supports combining such tasks into workflows (Celery primitives: chains, groups, chords, etc.). So let’s get through some examples…

We’ll use the following task definitions – single task, printing args and kwargs:

@celery_app.task
def noop(*args, **kwargs):
    # Task accepts any arguments and does nothing
    print(args, kwargs)
    return True

Now we can execute the task asynchronously, using the following code:

task = noop.s(777)
task.apply_async()

The elementary tasks can be parametrised and combined into a complex workflow using celery methods, i.e. “chain”, “group”, and “chord”. See the examples below. In each of them, the left side shows a visual representation of a workflow, while the right side shows the code snippet that generates it. The green box is the starting point, after which the workflow execution progresses vertically.

Chain – a set of tasks processed sequentially

workflow = (
    chain([noop.s(i) for i in range(3)])
)

Group – a set of tasks processed in parallel

workflow = (
    group([noop.s(i) for i in range(5)])
)

Chord – a group of tasks chained to the following task

workflow = chord(
        [noop.s(i) for i in range(5)],
        noop.s(i)
)

# Equivalent:
workflow = chain([
        group([noop.s(i) for i in range(5)]),
        noop.s(i)
])

An important point: the execution of a workflow will always stop in the event of a failed task. As a result, a chain won’t be continued if some task fails in the middle of it. This gives us quite a powerful framework for implementing some neat automation, and that’s exactly what we’re using for Ceph-as-a-Service at OVHcloud! We’ve implemented lots of small, flexible, parameterisable tasks, which we combine together to reach a common goal. Here are some real-life examples of elementary tasks, used for the automatic removal of old hardware:

Change weight of Ceph node (used to increase/decrease the amount of data on node. Triggers data rebalance)
Set service downtime (data rebalance triggers monitoring probes, but this is expected, so set downtime for this particular monitoring entry)
Wait until Ceph is healthy (wait until the data rebalance is complete – repeating task)
Remove Ceph node from a cluster (node is empty so it can simply be uninstalled)
Send info to technicians in DC (hardware is ready to be replaced)
Add new Ceph node to a cluster (install new empty node)

We parametrise these tasks and tie them together, using Celery chains, groups and chords to create the desired workflow. Celery then does the rest by asynchronously executing the workflow.

Big workflows and Celery

As our infrastructure grows, so doo our automated workflows grow, with more tasks per workflow, higher complexity of workflows… What do we understand as a big workflow? A workflow consisting of 1,000-10,000 tasks. Just to visualize it take a look on following examples:

A few chords chained together (57 tasks in total)

workflow = chain([
    noop.s(0),
    chord([noop.s(i) for i in range(10)], noop.s()),
    chord([noop.s(i) for i in range(10)], noop.s()),
    chord([noop.s(i) for i in range(10)], noop.s()),
    chord([noop.s(i) for i in range(10)], noop.s()),
    chord([noop.s(i) for i in range(10)], noop.s()),
    noop.s()
])

More complex graph structure built from chains and groups (23 tasks in total)

# | is ‘chain’ operator in celery
workflow = (
    group(
        group(
            group([noop.s() for i in range(5)]),
            chain([noop.s() for i in range(5)])
        ) |
        noop.s() |
        group([noop.s() for i in range(5)]) |
        noop.s(),
        chain([noop.s() for i in range(5)])
    ) |
    noop.s()
)

As you can probably imagine, visualisations get quite big and messy when 1,000 tasks are involved! Celery is a powerful tool, and has lots of features that are well-suited for automation, but it still struggles when it comes to processing big, complex, long-running workflows. Orchestrating the execution of 10,000 tasks, with a variety of dependencies, is no trivial thing. There are several issues we encountered when our automation grew too big:

Memory issues during workflow building (client side)
Serialisation issues (client -> Celery backend transfer)
Nondeterministic, broken execution of workflows
Memory issues in Celery workers (Celery backend)
Disappearing tasks
And more…

Take a look at some GitHub tickets:

Using Celery for our particular use case became difficult and unreliable. Celery’s native support for workflows doesn’t seem to be the right choice for handling 100/1,000/10,000 tasks. In its current state, it’s just not enough. So here we stand, in front of a solid, concrete wall… Either we somehow fix Celery, or we rewrite our automation using a different framework.

Celery – to fix… or to fix?

Rewriting all of our automation would be possible, although relatively painful. Since I’m a rather lazy person, perhaps attempting to fix Celery wasn’t an entirely bad idea? So I took some time to dig through Celery’s code, and managed to find the parts responsible for building workflows, and executing chains and chords. It was still a little bit difficult for me to understand all the different code paths handling the wide range of use cases, but I realised it would be possible to implement a clean, straightforward orchestration that would handle all the tasks and their combinations in the same way. What’s more, I had a glimpse that it wouldn’t take too much effort to integrate it into our automation (let’s not forget the main goal!).

Unfortunately, introducing new orchestration into the Celery project would probably be quite hard, and would most likely break some backwards compatibility. So I decided to take a different approach – writing an extension or a plugin that wouldn’t require changes in Celery. Something pluggable, and as non-invasive as possible. That’s how Celery Dyrygent emerged…

Celery Dyrygent

https://github.com/ovh/celery-dyrygent

How to represent a workflow

You can think of a workflow as a directed acyclic graph (DAG), where each task is a separate graph node. When it comes to acyclic graphs, it is relatively easy to store and resolve dependencies between nodes, which leads to straightforward orchestration. Celery Dyrygent was implemented based on these features. Each task in the workflow has an unique identifier (Celery already assigns task IDs when a task is pushed for execution) and each one of them is wrapped into a workflow node. Each workflow node consists of a task signature (a plain Celery signature) and a list of IDs for the tasks it depends on. See the example below:

How to process a workflow

So we know how to store a workflow in a clean and easy way. Now we just need to execute it. How about using… Celery? Why not? For this, Celery Dyrygent introduces a workflow processor task (an ordinary Celery task). This task wraps a whole workflow and schedules an execution of primitive tasks, according to their dependencies. Once the scheduling part is over, the task repeats itself (it ‘ticks’ with some delay).

Throughout the whole processing cycle, workflow processor retains the state of the entire workflow internally. As a result, it updates the state with each repetition. You can see an orchestration example below:

Most notably, workflow processor stops its execution in two cases:

Once the whole workflow finishes, with all tasks successfully completed
When it can’t proceed any further, due to a failed task

How to integrate

So how do we use this? Fortunately, I was able to find a way to use Celery Dyrygent quite easily. First of all, you need to inject the workflow processor task definition into your Celery applicationP:

from celery_dyrygent.tasks import register_workflow_processor
app = Celery() #  your celery application instance
workflow_processor = register_workflow_processor(app)

Next, you need to convert your Celery defined workflow into a Celery Dyrygent workflow:

from celery_dyrygent.workflows import Workflow

celery_workflow = chain([
    noop.s(0),
    chord([noop.s(i) for i in range(10)], noop.s()),
    chord([noop.s(i) for i in range(10)], noop.s()),
    chord([noop.s(i) for i in range(10)], noop.s()),
    chord([noop.s(i) for i in range(10)], noop.s()),
    chord([noop.s(i) for i in range(10)], noop.s()),
    noop.s()
])

workflow = Workflow()
workflow.add_celery_canvas(celery_workflow)

Finally, simply execute the workflow, just as you would an ordinary Celery task:

workflow.apply_async()

That’s it! You can always go back if you wish, as the small changes are very easy to undo.

Give it a try!

Celery Dyrygent is free to use, and its source code is available on Github (https://github.com/ovh/celery-dyrygent). Feel free to use it, improve it, request features, and report any bugs! It has a few additional features not described here, so I’d encourage you to take a look at the project’s readme file. For our automation requirements, it’s already a solid, battle-tested solution. We’ve been using it since the end of 2018, and it has processed thousands of workflows, consisting of hundreds of thousands of tasks. Here are some productions stats, from June 2019 to February 2020:

936,248 elementary tasks executed
11,170 workflows processed
4,098 tasks in the biggest workflow so far
~84 tasks per workflow, on average

Automation is always a good idea!