Today, most applications rely directly or indirectly on databases. I would even take a bet and say that a large portion of those are relational databases. At OVHcloud, we rely on several dozens of clusters hosting hundreds of databases to power thousands of applications. Most of those databases power our API, host billing information and customer details.
As part of the team responsible for this infrastructure, I can say it is a huge responsibility to maintain such a critical part of the beating heart of OVHcloud.
In this new series of blog posts we will take a closer look at OVHcloud internal relational databases infrastructure. This first post is about the infrastructure of the internal databases. At OVHcloud, we use 3 majors DBMS (database management systems), PostgreSQL MariaDB and MySQL, every one of them relying on the same cluster architecture.
But first, what exactly is a cluster? A cluster is a group of nodes (physical or virtual) working together to provide a SQL service.
At OVHcloud we have an open source and “do it yourself” culture. It allow us to control our costs and more importantly to master the technologies we rely on.
That’s why during the last 2 years we designed, deployed, improved and ran failure-proof cluster topologies, then industrialised them. To satisfy our reliability, performance and functional requirements, we decided on a common topology for all these clusters. Let’s find out what it looks like!
Each cluster is composed of 3 nodes, with each node its role – primary, replica and backup.
The primary node assumes read-write workloads, while the replica(s) only handle read-only queries. When the primary node fails, we promote a replica node to become the primary node. Because in the vast majority of cases, databases handle much more read-only than read-write queries, replica nodes can be added to scale the cluster’s read-only capabilities. This is called horizontal scaling. Our last node is dedicated to backup operations. Backups are incredibly important we will talk a bit more about them later.
Because every node in the cluster can be promoted to primary, they need to be able to handle the same workload. Thus, they must have exactly the same resources (CPU, RAM, disk, network …). This is particularly important when we need to promote a replica because it will have to handle the same workload. In this case, having primary and replica not equally sized can be disastrous for your workload. With our clusters up and running we can start querying them. Each cluster can host one or more databases depending on several factor such as infrastructure cost and workload types (business critical or not, transactional or analytic…).
Thus, a single cluster can host from only one big database to tens of smaller ones. In this context, small and big are not only defined by the quantity of data but also by the expected frequency of queries. For this reason, we carefully tailor each cluster to provision them accordingly. When a database grows and the cluster is no longer appropriately sized, we migrate the database to a new cluster.
Aside from production we have another smaller environment that fulfils two needs. This is our development environment. We use it to test our backups and to provide our developers with a testing environment. We will get back to this matter in just a few lines.
Now let us talk about backups. As I mentioned earlier, backups are a critical part of enterprise-grade databases. To avoid having to maintain different processes for different DBMS flavors, we designed a generic backup process that we apply to all of those.
This allowed us to automate it more efficiently and abstract the complexity behind different software.
As you have probably guessed by now, backups are performed by the backup node. This node is part of the cluster and data is synchronously replicated on it, but it does not receive any query. When a snapshot is performed, the DBMS process is stopped and a snapshot of the filesystem is taken and sent to a storage server outside of the cluster for archival and resiliency. We use ZFS for this purpose because of its robustness and because of the incremental bandwidth which reduces the storage costs associated with snapshot archival.
But the main reason for having a separate backup node is the following: the cluster is not affected in any way by the backup. Indeed, backing up a full database can have a very visible impact on production (locks, CPU and RAM consumption etc…), and we don’t want that on production nodes.
But backups are useless if they can’t be restored. Therefore, every day, we restore the last snapshot of each cluster on a separate, dedicated node. This allows us to kill two birds with one stone, as this freshly restored backup is also used by our developers team to have an almost up-to-date development environment in addition to making sure we are able to restore backups.
To summarise: our database clusters are modular but follow a common topology. Clusters can host a variable number of databases depending on their expected workloads. Each of these databases scale horizontally for read-only operations by proposing different connections for read-only and read-writes operations. Furthermore, backup nodes are used to having regular backups without impacting the production databases. Internally, these backups are then restored on separate nodes as a fresh development environment.
This completes our tour of OVHcloud’s Internal Database Infrastructure and you are now all set for the next post which will be about replication. Stay tuned!
After 10 years as a Sysadmin in High Performance Computing, Wilfried Roset is now part of OVHcloud as Engineering Manager for their Databases product Unit. He focuses on industrialization, reliability and performances for both internal and public clusters offers.