OVHcloud Predictor, part 1

In our previous article concerning the CVE-2017-9841 vulnerability, we presented our web application firewall (WAF) implemented with NAXSI.

Usually, a WAF is run directly on the web server. At OVHcloud, we chose to run our web application firewall upstream, on a very powerful software layer that is specific to our web hosting infrastructures. These are the ‘Predictors’.

If you would like to learn more about them, this article focuses on them in detail.

They are a very crucial part of our infrastructure, and they’re like heroes you’d read about in fiction — but absolutely real!

Before we start describing the role of Predictors and how they work, we will need to understand how this infrastructure operates, powering 6 million websites.

To work on the internet, websites need a computer that serves queries from web browsers — this is a role a server plays.

But just one server is not enough to ensure that a website is available all the time. This means you need to add several servers in case one of them experiences an outage, as well as load balancers to redirect traffic to the right machine.

Websites use and produce data that needs to be conserved, even if the server that makes the website available on the internet goes down. To guarantee security for this data, we externalise storage within two distinct technical building blocks — file servers, and database servers.

These specialised servers have dedicated hardware and software, which offer optimal conservation for data and simplified backup strategies.

Here is a diagram representing the organisation of our web hosting clusters:

A vast architecture like this requires heavy financial investment. In fact, websites are not using constantly their allocated resources. With several websites hosted on this single infrastructure, the resources can be shared, and costs can be divided. This is the principle behind shared hosting.

At OVHcloud, we also offer hosting plans with guaranteed resources — Performance hosting plans. This means that instead of sharing a server’s resources with other customers, your website is placed on a separate server, and its resources are fully dedicated to you. Our Predictors use their talents here, too!

As you can see, the Predictors are included in the big family of load balancers.

So what sets them apart? They are specific to our web hosting plans, and they use data from our IT system to recognise the domains hosted, as well as the resources allocated to them.

They have four main roles:

Distributing queries depending on our customers offers in different clusters and web servers.
Ensuring that resources are distributed equally to customers using web servers, and redistributing resources automatically if required.
Protecting the infrastructure.
Regulating traffic during incidents, and blocking access to certain servers while the technical team carry out interventions.

Since the subject is so vast, we will detail the first two points in this article, and we’ll cover the other two points in a second article.

To assign the right server for each HTTP request, Predictors use several different criteria.

A fair balance of traffic

To work as load balancers, Predictors analyze all incoming queries in order to determine which web server is best to choose as a target, depending on their domain name.

This process only adds a few milliseconds to the request response time, but adapts Predictors answers with server statuses and website traffic in real-time — in that way we can guarantee optimal service availability.

In nominal operation, websites are reallocated periodically to balance the load for servers, and ensure optimal performance for customers.

Reallocation depending on the hosting plan

Predictors determine which hosting plan is linked to each incoming request. This enables them to redirect traffic to the corresponding web farm.

For example, customers who have opted for Performance offers with guaranteed resources are redirected to a web farm where the resources are dedicated to each website.

Some hosting offers like Cloud Web are grouped on dedicated clusters, which makes them easier to maintain.

With this system, we can also manage situations where the service is being misused. Although they may not do so knowingly, some customers use a high volume of resources on our infrastructure, and this can negatively impact performance for other websites hosted on the same server. To avoid this, a hosting plan can be temporarily redirected to a server cluster that is dedicated to managing these situations on a ‘best effort’ basis. When this happens, the customer is contacted accordingly.

We will discuss this in more detail in the next article.

Cache optimization

On our shared infrastructure, the data stored on hosting plans is grouped onto file servers that support multiple customers.

The web servers connect to these file servers using the NFS(Network File System) protocol.

This protocol is known for being robust and flexible, and enables us to share files across the network. Each time a read or write action is performed on the storage space dedicated to a website, data passes through the network before being read or written on the remote storage hardware. The use of this protocol — associated with the remote storage block — is a key factor that makes both the Predictor and the infrastructure resilient against website outages. The website data is instantly available on other web servers, so the website’s traffic can be redirected seamlessly from one server to another.

The simplest way of distributing queries across a web server cluster is to send the same number of queries to each server. But there are smarter ways of going even further to optimize resource usage!

To reduce queries to the storage server and speed up the website, we use a cache layer directly on the web servers to manage the file system.

This means that when a web user visits a website for the first time (whether they are using a CMS like WordPress, Joomla!, etc., or a website coded from scratch), the web server will read the website data on the file server, and store it in-memory via its VFS (Virtual File System) cache. This cache offers an abstraction of the underlying storage system, enabling the web server to use remote storage via NFS protocol — the same way as a local hard drive. For future queries, the HTTP server will avoid querying the network for this file.

But as Phil Karlton said, there are only two hard things in computer science: cache invalidation and naming things (https://quotesondesign.com/phil-karlton/). Because with each write operation, the cache needs to be updated — which generates traffic on the network.

To overcome this, the Predictors keep website visitors on the same server — which means the cache doesn’t have to be refreshed on other servers that are only used in the event of an incident, or rebalancing operation.

By assigning a single server to each hosting plan, we can significantly reduce the volume of network queries, which is a strong argument for getting good value for money on an infrastructure.

Predictors can also divide incoming traffic according to certain criteria (e.g. the solution, the source IP address), and balance the load for a single website across several web servers, while getting the optimal performance delivered by this cache optimization — but this is another story, which we’ll focus on in another article.

Monitoring

The Predictors also monitor the web servers constantly by retrieving their statuses. And they don’t just check the machine’s availability (does it ping? Is Apache responding?) — they also verify many other probes that are more specific to our hosting platform, and help us guarantee that websites are working properly.

If a web server becomes unavailable, the Predictors no longer redirects traffic to it, and the requests that would have been sent there are redirected to a working server. This significantly reduces the impact on the customer’s side.

This is when the self-healing mechanism described in another of our articles comes in: Selfheal at Webhosting – The external part

It takes over to repair the web server, before the Predictors re-integrates it in to the cluster, and HTTP requests can be sent on it again.

And our final challenge? Resolving how SSL certificates are issued.

Since 2016, we have delivered all of our web hosting plans with SSL certificates generated by our partner, Let’s Encrypt.

Before we deliver certificates, Let’s Encrypt needs to verify that the requester is legitimate. To verify this, Let’s Encrypt provides ‘challenges’ that can only be resolved by an infrastructure responding behind the domain name of the SSL certificate requester. This verification is carried out via a HTTP request to a single URL dependent on the domain, and is generated when the certificate request is launched by our infrastructure following a customer’s request.

On our infrastructure, the Predictors play a vital role in resolving these challenges! Placed on the critical path upstream from web servers, they can receive the Let’s Encrypt request, and process it without sending queries to the web servers.

This means we can generate SSL certificates with total transparency for our users, and simplify HTTPS access to websites!

And is that everything?

As you will have gathered from this article, Predictors are essential components of the OVHcloud shared hosting infrastructure.

They help us efficiently provide the features we offer, so that we can deliver web hosting solutions at the best price.

Like superheroes, they have more than one trick up their sleeve. And above all, they’re here to protect us!

In our next article, we will show you the benefits of Predictors in terms of security and stability.

Alexandre Kalatzis

Devops at OVHcloud | + posts

Devops in the Webhosting Deploy team