Transforming networks into business enablers with hyper resilience

Why trust in networks is essential for all organisations

Businesspeople often lose sleep over network outages. Downtime caused by unplanned outages results in the bottom line taking a hit and can cost huge sums of money. According to Gartner, the total average cost per minute of unplanned downtime runs at about $5,600.

Network resilience is becoming increasingly important for mission-critical operations as well as the increasing use of cloud services by all types of business. Data availability is becoming of key importance for everyone: from those running corporate IT departments in the cloud, to websites providing government services — even down to healthcare organisations performing surgery using extended reality (XR).

Transforming networks into business enablers with hyper resilience

Because of the growing dependence on network performance, we have instituted a five-year plan to move towards the delivery of a hyper-resilient network and reduce the risk of network threats that can cause business operations to screech to a halt. Because these threats come from all angles, including from human error, weather conditions and fibre cuts, as well as susceptibility to bugs and cyberattacks — we believe hyper-resilience is the path all network providers need to follow.

We are now three years into our plan and have been working hard to completely re-engineer every part of our network, from the optical and IP layers to the routes into our global data centres. The thinking behind this is to increase the trust in the performance of our network by increasing both reach and speed.

Using continuous improvement to build hyper-resilience

Our route to hyper-resilience is based on a five-pillar approach to improve the performance of all aspects of the network, including event management, incident management, change management and problem management. Broken down, these pillars support the overlay, changing the way customers consume network services and the underlay, which allows us to automatically configure each of the components that our customers use.

This is all underpinned by a philosophy of continuous improvement so we can make accurate assessments on the number of risks introduced and find effective ways to manage them.

Reducing risk means not introducing problems — something that often happens with human intervention. To facilitate centralised management of the network and to make it more flexible, we have introduced a software-defined networking approach. This allows for enhanced control and uses automation across both physical and virtual network environments.

To back this up, there are multiple routes taking fibre connections into our data centres to further reduce the risk of downtime. When fibre optic cables were intentionally sabotaged earlier this year, several cities in France experienced internet slowdowns and outages. With our network having multiple paths for data traffic, we were able to re-route traffic, so our customers didn’t notice.

A proactive and predictive approach to managing risk

Our risk management methodology is both proactive and predictive and is designed to grow and evolve — nothing ever remains static in the evolution of networks. For problem management, we can now solve any issues early on, using a data driven approach. This data shows us any recurring patterns that indicate possible points of failure. Rather than let the problem grow, it allows us to solve any issue immediately. We have also increased the amount of network filtering to reduce the risk of any kind of malicious activity. These preventative measures mean there is less risk of a bigger incident later on down the line — and we can ensure that there is as little impact on the customer as possible.

One of our principal goals is to reach 100 per cent automation to overcome not just human error, but also ensure that any failures can be detected and fixed as they occur. With the software we build to manage the network, we’ve introduced automation into code production. This greatly reduces errors and together with robust CI/CD processes, we can iron out any bugs and errors before any code goes into production.

Zero-touch provisioning is another way of reducing and eliminating risk. Our next-generation network can detect new devices and determine what their purpose in the network is, and configure, install and deploy the devices automatically. This level of automation also removes the headache of updating device firmware. Before, this was a time-consuming and disruptive process. Now we can update devices more easily without any network interruption. We’re also supporting this further with software that’s designed to be self-healing to detect, react to and correct issues on the fly.

Leveraging new services for customers

By building a hyper-resilient network, we’re now able to focus on delivering a better service to customers and help them with their specific business needs. It also means we can offer more value-added services that customers can make use of in the cloud, such as private networks, and support any business looking to leverage infrastructure as code (IaS). Not only can we deliver services much faster and more easily, we can ensure they have resilience built in by default.

While we recognise that threats can never be fully eliminated, we aim to make our own network as resilient as possible and see this journey as an ever-continuing one. Ultimately, it’s all about ensuring that customers can rely on their networks, generate value and increase their agility. With networks moving beyond being a commodity to being a valuable business enabler, it’s vital that hyper-resilient networks become the mantra for all.

Yaniv Fdida

Chief Product Officer at OVHcloud | + posts

Romain Guillaume

Head of Network Unit at OVHcloud | + posts

Xavier Martins Rivas

Director of Cloud Networks at OVHcloud | + posts