Big Data

From 4 days to 15 minutes, a Domain Big Data story

From 4 days to 15 minutes, a Domain Big Data story

In our Domain names 101 series, we explained that one ICANN‘s missions is regulating and providing norms for gTLDs domain names (.com, .net, .info …). One of these norms is the Registrar Data Escrow program, also known as RDE. To put it simply, registrars have to regularly gather and export some data to a third-party […]

From 4 days to 15 minutes, a Domain Big Data story Read More »

Why are you still managing your data processing clusters?

Why are you still managing your data processing clusters?

Cluster computing is used to share a computation load among a group of computers. This achieves a higher level of performance and scalability.    Apache Spark is an open-source, distributed and cluster-computing framework, that is much faster than the previous one (Hadoop MapReduce). This is thanks to features like in-memory processing and lazy evaluation. Apache Spark

Why are you still managing your data processing clusters? Read More »

Do you need to process your data? Try the new OVHcloud Data Processing cloud service!

Do you need to process your data? Try the new OVHcloud Data Processing service!

One of the data services of OVHcloud is called OVHcloud Data Processing (ODP). It is a service that allows you to submit a processing job without caring about the cluster behind it. You just have to specify the ressources you want to use for your job, and the service will abstract the cluster creation, and destroy it for you as soon as your job is finished. In other words, you don’t have to think about clusters any more. Decide how much resources you need to process your data in the most efficient way for you and let OVHcloud Data Processing do the rest.

Do you need to process your data? Try the new OVHcloud Data Processing service! Read More »

Contributing to Apache HBase: custom data balancing

Contributing to Apache HBase: custom data balancing

In today’s blogpost, we’re going to take a look at our upstream contribution to Apache HBase’s stochastic load balancer, based on our experience of running HBase clusters to support OVHcloud’s monitoring. The context Have you ever wondered how: we generate the graphs for your OVHcloud server or web hosting package? our internal teams monitor their

Contributing to Apache HBase: custom data balancing Read More »

Apache Spark & OVH Analytics Data Compute

How to run massive data operations faster than ever, powered by Apache Spark and OVH Analytics Data Compute

If you’re reading this blog for the first time, welcome to the ongoing data revolution! Just after the industrial revolution came what we call the digital revolution, with millions of people and objects accessing a world wide network – the internet – all of them creating new content, new data. Let’s think about ourselves… We

How to run massive data operations faster than ever, powered by Apache Spark and OVH Analytics Data Compute Read More »

CDS

Understanding CI/CD for Big Data and Machine Learning

This week, the OVH Integration and Continuous Deployment team was invited to the DataBuzzWord podcast. Together, we explored the topic of continuous deployment in the context of machine learning and big data. We also discussed continuous deployment for environments like Kubernetes, Docker, OpenStack and VMware VSphere. If you missed it, or would like to review everything that was discussed, you

Understanding CI/CD for Big Data and Machine Learning Read More »