Last month, OVH and Binder team partnered together in order to support the growth of the BinderHub ecosystem around the world.
With approximately 100,000 weekly users of the mybinder.org public deployment and 3,000 unique git repositories hosting Binder badges, the need for more resources and computing time was felt.
Today, we are thrilled to announce that OVH is now part of the world-wide federation of BinderHubs powering mybinder.org. All traffic to mybinder.org is now split between two BinderHubs – one run by the Binder team, and another run on OVH infrastructure.
So for those who don’t already know mybinder.org, here’s a summary.
What is Jupyter?
Jupyter is an awesome open-source project that allows users to create, visualise and edit interactive notebooks. It supports a lot of popular programming languages such as Python, R and Scala as well as some presentation standards such as markdown, code snippet, charts visualisation…
Example of a local Jupyter Notebook reading a notebook inside the OVH GitHub repository prescience client.
The main use case is the ability to share your work with tons of people, who can try, use and edit the work directly from their web browser.
Many researchers and professors are now able to work remotely on the same projects, without any infrastructure or environment issues. It’s a major improvement for communities.
Here is for example a notebook (Github project) allowing you to use Machine Learning, from dataset ingestion to classification:
Example of a Machine Learning Jupyter Notebook
What is JupyterHub?
JupyterHub is an even more awesome open-source project bringing the multi-user feature for Jupyter notebooks. With several pluggable authentication mechanisms (ex: PAM, OAuth), it allows Jupyter notebooks to be spawned on the fly from a centralised infrastructure. Users can then easily share their notebooks and access rights with each other. That makes JupyterHub perfect for companies, classrooms and research labs.
What is BinderHub?
Finally, BinderHub is the cherry on the cake: it allows users to turn any Git repository (such as GitHub) into a collection of interactive Jupyter notebooks with only one click.
Landing page of the binder project
The Binder instance deployed by OVH can be accessed here.
- Just choose a publicly accessible git repository (better if it already contains some Jupyter notebooks).
- Copy the URL of a chosen repository into the correct binder field.
- Click the launch button.
- If it is the first time that binder sees the repository you provide, you will see compilation logs appear. Your repository is being analysed and prepared for the start of a related Jupyter notebook.
- Once the compilation is complete you will be automatically redirected to your dedicated instance.
- You can then start interacting and hacking inside the notebook.
- On the initial binder page you will see a link to share your repository with others.
How does it work?
Tools used by BinderHub
BinderHub connects several services together to provide on-the-fly creation and registry of Docker images. It uses the following tools:
- A cloud provider such as OVH.
- Kubernetes to manage resources on the cloud
- Helm to configure and control Kubernetes.
- Docker to use containers that standardise computing environments.
- A BinderHub UI that users can access to specify Git repos they want built.
- BinderHub to generate Docker images using the URL of a Git repository.
- A Docker registry that hosts container images.
- JupyterHub to deploy temporary containers for users.
What happens when a user clicks a Binder link?
After a user clicks a Binder link, the following chain of events happens:
- BinderHub resolves the link to the repository.
- BinderHub determines whether a Docker image already exists for the repository at the latest reference (git commit hash, branch, or tag).
- If the image doesn’t exist, BinderHub creates a build pod that uses repo2docker to:
- Fetch the repository associated with the link.
- Build a Docker container image containing the environment specified in configuration files in the repository.
- Push that image to a Docker registry, and send the registry information to the BinderHub for future reference.
- BinderHub sends the Docker image registry to JupyterHub.
- JupyterHub creates a Kubernetes pod for the user that serves the built Docker image for the repository.
- JupyterHub monitors the user’s pod for activity, and destroys it after a short period of inactivity.
A diagram of the BinderHub architecture
How we deployed it?
Powered by OVH Kubernetes
One great thing about the Binder project is that it is completely cloud agnostic, you just need a Kubernetes cluster to deploy on.
Kubernetes is one of the best choices to make when it comes to scalability on a micro-services architecture stack. The managed Kubernetes solution is powered by OVH’s Public Cloud instances. With OVH Load Balancers and integrated additional disks, you can host all types of workloads, with total reversibility.
To this end, we used 2 services in the OVH Public Cloud:
- A Kubernetes Cluster today consuming 6 nodes of
C2-15
VM instances (it will grow in the future) - A Docker Registry
We also ordered a specific domain name so that our binder stack could be publicly accessible from anywhere.
Installation of HELM on our new cluster
Once the automatic installation of our Kubernetes cluster was complete we downloaded the administration YAML file allowing us to manage our cluster and to launch kubectl
commands on it.
kubectl
is the official and most popular tool used to administrate Kubernetes cluster. More information about how to install it can be found here.
The automatic deployment of the full Binder stack is already prepared in the form of Helm package. Helm is a package manager for kubernetes and it needs a client part (helm
) and a server part (tiller
) to work.
All information about installing helm
and tille
can be found here.
Configuration of our Helm deployment
With tiller
installed on our cluster, everything was ready to automate the deployment of binder in our OVH infrastructure.
The configuration of the helm
deployment is pretty straightforward and all the steps have been described by the Binder team here.
Integration into the binderhub CD/CI process
The binder team
already had a travis workflow existing for the automation of their test and deployment processes. Everything is transparent and they expose all their configurations (except secrets) on their GitHub project. We just had to integrate with their current workflow and push our specific configuration on their repository.
We then waited for their next launch of their Travis workflow and it worked.
From this moment onward, the ovh stack for binder was running and accessible by anyone from everywhere at this adress: https://ovh.mybinder.org/.
What comes next?
OVH will continue engaging with the data open-source community, and keep building a strong relationship with the Jupyter foundation and more generally the python community.
This first collaborative experience with such a data-driven open-source organisation helped us to establish the best team organisation and management to ensure that both OVH and the community achieve their goals in the best way possible
Working with open source is very different from the industry as it requires a different mindset: very human-centric, where everyone has different objectives, priorities, timeline and points of view that should all be considered.
Special Thanks
We are grateful to the Binder, Jupyter, and QuantStack team for their help, the OVH K8s team for the OVH Managed Kubernetes and OVH Managed Private Registry, and the OVH MLS team for the support. You rock, people!
DevOps AI/ML