OVHcloud develops a reversible, and open Cloud where interoperability is key thanks to open-source. This approach participates in offering a transparent Cloud with no vendor lock-in aligning with our belief of being stronger together and thus going further. As an actor of the open-source community with several software releases benefiting from an open-source license, we are publishing code through GitHub while listening to feedback on our various channels.
Throughout the year, there are other ways we help the community with grants to access our infrastructure for testing or evaluation purposes. But that’s not all. Deploying at an unprecedented scale a number of open-source solutions we might find ourselves in the unique position to help the community through development and patching efforts. This recently happened within our storage team, and we thought sharing our experience would be worth a read.
OpenZFS and FreeBSD at OVHcloud
Among the Storage Product Unit, our mission is to deliver innovative file storage services based on different hardware and software stacks. We take the time to test and validate new stacks of technologies to deliver high performance and high availability storage with the utmost care for data protection all the while keeping costs reasonable. With a complete portfolio of storage solutions, we use different set of software foundations for different storage access modes including OpenZFS for file storage.
Based on our comprehensive testing, we chose FreeBSD for some of our offers like NAS-HA or Datastore NFS. One of the reasons is that FreeBSD is managed as a complete operating system with OpenZFS being a first-class citizen and natively integrated. It benefits from many years of experience across many teams ensuring quality and security.
FreeBSD’s release management goes through multiple steps from idea inception to public releases:
- Technical reviews by peers,
- Current branch to test,
- Stable branch for wider user base testing,
- Release candidate test,
- Normal release,
At the same time, patches for software are released to fix vulnerabilities and bugs.
The ports collection is well-designed and simple. While FreeBSD has binary packages, which are handled by the pkg package manager, it also has the ability to compile software from source, allowing user to select desired options of compilation.
FreeBSD also provides tools like poudriere. Poudriere is a utility for creating and testing FreeBSD packages. It makes it easy for users to build and set up their own binary package repository in which packages are built with their own options.
FreeBSD has over five hundred system variables that can be read and set using the sysctl utility. These system variables are used to apply some changes to a running FreeBSD system. This includes many advanced options of the TCP/IP stack and virtual memory system that can improve performance.
Our goal today is not to come up with an exhaustive list of technical advantages on why to use FreeBSD. It would probably require one to many full blog posts. Keep in mind jail, pf, linux binary compatibility and so on…
As we use FreeBSD, we are convinced that ZFS is a high-performance file system with replication, compression, encryption, and snapshots. If you want more details, our very own Frédéric Zind said it all during his tech talk at Very Tech Trip 2023 : 🇬🇧 / 🇫🇷
To illustrate what we are doing with OpenZFS and FreeBSD, let’s take an example with NAS-HA. This product is a file storage service (active/passive cluster illustrated below) built upon 2 nodes and a ZFS-based filesystem shared with NFS and/or CIFS. NAS-HA is a good example of how we integrated both FreeBSD and OpenZFS to build an open and secure storage service dedicated to versatile workloads (centralized storage for private or public cloud instances, bare metal servers…). With NAS-HA we are talking about several thousands of servers holding Petabytes of data.
To demonstrate how OVHcloud participates in open-source communities, we are sharing what we came to encounter in 2022 and how we fixed it.
Through our constant monitoring system, our operate team observed a condition where some filers had no more allocated memory due to a lack of free vnodes. As a reboot was necessary to unhang filers, the operate and engineering teams investigated the root cause. They found several leads but unfortunately couldn’t reproduce the incident. To improve the situation, the following was done:
- Double the base reservation of memory for OS (vfs.zfs.arc_max). Since our FreeBSD servers are booted into an in-memory OS without swap, we have to be careful that everything has sufficient resources and do not fight ZFS for memory,
- Create a buffer memory zone (vfs.zfs.arc.sys_free),
- Change the speed of release memory with no effect,
- Disable periodic checks because default checks were consuming vnodes (a vnode is a structure representing a filesystem entity like a file, directory, device node) and were unneeded for our use cases,
Even after applying these changes, some crashes were still happening. Then, we noticed that the incident seemed to be related to a non-holded snapshot during a restoration. There was a snapshot rotation while the snapshot was used during restoration, and it caused a deadlock. At this time, we still had no repro.
What we found:
- Huge spike in vnode usage,
- Spikes seemed to be correlated to the rotation and restoration of our backups (available through the .zfs directory on our NFS shares),
What we managed to:
- Limit the interaction of the .zfs directory to our essential operations,
- Manually set vnode limit to 10 million (via sysctl),
As a result, parallel rsync copies successfully finished. No more problems were observed during our quality assurance period (5 days), so we reached out to concerned communities.
Random servers were still experiencing incidents. The community started to report similar behaviors. In parallel, we noticed that if an NFS client browses the .zfs directory exposed by our servers all ZFS operations on snapshots would end up in a deadlock state afterwards. This happened only on OpenZFS 2.x (FreeBSD 13.x) and only a reboot would unfreeze the server. Problem reports have been escalated to OpenZFS (here) and Freebsd (here). FreeBSD’s users confirmed our testing with a procedure to reproduce the incident.
Thanks to OVHcloud’s FreeBSD developers, we were quickly pointed at a FreeBSD developer, Mark, in touch with ZFS matters and we let him know what we have done, found, and tested.
Leveraging the community’s feedback, Mark suggested patches and our engineering team tested each one of them. Communication was smooth and the root cause was finally identified: a Linux-specific race condition. Technically speaking the issue was described the following way: “zfsctl_snapdir_fid() sets fid_gen to 0 or 1 depending on whether the snapshot directory is mounted. On FreeBSD it fails, making snapshot dirs inaccessible via NFS.” (See commit on OpenZFS’s github)The patch was committed and merged in FreeBSD and OpenZFS’s repositories.
Why didn’t we catch this bug during our preprod tests? Because of mathematics. With fewer servers, there is lower probability to have concurrent access to .zfs directory. This test has since then been added to our test book.
As a cloud actor, we are committed to open-source and many people at OVHcloud spend incredible amount of time to reproduce bugs, read code, test, and ultimately submit patches. Some of us even develop open-source projects (CDS, bastion by example are shining great examples) and/or during their free time.
At the Storage Product Unit, we are first and foremost geeks that love FreeBSD and OpenZFS. And well managing more than 10k servers daily is quite the experience.
Most importantly, we want to give back to open-source communities like Linux Foundation through the Ceph foundation, Cloud Native Computing Foundation, Python community, PostgreSQL community, FreeBSD, OpenZFS and more. If you want to learn more, you can check our open innovation program as well as OVHcloud GitHub repo.