OVHcloud: Microcode management at scale

Microphotograph of the Motorola 68000 microprocessor die. Image courtesy of Wikimedia Commons, licensed under CC BY 3.0.
No changes were made.

Table of content:

What is microcode?

As Gordon Moore predicted in 1965, in what is now known as Moore’s Law, the number of transistors in an integrated circuit would double every two years. This exponential growth has driven the design of smaller, faster, and more efficient transistors, pushing CPUs to unprecedented levels of complexity. As a result, a more flexible way of converting machine code into CPU circuit-level operations is needed. Although MIT pioneered software-controlled circuitry in the 1950s, the widespread use of micro-programming for CPU was delayed for decades by the limited capacity and high cost of storage.

Microcode is an abstraction layer that sits between the CPU’s hardware and machine code (a binary representation of the compiled program). It translates machine instructions (basic steps a computer performs), state machine data, or other inputs into sequences of detailed circuit-level operations. It separates machine instructions from the control signal (i.e., electrical impulses), thereby enabling greater flexibility in designing and modifying instructions. Unlike hardwired instruction decoding, which requires hardware changes for bug fixes, microcode software-based updates provide a simpler, more efficient patching process.
Microcode software is exclusively edited by the hardware manufacturer, and is closely tied to the particular hardware it runs on. This means microcode is typically proprietary software (Intel x86, AMD x86), with the exception of a few open-source hardware CPUs.

What is it used for?

  • Instruction decoding
  • CPU bug fixes
  • Exception handling
  • Power management
  • Complex CPU features

Benefits and drawbacks

BenefitsDrawbacks
✅Easy design, machine code scalability❌New layer – new attack vector and higher risk of microcode bugs
✅CPU manufacturing optimisation: identical hardware across products, with different features (features are enabled or disabled via microcode)❌High machine latency (more clock cycles needed to run a machine code, compared to its equivalent in assembly code) 
✅Easy debugging and testing during production phases❌Except for very few exceptions, microcode is obscure and lacks documentation
✅Microcode updates can be used to correct hardware design flaws, thus reducing post-production and distribution costs

Microcode architecture

Instructions specific to CPU architecture (or macro-instructions) are complex and must be broken down into sequences of simple instructions.

  • Decoder: translates CPU architecture-specific instruction to micro-operation
    • Short Decoder + operation packing: widely used hardware-based decoder that can incorporate multiple short decoders
    • Long Decoder:  hardware-based decoder operating similarly to a more intricate set of instructions
    • Vector Decoder – microcode engine: software-based decoder for rare and very complex instructions
  • Scheduler
    • Micro-ops are reordered and then fed into the pipeline
  • Processing
    • Micro-ops are processed by the corresponding execution unit

Microcode update integration

Microcode uses 2 types of storage:

  • Microcode ROM: for storing the program
  • Microcode RAM: for storing the microcode update

Match registers are used to provide breakpoints in the microcode ROM

When a particular address is reached, it’s rerouted to the microcode RAM containing the update.

Since microcode patches are stored in a low-latency volatile on-chip RAM, it’s fairly limited in size and not persistent. Each time the system boots, it needs to:

  • be in kernel mode (or supervisor mode)
  • load microcode update into RAM
  • write the update’s virtual address to the Microcode Specific Register (MSR)

By design, microcode patches add extra condition checks, which slows down CPU performance. Additionally, patch application isn’t always successful, and in some cases, only provides a partial fix.

Microcode update format

CPU manufacturers distribute microcode updates as bundled microcode files. A bundle microcode file is a concatenation of individual microcode files; each individual file is a single patch for a unique CPUID (the signature of a CPU hardware model within a product). Practically speaking, a microcode bundle contains all the microcode updates for every CPUID from a specific CPU manufacturer, as of a particular date. Each microcode file includes a header (see Intel structure table below and the payload containing the patch).

This payload consists of match registers (breakpoints) and triads (containing the microcode patch).

Triggers represent conditions under which control is transferred from microcode ROM to patch RAM. 

Each triad contains:

  • 3 micro-ops: microcode instruction that runs code
  • Sequence word: for redirecting control flow
B/Bit0313263
0Header typeUpdate revision
8Update release dateCPUID
16ChecksumLoader version
24Platform IDData size
32Total sizeMetadata size
40Minimum versionReserved
Bit-field diagram: Intel individual microcode update file

Microcode update methods

Microcode updates can be applied at one of the following layers:

  • Firmware layer
  • Kernel layer
  • Operating system layer

Regardless of the method used, the impact on CPU performance is consistent for all microcode updates.

Application
Operating system
(APIs, system calls, file management)
Kernel
(Hardware Abstraction, drivers)
Firmware
(BIOS/UEFI)
Hardware
(CPU)

Microcode update methods comparison

There are many ways to update a microcode. Here’s an extensive list of update options:

DescriptionExamplesLayerSourceBenefits/❌Drawbacks
Microcode update available via regular BIOS/UEFI updatesOpenBMC
HP BIOS Update Utility
ASRock Instant Flash
ASUS EZ Flash Supermicro Update Manager
FirmwareMotherboard manufacturer✅Cross-OS compatibility
❌Update process varies depending on BIOS/UEFI editors and versions.
❌Difficult to automate
❌New microcode for firmware updates has a much slower release cycle than OS updates, causing delays in delivery
❌Risk of a failed update damaging the motherboard
❌Requires a system reboot
    Microcode updates can be integrated directly into a custom-built kernelLinux kernel
TinyOS
Minix
KernelCPU manufacturer✅Automation is feasible across different hardware and OSs¹
✅Highly customisable
✅Shortens the time between release and update
❌Requires a high level of technical expertise
❌Requires a significant investment (time/money)
❌Requires a system reboot
    Microcode update available through regular OS updatesPackage manager, like APT on debian-based Linux OSs (or ‘early loading’)
Windows update
OS Editor✅Easy
✅Safe, fewer chances of unpredictable behaviour
❌Update process is OS-specific and limited to OS in use
❌Delayed availability of new microcode packages
❌Requires system reboot
    Manual microcode loadingiucode-tool command on Linux (or ‘late loading’)
Chain command on iPXE
Operating SystemManual download from CPU manufacturer✅Reduces update delays after public release announcements (Intel “public disclosure”)
✅No reboot required² ✅Automation-ready
❌Risk of unexpected errors for certain patches when CPU is running 
❌Requires advanced expertise, prone to human errors (misapplied microcode, inconsistent download)
❌OS-specific update process, not persistent across reboot

____

¹ Assuming the custom kernel is compatible with hardware

² Assuming the running OS is the one applying the microcode update

Microcode update challenges at OVHcloud

While CPU manufacturers strongly recommend installing the latest microcode updates, no contractual obligations compel end customers to install the most recent microcode versions. However, it might be required by internal policy and/or industry certifications and standards.

Microcode updates are usually distributed through system firmware (BIOS/UEFI) or OS patches, in partnership with hardware manufacturers and software editors.

As a server manufacturer, OVHcloud enables its customers to access these microcode updates.

Automating a diverse range of BIOS/UEFI editors and versions is a challenge for the company given its broad hardware range, strategic factors, and a geographically diverse supply chain. As a result, it doesn’t update microcode at the firmware layer.

Moreover, OVHcloud maintenance is limited to hardware and firmware layers, and doesn’t include accessing the operating system on the customer’s bare-metal server. So updating the microcode through an OS-dependent method is ruled out as an option.

OVHcloud boot process

As previously mentioned, the wide range of hardware motherboards makes it impossible to automate the boot mode switch from BIOS/UEFI. This is why OVHcloud uses a temporary, minimalist, in-memory OS for the boot process. One major advantage of this open-source boot firmware is its powerful scripting, which extends the capabilities of traditional Preboot Execution Environment (PXE) without the need for BIOS/UEFI reflashing. The boot process unfolds in the following steps:

Depending on whether the customer chooses to boot from disk, a rescue environment, or a custom iPXE script, the corresponding iPXE script will run.

If a custom iPXE script isn’t chosen, the system downloads the latest validated AMD or Intel (CPU-specific) microcode bundle and signature from OVHcloud. Once the signature is validated, the microcode patches are distributed across all the CPU’s cores. The system then boots from a local disk or a remote image.

If the customer decides to boot using an iPXE script, the API will generate the iPXE script as per the customer’s specifications, with no changes made. This means that without a custom iPXE script specifying otherwise, customers won’t receive microcode updates.

OVHcloud microcode validation

Whenever a CPU manufacturer releases a new microcode, OVHcloud repackages the microcode bundles (separate AMD and Intel versions). As expected, the cloud provider wants the ability to skip microcode patches for CPUs not in its infrastructure or those that have created issues on some of its CPUs problems. OVHcloud also enables the signing of bundles to ensure data integrity. Once the bundle microcode file is published to the file server, automated testing begins on dedicated servers, one for each affected CPU and platform.

The following actions are carried out on each testing server:

  • server reboot from rescue with the microcode patch(s) to validate
  • CPU microcode version check to validate iPXE script from the rescue boot
  • disk(s) erasure to remove artefacts from previous OS re-installs and install a minimalist Linux distribution
  • server reboot from disk and proper boot check
  • CPU microcode version check to validate microcode update for boot from disk

Once validated, the new URL to the bundle microcode file is replaced in all the rescue and iPXE boot scripts; human intervention is required to manually change the URL. Depending on the severity, a targeted email can be sent to the affected customers. The new microcode patch only takes effect after the next server hardware reboot³. OVHcloud never reboots a dedicated server without the customer’s consent.

____

³ A (hard) reboot from the OVHcloud Control Panel or API is needed. Soft reboot (reboot command from the OS) doesn’t apply the microcode patch.

Sources

Jean-Baptiste Delon
OVHcloud | Website | + posts

Joining OVHcloud in 2020, he serves as a Full Stack Developer within the baremetal-system team. He has made significant contributions to enhancing the OS installation process for baremetal servers, ensuring seamless integration with various interfaces, including OVHcloud / So you Start / Kimsufi Control Panels and API. He specializes in Linux and open-source projects.