Criteo

OSFC, a unique conference to support open-source firmware initiatives

2024-9-20

Criteo R&D

The 7th edition of the Open Source Firmware Conference took place in Bochum (Germany), gathering 160 engineers for 3 days.

Authors: Vincent Minet, Anisse Astier & Erwan Velu

Open Source is a major topic in improving auditability, security, and debugging of firmware running in servers operated by infrastructure owners. It’s not a surprise to see major companies like Google, Meta, AWS, and hardware vendors like AMD or HPE support this conference.

Criteo is aligned with these actors on the importance of opening the source code and sharing the associated procedures (build, deploy, validate). That philosophy is part of our hardware manifesto, released one year ago, which exposes our key principles for operating bare-metal servers efficiently.

Naturally, three members of Criteo’s hardware team attended this conference to share their vision around these topics and meet members of this large community.

This article highlights a couple of presentations that resonated the most with us.

The following pictures are courtesy of Anisse Astier(c).

AMD’s Long-Term Strategy for Open Source Firmware: From Concept to Implementation.

Paul Grimes (Principal Member of Technical Staff — Firmware Engineer at AMD)

Paul Grimes presented a detailed roadmap of the transition from the actual AGESA to openSIL. AGESA, the early-stage Silicon initialization code, has always been closed-source based on the design choice. In 2023, AMD released the first version of openSIL, an open-source silicon initialization C library to be included in host firmware like coreboot.

It started with a Proof-of-Concept on Genoa (AMD Zen 4 EPYC), with only a few AGESA components, minimal testing, and only booting Linux and Windows. Now, with Turin (AMD’s next generation CPU based on the Zen5 architecture), openSIL is much closer to feature parity with AGESA based firmware. It should be released a quarter after the Turin launch, by EOY 2024.

The 6th gen Epyc openSIL is already being worked on under NDA with partners and should be released likewise as open source a quarter after launch (at the latest), but it is not yet production-ready.

According to Paul, the goal of delaying release until after launch is to keep a competitive advantage.

The Genoa PoC was a success, and was initially going to be kept private, but was released after community discussions. It’s available at GitHub — openSIL/openSIL

On Genoa, openSIL is not yet considered production-worthy, it is available for evaluation only. Paul insisted on this specific point: “PoC projects are not reflective of long-term plans for release, maintenance, community engagement, etc.”

AMD openSIL — Development Stages

AMD is also working on open-source edk2 (UEFI firmware) support for Turin. Paul and his colleagues are now maintainers of the AMD parts of the Tianocore project. By the end of the year, production-quality edk2 code should be upstreamed for Turin and Genoa (SP5 socket).

Paul continued by detailing the long-term plan to transition from AGESA to openSIL. The overall plan is not to replace AGESA with openSIL at once but more likely to replace more and more AGESA parts with openSIL up to a complete replacement. Paul explained that this process will take time but they are committed to doing it. openSIL will continue to grow and consume the remaining AMD IP.

For 6th Gen AMD EPYC & Ryzen, Agesa-v10 will still be needed but will reach its end of life and will be accompanied by moving ALL UEFI-based AMD code into amd-edk2* and upstream to Tianocore/edk2* in future generations.

The last of the talk was a confirmation of AMD’s strategy towards the open-source.

AMD wants to finalize the community engagement model and encourage community and customer contributions. Paul confirmed they will strengthen their open-source presence by contributing to Tianocore and concluded that coreboot support is a matter of time.

AMD transitioned from a closed-source model to an open one and considering the legal and technical difficulties in making all this happen, that’s pretty impressive. It’s also important to note that AMD is embracing the open-source model the right way by being part of the edk2 project (they are maintaining the AMD tree), by releasing the full commit history for their projects and being open to contributions. They understand that creating a thriving open-source community is more work than just doing a code dump.

We, Criteo, publicly stated during this talk this is the model we are expecting from our suppliers and we are eager to test it on our platforms. They make a great move and set the bar pretty high in the industry of what doing Open-Source firmware means. Having a clear and engaged open source strategy on their products puts AMD in a unique position when considering having new products in an infrastructure.

What CSP Servers Need from Open Source Firmware Solutions.

Jiming Sun (Sr. Software Manager at Amazon Web Services)

After an introduction to his career path, Jiming explains that as a Cloud Services Provider (CSP), AWS has been pushing vendors for firmware modularization.

His motto for hardware vendors was “Stay in your sandbox, please”. For example, only modify code for Silicon initialization in the silicon initialization blob, not in the bios itself. Hardware vendors should only modify their own UEFI DXE drivers, not touch other vendors’ code.

“Stay In Your Own Sandbox, Please”

Jiming stated that hardware vendors should prevent themself from doing complex software suites and integrations that have to be dismantled by users later.

Why hasn’t AWS adopted coreboot yet? Jiming asks. First, AWS needs to move fast and is customer-obsessed. This reduces the amount of risk they can take. Since AWS relies on silicon vendors, they mostly rely on EDK2 as the silicon enablement platform. In general, coreboot is 1 or 2 generations behind on mainstream platform support, the feature set AWS needs is lacking in coreboot: RAS, PCI segment (now upstream), long-mode (still a work in progress), and CSP-specific boot device selection.

Payload and LinuxBoot concepts are great but could be improved as well according to Jiming.

But even then, Jiming is hopeful, since CSP servers are becoming closer to embedded systems than “traditional servers”. He explains that servers have a fixed configuration and none of the hot-plugging complex features that exist are useful in production.

Jiming mentioned that Coreboot’s smaller footprint is seen as an advantage for analysis and attack surface reduction, as for the debugging experience (static and small) and fast build times.

Jiming is asking Silicon Vendors to encapsulate all silicon init code (and nothing more) in their blobs (Intel FSP, AMD Agesa, and OpenSIL), and properly test it with both coreboot and edk2. All generations should be supported in the same branch as well to ease the integration.

Silicon vendors should not hide locked-down firmware, but give it out, to sell more chips, Jiming says (with much applause in the room).

That’s one of the talks that resonated a lot with us. We also feel that hardware vendors tend to invent solutions to issues they don’t understand the same way as {hyper}scalers do. Their solutions usually do not scale well, require user interactions, and offer old-fashioned interfaces (GUI, unparseable CLI etc..). What is needed to manage and orchestrate large infrastructure is just a library or an API so we can build our custom tools and integrate them into our existing stacks. The lack of an SRE approach in the development of such tools put them in the situation well described by Jiming: we are spending more time dismantling or working around their tooling than building robust bricks scaling to 10 or 100s of thousands of machines.

Building flashless firmware infrastructure

Jean-Marie Verdun is known as the one pushing to get open hardware at HPE. In 2022, he presented Open Source Firmware strategy at HPE. This talk can be seen as a follow-up and presents how servers can be booted using a flashless approach.

Jean-Marie was doing a bit of history and reminded us that firmware is still static with aging concepts. UEFI is 25 years old and even if BMCs (Baseband Management Controllers, used to remotely control a server) have improved thanks to OpenBMC, they are still monolithic.

He wondered if we could address firmware like a regular OS instead. ROM has limited space and does not necessarily make sense. All that leads to a new approach to the classic BMC bootflow, where it boots from the network instead of from Flash itself.

To go in this direction, Jean-Marie and his team built a proof of concept in a rack, initially with a file server to host the BMC images. HPE BMCs run on HPE’s in-house SoC, named “GXP”; they can run OpenBMC in production since Gen11.

His approach is having the BMC net booting over Ethernet, setting up an iSCSI target to get a block device, and finally a writeable root filesystem.

The BMC Linux OS has a package manager (DNF and RPM) and can be treated as a regular operating system. This has the advantage of allowing more flexibility to manage upgrade and configuration changes.

The network switch image build scripts will be published soon™, Jean-Marie says, as he had the go from HPE.

The GXP development board demonstrated at OSFC’24

This presentation was very interesting as it shows a path where hardware vendors, when implementing open firmware on their machines, offer new use cases and innovation. Netbooting BMC is a great improvement in terms of security and velocity. Having regular updates available in minutes, as there is no need to flash a low-speed device, becomes a reality. We are very fond of this new open-source approach around the BMC. Even more, we participated in the initial bootstrapping of the OpenBMC POC on Proliant servers and contributed the trick to perform the first BMC netboot on this hardware. This is a cooperative model we like and support: infrastructure owners must participate in the definition of these technologies so we can share our vision and expertise to get the most out of our servers.

Prem’Day: feedback on the first infra on-prem conference and creation of a user group.

Erwan Velu (Hardware Engineer at Criteo)

Prem’Day is a one-day conference, the first edition was in May 2024, made by infrastructure users, and sponsored by hardware vendors. The concept is about sharing infrastructure stories from an engineering perspective aka “what it means operating on-premise servers on a daily basis”.

Criteo, Qarnot, Moji, Scaleway, and i3d.net shared various topics such as automatic server validation, reusing heat in an HPC context, how measuring performance, immersive cooling performance and caveats, and feedback on operating a large and diverse on-prem fleet of servers. The day was completed with a round table discussing the evolution of server’ densification in the next two years. You can watch all these talks here.

Amongst the different types of businesses operated by these actors, when speaking about operations, a major topic was emerging: We all need the firmware ecosystem to improve. Since this talk in 2022, the situation hasn’t evolved much, and infrastructure owners are still in an isolated position, reducing the impact on the firmware ecosystem.

In addition to the conference, a user group is under creation and will be used as a platform for three main objectives:

  • ease the communication between the infrastructure owners
  • a single point of contact for hardware vendors
  • a reach-out zone with open-source communities

If you feel concerned by this topic, reach the project on #premday on OSFW slack channel.

If you have any interest in Firmware, this conference has to be on your schedule and for sure is on ours. It’s a great place to meet this ecosystem including developers and vendors. This blog article is a limited subset of all the great talks that happened there, we picked a few ones that matched the most our vision of what should be an open infrastructure. All videos and slides should be available soon on the conference website.

Big hats off to the organizers for making this 7th edition a very cool one: we learned plenty of new things, and met great people. Clearly on our schedule for 2025.


OSFC, a unique conference to support open-source firmware initiatives was originally published in Criteo Tech Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.