Reproducible Builds Summit, 5th edition

For several years, the Reproducible Builds Summit has become this pleasant and fruitful retreat where we Guix hackers like to go and share, brainstorm, and hack with people from free software projects and companies who share this interest in reproducible builds and related issues. This year, several of us had the chance to be in Marrakesh for the fifth Reproducible Builds Summit, which was attended by about thirty people.

Reproducible Builds logo

This blog post summarizes takeaways from the different sessions we attended, and introduces some of the cool hacks that came to life on the roof top of the lovely riad that was home to the summit.

Java

Java is a notoriously difficult topic, as far as bootstrapping and reproducibility go. For instance, Gradle is now the most common tool for building Java code, and in particular Android apps. However, the current way of building Gradle is to use Gradle and a build script written in Kotlin. The Kotlin project, in turn, is also built using Gradle and a build script written in Kotlin. So we end up with the most cyclic graph one can imagine with two packages: a circle between the two, and two additional loops from the packages to themselves. However, the Kotlin dependency of Gradle was introduced less than two years ago, so there is some hope we can disentangle the bootstrapping mess...

Andreas took part in a session on bootstrapping the Android toolchain, with a very vague hope of getting more than adb, fastboot and a few more utilities into Guix. The task looks daunting, since the sourcecode is spread over a large number of git repositories with gigabytes of data, and the idea of modular builds apparently has not influenced the design decisions. But all is not lost, Sylvain from Android Rebuilds has done a lot of work to disentangle the sources, and we could also look for inspiration from the Replicant project. Interestingly, the Android NDK, which provides a foreign function interface to C libraries, appears to be an easier target.

Another working group, in which none of us took part, evolved around Maven; Hans wrote a short summary of the outcome.

Some discussions have also evolved around F-Droid, the free app store for Android, and the topic of building the apps reproducibly and adding relevant information to the competition.

Verifying and sharing build results

Speaking of which, the website retracing reproducibility feats and issues was also the subject of a cross-distribution discussion round between Debian, Arch, Nix, Guix and OpenWRT. Currently the page is tightly connected to a continuous integration instance rebuilding distributions such as Debian and Arch. We have discussed a file format (probably based on JSON) that would help to separate the process of creating the reproducibility information from collecting, evaluating and displaying it. From a Guix point of view, the idea would be to have the website communicate with an instance of the Guix Data Service.

Additionally, Bernhard started a discussion about a possible new site to easily show for a package, if it builds reproducibly in different distributions, this is mentioned on this post about the summit. This would probably also consume some data about the reproducibility of packages within Guix from the Guix Data Service.

Guix Data Service

This nifty project can serve to collect data from a number of independent Guix build farms (of which we currently have two, the farm behind ci.guix.gnu.org, and the farmlet of one or two machines behind bayfront.guix.gnu.org. Meeting in person was the occasion to update the bayfront configuration to mimic more closely that of ci; in particular, the build farm results are now exported to the web frontend.

We had quite some discussion (so far without conclusion) about the exact boundaries between Cuirass and the Guix Data Service: should the former only be a thin layer on top of the Guix daemon with the latter processing all the data towards a web frontend, or should Cuirass continue to handle its own web page?

While the Guix Data Service is not currently running at data.guix.gnu.org as the server is down for maintenance, lots of progress was made with the code. Information about normalized archives (nars), such as package binaries, that are provided by substitute servers can now be imported and stored in the database, and the ability to fetch and store builds from Cuirass has been improved. This is building towards being able to automatically and continuously track the reproducibility of Guix packages.

Bootstrapping

This year the summit had an official extended format; encouraging participants to attend for a full week by adding coding time around the usual three more structured core days that were facilitated in a lovely productive and high-energy fashion by Gunner and Evelyn of Aspiration Tech.

Even before the core days started, David had packaged GNU Mes for Nix with the aim of creating a Reduced Binary Seed bootstrap for NixOS. As Vagrant managed to get Mes into Debian unstable before the summit, he expressed that we should do something with it. We decided to attempt a cross-distribution Diverse Double Compilation of Mes. Initially, David (Nix), Vagrant (Debian) and janneke (GNU Guix) took up the challenge, soon to be joined by Jelle (Arch). David was the first do do a diffoscope comparison to find that Mes v0.21 actually embeds a store file name. Always nice to see Reproducible meet Bootstrappable ;-) Upstream was easily convinced to write a patch. More news on this real soon!

Ludovic and janneke took the opportunity to take the Guix Scheme-only bootstrap a couple of steps further. In a joint effort the last functional bug was fixed and Ludovic came up with a way to avoid actually adding Gash and Gash Core Utils to the bootstrap binary seeds. The idea of bootstrapping from the current %bootstrap-mes (v0.19) instead of updating to v0.21 presented itself and was implemented by janneke right after the summit.

Andreas was wondering about the use of GCC 2.95.3 in the Guix bootstrap and then worked to create a patch to compile GMP, MPFR, and MPC using TinyCC. That work is helping the effort to remove the intermediate GCC 2.95.3 from the Guix bootstrap and instead target GCC 4.6.4 directly.

All in all a very productive and especially inspiring summit for bootstrapping with more people and projects on board, giving new perspectives to work on... and dream about.

In the last extreme bootstrapping work session, Hannes from MirageOS was inspired to start an initial port of Mes to FreeBSD and gave rise to...

Extreme bootstrapping!

As part of the discussions about bootstrapping, people noted that Guix’ build daemon is usually ignored from bootstrapping considerations, and wondered whether it should be taken into account. In effect, the build daemon emulates builds from scratch, as if one had booted into an empty machine. It does that by creating isolated build environments that contain nothing but the explicitly declared inputs. However, the build daemon is part of the Trusted Computing Base (TCB): like compilers in the “trusting trust” attack, it could inject backdoors into build results. Thus, the question becomes: how can we reduce the TCB by removing guix-daemon from it?

Vagrant came up with this crazy-looking idea: what if we started building things straight from the initrd? That way, our TCB would be stripped of guix-daemon, the Shepherd, and other services running on a normal system. Since Guix has all the build information available in the form of derivations, which are normally interpreted by the daemon, we found that it shouldn’t be that hard to convert them to a minimal Guile script that would be executed during startup, from the initrd. Some hack hours later, we had a proof-of-concept branch, adding a (gnu system bootstrap) module with all the necessary machinery:

  1. a function that converts an arbitrary derivation to a linear build script that builds the complete dependency graph in topological order;
  2. the declaration of an operating system that boots into such a script from the initrd;
  3. a function to run a pure-Scheme SHA256 implementation to compute and display the hash of the build result.

More on that in a future post! Interestingly, we learned that NBS is taking a similar approach — building from the initrd — though with different binary seeds and specific build and packaging tooling.

We went on exploring the space of what we called “extreme bootstrapping” some more. How could we further reduce the TCB? The kernel is an obvious target: as long as we use the Linux kernel, we could disable many optional features, even perhaps networking and storage drivers. Fabrice Bellard’s 2004 impressive tcc-boot experiment reminds us that we could even aim for a bootloader that builds the OS kernel before it boots it; this removes Linux entirely from the TCB, in exchange for TinyCC. As part of the “Bootstrappable Debian” project, asmc takes a similar approach: providing a very small OS kernel that’s enough to compile simple things. This is like going “from inorganic matter to organic molecules”, as Giovanni Mascellani nicely puts it.

When a Mirage developer and hackers familiar with GNU/Hurd talk about bootstrapping, it is no surprise that they end up looking at library OSes and microkernels. Indeed, one could imagine booting into a dedicated Mirage unikernel (though it would lack a POSIX personality), or booting into GNU Mach with few or no Hurd services initially running. That would be a way to strip the TCB to a bare minimum… It will be some time before we get there, but it could well be our horizon!

More cool hacks

During the summit, support for system provenance tracking in guix system landed in Guix. This allows a deployed system to embed the information needed to rebuild it: its channels and its configuration file. In other words, the result is what we could call a source-carrying system, which could also be thought of as a sort of Quine. For users it’s a convenient way to map a running system or virtual machine image back to its source, or to verify that its binaries are genuine by rebuiling it.

The guix challenge command started its life shortly before the first summit. During this year’s hacking sessions, it gained a --diff option that automates the steps of downloading, decompressing, and diffing non-reproducible archives, possibly with Diffoscope. The idea came up some time ago, and it’s good that we can cross that line from our to-do list.

Thanks!

We are grateful to everyone who made this summit possible: Gunner and Evelyn of Aspiration, Hannes, Holger, Lamby, Mattia, and Vagrant, as well as our kind hosts at Priscilla. And of course, thanks to all fellow participants whose openmindedness and focus made this both a productive and a pleasant experience!

About GNU Guix

GNU Guix is a transactional package manager and an advanced distribution of the GNU system that respects user freedom. Guix can be used on top of any system running the kernel Linux, or it can be used as a standalone operating system distribution for i686, x86_64, ARMv7, and AArch64 machines.

In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. When used as a standalone GNU/Linux distribution, Guix offers a declarative, stateless approach to operating system configuration management. Guix is highly customizable and hackable through Guile programming interfaces and extensions to the Scheme language.

Unless otherwise stated, blog posts on this site are copyrighted by their respective authors and published under the terms of the CC-BY-SA 4.0 license and those of the GNU Free Documentation License (version 1.3 or later, with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts).