Guix Further Reduces Bootstrap Seed to 25%
We are delighted to announce that the second reduction by 50% of the Guix bootstrap binaries has now been officially released!
The initial set of binaries from which packages are built now weighs in at approximately 60~MiB, a quarter of what it used to be.
In a previous blog post we elaborate on why this reduction and bootstrappability in general is so important. One reason is to eliminate---or greatly reduce the attack surface of---a “trusting trust” attack. Last summer at the Breaking Bitcoin conference, Carl Dong gave a fun and remarkably gentle introduction and at FOSDEM2020 I also gave a short talk about this. If you choose to believe that building from source is the proper way to do computing, then it follows that the “trusting trust” attack is only a symptom of an incomplete or missing bootstrap story.
Further Reduced Binary Seed bootstrap
Last year, the first reduction removed the GCC, glibc and Binutils
binary seeds. The new Further Reduced Binary Seed bootstrap, merged
in Guix master
last month, removes the “static-binaries tarball”
containing GNU Awk, Bash, Bzip2, the GNU Core Utilities, Grep, Gzip,
GNU Make, Patch, sed, Tar, and Xz. It replaces them by Gash and Gash
Core Utils. Gash is a
minimalist POSIX shell written in Guile Scheme, while Gash Core Utils
is a Scheme implementation for most of the tools found in
GNU Coreutils, as well as the most essential bits of Awk, grep and
sed.
After three new GNU Mes releases with numerous Mes C Library updates and fixes, a major update of Gash and the first official Gash Utils release, and the delicate balancing of 17 new bootstrap source packages and versions, the bottom of the package graph now looks like this (woohoo!):
gcc-mesboot (4.9.4)
^
|
(...)
^
|
binutils-mesboot (2.14), glibc-mesboot (2.2.5),
gcc-core-mesboot (2.95.3)
^
|
bash-mesboot (2.05), bzip2-mesboot, gawk-mesboot (3.0.0)
diffutils-mesboot (2.7), patch-mesboot (2.5.9), sed-mesboot (1.18)
^
|
gnu-make-mesboot (3.82)
^
|
gzip-mesboot (1.2.4)
^
|
tcc-boot
^
|
mes-boot
^
|
gash-boot, gash-utils-boot
^
|
*
bootstrap-mescc-tools, bootstrap-mes (~12 MiB)
bootstrap-guile (~48 MiB)
We are excited that the Nlnet Foundation has sponsored this work!
However, we aren't done yet; far from it.
Lost Paths
The idea of reproducible builds and bootstrappable software is not very new. Much of that was implemented for the GNU tools in the early 1990s. Working to recreate it in present time shows us much of that practice was forgotten.
Readers who are familiar with the GNU toolchain may have
noticed the version numbers of the *-mesboot
source packages in
this great new bootstrap: They are ancient! That's a problem.
Typically, newer versions of the tool chain fix all kinds of bugs, make the software easier to build and add support for new CPU architectures, which is great. However---more often than not--- simultaneously new features are introduced or dependencies are added that are not necessary for bootstrapping and may increase the bootstrap hurdle. Sometimes, newer tools are more strict or old configure scripts do not recognise newer tool versions.
A trivial example is GNU sed. In the current bootstrap we are using
version 1.18, which was released in 1993. Until recently the latest
version of sed we could hope to bootstrap was sed-4.2.2 (2012). Newer
releases ship as xz
-compressed tarballs only, and xz
is
notoriously difficult to bootstrap (it needs a fairly recent GCC and
try building that without sed).
Luckily, the sed maintainers (Jim Meyering) were happy to
correct
this mistake and starting from release
sed-4.8 (2020) also
gzip
-compressed tarballs will be shipped. Similar for the GNU Core
Utils: Releases made between 2011 and 2019 will probably be useless
for bootstrapping. Confronted with this information, also the
coreutils maintainers (Pádraig Brady) were happy
to
release
coreutils-8.32
also in gzip
compression from now on.
Even these simple cases show that solving bootstrap problems can only be done together: For GNU it really is a project-wide responsibility that needs to be addressed.
Most bootstrap problems or loops are not so easy to solve and sometimes there are no obvious answers, for example:
In 2013, the year that Reproducible Builds started to gain some traction, the GNU Compiler Collection released gcc-4.8.0, making C++ a build requirement, and
Even more recently (2018), the GNU C Library glibc-2.28 adds Python as a build requirement,
and while these examples make for a delightful puzzle from a bootstrappability perspective, we would love to see the maintainers of GNU softwares to consider bootstrappability and start taking more responsibility for the bootstrap story of their packages.
Towards a Universal, Full Source Bootstrap
Our next target will be a third reduction by ~50%; the Full-Source bootstrap will replace the MesCC-Tools and GNU Mes binaries by Stage0 and M2-Planet.
The Stage0 project by Jeremiah Orians starts everything from ~512 bytes; virtually nothing. Have a look at this incredible project if you haven’t already done so.
We are most grateful and excited that the Nlnet Foundation has again decided to sponsor this work!
While the reduced bootstrap currently only applies to the i686-linux and x86_64-linux architectures, we are thrilled that ARM will be joining soon. The Trusted ARM bootstrapping work is progressing nicely, and GNU Mes is now passing its entire mescc test suite on native ARMv7, and passing nigh its entire gcc test suite on native ARMv7. Work is underway to compile tcc using that GNU Mes. Adding this second architecture is a very important one towards the creation of a universal bootstrap!
Upcoming releases of Gash and Gash-Utils will allow us to clean up the
bottom of the package graph and remove many of the “vintage” packages.
In particular, the next version of Gash-Utils will be sophisticated
enough to build everything up to gcc-mesboot
using only old versions
of GNU Make and Gzip. This is largely thanks to improvements to the
implementation of Awk, which now includes nearly all of the standard
features.
Looking even further into the future, we will likely have to remove the “vintage” GCC-2.95.3 that was such a helpful stepping stone and reach straight for GCC-4.6.4. Interesting times ahead!
About Bootstrappable Builds and GNU Mes
Software is bootstrappable when it does not depend on a binary seed that cannot be built from source. Software that is not bootstrappable---even if it is free software---is a serious security risk for a variety of reasons. The Bootstrappable Builds project aims to reduce the number and size of binary seeds to a bare minimum.
GNU Mes is closely related to the Bootstrappable Builds project. Mes aims to create an entirely source-based bootstrapping path for the Guix System and other interested GNU/Linux distributions. The goal is to start from a minimal, easily inspectable binary (which should be readable as source) and bootstrap into something close to R6RS Scheme.
Currently, Mes consists of a mutual self-hosting scheme interpreter and C compiler. It also implements a C library. Mes, the scheme interpreter, is written in about 5,000 lines of code of simple C. MesCC, the C compiler, is written in scheme. Together, Mes and MesCC can compile a lightly patched TinyCC that is self-hosting. Using this TinyCC and the Mes C library, it is possible to bootstrap the entire Guix System for i686-linux and x86_64-linux.
About GNU Guix
GNU Guix is a transactional package manager and an advanced distribution of the GNU system that respects user freedom. Guix can be used on top of any system running the kernel Linux, or it can be used as a standalone operating system distribution for i686, x86_64, ARMv7, and AArch64 machines.
In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. When used as a standalone GNU/Linux distribution, Guix offers a declarative, stateless approach to operating system configuration management. Guix is highly customizable and hackable through Guile programming interfaces and extensions to the Scheme language.
Unless otherwise stated, blog posts on this site are copyrighted by their respective authors and published under the terms of the CC-BY-SA 4.0 license and those of the GNU Free Documentation License (version 1.3 or later, with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts).