The Full-Source Bootstrap: Building from source all the way down
We are delighted and somewhat relieved to announce that the third
reduction of the Guix bootstrap binaries has now been merged in the
main branch of Guix! If you run guix pull
today, you get a package
graph of more than 22,000 nodes rooted in a 357-byte program—something
that had never been achieved, to our knowledge, since the birth of Unix.
We refer to this as the Full-Source Bootstrap. In this post, we explain what this means concretely. This is a major milestone—if not the major milestone—in our quest for building everything from source, all the way down.
How did we get there, and why? In two previous blog posts, we elaborated on why this reduction and bootstrappability in general is so important.
One reason is to properly address supply chain security concerns. The Bitcoin community was one of the first to recognize its importance well enough to put the idea into practice. At the Breaking Bitcoin conference 2020, Carl Dong gave a fun and remarkably gentle introduction. At the end of the talk, Carl states:
The holy grail for bootstrappability will be connecting
hex0
tomes
.
Two years ago, at FOSDEM 2021, I (Janneke) gave a short talk about how we were planning to continue this quest.
If you think one should always be able to build software from source, then it follows that the “trusting trust” attack is only a symptom of an incomplete or missing bootstrap story.
The Road to Full-Source Bootstrap
Three years ago, the bootstrap binaries were reduced to just GNU Mes and MesCC-Tools (and the driver to build Guix packages: a static build of GNU Guile 2.0.9).
The new Full-Source Bootstrap, merged in Guix master
yesterday,
removes the binaries for Mes and MesCC-Tools and replaces them by bootstrap-seeds. For x86-linux (which is also used by the x86_64-linux build), this means this program
hex0-seed, with ASCII-equivalent
hex0_x86.hex0. Hex0 is self-hosting and its source looks like this:
; Where the ELF Header is going to hit
; Simply jump to _start
; Our main function
# :_start ; (0x8048054)
58 # POP_EAX ; Get the number of arguments
you can spot two types of line-comment: hex0 (;
) and assembly (#
).
The only program-code in this snippet is 58
: two hexidecimal digits
that are taken as two nibbles and compiled into the corresponding byte
with binary value 58
.
Starting from this 357-byte hex0-seed binary provided by the
bootstrap-seeds
, the stage0-posix
package created by Jeremiah
Orians first builds hex0 and then all the way up: hex1, catm, hex2,
M0, cc_x86, M1, M2, get_machine (that's all of MesCC-Tools), and
finally M2-Planet.
The new GNU Mes v0.24 release can be built with M2-Planet. This time with only a remarkably small change, the bottom of the package graph now looks like this (woohoo!):
gcc-mesboot (4.9.4)
^
|
(...)
^
|
binutils-mesboot (2.20.1a), glibc-mesboot (2.2.5),
gcc-core-mesboot (2.95.3)
^
|
patch-mesboot (2.5.9)
^
|
bootstrappable-tcc (0.9.26+31 patches)
^
|
gnu-make-mesboot0 (3.80)
^
|
gzip-mesboot (1.2.4)
^
|
tcc-boot (0.9.27)
^
|
mes-boot (0.24.2)
^
|
stage0-posix (hex0..M2-Planet)
^
|
gash-boot, gash-utils-boot
^
|
*
bootstrap-seeds (357-bytes for x86)
~~~
[bootstrap-guile-2.0.9 driver (~25 MiB)]
We are excited that the NLnet Foundation has been sponsoring this work!
However, we aren't done yet; far from it.
Lost Paths
The idea of reproducible builds and bootstrappable software is not very new. Much of that was implemented for the GNU tools in the early 1990s. Working to recreate it in present time shows us much of that practice was forgotten.
Most bootstrap problems or loops are not so easy to solve and sometimes there are no obvious answers, for example:
In 2013, the year that Reproducible Builds started to gain some traction, the GNU Compiler Collection released version 4.8.0, making C++ a build requirement, and
Even more recently (2018), the GNU C Library glibc-2.28 adds Python as a build requirement,
While these examples make for a delightful puzzle from a bootstrappability perspective, we would love to see the maintainers of GNU packages consider bootstrappability and start taking more responsibility for the bootstrap story of their packages.
Next Steps
Despite this major achievement, there is still work ahead.
First, while the package graph is rooted in a 357-byte program, the set
of binaries from which packages are built includes a 25 MiB
statically-linked Guile, guile-bootstrap
, that Guix uses as its driver
to build the initial packages. 25 MiB is a tenth of what the initial
bootstrap binaries use to weigh, but it is a lot compared to those 357
bytes. Can we get rid of this driver, and how?
A development effort with Timothy Sample addresses the dependency on
guile-bootstrap
of Gash and
Gash-Utils, the
pure-Scheme POSIX shell implementation central to our second
milestone.
On the one hand, Mes is gaining a higher level of Guile compatibility:
hash table interface, record interface, variables and variable-lookup,
and Guile (source) module loading support. On the other hand, Gash
and Gash-Utils are getting Mes compatibility for features that Mes is
lacking (notably syntax-case
macros). If we pull this off,
guile-bootstrap
will only be used as a dependency of bootar and as
the driver for Guix.
Second, the full-source bootstrap that just landed in Guix master
is
limited to x86_64-linux and i686-linux, but ARM and RISC-V will be
joining soon. We are most grateful and excited that the NLnet
Foundation has decided to continue sponsoring this
work!
Some time ago, Wladimir van der Laan contributed initial RISC-V support for Mes but a major obstacle for the RISC-V bootstrap is that the “vintage” GCC-2.95.3 that was such a helpful stepping stone does not support RISC-V. Worse, the RISC-V port of GCC was introduced only in GCC 7.5.0—a version that requires C++ and cannot be bootstrapped! To this end, we have been improving MesCC, the C compiler that comes with Mes, so it is able to build GCC 4.6.5; meanwhile, Ekaitz Zarraga backported RISC-V support to GCC 4.6.5, and backported RISC-V support from the latest tcc to our bootstrappable-tcc.
Outlook
The full-source bootstrap was once deemed impossible. Yet, here we are, building the foundations of a GNU/Linux distro entirely from source, a long way towards the ideal that the Guix project has been aiming for from the start.
There are still some daunting tasks ahead. For example, what about the
Linux kernel? The good news is that the bootstrappable community has
grown a lot, from two people six years ago there are now around 100
people in the #bootstrappable
IRC channel. Interesting times ahead!
About Bootstrappable Builds and GNU Mes
Software is bootstrappable when it does not depend on a binary seed that cannot be built from source. Software that is not bootstrappable---even if it is free software---is a serious security risk (supply chain security) for a variety of reasons. The Bootstrappable Builds project aims to reduce the number and size of binary seeds to a bare minimum.
GNU Mes is closely related to the Bootstrappable Builds project. Mes is used in the full-source bootstrap path for the Guix System.
Currently, Mes consists of a mutual self-hosting scheme interpreter and C compiler. It also implements a C library. Mes, the scheme interpreter, is written in about 5,000 lines of code of simple C and can be built with M2-Planet. MesCC, the C compiler, is written in scheme. Together, Mes and MesCC can compile bootstrappable TinyCC that is self-hosting. Using this TinyCC and the Mes C library, the entire Guix System for i686-linux and x86_64-linux is bootstrapped.
About GNU Guix
GNU Guix is a transactional package manager and an advanced distribution of the GNU system that respects user freedom. Guix can be used on top of any system running the Hurd or the Linux kernel, or it can be used as a standalone operating system distribution for i686, x86_64, ARMv7, AArch64 and POWER9 machines.
In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. When used as a standalone GNU/Linux distribution, Guix offers a declarative, stateless approach to operating system configuration management. Guix is highly customizable and hackable through Guile programming interfaces and extensions to the Scheme language.
Pokud není uvedeno jinak, příspěvky na blogu na těchto stránkách jsou chráněny autorskými právy příslušných autorů a zveřejněny za podmínek licence CC-BY-SA 4.0 a podmínek licence GNU Free Documentation License (verze 1.3 nebo novější, bez invariantních částí, bez textů na přední straně obálky a bez textů na zadní straně obálky).