The Full-Source Bootstrap: Building from source all the way down

We are delighted and somewhat relieved to announce that the third reduction of the Guix bootstrap binaries has now been merged in the main branch of Guix! If you run guix pull today, you get a package graph of more than 22,000 nodes rooted in a 357-byte program—something that had never been achieved, to our knowledge, since the birth of Unix.

We refer to this as the Full-Source Bootstrap. In this post, we explain what this means concretely. This is a major milestone—if not the major milestone—in our quest for building everything from source, all the way down.

How did we get there, and why? In two previous blog posts, we elaborated on why this reduction and bootstrappability in general is so important.

One reason is to properly address supply chain security concerns. The Bitcoin community was one of the first to recognize its importance well enough to put the idea into practice. At the Breaking Bitcoin conference 2020, Carl Dong gave a fun and remarkably gentle introduction. At the end of the talk, Carl states:

The holy grail for bootstrappability will be connecting hex0 to mes.

Two years ago, at FOSDEM 2021, I (Janneke) gave a short talk about how we were planning to continue this quest.

If you think one should always be able to build software from source, then it follows that the “trusting trust” attack is only a symptom of an incomplete or missing bootstrap story.

The Road to Full-Source Bootstrap

Three years ago, the bootstrap binaries were reduced to just GNU Mes and MesCC-Tools (and the driver to build Guix packages: a static build of GNU Guile 2.0.9).

The new Full-Source Bootstrap, merged in Guix master yesterday, removes the binaries for Mes and MesCC-Tools and replaces them by bootstrap-seeds. For x86-linux (which is also used by the x86_64-linux build), this means this program hex0-seed, with ASCII-equivalent hex0_x86.hex0. Hex0 is self-hosting and its source looks like this:

    ; Where the ELF Header is going to hit  
    ; Simply jump to _start  
    ; Our main function  
    # :_start ; (0x8048054)  
    58 # POP_EAX ; Get the number of arguments  

you can spot two types of line-comment: hex0 (;) and assembly (#). The only program-code in this snippet is 58: two hexidecimal digits that are taken as two nibbles and compiled into the corresponding byte with binary value 58.

Starting from this 357-byte hex0-seed binary provided by the bootstrap-seeds, the stage0-posix package created by Jeremiah Orians first builds hex0 and then all the way up: hex1, catm, hex2, M0, cc_x86, M1, M2, get_machine (that's all of MesCC-Tools), and finally M2-Planet.

The new GNU Mes v0.24 release can be built with M2-Planet. This time with only a remarkably small change, the bottom of the package graph now looks like this (woohoo!):

                              gcc-mesboot (4.9.4)
                                      ^
                                      |
                                    (...)
                                      ^
                                      |
               binutils-mesboot (2.20.1a), glibc-mesboot (2.2.5),
                          gcc-core-mesboot (2.95.3)
                                      ^
                                      |
                             patch-mesboot (2.5.9)
                                      ^
                                      |
                       bootstrappable-tcc (0.9.26+31 patches)
                                      ^
                                      |
                            gnu-make-mesboot0 (3.80)
                                      ^
                                      |
                              gzip-mesboot (1.2.4)
                                      ^
                                      |
                               tcc-boot (0.9.27)
                                      ^
                                      |
                                 mes-boot (0.24.2)
                                      ^
                                      |
                         stage0-posix (hex0..M2-Planet)
                                      ^
                                      |
                          gash-boot, gash-utils-boot
                                      ^
                                      |
                                      *
                     bootstrap-seeds (357-bytes for x86)
                                     ~~~
                   [bootstrap-guile-2.0.9 driver (~25 MiB)]

full graph

We are excited that the NLnet Foundation has been sponsoring this work!

However, we aren't done yet; far from it.

Lost Paths

The idea of reproducible builds and bootstrappable software is not very new. Much of that was implemented for the GNU tools in the early 1990s. Working to recreate it in present time shows us much of that practice was forgotten.

Most bootstrap problems or loops are not so easy to solve and sometimes there are no obvious answers, for example:

While these examples make for a delightful puzzle from a bootstrappability perspective, we would love to see the maintainers of GNU packages consider bootstrappability and start taking more responsibility for the bootstrap story of their packages.

Next Steps

Despite this major achievement, there is still work ahead.

First, while the package graph is rooted in a 357-byte program, the set of binaries from which packages are built includes a 25 MiB statically-linked Guile, guile-bootstrap, that Guix uses as its driver to build the initial packages. 25 MiB is a tenth of what the initial bootstrap binaries use to weigh, but it is a lot compared to those 357 bytes. Can we get rid of this driver, and how?

A development effort with Timothy Sample addresses the dependency on guile-bootstrap of Gash and Gash-Utils, the pure-Scheme POSIX shell implementation central to our second milestone. On the one hand, Mes is gaining a higher level of Guile compatibility: hash table interface, record interface, variables and variable-lookup, and Guile (source) module loading support. On the other hand, Gash and Gash-Utils are getting Mes compatibility for features that Mes is lacking (notably syntax-case macros). If we pull this off, guile-bootstrap will only be used as a dependency of bootar and as the driver for Guix.

Second, the full-source bootstrap that just landed in Guix master is limited to x86_64-linux and i686-linux, but ARM and RISC-V will be joining soon. We are most grateful and excited that the NLnet Foundation has decided to continue sponsoring this work!

Some time ago, Wladimir van der Laan contributed initial RISC-V support for Mes but a major obstacle for the RISC-V bootstrap is that the “vintage” GCC-2.95.3 that was such a helpful stepping stone does not support RISC-V. Worse, the RISC-V port of GCC was introduced only in GCC 7.5.0—a version that requires C++ and cannot be bootstrapped! To this end, we have been improving MesCC, the C compiler that comes with Mes, so it is able to build GCC 4.6.5; meanwhile, Ekaitz Zarraga backported RISC-V support to GCC 4.6.5, and backported RISC-V support from the latest tcc to our bootstrappable-tcc.

Outlook

The full-source bootstrap was once deemed impossible. Yet, here we are, building the foundations of a GNU/Linux distro entirely from source, a long way towards the ideal that the Guix project has been aiming for from the start.

There are still some daunting tasks ahead. For example, what about the Linux kernel? The good news is that the bootstrappable community has grown a lot, from two people six years ago there are now around 100 people in the #bootstrappable IRC channel. Interesting times ahead!

About Bootstrappable Builds and GNU Mes

Software is bootstrappable when it does not depend on a binary seed that cannot be built from source. Software that is not bootstrappable---even if it is free software---is a serious security risk (supply chain security) for a variety of reasons. The Bootstrappable Builds project aims to reduce the number and size of binary seeds to a bare minimum.

GNU Mes is closely related to the Bootstrappable Builds project. Mes is used in the full-source bootstrap path for the Guix System.

Currently, Mes consists of a mutual self-hosting scheme interpreter and C compiler. It also implements a C library. Mes, the scheme interpreter, is written in about 5,000 lines of code of simple C and can be built with M2-Planet. MesCC, the C compiler, is written in scheme. Together, Mes and MesCC can compile bootstrappable TinyCC that is self-hosting. Using this TinyCC and the Mes C library, the entire Guix System for i686-linux and x86_64-linux is bootstrapped.

About GNU Guix

GNU Guix is a transactional package manager and an advanced distribution of the GNU system that respects user freedom. Guix can be used on top of any system running the Hurd or the Linux kernel, or it can be used as a standalone operating system distribution for i686, x86_64, ARMv7, AArch64 and POWER9 machines.

In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. When used as a standalone GNU/Linux distribution, Guix offers a declarative, stateless approach to operating system configuration management. Guix is highly customizable and hackable through Guile programming interfaces and extensions to the Scheme language.

موضوع‌های مرتبط:

Bootstrapping Reproducible builds Security

Unless otherwise stated, blog posts on this site are copyrighted by their respective authors and published under the terms of the CC-BY-SA 4.0 license and those of the GNU Free Documentation License (version 1.3 or later, with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts).