The Big Change
Making cross-cutting changes over a large code base is difficult, but it's occasionally necessary if we are to keep the code base tidy and malleable. With almost 800K source lines of code, Guix can reasonably be called a large code base. One might argue that almost 80% of this code is package definitions, which “doesn't count”. Reality is that it does count, not only because those package definitions are regular Scheme code, but also they are directly affected by the big change Guix has just undergone.
This post looks at what’s probably the biggest change Guix has seen since it started nine years ago and that anyone writing packages will immediately notice: simplified package inputs. Yes, we just changed how each of the 20K packages plus those in third-party channels can declare their dependencies. Before describing the change, how we implemented it, and how packagers can adapt, let’s first take a look at the previous situation and earlier improvements that made this big change possible.
Packages and inputs
Packages in Guix are defined using a domain-specific language embedded in the Scheme programming language—an EDSL, for the programming language geeks among us. This is a departure from other designs such as that of Nix, which comes with a dedicated language, and it gives packagers and users access to a rich programming interface while retaining the purely declarative style of package definitions.
This package EDSL is one of the oldest bits of Guix and was described in a 2013 paper. Although embedded in Scheme, the package “language” was designed to be understandable by people who’ve never practiced any kind of Lisp before—you could think of it as a parenthesized syntax for JSON or XML. It’s reasonable to say that it’s been successful at that: of the 600+ people who contributed to Guix over the years, most had never written Scheme or Lisp before. The example given in that paper remains a valid package definition:
(define hello
(package
(name "hello")
(version "2.8")
(source (origin
(method url-fetch)
(uri (string-append "mirror://gnu/hello/hello-"
version ".tar.gz"))
(sha256 (base32 "0wqd8..."))))
(build-system gnu-build-system)
(arguments
'(#:configure-flags
`("--disable-color"
,(string-append "--with-gawk="
(assoc-ref %build-inputs "gawk")))))
(inputs `(("gawk" ,gawk)))
(synopsis "GNU Hello")
(description "An illustration of GNU's engineering practices.")
(home-page "http://www.gnu.org/software/hello/")
(license gpl3+)))
Of particular interest here is the inputs
field, which lists
build-time dependencies. Here there’s only one, GNU Awk; it has an
associated label, "gawk"
. The ,gawk
bit lets us insert the value
of the gawk
variable, which is another package. We can list more
dependencies like so:
(inputs `(("gawk" ,gawk)
("gettext" ,gnu-gettext)
("pkg-config" ,pkg-config)))
Quite a bit of boilerplate. Unless you’re into Lisp, this probably
looks weird to you—manageable, but weird. What’s the deal with this
backquote and those commas? The backquote is shorthand for quasiquote
and commas are shorthand for unquote
; it’s a facility that Lisp and
Scheme provide to construct
lists.
Lispers couldn’t live without quasiquote, it’s wonderful. Still, exposing newcomers to this syntax has always been uncomfortable; in tutorials you’d end up saying “yeah don’t worry, just write it this way”. Our goal though is to empower users by giving them abstractions they can comprehend, hopefully providing a smooth path towards programming without noticing. This seemingly opaque backquote-string-unquote construct works against that goal.
Then, you ask, why did Guix adopt this unpleasant syntax for inputs in
the first place? Input syntax had to satisfy one requirement: that it’d
be possible for “build-side code” to refer to a specific input. And
what’s build-side code? It’s code that appears in the package
definition that is staged for later execution—code that’s only
evaluated when the package is actually built. The bit that follows
#:configure-flags
in the example above is build-side code: it’s an
expression evaluated if and when the package gets built. That
#:configure-flags
expression refers to the gawk package to construct a
flag like "--with-gawk=/gnu/store/…-gawk-5.0.1"
; it does so by
referring to the special %build-inputs
variable, which happens to
contain an association
list
that maps input labels to file names. The "gawk"
label in inputs
is
here to allow build-side code to get at an
input’s file name.
Still here? The paragraphs above are a demonstration of the shortcoming of this approach. That we have to explain so much suggests we’re lacking an abstraction that would make the whole pattern clearer.
G-expressions and self-referential records
The missing abstraction came to Guix a couple of years later:
G-expressions.
Without going into the details, which were covered elsewhere, notably
in a research article,
g-expressions, or “gexps”, are traditional Lisp
s-expressions (“sexps”) on
steroids. A gexp can contain a package record or any other “file-like
object” and, when that gexp is serialized for eventual execution, the
package is replaced by its /gnu/store/…
file name.
Gexps have been used since 2014–2015 in all of Guix System and they’ve been great to work with, but package definitions were stuck with old-style sexps. One reason is that a big chunk of the code that deals with packages and build systems had to be ported to the gexp-related programming interfaces; a first attempt had been made long ago but performance was not on par with that of the old code, so postponing until that was addressed seemed wiser. The second reason was that using gexps in package definitions could be so convenient that packagers might unwillingly find themselves creating inconsistent packages.
We can now rewrite our hello
package such that configure flags are
expressed using a gexp:
(define hello
(package
(name "hello")
;; …
(arguments
(list #:configure-flags
#~`("--disable-color"
,(string-append "--with-gawk=" #$gawk))))
(inputs `(("gawk" ,gawk)))))
The reference inserted here with #$gawk
(#$
is synonymous for
ungexp
, the gexp equivalent of traditional unquote
) refers to the
global gawk
variable. This is more concise and semantically clearer
than the (assoc-ref %build-inputs "gawk")
snippet we had before.
Now suppose you define a package variant using this common idiom:
(define hello-variant
(package
(inherit hello)
(name "hello-variant")
(inputs `(("gawk" ,gawk-4.0)))))
Here the intent is to create a package that depends on a different
version of GNU Awk—the hypothetical gawk-4.0
instead of gawk
.
However, the #:configure-flags
gexp of this variant still refers to
the gawk
variable, contrary to what the inputs
field prescribes; in
effect, this variant depends on the two Awk versions.
To address this discrepancy, we needed a new linguistic device, to put
it in a fancy way. It arrived in
2019
in the form of self-referential records. Within a field such as the
arguments
field above, it’s now possible to refer to this-package
to
get at the value of the package being defined. (If you’ve done
object-oriented programming before, you might be thinking that we just
rediscovered the this
or self
pointer, and there’s some truth in
it. :-)) With a bit of syntactic sugar, we can rewrite the example above
so that it refers to the Awk package that appears in its own inputs
field:
(define hello
(package
(name "hello")
;; …
(arguments
(list #:configure-flags
#~(list (string-append "--with-gawk="
#$(this-package-input "gawk")))))
(inputs `(("gawk" ,gawk)))))
With this in place, we can take advantage of gexps in package definitions while still supporting the common idiom to define package variants, wheee!
That was a long digression from our input label theme but, as you can see, all this interacts fairly tightly.
Getting rid of input labels
Now that we have gexps and self-referential records, it looks like we can finally get rid of input labels: they’re no longer strictly necessary because we can insert packages in gexps instead of looking them up by label in build-side code. We “just” need to devise a backward compatibility plan…
Input labels are pervasive; they’re visible in three contexts:
- in the
inputs
fields of package definitions; - on the build side with the
inputs
keyword parameter of build phases; - in the Scheme programming interface since
package-inputs
and related functions are expected to return a list of labeled inputs.
We’re brave but not completely crazy, so we chose to focus on #1 for now—it’s also the most visible of all three—, with an plan to incrementally address #2, leaving #3 for later.
To allow for label-less inputs, we augmented the record interface with
field
sanitizers.
This feature allows us to define a procedure that inspects and
transforms the value specified for the inputs
, native-inputs
, and
propagated-inputs
. Currently that procedure reintroduces input
labels
when they’re missing. In a sense, we’re just changing the surface
syntax but under the hood everything is unchanged. With this change,
our example above can now be written like this:
(define hello
(package
(name "hello")
;; …
(inputs (list gawk))))
Much nicer, no? The effect is even more pleasant for packages with a number of inputs:
(define-public guile-sqlite3
(package
(name "guile-sqlite3")
;; …
(build-system gnu-build-system)
(native-inputs (list autoconf automake guile-3.0 pkg-config))
(inputs (list guile-3.0 sqlite))))
That’s enough to spark joy to anyone who’s taught the previous syntax. Currently this is transformed into something like:
(define-public guile-sqlite3
(package
;; …
(native-inputs
(map add-input-label
(list autoconf automake guile-3.0 pkg-config)))))
… where the add-input-label
function turns a package into a
label/package pair. It does add a little bit of run-time overhead, but
nothing really measurable.
There are also cases where package definitions, in particular for
package variants, would directly manipulate input lists as returned by
package-inputs
and related procedures. It’s a case where packagers
had to be aware of input labels, and they would typically use
association list (or “alist”) manipulation
procedures
and similar construct—this is context #3 above. To replace those
idioms, we defined a higher-level construct that does not assume input
labels. For example, a common idiom when defining a package variant
with additional dependencies goes like this:
(define hello-with-additional-dependencies
(package
(inherit hello)
(name "hello-with-bells-and-whistles")
(inputs `(("guile" ,guile-3.0)
("libtextstyle" ,libtextstyle)
,@(package-inputs hello)))))
The variant defined above adds two inputs to those of hello
. We
introduced a macro,
modify-inputs
,
which allows packagers to express that in a higher-level (and less
cryptic) fashion, in a way that does not refer to input labels. Using
this other linguistic device (ha ha!), the snippet above becomes:
(define hello-with-additional-dependencies
(package
(inherit hello)
(name "hello-with-bells-and-whistles")
(inputs (modify-inputs (package-inputs hello)
(prepend guile-3.0 libtextstyle)))))
Similarly, modify-inputs
advantageously subsumes alist-delete
and
whatnot when willing to replace or remove inputs, like so:
(modify-inputs (package-inputs hello)
(replace "gawk" my-special-gawk)
(delete "guile"))
On the build side—context #2 above—, we also provide new procedures that
allow packagers to avoid relying on input labels: search-input-file
and
search-input-directory
.
Instead of having build phases that run code like:
(lambda* (#:key inputs #:allow-other-keys)
;; Replace references to “/sbin/ip” by references to
;; the actual “ip” command in /gnu/store.
(substitute* "client/scripts/linux"
(("/sbin/ip")
;; Look up the input labeled “iproute”.
(string-append (assoc-ref inputs "iproute")
"/sbin/ip"))))
… you’d now write:
(lambda* (#:key inputs #:allow-other-keys)
;; Replace references to “/sbin/ip” by references to
;; the actual “ip” command in /gnu/store.
(substitute* "client/scripts/linux"
(("/sbin/ip")
;; Search “/sbin/ip” among all the inputs.
(search-input-file inputs "/sbin/ip"))))
Nothing revolutionary but a couple of advantages: code is no longer tied
to input labels or package names, and search-input-file
raises an
exception when the file is not found, which is better than building an
incorrect file name.
That was a deep dive into packaging! If you’re already packaging software for Guix, you hopefully see how to do things “the new way”. Either way, it’s interesting to see the wide-ranging implications of what initially looks like a simple change. Things get complex when you have to consider all the idioms that 20,000 packages make use of.
Adapting to the new style
It’s nice to have a plan to change the style of package definitions, but how do you make it happen concretely? The last thing we want is, for the next five years, to have to explain two package styles to newcomers instead of one.
First, guix import
now returns packages in the new style. But what about existing package
definitions?
Fortunately, most of the changes above can be automated: that’s the job
of the guix style
command that we added for this purpose, but which may eventually be
extended to make package definitions prettier in all sorts of ways.
From a checkout, one can run:
./pre-inst-env guix style
Whenever it can be done safely, package inputs in every package
definition are rewritten to the new style: removing input labels, and
using modify-inputs
where appropriate. If you maintain your own
channel, you can also run it for your packages:
guix style -L /path/to/channel my-package1 my-package2 …
We recommend waiting until the next Guix release is out though, which
could be a month from now, so that your channel remains usable by those
who pinned an older revision of the guix
channel.
We ran guix style
a couple of days ago on the whole repository,
leading to the biggest
commit
in Guix history:
460 files changed, 37699 insertions(+), 49782 deletions(-)
Woohoo! Less than 15% of the packages (not counting Rust packages, which are a bit special) have yet to be adjusted. In most cases, package inputs that were left unchanged are either those where we cannot automatically determine whether the change would break the package, for instance because build-side code expects certain labels, and those that are “complex”—e.g., inputs that include conditionals.
The key challenge here was making sure guix style
transformations are
correct; by default, we even want to be sure that changes introduced by
guix style
do not trigger a rebuild—that package
derivations
are unchanged.
To achieve that, guix style
correlates the source code of each
package definition with the corresponding live package record. That
allows it to answer questions such as “is this label the name of the
input package”. That way, it can tell that, say:
(inputs `(("libx11" ,libx11)
("libxcb" ,libxcb)))
can be rewritten without introducing a rebuild, because labels match actual package names, whereas something like this cannot, due to label mismatches:
(inputs `(("libX11" ,libx11)
("xcb" ,libxcb)))
guix style
can also determine situations where changes would trigger a
rebuild but would still be “safe”—without any observable effect. You
can force it to make such changes by running:
guix style --input-simplification=safe
Because we’re all nitpicky when it comes to code formatting, guix style
had to produce nicely formatted code, and to make local changes
as opposed to rewriting complete package definitions. Lisps are famous
for being homoiconic,
which comes in handy in such a situation.
But the tools at our disposal are not capable enough for this
application. First, Scheme’s standard
read
procedure, which reads an sexp (or an abstract syntax tree if you will)
from a byte stream and returns it, does not preserve comments.
Obviously we’d rather not throw away comments, so we came up with our
own read
variant that preserves comments. Similarly, we have a custom
pretty printer that can write comments, allowing it to achieve changes
like this:
@@ -2749,18 +2707,17 @@ (define-public debops
"debops-debops-defaults-fall-back-to-less.patch"))))
(build-system python-build-system)
(native-inputs
- `(("git" ,git)))
+ (list git))
(inputs
- `(("ansible" ,ansible)
- ("encfs" ,encfs)
- ("fuse" ,fuse)
- ("util-linux" ,util-linux) ;for umount
- ("findutils" ,findutils)
- ("gnupg" ,gnupg)
- ("which" ,which)))
+ (list ansible
+ encfs
+ fuse
+ util-linux ;for umount
+ findutils
+ gnupg
+ which))
(propagated-inputs
- `(("python-future" ,python-future)
- ("python-distro" ,python-distro)))
+ (list python-future python-distro))
(arguments
`(#:tests? #f
The pretty printer also has special rules for input lists. For
instance, lists of five inputs or less go into a single line, if
possible, whereas longer lists end up with one input per line, which is
often more convenient, especially when visualizing diffs. It also has
rules to format modify-inputs
in the same way we’d do it in Emacs:
@@ -171,9 +170,9 @@ (define-public arcan-sdl
(inherit arcan)
(name "arcan-sdl")
(inputs
- `(("sdl" ,sdl)
- ,@(fold alist-delete (package-inputs arcan)
- '("libdrm"))))
+ (modify-inputs (package-inputs arcan)
+ (delete "libdrm")
+ (prepend sdl)))
(arguments
Overall that makes guix style
a pretty fun meta-project!
“Worse is better” or “the Right Thing”?
There are several lessons here. One is that having an embedded
domain-specific language is what makes such changes possible: yes
package definitions have a declarative field, but we do not hide the
fact that their meaning is determined by the broader Guix framework,
starting with the (guix packages)
module, which defines the package
record type and associated procedures. Having a single repository
containing both package definitions and “the package manager” is also
a key enabler; we can change the framework, add new linguistic tools,
and adjust package definitions at the same time. This is in contrast,
for instance, with the approach taken by Nix, where the language
implementation lives separately from the package collection.
Another one is that a strong, consistent community leads to consistent changes—not surprisingly in fact. It’s a pleasure to see that we, collectively, can undertake such overarching changes and all look in the same direction.
The last lesson is in how we approach design issues in a project that is now a little more than nine years old. Over these nine years it’s clear that we have usually favored “the right thing” in our design—but not at any cost. This whole change is an illustration of this. It was clear from the early days of Guix that input labels and the associated machinery were suboptimal. But it took us a few years to design an implement the missing pieces: G-expressions, self-referential records, and the various idioms that allow package definitions to benefit from these. In the meantime, we built a complete system and a community and we gained experience. We cannot claim we attained the right thing, if such a thing exists, but certainly package definitions today are closer to the declarative ideal and easier to teach.
It’s here today!
This big change, along with countless other improvements and package
upgrades, is just one guix pull
away! After months of development, we
have just merged the “core updates” branch bringing so many new
things—from GNOME 41, to GCC 10 by default, to hardened Python packages
and improved executable startup
times.
This paves the way for the upcoming release, most likely labeled “1.4”,
unless a closer review of the changes that have landed leads us to think
“2.0” would be more appropriate… Stay tuned!
About GNU Guix
GNU Guix is a transactional package manager and an advanced distribution of the GNU system that respects user freedom. Guix can be used on top of any system running the Hurd or the Linux kernel, or it can be used as a standalone operating system distribution for i686, x86_64, ARMv7, AArch64 and POWER9 machines.
In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. When used as a standalone GNU/Linux distribution, Guix offers a declarative, stateless approach to operating system configuration management. Guix is highly customizable and hackable through Guile programming interfaces and extensions to the Scheme language.
Om inte annat anges är blogginlägg på denna webbplats upphovsrättsskyddade av deras respektive författare och utgivna under villkoren hos licensen 1>CC-BY-SA 4.0</1> och licensen GNU Free Documentation License (version 1.3 eller senare, med inga invariant sections, inga front-cover texts, och inga back-cover texts).