Substitutes are now available as lzip
For a long time, our build farm at ci.guix.gnu.org has been delivering
substitutes
(pre-built binaries) compressed with gzip. Gzip was never the best
choice in terms of compression ratio, but it was a reasonable and
convenient choice: it’s rock-solid, and zlib made it easy for us to have
Guile
bindings
to perform in-process compression in our multi-threaded guix publish
server.
With the exception of building software from source, downloads take the most time of Guix package upgrades. If users can download less, upgrades become faster, and happiness ensues. Time has come to improve on this, and starting from early June, Guix can publish and fetch lzip-compressed substitutes, in addition to gzip.
Lzip
Lzip is a relatively little-known compression format, initially developed by Antonio Diaz Diaz ca. 2013. It has several C and C++ implementations with surprisingly few lines of code, which is always reassuring. One of its distinguishing features is a very good compression ratio with reasonable CPU and memory requirements, according to benchmarks published by the authors.
Lzlib provides a well-documented C
interface and Pierre Neidhardt set out to write bindings for that
library, which eventually landed as the (guix lzlib)
module.
With this in place we were ready to start migrating our tools, and then our build farm, to lzip compression, so we can all enjoy smaller downloads. Well, easier said than done!
Migrating
The compression format used for substitutes is not a core component like
it can be in “traditional” binary package formats such as
.deb
since Guix is conceptually a
“source-based” distro. However, deployed Guix installations did not
support lzip, so we couldn’t just switch our build farm to lzip
overnight; we needed to devise a transition strategy.
Guix asks for the availability of substitutes over HTTP. For example, a question such as:
“Dear server, do you happen to have a binary of
/gnu/store/6yc4ngrsig781bpayax2cg6pncyhkjpq-emacs-26.2
that I could download?”
translates into prose to an HTTP GET of https://ci.guix.gnu.org/6yc4ngrsig781bpayax2cg6pncyhkjpq.narinfo, which returns something like:
StorePath: /gnu/store/6yc4ngrsig781bpayax2cg6pncyhkjpq-emacs-26.2
URL: nar/gzip/6yc4ngrsig781bpayax2cg6pncyhkjpq-emacs-26.2
Compression: gzip
NarHash: sha256:0h2ibqpqyi3z0h16pf7ii6l4v7i2wmvbrxj4ilig0v9m469f6pm9
NarSize: 134407424
References: 2dk55i5wdhcbh2z8hhn3r55x4873iyp1-libxext-1.3.3 …
FileSize: 48501141
System: x86_64-linux
Deriver: 6xqibvc4v8cfppa28pgxh0acw9j8xzhz-emacs-26.2.drv
Signature: 1;berlin.guixsd.org;KHNpZ25hdHV…
(This narinfo format is inherited from Nix and
implemented
here
and
here.)
This tells us we can download the actual binary from
/nar/gzip/…-emacs-26.2
, and that it will be about 46 MiB (the
FileSize
field.) This is what guix publish
serves.
The trick we came up with was to allow guix publish
to advertise
several URLs, one per compression format. Thus, for recently-built
substitutes, we get something like
this:
StorePath: /gnu/store/mvhaar2iflscidl0a66x5009r44fss15-gimp-2.10.12
URL: nar/gzip/mvhaar2iflscidl0a66x5009r44fss15-gimp-2.10.12
Compression: gzip
FileSize: 30872887
URL: nar/lzip/mvhaar2iflscidl0a66x5009r44fss15-gimp-2.10.12
Compression: lzip
FileSize: 18829088
NarHash: sha256:10n3nv3clxr00c9cnpv6x7y2c66034y45c788syjl8m6ga0hbkwy
NarSize: 94372664
References: 05zlxc7ckwflz56i6hmlngr86pmccam2-pcre-8.42 …
System: x86_64-linux
Deriver: vi2jkpm9fd043hm0839ibbb42qrv5xyr-gimp-2.10.12.drv
Signature: 1;berlin.guixsd.org;KHNpZ25hdHV…
Notice that there are two occurrences of the URL
, Compression
, and
FileSize
fields: one for gzip, and one for lzip. Old Guix instances
will just pick the first one, gzip; newer Guix will pick whichever
supported method provides the smallest FileSize
, usually lzip. This
will make migration trivial in the future, should we add support for
other compression methods.
Users need to upgrade their Guix daemon to benefit from lzip. On a
“foreign distro”, simply run guix pull
as root. On standalone Guix
systems, run guix pull && sudo guix system reconfigure /etc/config.scm
. In both cases, the daemon has to be restarted, be it
with systemctl restart guix-daemon.service
or with herd restart guix-daemon
.
First impressions
This new gzip+lzip scheme has been deployed on ci.guix.gnu.org for a
week. Specifically, we run guix publish -C gzip:9 -C lzip:9
, meaning
that we use the highest compression ratio for both compression methods.
Currently, only a small subset of the package substitutes are available
as both lzip and gzip; those that were already available as gzip have
not been recompressed. The following Guile program that taps into the
API of guix weather
allows us to get some insight:
(use-modules (gnu) (guix)
(guix monads)
(guix scripts substitute)
(srfi srfi-1)
(ice-9 match))
(define all-packages
(@@ (guix scripts weather) all-packages))
(define package-outputs
(@@ (guix scripts weather) package-outputs))
(define (fetch-lzip-narinfos)
(mlet %store-monad ((items (package-outputs (all-packages))))
(return
(filter (lambda (narinfo)
(member "lzip" (narinfo-compressions narinfo)))
(lookup-narinfos "https://ci.guix.gnu.org" items)))))
(define (lzip/gzip-ratio narinfo)
(match (narinfo-file-sizes narinfo)
((gzip lzip)
(/ lzip gzip))))
(define (average lst)
(/ (reduce + 0 lst)
(length lst) 1.))
Let’s explore this at the REPL:
scheme@(guile-user)> (define lst
(with-store s
(run-with-store s (fetch-lzip-narinfos))))
computing 9,897 package derivations for x86_64-linux...
updating substitutes from 'https://ci.guix.gnu.org'... 100.0%
scheme@(guile-user)> (length lst)
$4 = 2275
scheme@(guile-user)> (average (map lzip/gzip-ratio lst))
$5 = 0.7398994395478715
As of this writing, around 20% of the package substitutes are available as lzip, so take the following stats with a grain of salt. Among those, the lzip-compressed substitute is on average 26% smaller than the gzip-compressed one. What if we consider only packages bigger than 5 MiB uncompressed?
scheme@(guile-user)> (define biggest
(filter (lambda (narinfo)
(> (narinfo-size narinfo)
(* 5 (expt 2 20))))
lst))
scheme@(guile-user)> (average (map lzip/gzip-ratio biggest))
$6 = 0.5974238562384483
scheme@(guile-user)> (length biggest)
$7 = 440
For those packages, lzip yields substitutes that are 40% smaller on average. Pretty nice! Lzip decompression is slightly more CPU-intensive than gzip decompression, but downloads are bandwidth-bound, so the benefits clearly outweigh the costs.
Going forward
The switch from gzip to lzip has the potential to make upgrades “feel” faster, and that is great in itself.
Fundamentally though, we’ve always been looking in this project at peer-to-peer solutions with envy. Of course, the main motivation is to have a community-supported and resilient infrastructure, rather than a centralized one, and that vision goes hand-in-hand with reproducible builds.
We started working on an extension to publish and fetch substitutes over IPFS. Thanks to its content-addressed nature, IPFS has the potential to further reduce the amount of data that needs to be downloaded on an upgrade.
The good news is that IPFS developers are also interested in working with package manager developers, and I bet there’ll be interesting discussions at IPFS Camp in just a few days. We’re eager to pursue our IPFS integration work, and if you’d like to join us and hack the good hack, let’s get in touch!
About GNU Guix
GNU Guix is a transactional package manager and an advanced distribution of the GNU system that respects user freedom. Guix can be used on top of any system running the kernel Linux, or it can be used as a standalone operating system distribution for i686, x86_64, ARMv7, and AArch64 machines.
In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. When used as a standalone GNU/Linux distribution, Guix offers a declarative, stateless approach to operating system configuration management. Guix is highly customizable and hackable through Guile programming interfaces and extensions to the Scheme language.
Unless otherwise stated, blog posts on this site are copyrighted by their respective authors and published under the terms of the CC-BY-SA 4.0 license and those of the GNU Free Documentation License (version 1.3 or later, with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts).