Substitutes are now available as lzip

For a long time, our build farm at ci.guix.gnu.org has been delivering substitutes (pre-built binaries) compressed with gzip. Gzip was never the best choice in terms of compression ratio, but it was a reasonable and convenient choice: it’s rock-solid, and zlib made it easy for us to have Guile bindings to perform in-process compression in our multi-threaded guix publish server.

With the exception of building software from source, downloads take the most time of Guix package upgrades. If users can download less, upgrades become faster, and happiness ensues. Time has come to improve on this, and starting from early June, Guix can publish and fetch lzip-compressed substitutes, in addition to gzip.

Lzip

Lzip is a relatively little-known compression format, initially developed by Antonio Diaz Diaz ca. 2013. It has several C and C++ implementations with surprisingly few lines of code, which is always reassuring. One of its distinguishing features is a very good compression ratio with reasonable CPU and memory requirements, according to benchmarks published by the authors.

Lzlib provides a well-documented C interface and Pierre Neidhardt set out to write bindings for that library, which eventually landed as the (guix lzlib) module.

With this in place we were ready to start migrating our tools, and then our build farm, to lzip compression, so we can all enjoy smaller downloads. Well, easier said than done!

Migrating

The compression format used for substitutes is not a core component like it can be in “traditional” binary package formats such as .deb since Guix is conceptually a “source-based” distro. However, deployed Guix installations did not support lzip, so we couldn’t just switch our build farm to lzip overnight; we needed to devise a transition strategy.

Guix asks for the availability of substitutes over HTTP. For example, a question such as:

“Dear server, do you happen to have a binary of /gnu/store/6yc4ngrsig781bpayax2cg6pncyhkjpq-emacs-26.2 that I could download?”

translates into prose to an HTTP GET of https://ci.guix.gnu.org/6yc4ngrsig781bpayax2cg6pncyhkjpq.narinfo, which returns something like:

StorePath: /gnu/store/6yc4ngrsig781bpayax2cg6pncyhkjpq-emacs-26.2
URL: nar/gzip/6yc4ngrsig781bpayax2cg6pncyhkjpq-emacs-26.2
Compression: gzip
NarHash: sha256:0h2ibqpqyi3z0h16pf7ii6l4v7i2wmvbrxj4ilig0v9m469f6pm9
NarSize: 134407424
References: 2dk55i5wdhcbh2z8hhn3r55x4873iyp1-libxext-1.3.3 …
FileSize: 48501141
System: x86_64-linux
Deriver: 6xqibvc4v8cfppa28pgxh0acw9j8xzhz-emacs-26.2.drv
Signature: 1;berlin.guixsd.org;KHNpZ25hdHV…

(This narinfo format is inherited from Nix and implemented here and here.) This tells us we can download the actual binary from /nar/gzip/…-emacs-26.2, and that it will be about 46 MiB (the FileSize field.) This is what guix publish serves.

The trick we came up with was to allow guix publish to advertise several URLs, one per compression format. Thus, for recently-built substitutes, we get something like this:

StorePath: /gnu/store/mvhaar2iflscidl0a66x5009r44fss15-gimp-2.10.12
URL: nar/gzip/mvhaar2iflscidl0a66x5009r44fss15-gimp-2.10.12
Compression: gzip
FileSize: 30872887
URL: nar/lzip/mvhaar2iflscidl0a66x5009r44fss15-gimp-2.10.12
Compression: lzip
FileSize: 18829088
NarHash: sha256:10n3nv3clxr00c9cnpv6x7y2c66034y45c788syjl8m6ga0hbkwy
NarSize: 94372664
References: 05zlxc7ckwflz56i6hmlngr86pmccam2-pcre-8.42 …
System: x86_64-linux
Deriver: vi2jkpm9fd043hm0839ibbb42qrv5xyr-gimp-2.10.12.drv
Signature: 1;berlin.guixsd.org;KHNpZ25hdHV…

Notice that there are two occurrences of the URL, Compression, and FileSize fields: one for gzip, and one for lzip. Old Guix instances will just pick the first one, gzip; newer Guix will pick whichever supported method provides the smallest FileSize, usually lzip. This will make migration trivial in the future, should we add support for other compression methods.

Users need to upgrade their Guix daemon to benefit from lzip. On a “foreign distro”, simply run guix pull as root. On standalone Guix systems, run guix pull && sudo guix system reconfigure /etc/config.scm. In both cases, the daemon has to be restarted, be it with systemctl restart guix-daemon.service or with herd restart guix-daemon.

First impressions

This new gzip+lzip scheme has been deployed on ci.guix.gnu.org for a week. Specifically, we run guix publish -C gzip:9 -C lzip:9, meaning that we use the highest compression ratio for both compression methods.

Currently, only a small subset of the package substitutes are available as both lzip and gzip; those that were already available as gzip have not been recompressed. The following Guile program that taps into the API of guix weather allows us to get some insight:

(use-modules (gnu) (guix)
             (guix monads)
             (guix scripts substitute)
             (srfi srfi-1)
             (ice-9 match))

(define all-packages
  (@@ (guix scripts weather) all-packages))

(define package-outputs
  (@@ (guix scripts weather) package-outputs))

(define (fetch-lzip-narinfos)
  (mlet %store-monad ((items (package-outputs (all-packages))))
    (return
     (filter (lambda (narinfo)
               (member "lzip" (narinfo-compressions narinfo)))
             (lookup-narinfos "https://ci.guix.gnu.org" items)))))

(define (lzip/gzip-ratio narinfo)
  (match (narinfo-file-sizes narinfo)
    ((gzip lzip)
     (/ lzip gzip))))

(define (average lst)
  (/ (reduce + 0 lst)
     (length lst) 1.))

Let’s explore this at the REPL:

scheme@(guile-user)> (define lst
                       (with-store s
                         (run-with-store s (fetch-lzip-narinfos))))
computing 9,897 package derivations for x86_64-linux...
updating substitutes from 'https://ci.guix.gnu.org'... 100.0%
scheme@(guile-user)> (length lst)
$4 = 2275
scheme@(guile-user)> (average (map lzip/gzip-ratio lst))
$5 = 0.7398994395478715

As of this writing, around 20% of the package substitutes are available as lzip, so take the following stats with a grain of salt. Among those, the lzip-compressed substitute is on average 26% smaller than the gzip-compressed one. What if we consider only packages bigger than 5 MiB uncompressed?

scheme@(guile-user)> (define biggest
                       (filter (lambda (narinfo)
                                 (> (narinfo-size narinfo)
                                    (* 5 (expt 2 20))))
                               lst))
scheme@(guile-user)> (average (map lzip/gzip-ratio biggest))
$6 = 0.5974238562384483
scheme@(guile-user)> (length biggest)
$7 = 440

For those packages, lzip yields substitutes that are 40% smaller on average. Pretty nice! Lzip decompression is slightly more CPU-intensive than gzip decompression, but downloads are bandwidth-bound, so the benefits clearly outweigh the costs.

Going forward

The switch from gzip to lzip has the potential to make upgrades “feel” faster, and that is great in itself.

Fundamentally though, we’ve always been looking in this project at peer-to-peer solutions with envy. Of course, the main motivation is to have a community-supported and resilient infrastructure, rather than a centralized one, and that vision goes hand-in-hand with reproducible builds.

We started working on an extension to publish and fetch substitutes over IPFS. Thanks to its content-addressed nature, IPFS has the potential to further reduce the amount of data that needs to be downloaded on an upgrade.

The good news is that IPFS developers are also interested in working with package manager developers, and I bet there’ll be interesting discussions at IPFS Camp in just a few days. We’re eager to pursue our IPFS integration work, and if you’d like to join us and hack the good hack, let’s get in touch!

About GNU Guix

GNU Guix is a transactional package manager and an advanced distribution of the GNU system that respects user freedom. Guix can be used on top of any system running the kernel Linux, or it can be used as a standalone operating system distribution for i686, x86_64, ARMv7, and AArch64 machines.

In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. When used as a standalone GNU/Linux distribution, Guix offers a declarative, stateless approach to operating system configuration management. Guix is highly customizable and hackable through Guile programming interfaces and extensions to the Scheme language.

Sujets liés :

Performance Scheme API

Sauf indication contraire, les billets de blog de ce site sont la propriété de leurs auteurs respectifs et publiés sous les termes de la licence CC-BY-SA 4.0 et ceux de la GNU Free Documentation License (version 1.3 ou supérieur, sans section invariante, sans texte de préface ni de postface).