Source code archiving in Guix: new publication

We are glad to announce the publication of a new research paper entitled Source Code Archiving to the Rescue of Reproducible Deployment for the ACM Conference on Reproducibility and Replicability. The paper presents work that has been done since we started connecting Guix with the Software Heritage (SWH) archive five years ago:

The ability to verify research results and to experiment with methodologies are core tenets of science. As research results are increasingly the outcome of computational processes, software plays a central role. GNU Guix is a software deployment tool that supports reproducible software deployment, making it a foundation for computational research workflows. To achieve reproducibility, we must first ensure the source code of software packages Guix deploys remains available.

We describe our work connecting Guix with Software Heritage, the universal source code archive, making Guix the first free software distribution and tool backed by a stable archive. Our contribution is twofold: we explain the rationale and present the design and implementation we came up with; second, we report on the archival coverage for package source code with data collected over five years and discuss remaining challenges.

The ability to retrieve package source code is important for researchers who need to be able to replay scientific workflows, but it’s just as important for engineers and developers alike, who may also have good reasons to redeploy or to audit past package sets.

Support for source code archiving and recovery in Guix has improved a lot over the past five years, in particular with:

Diagram taken from the paper showing Disarchive tarball “disassembly” and “assembly”.

Graph taken from the paper showing package source code archival coverage over time.

94% of the packages in a January 2024 snapshot of Guix are known to have their source code archived!

Check out the paper to learn more about the machinery at play and the current status.

Related topics:

Research

Unless otherwise stated, blog posts on this site are copyrighted by their respective authors and published under the terms of the CC-BY-SA 4.0 license and those of the GNU Free Documentation License (version 1.3 or later, with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts).