Source code archiving in Guix: new publication
We are glad to announce the publication of a new research paper entitled Source Code Archiving to the Rescue of Reproducible Deployment for the ACM Conference on Reproducibility and Replicability. The paper presents work that has been done since we started connecting Guix with the Software Heritage (SWH) archive five years ago:
The ability to verify research results and to experiment with methodologies are core tenets of science. As research results are increasingly the outcome of computational processes, software plays a central role. GNU Guix is a software deployment tool that supports reproducible software deployment, making it a foundation for computational research workflows. To achieve reproducibility, we must first ensure the source code of software packages Guix deploys remains available.
We describe our work connecting Guix with Software Heritage, the universal source code archive, making Guix the first free software distribution and tool backed by a stable archive. Our contribution is twofold: we explain the rationale and present the design and implementation we came up with; second, we report on the archival coverage for package source code with data collected over five years and discuss remaining challenges.
The ability to retrieve package source code is important for researchers who need to be able to replay scientific workflows, but it’s just as important for engineers and developers alike, who may also have good reasons to redeploy or to audit past package sets.
Support for source code archiving and recovery in Guix has improved a lot over the past five years, in particular with:
- Support for recovering source code tarballs (
tar.gz
and similar files): this is made possible by Disarchive, written by Timothy Sample.
- The ability to look up data by nar hash in the SWH archive (“nar” is the normalized archive format used by Nix and Guix), thanks to fellow SWH hackers. This, in turn, allows Guix to look up any version control checkout by content hash—Git, Subversion, Mercurial, you name it!
- The monitoring of archival coverage with Timothy’s Preservation of Guix reports has allowed us to identify discrepancies in Guix, Disarchive, and/or SWH and to increase archival coverage.
94% of the packages in a January 2024 snapshot of Guix are known to have their source code archived!
Check out the paper to learn more about the machinery at play and the current status.
Wenn nicht anders angegeben, sind Blogeinträge auf diesem Webauftritt urheberrechtlich geschützt zugunsten ihrer jeweiligen Verfasser und veröffentlicht zu den Bedingungen der Lizenz CC-BY-SA 4.0 und der GNU Free Documentation License (Version 1.3 der Lizenz oder einer späteren Version, ohne unveränderliche Abschnitte, ohne vorderen Umschlagtext und ohne hinteren Umschlagtext).