Source code archiving in Guix: new publication

We are glad to announce the publication of a new research paper entitled Source Code Archiving to the Rescue of Reproducible Deployment for the ACM Conference on Reproducibility and Replicability. The paper presents work that has been done since we started connecting Guix with the Software Heritage (SWH) archive five years ago:

The ability to verify research results and to experiment with methodologies are core tenets of science. As research results are increasingly the outcome of computational processes, software plays a central role. GNU Guix is a software deployment tool that supports reproducible software deployment, making it a foundation for computational research workflows. To achieve reproducibility, we must first ensure the source code of software packages Guix deploys remains available.

We describe our work connecting Guix with Software Heritage, the universal source code archive, making Guix the first free software distribution and tool backed by a stable archive. Our contribution is twofold: we explain the rationale and present the design and implementation we came up with; second, we report on the archival coverage for package source code with data collected over five years and discuss remaining challenges.

The ability to retrieve package source code is important for researchers who need to be able to replay scientific workflows, but it’s just as important for engineers and developers alike, who may also have good reasons to redeploy or to audit past package sets.

Support for source code archiving and recovery in Guix has improved a lot over the past five years, in particular with:

Diagram taken from the paper showing Disarchive tarball “disassembly” and “assembly”.

Graph taken from the paper showing package source code archival coverage over time.

94% of the packages in a January 2024 snapshot of Guix are known to have their source code archived!

Check out the paper to learn more about the machinery at play and the current status.

연관된 주제:

Research

달리 명시되지 않는 한, 이와 같은 사이트의 블로그 게시물은 해당 작성자에게 저작권이 있고 CC-BY-SA 4.0 라이선스 및 GNU 자유 문서 라이선스 (버전 1.3 이상, 고정 부분 없음, 앞-표지 텍스트 없음, 뒤-표지 텍스트 없음)의 조건에 따라 게시됩니다.