https://guix.gnu.org/feeds/blog/guix-data-service.atomGNU Guix — Blog — Guix Data Servicefeed author nameGNU Guixhttps://guix.gnu.org/themes/initial/img/icon.png2024-03-20T10:57:53Zhttps://guix.gnu.org/blog/2020/introduction-to-the-guix-data-service-the-missing-blog-post//Introduction to the Guix Data Service, the missing blog postChristopher Baines2020-11-08T20:30:00Z2020-11-08T20:30:00Z The Guix Data Service processes, stores and
provides data about Guix over time, at least that is what the
README says. It's been around since the start of
2019 , and while there have been plenty of long emails
to the guix-devel mailing list about it and a blog post about a
related Outreachy project , this is the first
blog post covering what it is and why it exists. Why? The initial motivation came from trying to automate aspects of
reviewing patches for Guix. If you have some patches for Guix, one
aspect of review…<p>The <a href="https://git.savannah.gnu.org/cgit/guix/data-service.git/">Guix Data Service</a> processes, stores and
provides data about Guix over time, at least that is what the
<a href="https://data.guix.gnu.org/README">README</a> says. It's been around since the <a href="https://lists.gnu.org/archive/html/guix-devel/2019-02/msg00089.html">start of
2019</a>, and while there have been plenty of long emails
to the guix-devel mailing list about it and a <a href="https://guix.gnu.org/en/blog/2020/improve-internationalization-support-for-the-guix-data-service/">blog post about a
related Outreachy project</a>, this is the first
blog post covering what it is and why it exists.</p><h1>Why?</h1><p>The initial motivation came from trying to automate aspects of
reviewing patches for Guix. If you have some patches for Guix, one
aspect of review might be to apply the patches and then build the
affected packages. How do you know what packages are affected though?</p><p>You could try and guess based on the content of the patches, and this
could work some of the time, but because Guix packages relate to one
another, changing one package may cause dependent packages to change.
Additionally, there are places in Guix where small changes could
affect a large number of packages, build systems for example. The
<code>guix refresh -l</code> command is really helpful when testing packages
locally, but it can in some cases miss some packages that are effected
by changes, as it only explores the package graph.</p><p>The approach taken to working out what packages are affected by a set
of patches, was to record information about all the packages in the
"base" revision of Guix, prior to applying the patches, and also
record information about the "target" revision generated from applying
the patches. With all that information about the two revisions, you
can then compare the data to determine what's changed. This goes
beyond finding out what packages are affected, and includes things
like looking at changes to lint warnings, channel news entries, and
more.</p><p><a href="https://data.guix.gnu.org/compare?base_commit=f503cfc9c51ea4ddd6cc9c027f1897e7866e411e&target_commit=f161bd2cd7af6a0a7027a2e4ed97912027d5033d"><img src="https://guix.gnu.org/static/blog/img/data-guix-gnu-org-compare.png" alt="Screenshot of the comparision between two commits" /></a></p><p>This approach of storing information about revisions has applications
beyond reviewing patches, which is another reason why this approach
was taken. While the Guix Data Service doesn't bring new knowledge to
the world, it can make information that is out there more accessible,
and that improved accessibility is a feature.</p><h1>Applications</h1><p>Say you want to know when the previous version of a package was
available, and what that version was. You could look through the Git
repository history, or inspect previous revisions to find out, but
because the Guix Data Service can store the available package names
and versions in a range of revisions, it can provide this information
more quickly and with less effort.</p><p><a href="https://data.guix.gnu.org/repository/1/branch/master/package/emacs"><img src="https://guix.gnu.org/static/blog/img/data-guix-gnu-org-emacs-versions.png" alt="Screenshot of a Guix Data Service package versions page for emacs" /></a></p><p>Now, questions about package versions is something a user of Guix
might have. However, so far I haven't seen the Guix Data Service as
something that users of Guix should necessarily use or be aware of.
Instead, I think it has a place to provide information to enable
things that users of Guix would directly use.</p><p>There are a few applications of data from the Guix Data Service in
varying states of development. I've been attempting to automate parts
of a <a href="https://git.cbaines.net/guix/weekly-news/">weekly news publication about Guix</a> through using
the Guix Data Service, I've also been writing a <a href="https://git.cbaines.net/guix/build-coordinator/">service for building
derivations</a>, which I've been using in
conjunction with the Guix Data Service to provide substitutes. As
part of an Outreachy internship on improving internationalisation
support in the Guix Data Service, Danjela worked on creating a package
search page for the Guix website, which wrapped the package search
functionality in the Guix Data Service.</p><p><a href="https://prototype-guix-weekly-news.cbaines.net/"><img src="https://guix.gnu.org/static/blog/img/prototype-guix-weekly-news.png" alt="Screenshot of the prototype weekly news site" /></a></p><p>While I'm cautious about having the Guix Data Service attempt to
address individual user needs, there are some applications where it
alone is sufficient. I've been using the Guix Data Service to gather
up data about which packages in Guix don't build reproducibly.
Hopefully the Guix Data Service is well positioned to help with
technical questions like this.</p><p><a href="https://data.guix-patches.cbaines.net/repository/2/branch/master/latest-processed-revision/package-reproducibility"><img src="https://guix.gnu.org/static/blog/img/data-guix-patches-package-reproducibility.png" alt="Screenshot of the Guix Data Service package reproducibility page" /></a></p><h1>Architecture</h1><p>The Guix Data Service is written in Guile, and uses PostgreSQL for the
database. There's plenty of SQL queries in the code, including some
quite long ones.</p><p>There are several scripts which act as entry points to different parts
of the Guile codebase:</p><ul><li><p><code>guix-data-service</code></p><ul><li>Provides the web interface</li></ul></li><li><p><code>guix-data-service-process-jobs</code></p><ul><li>Polls the database for new jobs, and forks <code>guix-data-service-process-job</code></li></ul></li><li><p><code>guix-data-service-process-job</code></p><ul><li>Processes an individual job, loading data for a single revision</li></ul></li><li><p><code>guix-data-service-process-branch-updated-email</code></p><ul><li><p>Processes an email to find out about new revisions</p></li></ul></li></ul><p>There's also other scripts which perform a range of functions, like
backing up the database, generating a minimal database which is
hopefully small in size and querying build/substitute servers for
information.</p><p>When running on a Guix system, there's a <a href="https://guix.gnu.org/manual/devel/en/html_node/Guix-Services.html#Guix-Data-Service">service to help with
deployment</a>.</p><h1>Getting information in</h1><p>Rather than polling the Git repository to find out about new
revisions, the methodology used so far has been to receive emails. In
the case of the main Guix Git repository, this can work as follows:</p><ul><li>New commits are pushed to the Guix Git repository on Savannah</li><li>A post-receive hook sends an email about the branch that's been
updated, plus emails about each commit</li><li>A dedicated email account is used to subscribe to guix-commits, and
this receives the emails</li><li><code>getmail</code> is running as a service on the machine running the Guix
Data Service, it receives the emails and calls
<code>guix-data-service-process-branch-updated-email</code> passing the
contents in on stdin</li><li>The script reads the email and inserts the relevant data in about
the branch that was updated, the time it was updated, and also
inserts a new job, representing the new revision to be processed</li></ul><p>Compared to polling the Git repository, this approach has a few
advantages:</p><ul><li>The time the email was sent is a good proxy to when the branch was
updated</li><li>Receiving emails promptly, having <code>getmail</code> use IDLE for example
helps with learning of changes quickly</li><li>The email account provides some reliability, so messages aren't
missed if the Guix Data Serivce is down, they'll just be processed
later</li><li>The mbox files for the guix-commits mailing list can be processed
to provide data for past revision, in years when the Guix Data
Service wasn't running for example</li></ul><p>When the <code>guix-data-service-process-job</code> script runs, it goes through
a long process to extract information about that revision of Guix, and
store it in the database.</p><p>The first part of this is to actually fetch and build the relevant
revision. The Guix Data Service uses <a href="https://guix.gnu.org/manual/devel/en/html_node/Channels.html">channels</a>
and <a href="https://guix.gnu.org/manual/devel/en/html_node/Inferiors.html">inferiors</a>, the same code used by <code>guix pull</code> and <code>guix time-machine</code> for communication with another revision
of Guix. It's through the inferior REPL that information from the
target revision is extracted.</p><p>In addition to receiving information about new revisions, the Guix
Data Service can accept POST requests to receive information about
builds. There's some support in Cuirass and the Guix Build
Coordinator to send these requests.</p><h1>Storing all that information</h1><p>Following on from one of the initial motivations for the Guix Data
Service, comparing two revisions to determine which packages have
changed, the schema for the database is organised to facilitate fast
comparisons between two arbitrary revisions. The compromise here is
the storage space taken up.</p><p>Similar to version control systems, an alternative schema would have
been to store the differences between revisions in a linked list or
tree. This would avoid storing lots of information that generally
doesn't often change between subsequent revisions, but at the expense
of making both determining the entire state of individual revisions
and comparing arbitrary revisions more complex and costly.</p><p>Even though all the information about each revision is associated with
that revision, there is some indirection, and deduplication involved.
For example, each revision is associated against entries in the
<code>package_derivations</code> table, which represents a package plus
derivation for a specific system and target. If this information
doesn't differ between two revisions, they'll just reference the same
entries in this table.</p><h1>Making information available</h1><p>Currently, the Guix Data Service provides a web interface. The HTML
pages and forms are designed to help potential users of the Guix Data
Service find and explore the available data. When using the Guix Data
Service, you'd probably want a more machine readable form for the
data, rather than HTML, and at the moment that's JSON. You should be
able to request JSON either through the HTTP Accept header, or by
using the <code>.json</code> extension on the URL path.</p><h1>Deployments</h1><p>There's not just one deployment of the Guix Data Service, currently I
know of two. There's <a href="https://data.guix.gnu.org/">data.guix.gnu.org</a>
which just tracks the master branch of Guix, and has data going back
to roughly the start of 2019 (with some gaps). There's also
<a href="https://data.guix-patches.cbaines.net/">data.guix-patches.cbaines.net</a>
which isn't limited to just the master branch, and has additional
branches constructed from patches that are submitted, but doesn't have
much historical data.</p><h1>Looking forward</h1><p>There's still lots of areas where the Guix Data Service can be
improved.</p><p>It would be convenient if getting data in to the Guix Data Service was
faster, currently there's quite a delay between the Guix Data Service
finding out about a new revision, and it completing processing it.</p><p>The processing could also be improved, there's some notable current
omissions like the package graph (inputs, propagated-inputs and
native-inputs) as well as package replacements (grafts). It would also
be interesting to see if the Guix Data Service could be generalised to
process other channels, instead or in addition to the main Guix
channel.</p><p>In the future, I'd like to make the data available in formats other
than JSON, like RDF. I'd also like for it to be possible to
watch/subscribe to particular things that the Guix Data Service knows
about, the Guix Data Service would then notify you via some means that
there's be a change. This could enable all sorts of applications to
respond to changes connected to Guix.</p><h2>Additional reading</h2><p>To provide some information, and to help get my own thoughts in order,
I sent out semi-regular emails about the Guix Data Service over the
last two years, I've linked to most of these below:</p><ul><li><a href="https://lists.gnu.org/archive/html/guix-devel/2020-06/msg00034.html">2020/06/03 - Build reproducibility metrics</a></li><li><a href="https://lists.gnu.org/archive/html/guix-devel/2020-05/msg00153.html">2020/05/07 - April update on data.guix.gnu.org (Guix Data Service)</a></li><li><a href="https://lists.gnu.org/archive/html/guix-devel/2020-03/msg00476.html">2020/03/30 - Patchwork + the Guix Data Service for assisting with patch review</a></li><li><a href="https://lists.gnu.org/archive/html/guix-devel/2020-03/msg00454.html">2020/03/29 - March update on data.guix.gnu.org (Guix Data Service)</a></li><li><a href="https://lists.gnu.org/archive/html/guix-devel/2020-02/msg00268.html">2020/02/17 - February update on data.guix.gnu.org and the Guix Data Service</a></li><li><a href="https://lists.gnu.org/archive/html/guix-devel/2020-01/msg00073.html">2020/01/05 - Another update on the Guix Data Service</a></li><li><a href="https://lists.gnu.org/archive/html/guix-devel/2019-09/msg00277.html">2019/09/30 - Anyone interested in getting involved with the Guix Data Service?</a></li><li><a href="https://lists.gnu.org/archive/html/guix-devel/2019-09/msg00104.html">2019/09/08 - Guix Data Service - September update</a></li><li><a href="https://lists.gnu.org/archive/html/guix-devel/2019-05/msg00332.html">2019/05/17 - More progress with the Guix Data Service</a></li><li><a href="https://lists.gnu.org/archive/html/guix-devel/2019-05/msg00127.html">2019/05/06 - Linting, and how to get the information in to the Guix Data Serivce</a></li><li><a href="https://lists.gnu.org/archive/html/guix-devel/2019-04/msg00094.html">2019/04/04 - Progress with the Guix Data Service</a></li><li><a href="https://lists.gnu.org/archive/html/guix-devel/2019-02/msg00089.html">2019/02/08 - Tracking and inspecting how Guix changes over time</a></li></ul><h4>About GNU Guix</h4><p><a href="https://www.gnu.org/software/guix">GNU Guix</a> is a transactional package
manager and an advanced distribution of the GNU system that <a href="https://www.gnu.org/distros/free-system-distribution-guidelines.html">respects
user
freedom</a>.
Guix can be used on top of any system running the Hurd or the Linux
kernel, or it can be used as a standalone operating system distribution
for i686, x86_64, ARMv7, and AArch64 machines.</p><p>In addition to standard package management features, Guix supports
transactional upgrades and roll-backs, unprivileged package management,
per-user profiles, and garbage collection. When used as a standalone
GNU/Linux distribution, Guix offers a declarative, stateless approach to
operating system configuration management. Guix is highly customizable
and hackable through <a href="https://www.gnu.org/software/guile">Guile</a>
programming interfaces and extensions to the
<a href="http://schemers.org">Scheme</a> language.</p>https://guix.gnu.org/blog/2020/improve-internationalization-support-for-the-guix-data-service//Improve Internationalization Support for the Guix Data ServiceDanjela Lura2020-07-23T12:00:00Z2020-07-23T12:00:00Z The first half of my Outreachy
internship is already over and I am really excited to share my
experience. Over the past weeks I’ve had the opportunity to work on
the Guix Data Service , watch myself
change, and accomplish way more than I thought I would. The Guix Data Service processes, stores and provides data about Guix
over time. It provides a complementary interface to Guix itself by
having a web interface and API to browse and access the data. The work I have done so far revolves around storing translated lint
checker descriptions as well as…<p>The first half of my <a href="https://www.outreachy.org/">Outreachy</a>
internship is already over and I am really excited to share my
experience. Over the past weeks I’ve had the opportunity to work on
the <a href="https://data.guix.gnu.org/">Guix Data Service</a>, watch myself
change, and accomplish way more than I thought I would.</p><p>The Guix Data Service processes, stores and provides data about Guix
over time. It provides a complementary interface to Guix itself by
having a web interface and API to browse and access the data.</p><p>The work I have done so far revolves around storing translated <a href="https://guix.gnu.org/manual/en/html_node/Invoking-guix-lint.html">lint
checker descriptions</a> as well as package synopsis and
descriptions in the Guix Data Service PostgreSQL database and making
them available through the Guix Data Service web interface.</p><p>Initially the Guix Data Service database had translated versions of
lint warning messages available, but they were not accessible through
the web interface, so I made that possible during the <a href="https://www.outreachy.org/docs/applicant/#make-contributions">contribution
period</a>.</p><p>Working on making lint warning messages available on the web interface
made it easier for me to understand how translations for lint checker
descriptions and package synopsis and descriptions would be stored in
the database and later on be made available through the Guix Data
Service web interface. At this point, the Guix Data Service supports
package synopsis and descriptions as well as lint checker descriptions
in various locales.</p><p><img src="/static/blog/img/guix-data-service-audacity.png" alt="Guix Data Service page for the audacity package, in the Spanishlocale" /></p><p>Hopefully these changes will provide the Guix Data Service users with
a more feasible way to interact with Guix data.</p><p>I have to note that this is my first internship and I was initially
reluctant to believe that I would be able to tackle or successfully
accomplish the tasks I was assigned, but with my mentor’s help and
guidance I managed to. So far it has been a rewarding experience
because it has helped me make progress in so many aspects, whilst
contributing to a project that will potentially increase inclusion.</p><p>While working on this project, I’ve significantly improved my Guile,
SQL, and Git skills and I am now more aware of how software
localization is achieved. In addition to getting more technically
skilled, this internship has taught me how to manage time and emotions
when dealing with more than one activity at a time.</p><p>Now that a good share of what was initially planned to be done is
accomplished, my mentor suggested working on something using the Guix
Data Service data and I will be engaged in that during the remaining
half.</p><p>These first 7 weeks of my internship have gone by really fast, but I
have enjoyed everything and I am so eager to experience what's to
come.</p>