Dec 17, 2022Julian K. Arni

Nix, Caching, and CIs

Caching in a Nix CI is remarkably easy, safe, and effective!

I recently made Garnix run checks against PRs from forks — previously it just skipped them. Thinking of the security issues usually associated with running CI checks on untrusted patches, I had to feel grateful to Nix's way of doing things — it saved me a lot of trouble.

There are two primary security issues: access to secrets, and poisoning artifacts. I won't talk about the first — Garnix doesn't currently support secrets. The common example, in the second category, is uploading malicious artifacts to the build cache.

Take the example workflow for GitHub's cache action:

name: Caching Primes
on: push
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Cache Primes
        id: cache-primes
        uses: actions/cache@v3
        with:
          path: prime-numbers
          key: ${{ runner.os }}-primes
      - name: Generate Prime Numbers
        if: steps.cache-primes.outputs.cache-hit != 'true'
        run: /generate-primes.sh -d prime-numbers
      - name: Use Prime Numbers
        run: /primes.sh -d prime-numbers

If uploading to cache had no restriction on untrusted input (i.e, external PRs), a malicious user can disable the cache-hit check, write a bad list of primes to prime-numbers (8 is now prime! 11 isn't!), and from then on all checks would use that bad list instead of the right one. Fake news, math edition.

The GitHub action prevents this by scoping the cache by branch. This means PRs (and branches) don't get to use one another's cache, only their own (and main/master's), even if they're in the same repo. (Even a "scope by originating repo" policy would not be sufficient, since it's easy enough with this type of setup to accidentally poison the cache.)

Compare this to the situation with Garnix. Not only can you share the cache between branches, and between the target and origin repos in a PR, but even between forks that never interact again after being forked. Or between fully independent repos that happen to have the same package.

This means, in turn, that if you have repo foo that depends on repo bar, and both have Garnix enabled, you don't have to do anything special to get the artifacts/binaries from bar in foo!

(A "problem" we had was that in the tutorial repo, hello-garnix, which has a bug users are asked to fix in their own fork after enabling Garnix, people would push their changes to their fork and see the build succeed almost immediately, when the build should have taken a few seconds. This is quite confusing! What was happening was that a different user had already made the exact same change in their fork, so the result was already built and in cache. Moreover, unlike with other CIs, you don't have to do anything special to get the cache running — it just works — so people weren't even necessarily aware that caching existed.)

How does this work? The essence of the idea is that the cache key in Nix (which is per package, rather than per repo?) is the hash of the definition of the package: the source code, the hash of the dependencies.? In the primes example above, main/master would get the version of the primes cached in branch my-potentially-evil-pr if and only if both branches had the same source code for generate-primes.sh — which is exactly the right behavior!

Continue Reading

Mar 14, 2024Julian K. Arni

What happens if we make URLs immutable? A somewhat unusual idea that can substantially improve and simplify deployments.

Dec 19, 2023Alex David

Microsoft's LSP is great for text-editor diversity, but it is severely lacking in flexibility for project-specific configuration.

Dec 7, 2023Alex David

Release announcement for garn version v0.0.19, which includes the ability to import Nix flakes from the internet and from your repository.

View Archive →
black globe

Say hi, ask questions, give feedback