Nix, Caching, and CIs
Caching in a Nix CI is remarkably easy, safe, and effective!
I recently made Garnix run checks against PRs from forks — previously it just skipped them. Thinking of the security issues usually associated with running CI checks on untrusted patches, I had to feel grateful to Nix's way of doing things — it saved me a lot of trouble.
There are two primary security issues: access to secrets, and poisoning artifacts. I won't talk about the first — Garnix doesn't currently support secrets. The common example, in the second category, is uploading malicious artifacts to the build cache.
Take the example workflow for GitHub's cache action:
name: Caching Primes on: push jobs: build: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Cache Primes id: cache-primes uses: actions/cache@v3 with: path: prime-numbers key: ${{ runner.os }}-primes - name: Generate Prime Numbers if: steps.cache-primes.outputs.cache-hit != 'true' run: /generate-primes.sh -d prime-numbers - name: Use Prime Numbers run: /primes.sh -d prime-numbers
If uploading to cache had no restriction on untrusted input (i.e, external PRs), a malicious user can disable the cache-hit check, write a bad list of primes to prime-numbers (8 is now prime! 11 isn't!), and from then on all checks would use that bad list instead of the right one. Fake news, math edition.
The GitHub action prevents this by scoping the cache by branch. This means PRs (and branches) don't get to use one another's cache, only their own (and main/master's), even if they're in the same repo. (Even a "scope by originating repo" policy would not be sufficient, since it's easy enough with this type of setup to accidentally poison the cache.)
Compare this to the situation with Garnix. Not only can you share the cache between branches, and between the target and origin repos in a PR, but even between forks that never interact again after being forked. Or between fully independent repos that happen to have the same package.
This means, in turn, that if you have repo foo that depends on repo bar, and both have Garnix enabled, you don't have to do anything special to get the artifacts/binaries from bar in foo!
(A "problem" we had was that in the tutorial repo, hello-garnix, which has a bug users are asked to fix in their own fork after enabling Garnix, people would push their changes to their fork and see the build succeed almost immediately, when the build should have taken a few seconds. This is quite confusing! What was happening was that a different user had already made the exact same change in their fork, so the result was already built and in cache. Moreover, unlike with other CIs, you don't have to do anything special to get the cache running — it just works — so people weren't even necessarily aware that caching existed.)
How does this work? The essence of the idea is that the cache key in Nix (which is per package, rather than per repo ) is the hash of the definition of the package: the source code, the hash of the dependencies. In the primes example above, main/master would get the version of the primes cached in branch my-potentially-evil-pr if and only if both branches had the same source code for generate-primes.sh — which is exactly the right behavior!
Continue Reading
We've added incremental compilation to garnix. In this blog, we discuss prior art on incremental compilation in Nix, and describe our own design.
A short note about custom typing for functions in Nix