Nov 16, 2024Julian K. Arni

Incremental builds in Nix and garnix

We've added incremental compilation to garnix. In this blog, we discuss prior art on incremental compilation in Nix, and describe our own design.

The problem of incremental buildsshare

Nix does a great job of caching the work it does, and never doing it twice. If someone you trust built software before (your CI, your coworkers, or the NixOS organization, for instance), you end up for the most part just downloading it instead of building it anew. All of this is very clean, UX-wise: the same commands that would have built things instead first check their caches.

That story, however, only works at the granularity of packages. At the more granular level of modules (or compilation units), Nix doesn't usually do so great. If a single module changes within a package, everything is still built from scratch. For larger projects, such as company monorepos, compilers, or browsers, this can be the difference between compiling in minutes or hours.

This issue has been known for a while, and the space of possible solutions has been largely mapped out. At garnix we've come up with tweaks to existing ideas that we think work well, and for the first time make it easy for you to make your builds incremental, effectively. Before seeing how that works, it pays to look at prior art, and understand the various trade-offs each solution entails.

Broadly speaking, the existing approaches fall into three categories:

  1. Every module is a package
  2. Pierce the sandbox
  3. Pick a prior commit

Every module is a packageshare

The most natural (and principled) approach to this problem is to split the package into multiple ones, reflecting the level of granularity you want for your project (modules, usually).

There was a lot of excitement for this idea a while back, and quite a lot of effort has been put into making it work. In theory one could manually maintain module-level derivations (and this is in fact how Bazel mostly works), but in practice most people have understandably shunned this approach and hoped to automate the generation of subpackages corresponding to modules. The first challenge, then, was to write tools that would generate the correct derivations for the relevant toolchains. Haskell (GHC), which both had a lot of user overlap with Nix and has a somewhat slow compiler, got a lot of early attention, as did C/C++ with Make.

In order to generate a single derivation for each module automatically, there are two options: generate the entire graph of modules ahead of time as a first step before actually compiling anything, or interleave compilation and derivation-generation. If you can get away with the first, great. But many languages nowadays have dynamic dependency trees: one must actually do some amount of compilation before figuring out what depends on what. (The most familiar version of this are macros that, when evaluated, result in imports.)

Making that fit into Nix required developing or using fancier features like recursive Nix, dynamic derivations and import-from-derivation. Compilers moreover usually had to be forked to call Nix in the right places as they proceeded through a build.

These problems were substantial, but a lot of work went into addressing them. And after everything was more or less ready… it seems to have largely fizzled! In the cases I'm familiar with, the overhead involved in each Nix build (spinning up the sandbox) was too high, negating the benefits of incrementalism. This is true whether you write the derivations by hand, calculate the plan ahead of time, or do it dynamically.

It could be that we can reduce that overhead, and return to this course of action. John Ericson expressed this opinion to me. On the one hand, the overhead of Nix evaluation and sandboxing must become really low for it to be a workable option for projects with thousands of modules; on the other, projects like Bazel seem to manage well enough. Overally, I agree this deserves serious consideration since it's the most elegant long-term solution. For now, though, the wind seems to have gone from this particular sail.

Pierce the sandboxshare

Another, quite unprincipled approach, is to save all the intermediate build outputs, and then make them available to the builds in an impure way, without affecting the hash. Nix makes the second part easy enough with extra-sandbox-path, though there's still a lot of work making the cache save and restore the right things, without e.g. different branches overwriting one another.

A big downside of this is that it basically gives up on the principles of Nix. Builds are no longer reproducible, something fails it's very hard to figure out why. You also can no longer share a cache among different trust groups. Moreover, there aren't many advantages to it over the next approach, besides being, in some cases, a little easier to implement, so we won't say much about it.

Previous commit as inputshare

A third approach is to make the derivations you want cached to output their cache (for instance in a separate output), and then to import a previous version of that derivation, and use the cached output from that version in the new one.

This approach is substantially simpler than the first; and, unlike the second, it is pure (at least usually). But there are a few problems that need to be considered:

  1. How to conveniently pick and maintain updated the previous version
  2. How to avoid to performance issues that might arise from depending on a previous version that depends on a previous version that …

For 1, you can manually set it to an earlier build, and manually upgrade. But this gets annoying quickly, especially since keeping the cache relatively recent is important to it being effective. Gabriella Gonzalez came up with a nice technique for removing this step, based on a modified fetchGit, though it arguably introduces some impurity again.

For 2, the more established technique is to have the previous version be non-incremental, thus breaking any recursion. You then would need to have to types of package: incremental and non-incremental.

Our approach is within this family, but has some new ideas that solve these two problems differently. I'll explain how one uses it first, and then how it is implemented.

{
    # (1) Import 'incrementalize'
    inputs.garnix-incrementalize.url = "github:garnix-io/incrementalize";
    outputs = { nixpkgs, garnix-incrementalize, ...} :
    # (2) Use 'withCaches'
    garnix-incrementalize.lib.withCaches {
       # (3) Parametrized cached derivations
       packages.x86_64.default = cache: pkgs.mkDerivation {
         name = "inc"
         # (4) Create a new 'intermediates' output
         outputs = [ "out" "intermediates" ];
         # (5) Use 'cache' and produce 'intermediates'
         buildPhase = ''

           mkdir $intermediates
           cat ${cache}/run-logs > "$intermediates"/run-logs
           echo "I ran again" > "$intermediates"/run-logs

           echo normal-build-stuff > $out
         '';
       } ;
    };
}

Above, we see a simple, commented flake file that uses incremental builds. A first step is adding a new flake input to our incrementalize repo, and naming it garnix-incrementalize.

This repo provides a function withCaches, that you apply to the entire flake attrset. The function recurses into all the packages, checks, etc. For all those that are derivations, it leaves them as is (thus, if you apply it to an existing flake that's otherwise unchanged, it does nothing). But if any of them are functions, it applies them to the cache, which is just the intermediates output of that same package (if we have a cache), or the empty directory.

The cache/intermediates pairing is the thread that we pass through our builds to give them incrementality.

To enable this feature in garnix, we also need to say we want it. We can either make all builds incremental, or none, or all those that don't match a branch. We think a very good default is all but main:

incrementalizeBuilds:
  excludeBranches:
    - main

That way, you get fast builds on your feature branches, but are always building with a clean slate on main.

Now what garnix does when building is this:

  1. Check whether a build should be incrementalized (based on the garnix.yaml). If not, just do a non-incremental nix build;
  2. If so, look at the previous 5 commits to see if any have built successfully. If not, do a non-incremental build;
  3. If so, pick the most recent such build.
  4. Generate a flake file that is morally like the flake of that commit, but is “normalized”, in the sense that instead of having imports and functions etc., it is just a list of attributes pointing directly to the built nix store paths (via fetchClosure). This avoids refetching inputs such as nixpkgs and makes sure we never recurse in evaluation. Add to that flake file a function lib.withCaches that applies each flake attribute that is a function to its own cache. Let's call that previous-flake.nix
  5. Do a nix build --override-input garnix-incrementalize previous-flake.nix

In other words, we solve the problem 1 mentioned earlier by automatically picking the most recent successful build. And we solve problem 2 by reducing the previous version to normal form.

With this technique, you could in fact never do a full rebuild, and still always have a very recent cache. This isn't true of similar approaches before, which usually required two different packages, periodic full-rebuilds, and only used as cache those checkpoints. Here, the cache you use is always a recent parent of the current commit, not a global thing that might get clobbered or outdated.

There are a couple of points to note:

  1. Though the build is reproducible and pure, figuring out what exact derivation you would need to build locally to match the cache is not easy
  2. There may be more rebuilding at the Nix layer, because packages may change only insofar at the cache did.

Both can be solved, and we intend to do that in the future. But they are for the most part not major problems, and the cache is already an enormous improvement for a number of use-cases.

Conclusionshare

As of this week, garnix supports incremental builds in a way that offers a good compromise between principles and practicality. It's still pure, but is efficient, and easy to get started with.

As I mentioned earlier, there's a lot of prior work and discussion of these ideas:

  • Eelco Dolstra wrote an example of the "every module is a package" approach for make a long time ago
  • Jade Lovelace, Harry Garrood and Felix Springer (with input from Jonas Chevalier) developed versions of the "pick a commit" approach, and did work to improve GHC support for it. See for example here and here.
  • Nicolas Mattia developed snack, which I believe pioneered the "every module is a package" approach with GHC.
  • Ollie Charles and Matthew Bauer worked on a similar effort called ghc-nix

There may be things missing from this list, which is not meant to be exhaustive. Still, if you know anything should be added, let me know!

Continue Reading

Nov 11, 2024Julian K. Arni

How we designed our private caches.

Aug 27, 2024Julian K. Arni

A short note about custom typing for functions in Nix

Aug 22, 2024The garnix team

A guide to deploying NixOS servers - without even installing Nix!

View Archive
black globe

Say hi, ask questions, give feedback