Open-sourcing yensid
Solving the issues with remote building in Nix

As Mickey learned, managing your minions requires some subtlety!
Open-sourcing yensid
Remote building is a powerful feature of Nix. Yet, as currently implemented, it has its share of shortcomings:
- The job scheduler (when there are multiple builders) isn't great. It picks a builder based only on the jobs the client knows about. And it's not aware of actual CPU or memory usage.
- It's not possible to have affinity in building, so that clients get preferentially scheduled in builders they've used before (and which are, presumably, more likely to have the relevant dependencies already in the Nix store).
- There is no good metrics and usage information, keeping track of who built what and when (which is important in larger companies, or in SaaS setups).
- Adding or removing builders is complicated. Every client has to update their list of remote builders, including getting the SSH keys of the server.
- Adding or removing clients is also complicated. Every builder has to update their list of allowed keys.
Because of these issues, people have looked to new systems (Evan Laforge at Groq, and nixbuild.net). But we think we came up with a design that solves these problems, and is very easy to implement — and without changes to Nix.
The idea is this:
-
We introduce an SSH proxy. This is the server all clients use as a remote builder; it will proxy connections to the actual builders. How it load balances is up to you — it can periodically query the builders to figure out their load, and select builders accordingly, for example. There are also some simple built-in strategies such as least connections. The proxy can itself be load-balanced via DNS if needed.
-
Builders get SSH certificates signed by a CA. They are valid for a short period of time (e.g., a day), and renewed regularly. If they get deprovisioned or are compromised, they should no longer be renewed. Clients only need to trust the CA instead of every key, so the clients don't need to be updated every time there is a change to the builder pool.
That's it. Easy authentication and authorization, better job scheduling, and much of the work towards autoscaling.
We open-sourced our implementation of it here. We're calling it yensid after the wizard in the Sorcerer's Apprentice part of Fantasia. It has NixOS modules you can use to spin this up in your infrastructure. It comes with an easy script to spin up VMs for the builders, proxy and CA, so you can easily test your setup locally. You can also easily deploy the proxy/CA with garnix.
If you want to add metrics, you can have SSH config entries for each user in the builders set the NIX_CONFIG via an environment variable (and only allow Nix builds). This way, it's possible to identify the originating user, and thus keep track of who built what via, for example, a post-build-hook. (We actually have a patch nearly ready for Nix that allows much more sophistication and details than the post-build-hook does. More soon!)
You can also quite easily use the IP address of the client in making routing decisions, which makes for a simple, but often effective, form of affinity. You can even do routing to geographically-closest builders (and, with anycasting for the proxy, also the proxy). This way, you can optimize the network traffic between the client and builder.
For autoscaling, you can use the custom load balancing strategy for the proxy, and have a HAProxy Lua task that decides when to scale up and down, and adds and removes backends from the load balancing pool accordingly.
If you are using garnix in an enterprise deployment and want to try this with our builders, ask us about it! For the hosted version, we don't currently allow you to use our servers as remote builders, but might soon.
The most remarkable thing about this design is how simple it is to implement, and how incremental. Just the load-balancing SSH proxy is about 15 lines of code. The whole implementation took us only a couple of days!
Continue Reading
Nix makes CIs easy to compare; we benchmarked the main Nix CIs.
With most caches, you are giving a lot of people a lot of access
A response to Gerd Zellweger's "The Pain that is GitHub Actions"