Aug 27, 2024Julian K. Arni

Function Types in Nix

A short note about custom typing for functions in Nix

Introductionshare

Nix is a dynamically typed language. It does, however, have a module system, traditionally used for NixOS configuration but increasingly also elsewhere, which validates configurations in much the same way as Zod and similar schema libraries in Javascript. That validation system is helpful in catching and pinpointing configuration errors, but it has its limitations. In particular, functions are only checked for being functions, not for going to and from the right types. And surely that makes sense, since it can't be possible to do typechecking — even dynamically — in a dynamic language.?

Not so, not always so. This short post is to show that, in more limited but very useful cases, it is in fact possible to do just that — to typecheck functions fully.

The motivation for my looking into this originally came from working on hosting for garnix. Our hosting is very much guided by the Infrastructure-as-Code philosophy, so we have strived to make facts about persistence, too, part of the repository. Currently, you can give deployments a persistence name; this means that when they are deployed, they will be deployed to the machine with such a persistence name if one already exists, keeping the hard disk state (e.g. database state). Otherwise, one will be created.

But there are more complex determinations of how persistence works than just “create one called X if no X exists, otherwise use X”. For example, in pull-request previews, you might want to copy (rather than use) the disk from a staging branch, which may already have data to both make QA easier, and to make sure migrations work.

In other words, we would like to make it possible to say things such as:

In branch "main" and "staging", keep the existing "disk1" if it exists,
otherwise create it from scratch.

In any other branch create a persistent disk "disk1", copied from "disk1" from
branch "staging" if that exists, otherwise create it from scratch.

This naturally fits the shape of a function, taking the branch and the previous persistence facts as arguments. Notice, however, that no matter what the case, the end result (or return value) is a disk1 persistence existing. And it's important that this be true! If what persistences exist depends on what branch we are in, or, worse, what the history of deployments was, not only is it hard to reason about the state of the system, but also either some code is “wrong” sometimes (because it refers to a persistence that doesn't exist), or there's an unused persistence. And what's more, stability is affected: it could be that deploying a small typo fix somewhere causes the persistences to change.

In other words, we want to make it possible to have various forms of persistence based on the branch you commit to, but the list of persistences should be fixed on every branch. That's an invariant we would like to maintain. We're going to encode this as a function provided by a user; and the static check we need is actually quite sophisticated!? Indeed, many typed languages would find it hard to express it.

The ideashare

The idea is to create an API we know would maintain our invariant:?

copyFromOrCreate(from, name)    :: Text → (a : Text) → Persistences<{a}>
continueOrCreate(name)          :: (a : Text) → Persistences<{a}>
merge(left, right)              :: Persistences<a> → Persistences<b> → Persistences<a ∪ b>
ifBranchIs(branch, left, right) :: Branch → Persistences<a> → Persistences<a> → Persistences<a>

(The index after Persistences is the set of persistence names. {a} indicates a singleton set containing a, stands for set union.)

We know that persistences returned by copyFromOrCreate maintain our invariant: they create a single persistence with the given name no matter what. Similarly for continueOrCreate. And given two persistences with the relevant property, merge creates another set of persistences with the property.

In Nix, usage will look like this:

previous:
  merge
   previous
   (ifBranchIs "main"
    (continueOrCreate "disk1")
    (copyFromOrCreate "staging.disk1" "disk1"))

(Where the dot in the name of persistences copied indicates that we are referring to a different branch than the current one; all persistences are namespaced by branch.)

If we can enforce the types we gave above — that is, reject programs that are ill-typed according that schema — we can achieve our goal. But what could the implementations of these functions be, to get us there? The most obvious thing to do is have them return a record (“attribute set” in Nix parlance) with keys describing persistence names and values describing how they are created:

copyFromOrCreate = from: name { name = [ "copyFromOrCreate" from ]; }

And merge would be just //, the record-merging operator in Nix.

But there's a problem with this. It means we can access previous in an unrestricted way:

previous:
  if previous ? "disk1"
  then { "disk2": ...}
  else { "disk3": ...}

Breaking the guarantee we needed. We'll have no way of knowing whether this is happening for sure, short of applying the function to every possible previous and seeing whether the resulting keys differ. That is, we can't enforce or verify our invariant. (We can't even enforce the invariant that the value returned is a record.)

The solutionshare

What we can do is limit the possible values of previous to a single one, and tag every operation. In a sense, we'll be building up the syntax tree, which we can then analyze for improprieties.

prev = { tag = "previous"; };
merge = left: right: { tag = "merge"; inherit left right; };
ifBranchIs = branch: left: right: { tag = "ifBranchIs", inherit branch left right };
continueOrCreate = name: { tag = "continueOrCreate"; inherit name; };
copyFromOrCreate = from: name: { tag = "copyFromOrCreate"; inherit from name; }

Now, given the function, we apply it to prev, and get back a concrete representation of the function, with usages of the previous variable tagged (with the { tag = "previous"; } object). We can then typecheck it by recursively analyzing it:

let

inferPersistenceType = expr:
  if expr.tag == "previous" then []
  else if expr.tag == "merge" then inferPersistenceType (lib.unique (lib.sort expr.left ++ expr.right))
  else if expr.tag == "ifBranchIs" then
    let l = inferPersistenceType expr.left;
        r = inferPersistenceType expr.right;
      in if l == r then l else throw "type error: ifBranchIs with differing left and right types"
  else if expr.tag == "continueOrCreate" then
    if isString expr.name then [expr.name]
    else throw "typeError: continueOrCreate expects a string"
  else if expr.tag == "copyFromOrCreate" then
    if isString expr.name && isString expr.from then [expr.name]
    else throw "typeError: copyFromOrCreate expects two strings"
  else throw "Error";


isPersistenceType' = expr:
  expr.tag == "previous"
  || (expr.tag == "merge" && isPersistenceType' expr.left && isPersistenceType' expr.right)
  || (expr.tag == "ifBranchIs" && isString expr.branch && inferPersistenceType expr.left == inferPersistenceType expr.right)
  || (expr.tag == "continueOrCreate" && inferPersistenceType expr == inferPersistenceType expr)
  || (expr.tag == "copyFromOrCreate" && inferPersistenceType expr == inferPersistenceType expr);

isPersistenceType = fn: isPersistenceType' (fn { tag = "previous"; });

And we can try it out:

isPersistenceType
  (previous:
     (merge previous (continueOrCreate "foo")))
// ==> true

isPersistenceType
  (previous:
    (ifBranchIs "dev"
      (continueOrCreate "db")
      (copyFromOrCreate "staging.db" "db")))
// ==> true

isPersistenceType
  (previous:
   (ifBranchIs "dev"
    (continueOrCreate "foo")
    (continueOrCreate "bar")))
// ==> false

isPersistenceType
  (previous:
   (ifBranchIs "dev"
    (continueOrCreate "bar")
    (continueOrCreate "bar")))
// ==> true

isPersistenceType
  (previous:
   (continueOrCreate "bar"))
// ==> ... typeError: continueOrCreate expects a string

Notice that we needed the “internal” types (e.g., the branch string) to be typecheckable as well. If we have higher-order functions, this won't always be the case. But we can potentially apply the same technique again!

We can also interpret the resulting AST (Abstract Syntax Tree) as a function from branch and previous persistences to persistences:

interpret = fn: branch: prev:
  let expr = fn { tag = "previous"; }
  in if expr.tag == "previous" then prev
     else if expr.tag == "merge" then
          (interpret (expr.left) branch prev)
       // (interpret (expr.right branch prev)
     else if expr.tag == "ifBranchIs" then
          if branch == expr.branch then expr.left else expr.right
  ...

Which brings us back to the original idea of the records:

interpret
  (previous:
    (ifBranchIs "dev"
      (continueOrCreate "db")
      (copyFromOrCreate "staging.db" "db")))
  "dev"
  { }
// ==> { "db" : ["create"] }

Packaging itshare

A lot of this work could be abstracted away with a mkRestrictedType, which we then would call in our case like this:

mkRestrictedType ( self: {
  constructors = {
    prev.args = [];
    prev.returns = self;
    merge.args = [ self self ];
    merge.returns = self;
    ifBranchIs.args = [ type.str ];
    ifBranchIs.returns = self;
    ...
  };
  typecheck = ...;
})

And usage could pattern match on the same record, but receiving the appropriate functions:

{ prev, merge, ifBranchIs, ... } : ...

What's curious about this is that there's no clear boundary between the constructors and the function arguments!

Note that the arguments of the function can't be of arbitrary types. They have to be analyzable, which is what this machinery ensures.

Conclusionshare

This idea isn't by any means new. It's closely related to defunctionalization and HOAS (higher-order abstract syntax). If we wanted to generalize this method to multiple arguments, of potentially different types, we would probably have arg1, arg2, …, which might well remind you of de Bruijn indices.

This is of limited usefulness to most NixOS modules which already use functions, since such uses are largely limited to providing a list of packages or plugins to be used, given an argument record containing them. However, this technique also makes transpilation of the expressions possible. NixOS modules are largely about transpiling data in Nix to whatever configuration language the particular services use; we can with this go further and include Nix functions too!

The technique, as we mentioned, doesn't work for arbitrary functions. You need to have control over the eliminators of the type of the arguments, so as to be able to keep track of how they are being used. A function with an arguemnt of type Int, for example, would not work. However, a function from a type Int', isomorphic to Int, and even constructed from it, can work. As an exercise, you can try writing a typechecker for a function that takes a Port type (constructed from integers) and returns a boolean.?

Continue Reading

Aug 22, 2024The garnix team

A guide to deploying NixOS servers - without even installing Nix!

May 14, 2024Sönke Hahn, Alex David, Julian Arni

A simpler, more composable Haskell process library

Mar 14, 2024Julian K. Arni

What happens if we make URLs immutable? A somewhat unusual idea that can substantially improve and simplify deployments.

View Archive
black globe

Say hi, ask questions, give feedback