about summary refs log tree commit diff
diff options
context:
space:
mode:
authorVincent Ambo <tazjin@tvl.su>2024-05-03T08·54+0300
committerclbot <clbot@tvl.fyi>2024-05-03T22·25+0000
commitf2f12d15568b068b7b38473c03b74275a7f43cee (patch)
treec356b343456c397c139effd23569dc4edc019fae
parent1c7d319164c6122f9f21f1b54ac4f6cddb53f1bb (diff)
docs(tvix/eval): add document about how binding construction works r/8072
Change-Id: I216be1b75eb9f18a58ab2164a77b3c51de8bf784
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11583
Autosubmit: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
Reviewed-by: Adam Joseph <adam@westernsemico.com>
-rw-r--r--tvix/eval/docs/bindings.md133
1 files changed, 133 insertions, 0 deletions
diff --git a/tvix/eval/docs/bindings.md b/tvix/eval/docs/bindings.md
new file mode 100644
index 0000000000..2b062cb13d
--- /dev/null
+++ b/tvix/eval/docs/bindings.md
@@ -0,0 +1,133 @@
+Compilation of bindings
+=======================
+
+Compilation of Nix bindings is one of the most mind-bending parts of Nix
+evaluation. The implementation of just the compilation is currently almost 1000
+lines of code, excluding the various insane test cases we dreamt up for it.
+
+## What is a binding?
+
+In short, any attribute set or `let`-expression. Tvix currently does not treat
+formals in function parameters (e.g. `{ name ? "fred" }: ...`) the same as these
+bindings.
+
+They have two very difficult features:
+
+1. Keys can mutually refer to each other in `rec` sets or `let`-bindings,
+   including out of definition order.
+2. Attribute sets can be nested, and parts of one attribute set can be defined
+   in multiple separate bindings.
+
+Tvix resolves as much of this logic statically (i.e. at compile-time) as
+possible, but the procedure is quite complicated.
+
+## High-level concept
+
+The idea behind the way we compile bindings is to fully resolve nesting
+statically, and use the usual mechanisms (i.e. recursion/thunking/value
+capturing) for resolving dynamic values.
+
+This is done by compiling bindings in several phases:
+
+1. An initial compilation phase *only* for plain inherit statements (i.e.
+   `inherit name;`), *not* for namespaced inherits (i.e. `inherit (from)
+   name;`).
+
+2. A declaration-only phase, in which we use the compiler's scope tracking logic
+   to calculate the physical runtime stack indices (further referred to as
+   "stack slots" or just "slots") that all values will end up in.
+
+   In this phase, whenever we encounter a nested attribute set, it is merged
+   into a custom data structure that acts like a synthetic AST node.
+
+   This can be imagined similar to a rewrite like this:
+
+   ```nix
+   # initial code:
+   {
+       a.b = 1;
+       a.c = 2;
+   }
+
+   # rewritten form:
+   {
+       a = {
+           b = 1;
+           c = 2;
+       };
+   }
+   ```
+
+   The rewrite applies to attribute sets and `let`-bindings alike.
+
+   At the end of this phase, we know the stack slots of all namespaces for
+   inheriting from, all values inherited from them, and all values (and
+   optionall keys) of bindings at the current level.
+
+   Only statically known keys are actually merged, so any dynamic keys that
+   conflict will lead to a "key already defined" error at runtime.
+
+3. A compilation phase, in which all values (and, when necessary, keys) are
+   actually compiled. In this phase the custom data structure used for merging
+   is encountered when compiling values.
+
+   As this data structure acts like an AST node, the process begins recursively
+   for each nested attribute set.
+
+At the end of this process we have bytecode that leaves the required values (and
+optionally keys) on the stack. In the case of attribute sets, a final operation
+is emitted that constructs the actual attribute set structure at runtime. For
+`let`-bindings a final operation is emitted that removes these locals from the
+stack when the scope ends.
+
+## Moving parts
+
+WARNING: This documents the *current* implementation. If you only care about the
+conceptual aspects, see above.
+
+There's a few types involved:
+
+* `PeekableAttrs`: peekable iterator over an attribute path (e.g. `a.b.c`)
+* `BindingsKind`: enum defining the kind of bindings (attrs/recattrs/let)
+* `AttributeSet`: struct holding the bindings kind, the AST nodes with inherits
+  (both namespaced and not), and an internal representation of bindings
+  (essentially a vector of tuples of the peekable attrs and the expression to
+  compile for the value).
+* `Binding`: enum describing the kind of binding (namespaced inherit, attribute
+  set, plain binding of *any other value type*)
+* `KeySlot`: enum describing the location in which a key slot is placed at
+  runtime (nowhere, statically known value in a slot, dynamic value in a slot)
+* `TrackedBinding`: struct representing statically known information about a
+  single binding (its key slot, value slot and `Binding`)
+* `TrackedBindings`: vector of tracked bindings, which implements logic for
+  merging attribute sets together
+
+And quite a few methods on `Compiler`:
+
+* `compile_bindings`: entry point for compiling anything that looks like a
+  binding, this calls out to the functions below.
+* `compile_plain_inherits`: takes all inherits of a bindings node and compiles
+  the ones that are trivial to compile (i.e. just plain inherits without a
+  namespace). The `rnix` parser does not represent namespaced/plain inherits in
+  different nodes, so this function also aggregates the namespaced inherits and
+  returns them for further use
+* `declare_namespaced_inherits`: passes over all namespaced inherits and
+  declares them on the locals stack, as well as inserts them into the provided
+  `TrackedBindings`
+* `declare_bindings`: declares all regular key/value bindings in a bindings
+  scope, but without actually compiling their keys or values.
+
+  There's a lot of heavy lifting going on here:
+
+  1. It invokes the various pieces of logic responsible for merging nested
+     attribute sets together, creating intermediate data structures in the value
+     slots of bindings that can be recursively processed the same way.
+  2. It decides on the key slots of expressions based on the kind of bindings,
+     and the type of expression providing the key.
+* `bind_values`: runs the actual compilation of values. Notably this function is
+  responsible for recursively compiling merged attribute sets when it encounters
+  a `Binding::Set` (on which it invokes `compile_bindings` itself).
+
+In addition to these several methods (such as `compile_attr_set`,
+`compile_let_in`, ...) invoke the binding-kind specific logic and then call out
+to the functions above.