When representing syntax we have at least two options for variables:

*variable names*, which are convenient to read and write but a nightmare to implement, or*De Bruijn indices*, which avoid most of the issues of variable names, but really aren’t human-friendly.

Case in point:

“de Bruijn syntax is a Cylon detector” – @pigworker at BCTCS pic.twitter.com/17d9BJAoZe

— Edwin Brady (@edwinbrady) April 27, 2017

Can we come up with an approach that combines the advantages of both?

I think so, with a strong enough type system and sprinkling of proof search.

This blog post is literate Idris.

>moduleLambda >importData.List

To keep it simple, we will focus only on untyped -calculus. Our idea is to define an inductive type family `Term xs`

indexed by a list of free variables `xs`

:

>dataTerm : List String -> Typewhere> App : Term xs -> Term xs -> Term xs > Lam : (x : String) -> Term (x :: xs) -> Term xs > Var : (x : String) -> {autop : Elem x xs} -> Term xs

The constructors are application, lambda, and variables. For lambdas, we introduce a new variable which can then be free in the body of the lambda. For variables, we can use the variable name `x`

as long as `x`

is in the list of free variables `xs`

. The `{auto p ...}`

syntax tells Idris to perform a proof search, but you can always supply a proof manually.

The `Elem x xs`

type is defined in `Data.List`

as follows:

dataElem : a -> List a -> TypewhereHere : Elem x (x :: xs) There : Elem x xs -> Elem x (y :: xs)

A value of type `Elem x xs`

tells us not only that `x`

is in `xs`

, but it also gives us the position of `x`

in `xs`

. Thus the `Elem x xs`

argument represents the De Bruijn index of the variable. This serves two purposes:

- It ensures that the variable name
`x`

is in the list of free variables`xs`

. - In the case where a name is repeated in
`xs`

, it also tells us*which*variable is meant. For example,`Var "x" {p=Here}`

and`Var "x" {p=There Here}`

are both valid terms in`Term ["x","x"]`

, and they refer to two different variables, even though the variables have the same name.

So it’s nice to have an `Elem x xs`

around. But we definitely don’t want to write values of `Elem`

, so we ask the compiler to do it for us. We can be the humans, and let the compiler be the machine.

For example, here are the SKI combinators, with no indices in sight:

> termI : Term xs > termI = Lam "x" (Var "x") > > termK : Term xs > termK = Lam "x" (Lam "y" (Var "x")) > > termS : Term xs > termS = Lam "x" (Lam "y" (Lam "z" > (App (App (Var "x") (Var "z")) > (App (Var "y") (Var "z")))))

It looks nice. That’s the idea.

The rest of the post will be implementation details.

**Shifting.** Relaxing the type of a term by adding more free variables is akin to shifting in De Bruijn notation. These definitions rely on the library function

dropElem : (xs : List t) -> Elem x xs -> List t

which drops one element of a list. Shifting is defined as follows:

> undropElem : (q : Elem y ys) -> (p : Elem x (dropElem ys q)) -> Elem x ys > undropElem Here p = There p > undropElem (There q') Here = Here > undropElem (There q') (There p') = There (undropElem q' p') > > shift : (q : Elem y ys) -> Term (dropElem ys q) -> Term ys > shift q (App a b) = App (shift q a) (shift q b) > shift q (Lam x a) = Lam x (shift (There q) a) > shift q (Var x {p}) = Var x {p=undropElem q p} > > shift1 : Term xs -> Term (x :: xs) > shift1 = shift Here

**Substitution.** To define substitution, we first need a version of `dropElem`

that also drops all the elements before the given element. Thus:

> dropManyElem : (ys : List t) -> (q : Elem y ys) -> List t > dropManyElem [] Hereimpossible> dropManyElem [] (There _)impossible> dropManyElem (x :: xs) Here = xs > dropManyElem (x :: xs) (There q') = dropManyElem xs q'

Replacing a free variable with a term is always tricky in De Bruijn notation (and it’s even harder without them). Fortunately our reliance on `Elem`

forces us to do the right thing; the types prevent us from having index-manipulation bugs. The tricky case is Var, which we handle separately.

> substVar : (q : Elem y ys) -> Term (dropManyElem ys q) -> (x: String) -> (p : Elem x ys) -> Term (dropElem ys q) > substVar Here t x Here = t > substVar Here t x (There p') = Var x {p=p'} > substVar (There q') t x Here = Var x {p=Here} > substVar (There q') t x (There p') = shift1 (substVar q' t x p') > > subst : (q : Elem y ys) -> Term (dropManyElem ys q) -> Term ys -> Term (dropElem ys q) > subst q t (App a b) = App (subst q t a) (subst q t b) > subst q t (Lam x a) = Lam x (subst (There q) t a) > subst q t (Var x {p}) = substVar q t x p > > subst1 : (x : String) -> Term xs -> Term (x :: xs) -> Term xs > subst1 _ = subst Here

**Renaming Variables.** Renaming variables is a bit different from substitution, because we aren’t dropping a free variable, just replacing it. We need a helper function to replace elements from a list.

> replaceElem : (ys : List t) -> (q : Elem y ys) -> (x : t) -> List t > replaceElem (y :: xs) Here x = x :: xs > replaceElem (y :: xs) (There later) x = y :: replaceElem xs later x > > renameVar : (q : Elem y xs) -> (x : String) -> (z : String) -> (p:Elem z xs) -> Term (replaceElem xs q x) > renameVar Here x z Here = Var x > renameVar Here x z (There p') = Var z {p=There p'} > renameVar (There q') x z Here = Var z {p=Here} > renameVar (There q') x z (There p') = shift1 (renameVar q' x z p') > > rename : (q : Elem y xs) -> (x : String) -> Term xs -> Term (replaceElem xs q x) > rename q x (App a b) = App (rename q x a) (rename q x b) > rename q x (Lam z a) = Lam z (rename (There q) x a) > rename q x (Var z {p}) = renameVar q x z p > > rename1 : (x : String) -> Term (y :: xs) -> Term (x :: xs) > rename1 = rename Here

**Alpha Equivalence.** The easiest way to compute alpha equivalence is to ignore names and compare indices instead. We need a helper function to convert `Elem`

proofs into indices.

> elemToIndex : Elem x xs -> Nat > elemToIndex Here = 0 > elemToIndex (There h) = S (elemToIndex h) > > alphaEq : Term xs -> Term ys -> Bool > alphaEq (App a1 b1) (App a2 b2) = alphaEq a1 a2 && alphaEq b1 b2 > alphaEq (Lam x1 a1) (Lam x2 a2) = alphaEq a1 a2 > alphaEq (Var x1 {p=p1}) (Var x2 {p=p2}) = elemToIndex p1 == elemToIndex p2 > alphaEq _ _ = False

**Showing A Term.** Turning a term into a string is not as easy as it may seem. We could let Idris derive a `Show`

instance for us, but in the process we lose the ability to distinguish between `Lam "x" (Lam "x" (Var "x" {p=Here}))`

and `Lam "x" (Lam "x" (Var "x" {p=There Here}))`

. To get around this, we roll our own show function that generates fresh variable names as need be. Of course we first need a function to generate fresh names if necessary. Here is one possible implementation:

> freshen : String -> List String -> String > freshen x xs =ifelem x xsthenaux [0 .. length xs]elsex >whereaux : List Nat -> String > aux [] = x -- should never happen > aux (i :: is) = >ifelem (x ++ show i) xs >thenaux is >elsex ++ show i

Then we define `showRenamed`

which shows the structure of a term while (possibly) renaming free variables. Then `show`

is just

> showRenamed : List String -> Term xs -> String > showRenamed ys (App a b) = concat ["(App", showRenamed ys a, " ", showRenamed ys b, ")"] > showRenamed ys (Var x {p}) = concat ["(Var ", show x', ")"] >wherex' : String > x' = fromMaybe x (index' (elemToIndex p) ys) > > showRenamed {xs} ys (Lam x a) = concat ["(Lam ", show x', " ", showRenamed ys' a, ")"] >wherex' : String > x' = freshen x (ys ++ xs) > > ys' : List String > ys' = x' :: ys > > Show (Term xs)where> show = showRenamed []

**In Conclusion.** We developed a representation of terms in the untyped -calculus that combines variable names with De Bruijn indices. These terms are friendly to read and write, and are *relatively* easy to manipulate while avoiding capture and subtle index bugs.

**Exercise.** Develop a similar representation for the simply-typed -calculus (STLC), with *typed* free variables. The `Term`

type family should have type `Ctx -> Typ -> Type`

where `Ctx`

is a list of free variables with types, and `Typ`

is the type of types in STLC.

[**EDIT**: Refactored `substVar`

and `renameVar`

out of `subst`

and `rename`

to please the totality checker and improve the code slightly. Fixed a bug in `renameVar`

.]

Follow-up: ABTs (Abstract Binding Trees) in Idris, using this technique. https://gist.github.com/fmota/3e888df3bce4089cdec74a152c71fea4

LikeLike

Incredibly helpful, thanks for sharing! I modified it for the STLC and now trying to add types to ABTs as well. Eventually I’d like to implement auto-complete IDE features for my language. I think this kind of setup will help a lot.

LikeLike