Let Idris Take Care Of You: Variable Binding

When representing syntax we have at least two options for variables:

  • variable names, which are convenient to read and write but a nightmare to implement, or
  • De Bruijn indices, which avoid most of the issues of variable names, but really aren’t human-friendly.

Case in point:

Can we come up with an approach that combines the advantages of both?

I think so, with a strong enough type system and sprinkling of proof search.

This blog post is literate Idris.

> module Lambda
> import Data.List

To keep it simple, we will focus only on untyped \lambda-calculus. Our idea is to define an inductive type family Term xs indexed by a list of free variables xs:

> data Term : List String -> Type where
>     App : Term xs -> Term xs -> Term xs
>     Lam : (x : String) -> Term (x :: xs) -> Term xs
>     Var : (x : String) -> {auto p : Elem x xs} -> Term xs

The constructors are application, lambda, and variables. For lambdas, we introduce a new variable which can then be free in the body of the lambda. For variables, we can use the variable name x as long as x is in the list of free variables xs. The {auto p ...} syntax tells Idris to perform a proof search, but you can always supply a proof manually.

The Elem x xs type is defined in Data.List as follows:

data Elem : a -> List a -> Type where
    Here  : Elem x (x :: xs)
    There : Elem x xs -> Elem x (y :: xs)

A value of type Elem x xs tells us not only that x is in xs, but it also gives us the position of x in xs. Thus the Elem x xs argument represents the De Bruijn index of the variable. This serves two purposes:

  1. It ensures that the variable name x is in the list of free variables xs.
  2. In the case where a name is repeated in xs, it also tells us which variable is meant. For example, Var "x" {p=Here} and Var "x" {p=There Here} are both valid terms in Term ["x","x"], and they refer to two different variables, even though the variables have the same name.

So it’s nice to have an Elem x xs around. But we definitely don’t want to write values of Elem, so we ask the compiler to do it for us. We can be the humans, and let the compiler be the machine.

For example, here are the SKI combinators, with no indices in sight:

> termI : Term xs
> termI = Lam "x" (Var "x")
> termK : Term xs
> termK = Lam "x" (Lam "y" (Var "x"))
> termS : Term xs
> termS = Lam "x" (Lam "y" (Lam "z"
>          (App (App (Var "x") (Var "z"))
>               (App (Var "y") (Var "z")))))

It looks nice. That’s the idea.

The rest of the post will be implementation details.

Shifting. Relaxing the type of a term by adding more free variables is akin to shifting in De Bruijn notation. These definitions rely on the library function

dropElem : (xs : List t) -> Elem x xs -> List t

which drops one element of a list. Shifting is defined as follows:

> undropElem : (q : Elem y ys) -> (p : Elem x (dropElem ys q)) -> Elem x ys
> undropElem Here p = There p
> undropElem (There q') Here = Here
> undropElem (There q') (There p') = There (undropElem q' p')
> shift : (q : Elem y ys) -> Term (dropElem ys q) -> Term ys
> shift q (App a b) = App (shift q a) (shift q b)
> shift q (Lam x a) = Lam x (shift (There q) a)
> shift q (Var x {p}) = Var x {p=undropElem q p}
> shift1 : Term xs -> Term (x :: xs)
> shift1 = shift Here

Substitution. To define substitution, we first need a version of dropElem that also drops all the elements before the given element. Thus:

> dropManyElem : (ys : List t) -> (q : Elem y ys) -> List t
> dropManyElem [] Here impossible
> dropManyElem [] (There _) impossible
> dropManyElem (x :: xs) Here = xs
> dropManyElem (x :: xs) (There q') = dropManyElem xs q'

Replacing a free variable with a term is always tricky in De Bruijn notation (and it’s even harder without them). Fortunately our reliance on Elem forces us to do the right thing; the types prevent us from having index-manipulation bugs. The tricky case is Var, which we handle separately.

> substVar : (q : Elem y ys) -> Term (dropManyElem ys q) -> (x: String) -> (p : Elem x ys) -> Term (dropElem ys q)
> substVar Here t x Here = t
> substVar Here t x (There p') = Var x {p=p'}
> substVar (There q') t x Here = Var x {p=Here}
> substVar (There q') t x (There p') = shift1 (substVar q' t x p')
> subst : (q : Elem y ys) -> Term (dropManyElem ys q) -> Term ys -> Term (dropElem ys q)
> subst q t (App a b) = App (subst q t a) (subst q t b)
> subst q t (Lam x a) = Lam x (subst (There q) t a)
> subst q t (Var x {p}) = substVar q t x p
> subst1 : (x : String) -> Term xs -> Term (x :: xs) -> Term xs
> subst1 _ = subst Here

Renaming Variables. Renaming variables is a bit different from substitution, because we aren’t dropping a free variable, just replacing it. We need a helper function to replace elements from a list.

> replaceElem : (ys : List t) -> (q : Elem y ys) -> (x : t) -> List t
> replaceElem (y :: xs) Here x = x :: xs
> replaceElem (y :: xs) (There later) x = y :: replaceElem xs later x
> renameVar : (q : Elem y xs) -> (x : String) -> (z : String) -> (p:Elem z xs) -> Term (replaceElem xs q x)
> renameVar Here x z Here = Var x
> renameVar Here x z (There p') = Var z {p=There p'}
> renameVar (There q') x z Here = Var z {p=Here}
> renameVar (There q') x z (There p') = shift1 (renameVar q' x z p')
> rename : (q : Elem y xs) -> (x : String) -> Term xs -> Term (replaceElem xs q x)
> rename q x (App a b) = App (rename q x a) (rename q x b)
> rename q x (Lam z a) = Lam z (rename (There q) x a)
> rename q x (Var z {p}) = renameVar q x z p
> rename1 : (x : String) -> Term (y :: xs) -> Term (x :: xs)
> rename1 = rename Here

Alpha Equivalence. The easiest way to compute alpha equivalence is to ignore names and compare indices instead. We need a helper function to convert Elem proofs into indices.

> elemToIndex : Elem x xs -> Nat
> elemToIndex Here = 0
> elemToIndex (There h) = S (elemToIndex h)
> alphaEq : Term xs -> Term ys -> Bool
> alphaEq (App a1 b1) (App a2 b2) = alphaEq a1 a2 && alphaEq b1 b2
> alphaEq (Lam x1 a1) (Lam x2 a2) = alphaEq a1 a2
> alphaEq (Var x1 {p=p1}) (Var x2 {p=p2}) = elemToIndex p1 == elemToIndex p2
> alphaEq _ _ = False

Showing A Term. Turning a term into a string is not as easy as it may seem. We could let Idris derive a Show instance for us, but in the process we lose the ability to distinguish between Lam "x" (Lam "x" (Var "x" {p=Here})) and Lam "x" (Lam "x" (Var "x" {p=There Here})). To get around this, we roll our own show function that generates fresh variable names as need be. Of course we first need a function to generate fresh names if necessary. Here is one possible implementation:

> freshen : String -> List String -> String
> freshen x xs = if elem x xs then aux [0 .. length xs] else x
>   where aux : List Nat -> String
>         aux [] = x -- should never happen
>         aux (i :: is) =
>             if elem (x ++ show i) xs
>                 then aux is
>                 else x ++ show i

Then we define showRenamed which shows the structure of a term while (possibly) renaming free variables. Then show is just

> showRenamed : List String -> Term xs -> String
> showRenamed ys (App a b) = concat ["(App", showRenamed ys a, " ", showRenamed ys b, ")"]
> showRenamed ys (Var x {p}) = concat ["(Var ", show x', ")"]
>   where x' : String
>         x' = fromMaybe x (index' (elemToIndex p) ys)
> showRenamed {xs} ys (Lam x a) = concat ["(Lam ", show x', " ", showRenamed ys' a, ")"]
>   where x'  : String
>         x'  = freshen x (ys ++ xs)
>         ys' : List String
>         ys' = x' :: ys
> Show (Term xs) where
>      show = showRenamed []

In Conclusion. We developed a representation of terms in the untyped \lambda-calculus that combines variable names with De Bruijn indices. These terms are friendly to read and write, and are relatively easy to manipulate while avoiding capture and subtle index bugs.

Exercise. Develop a similar representation for the simply-typed \lambda-calculus (STLC), with typed free variables. The Term type family should have type Ctx -> Typ -> Type where Ctx is a list of free variables with types, and Typ is the type of types in STLC.

[EDIT: Refactored substVar and renameVar out of subst and rename to please the totality checker and improve the code slightly. Fixed a bug in renameVar.]

Quotient Inductive Types & Their Dual

A Quotient Inductive Type (QIT) is an inductive type that includes some additional equality rules. For example, the type Int of integers can be defined as a QIT:

data Int
    Zero : Int
    Succ : Int -> Int
    Pred : Int -> Int
    SuccPred : (z : Int) -> Succ (Pred z) = z
    PredSucc : (z : Int) -> Pred (Succ z) = z

This is a quotient inductive definition of Int. There are three “point” constructors (Zero, Succ, and Pred) which construct all the Int values, and two “path” constructors (SuccPred and PredSucc) which construct additional equalities between the values of Int. These capture the idea that incrementing then decrementing, or decrementing then incrementing, are the same as doing nothing.

To define a recursive function over Int, one must not only supply cases for the point constructors, but also for the path constructors. For example, suppose we want to write a function to double an Int. Then we need the usual cases for the constructed values:

Double : Int -> Int
Double Zero = Zero
Double (Succ z) = Succ (Succ (Double z))
Double (Pred z) = Pred (Pred (Double z))

But we also need cases for the constructed equalities:

Double (SuccPred z) = Succ (SuccPred (Pred (Double z))) @ SuccPred (Double z) 
Double (PredSucc z) = Pred (PredSucc (Succ (Double z))) @ PredSucc (Double z)

In this case, Double (SuccPred z) must have type Double (Succ (Pred z)) = Double z, and Double (PredSucc z) must have type Double (Pred (Succ z)) = Double z.

There’s a bit of overloading going on: Succ,Pred, and Double are used to transform both values and equalities. When f is a function with domain A and p : x = y is an equality in type A, then f p : f x = f y is the congruence generated by applying f to p. I call this Congruence Overloading and it makes equalities easier to define. The @ operator above is just transitivity.

[Some context: The “point constructor” and “path constructor” terminology comes from Homotopy Type Theory (HoTT). In HoTT, there is a similar thing called Higher Inductive Types (HITs). QITs are a version of HITs that is suitable when the type theory doesn’t have higher paths, only plain equalities. They have been used to construct a model of type theory by Altenkirch and Kaposi.]

The natural question for a category theorist is “what’s the dual”? Well the dual of induction is coinduction, and the dual of quotients is equalizers, so maybe we can look at Equaliser Coinductive Types (ECTs).

These are coinductive types, defined by a set of “destructors”, together with a set of “restrictions”, i.e. equalities out of the type. If we think of coinductive data (codata) as behaviour, an ECT is used to place restrictions on behaviour.

Here’s an example. A tape is an infinite sequence of values to the left and to the right. With a tape, you can peek at the current value, move the tape left, or move the tape right. Furthermore, going left and then right, or going right and then left, are the same as doing nothing at all. Thus we define an ECT:

codata Tape t
    Peek  : Tape t -> t
    Left  : Tape t -> Tape t
    Right : Tape t -> Tape t
    LeftRight : (x : Tape t) -> Left (Right x) = x
    RightLeft : (x : Tape t) -> Right (Left x) = x

How do you construct a Tape t element? By copattern matching its destructors and restrictions. For instance, UniformTape returns a tape that is identical everywhere:

UniformTape : t -> Tape t
Peek (UniformTape x) = x
Left (UniformTape x) = UniformTape x
Right (UniformTape x) = UniformTape x
LeftRight (UniformTape x) = Refl (UniformTape x)
RightLeft (UniformTape x) = Refl (UniformTape x)

And PositionTape returns a Tape Int that has value i at position i:

PositionTape : Int -> Tape Int
Peek (PositionTape z) = z
Left (PositionTape z) = PositionTape (Pred z)
Right (PositionTape z) = PositionTape (Succ z)
LeftRight (PositionTape z) = PositionTape (PredSucc z)
RightLeft (PositionTape z) = PositionTape (SuccPred z)

We used the quotient inductive structure of ints to introduce a tape. Likewise, we can use the equalizer coinductive structure of tapes to eliminate an int:

At : Int -> Tape t -> t
At Zero x = Peek x
At (Succ z) x = At z (Right x) 
At (Pred z) x = At z (Left x)
At (SuccPred z) x = At z (RightLeft x)
At (PredSucc z) x = At z (LeftRight x)

I hope these examples show how natural ECTs are, and how they capture some symmetries in relation to QITs. Indeed, the type Tape t is isomorphic to Int -> t, so if you have Int you may as well have Tape.

Exercise. Show that Tape t is isomorphic to Int -> t.

Making a Difference When You’re Lazy

I want to share a neat way to perform automatic differentiation in Haskell. It generalizes dual numbers to handle arbitrarily high derivatives, via the power of laziness. I call them Taylor numbers* because they’re related to Taylor series.

Here’s the data type for Taylor numbers:

data Taylor t = Taylor t (Taylor t)

These numbers have two parts, the standard part and the differential part:

standardPart :: Taylor t -> t
standardPart (Taylor x dx) = x

differentialPart :: Taylor t -> Taylor t
differentialPart (Taylor x dx) = dx

The important thing is that we can extend pre-existing functions on numbers to Taylor numbers, and these extensions encode the rules of differentiation. To facilitate this, we define the extend and extend2 combinators:

extend  f df (Taylor x dx)               = Taylor (f x)   (df dx)
extend2 f df (Taylor x dx) (Taylor y dy) = Taylor (f x y) (df dx dy)

Now we can extend Num, Fractional, and Floating operations:

instance Num t => Num (Taylor t) where
  fromInteger z = Taylor (fromInteger z) 0
  negate = extend negate negate
  (+) = extend2 (+) (+)
  (-) = extend2 (-) (-)
  x * y = extend2 (*) (\ dx dy -> dx * y + x * dy) x y

  -- not differentiable at x=0
  abs x = extend abs (* signum x) x
  signum = extend signum (const 0)

Notice how the rule for (*) follows the product rule.

instance Fractional t => Fractional (Taylor t) where
  fromRational q = Taylor (fromRational q) 0
  x/y = z where z = extend2 (/) (\dx dy -> (dx - z*dy)/y) x y

instance Floating t => Floating (Taylor t) where
  pi = Taylor pi 0
  exp x = y where y = extend exp (* y) x
  log x = extend log (/ x) x
  sqrt x = extend sqrt (/ (2 * sqrt x)) x
  sin x = extend sin (* cos x) x
  cos x = extend cos (* (- sin x)) x
  tan x = extend tan (/ (cos x * cos x)) x

(You can look up the rest of the derivatives here.) Notice how (/) and exp refer to the result in their own derivative. I thought that was cute (but it’s not necessary).

We represent differentiable functions as functions Taylor t -> Taylor t. To calculate the derivatives of a differentiable function, we define variable and derivative:

variable :: Num t => t -> Taylor t
variable x = Taylor x 1

derivative :: Num t => (Taylor t -> Taylor t) -> t -> t
derivative f = standardPart . differentialPart . f . variable

For instance, we have:

derivative (\x -> x*x) x == x*2
derivative (\x -> x*x*x) x == x*x*3
derivative exp x == exp x

More generally, we can extract the f(x), f'(x), f”(x), etc coefficients of the Taylor series at a point and use that to calculate nth derivatives:

series :: Taylor t -> [t]
series (Taylor x dx) = x : series dx

nthDerivative :: Num t => Int -> (Taylor t -> Taylor t) -> t -> t
nthDerivative n f = (!! n) . series . f . variable

Why does it work? For the same reason dual numbers work. Taylor numbers are dual numbers where the ε-coefficient is itself a dual number, where the ε-coefficient is itself a dual number, where the ε-coefficient is itself a dual number … to infinity. Due to laziness, we only calculate as many derivatives as we need, without needing to know ahead of time how many derivatives that is.

Next post I’ll write about calculating gradients (i.e multi-variate derivatives) and doing optimisation via gradient descent, based on Taylor numbers.

Exercise. When we’re working with Taylor numbers, the derivative is very often zero. To handle this efficiently, we can redefine Taylor numbers as follows:

data Taylor t = Taylor t (Maybe (Taylor t))

where a differential part of Nothing means the derivative is zero. Redefine the rest of the module to take this into account. What simplifying assumptions can you make for extend and extend2?

*- But note that “Taylor number” is already taken in a different context.