Isn’t it about time we were able to define our own functions? I think so. There’s just one problem: some of the plumbing we’ve built so far doesn’t suit what we’re making in this post. Our environments, for example, won’t support recursive functions — so we’ll need to fix the implementation. Also, fair warning… this is the longest post yet!
Say we’re defining a factorial function. It might look like this:
(define fact (x) (if (< x 2) 1 (* x (fact (- x 1)))))
So in order to evaluate the definition and bind fact
, we’d have to do the
following:
define
in a closure (don’t worry about the exact
mechanics just yet)fact
in an environment, just like we do when evaluating
val
That’s all fine and dandy until we try and execute fact
, at which point the
interpreter will halt with a NotFound
exception. After all, how can fact
possibly know what it itself is? It can’t, because our environments store
environments as Pair
s of Symbol
x value
, so we’d have to infinitely
recursively store the contents of the function in nested environments all the
way down. And that takes up too much space.
Or would we have to? We could try fetching the formal parameters from the closure environment and the recursive function calls from the call-time environment… but that won’t work, because that function might have been redefined in the meantime, like so:
; Assume for a moment we've defined <, *, - primitives
(define fact (x) (if (< x 2) 1 (* x (fact (- x 1)))))
(val x fact)
(define fact (x) 5)
(x 3) ; == (* 3 (fact (- 3 1)))
; == (* 3 5)
; == 15
So we’ve got to fix them.
In C, we solve problems of recursive data storage with pointers. If, for example, you need to represent a binary search tree, you might do so like this:
struct TreeNode {
void *data;
struct TreeNode *left;
struct TreeNode *right;
};
Even if we wanted to, we couldn’t store actual TreeNode
s for the left and
right subtrees of any given node, because (as above) this would take infinite
space.
In OCaml, we’ll solve this problem with ref
s, which are sort of like OCaml’s
well-typechecked version of pointers. 1
I propose that we change our environment to have the following type:
type 'a env = (string * 'a option ref) list
The change to the built-in list
type from our use of Pair
s is largely due
to lobject
’s inability to hold option
or ref
. It comes with benefits,
most notably the decrease in code complexity throughout the interpreter.
That’s a lot of gymnastics for a lousy environment, but here’s the rationale, in brief: we need to fix the steps from above (reproduced here with modifications in bold):
define
in a closure (don’t worry about the exact
mechanics just yet), binding fact
to a ref
of unspecified valuefact
to the closure from above, just like we do when
evaluating val
ref
from step 1 to point to the resulting closureThese updates allow us to “close the loop” and make fact
self-aware. In our
case, “unspecified value” means using the option
type
2, and we’ll have a ref
pointing to that.
Look for these modifications and additions to the environment functions below.
Earlier I said not to worry about the exact mechanics of closures just yet. Well, time to worry about the exact mechanics. First, the how.
A lambda
is an anonymous function, defined like so:
(lambda (x) (+ x 1))
; ^ the expression body
; ^ the formal parameters
; ^ the word lambda
This anonymous function takes in a value (called x
internally) and adds one
(1) to it. Evaluating a lambda-expression results in a closure. A closure is
composed of three things:
x
)(+ x 1)
)The captured environment is key, and is what allows us to do things like
recursive functions, curried functions, and a bunch of other fancy stuff. In
order to make a closure, we can simply capture the environment at the point
where the lambda
is being evaluated.
A closure is a value, and therefore self-evaluating. They are only called into action when applied to values, at which point:
And that’s closures in a nutshell. So let’s hop to it.
We’ll need to have lambda expressions and closure value
s, I suppose, so let’s
add those:
type lobject =
| Fixnum of int
| Boolean of bool
| Symbol of string
| Nil
| Pair of lobject * lobject
| Primitive of string * (lobject list -> lobject)
| Quote of value
| Closure of name list * exp * value env (* NEW! *)
and value = lobject
and name = string
and exp =
| Literal of value
| Var of name
| If of exp * exp * exp
| And of exp * exp
| Or of exp * exp
| Apply of exp * exp
| Call of exp * exp list
| Lambda of name list * exp (* NEW! *)
| Defexp of def
Note how the only difference between Lambda
and Closure
is the attached
value env
.
But so far we don’t actually have a means to construct Lambda
expressions, so
let’s go ahead and add that into build_ast
:
let rec build_ast sexp =
match sexp with
| Primitive _ | Closure _ -> raise ThisCan'tHappenError
| Fixnum _ | Boolean _ | Nil | Quote _ -> Literal sexp
| Symbol s -> Var s
| Pair _ when is_list sexp ->
(match pair_to_list sexp with
[...]
| [Symbol "val"; Symbol n; e] -> Defexp (Val (n, build_ast e))
| [Symbol "lambda"; ns; e] when is_list ns ->
let err () = raise (TypeError "(lambda (formals) body)") in
let names = List.map (function Symbol s -> s | _ -> err ())
(pair_to_list ns)
in Lambda (names, build_ast e)
[...]
We’ve got to make sure of a couple things:
Symbol
names for formal parametersI’ve included a handy-dandy error function, tersely dubbed err
, that will
raise an exception if any of the supposed “formals” are not, in fact, formals.
And that’s all well and good if we can build it, but right now evalexp
doesn’t know how to handle it. So let’s make some Closure
s!
let rec evalexp exp env =
[...]
let rec ev = function
[...]
| Call (Var "env", []) -> env_to_val env
| Call (e, es) -> evalapply (ev e) (List.map ev es)
| Lambda (ns, e) -> Closure (ns, e, env)
| Defexp d -> raise ThisCan'tHappenError
in ev exp
Creating closures is surprisingly easy to do! Now before we try and evaluate them, let’s get cracking with some environment code.
bind
remains largely the same:
let bind (n, v, e) = (n, ref (Some v))::e
Only now we’re sprinkling in ref
s and option
s. We’ll also introduce two
other functions, mkloc
and bindloc
, which we’ll use later on:
let mkloc () = ref None
let bindloc (n, vor, e) = (n, vor)::e
(* Signature of bindloc: *)
val bindloc : name * 'a option ref * a env -> 'a env
Next comes lookup
, which is also modified to deal with ref
s and option
s:
exception UnspecifiedValue of string
let rec lookup = function
| (n, []) -> raise (NotFound n)
| (n, (n', v)::_) when n=n' ->
begin
match !v with
| Some v' -> v'
| None -> raise (UnspecifiedValue n)
end
| (n, (n', _)::bs) -> lookup (n, bs)
We’ve got one more case to handle now — when the value at a location is
None
. In that case, we should raise an UnspecifiedValue
exception.
Otherwise, we’ll just do as we did before, albeit now with some more
complexity.
Last, we’ve got a bindlist
function whose sole purpose is to bind a bunch of
name
s to a bunch of value
s (putting them in option ref
s along the way).
let bindlist ns vs env =
List.fold_left2 (fun acc n v -> bind (n, v, acc)) env ns vs
That’s all for environments… for now.
Okay but we still don’t know how to evaluate Closure
s… currently Call
and
Apply
both use evalapply
, but that only has a case for Primitive
s. It’ll
need to be able to handle Closure
s as well:
let rec evalexp exp env =
let evalapply f vs =
match f with
| Primitive (_, f) -> f vs
| Closure (ns, e, clenv) -> evalexp e (bindlist ns vs clenv)
| _ -> raise (TypeError "(apply prim '(args)) or (prim args)")
in
[...]
Ah, yes. This is where bindlist
comes in handy. Bind the names of the formals
to the values of the actuals, add them to the captured Closure
environment…
and then just call evalexp
. That’s it! How crazy is that?? Now we can build
functions quite easily:
$ ocaml 10_closures.ml
> (val addone (lambda (x) (+ x 1)))
#<closure>
> (addone 4)
5
> Exception: End_of_file.
$
I’ve glossed over the printing of closures, since (as you can see) all it
requires is printing #<closure>
. You could make it fancier and actually print
out the lambda and captured environment, but I think you’ll find most of the
time that’s just too much output.
Astute readers will at this point notice that we haven’t yet supported recursion, or really even done anything notably different with the new implementation of environments. Yes, well. That’s in the next section.
define
We have no way to make the values bound with val
self-aware, so we’re going
to do that now with define
. Let’s add a new type of def
:
[...]
and def =
| Val of name * exp
| Def of name * name list * exp (* NEW! *)
| Exp of exp
Note that Def
that very much mirrors the syntax we used before:
(define f (x) (+ x 1))
; Def ("f", ["x"], Call(Literal (Primitive "+"), [Var "x"; Literal (Fixnum 1)))
But we don’t yet have a way to build Def
s, so let’s go ahead and do that.
We’re going to need to add yet another case in build_ast
that looks very
similar for the case that builds Lambda
s:
let rec build_ast sexp =
[...]
(match pair_to_list sexp with
[...]
| [Symbol "lambda"; ns; e] when is_list ns ->
let err () = raise (TypeError "(lambda (formals) body)") in
let names = List.map (function Symbol s -> s | _ -> err ())
(pair_to_list ns)
in Lambda (names, build_ast e)
| [Symbol "define"; Symbol n; ns; e] ->
let err () = raise (TypeError "(define name (formals) body)") in
let names = List.map (function Symbol s -> s | _ -> err ())
(pair_to_list ns)
in Defexp (Def (n, names, build_ast e))
[...]
We’ve got nearly the same structure — the only differences now are that we
have to capture the name of the function we’re defining inside the Def
and
then wrap the Def
in a Defexp
.
Now that we can build Def
s we should also be able to evaluate them. So let’s
head on over to evaldef
, where we’re add a new case. I’m going to reproduce
the English steps I outlined above and then give the code so you can see how
well they translate over.
define
in a closure binding the function name to
a ref
of unspecified valueval
ref
from step 1 to point to the resulting closurelet evaldef def env =
match def with
| Val (n, e) -> let v = evalexp e env in (v, bind (n, v, env))
| Def (n, ns, e) ->
let (formals, body, cl_env) =
(match evalexp (Lambda (ns, e)) env with
| Closure (fs, bod, env) -> (fs, bod, env)
| _ -> raise (TypeError "Expecting closure."))
in
let loc = mkloc () in
let clo = Closure (formals, body, bindloc (n, loc, cl_env)) in
let () = loc := Some clo in
(clo, bindloc (n, loc, env))
| Exp e -> (evalexp e env, env)
The only tricky thing is the match
, which should be unnecessary — we know
that there’s only one way a Lambda
can evaluate: to a Closure
! But OCaml’s
type system doesn’t know that. In order to make it understand, we’d need to
convert our types to polymorphic variants. 3 Only then
could we write something like:
[...]
let `Closure (formals, body, cl_env) = evalexp (`Lambda (ns, e)) env in
[...]
And avoid that whole headache. Perhaps we’ll convert later.
In any case, that’s it. That’s lambdas, closures, define — all taken care of.
If you’re left wondering how we’re supposed to have the env
special form now
that we’ve completely changed how our environments work, swell. All the more
reason to download the code here and mess with it.
Next up, primitives III.
ref
s are probably not implemented as you’d expect. Since fields of OCaml
records are the only things in the language that can be declared mutable with
the mut
keyword, a ref
is a record with just one field: contents
.
Declared mutable, of course. ↩
Here is the API for OCaml’s
option
type, and also some reading on
Haskell’s Maybe
monad. ↩
Polymorphic variants are fun because they solve a bunch of nasty little type problems we have. They’re also not fun because they’re probably unfamiliar to OCaml newcomers, and require typing a bunch of backticks everywhere. Here’s an interesting StackOverflow question about them if you want to read some more. ↩