Welcome back to the “Compiling a Lisp” series. Last time, we added some
primitive unary instructions like add1
and integer->char
. This time, we’re
going to add some primitive binary functions like +
and <
. After this
post, we’ll be able to compile programs like:
(< (+ 1 2) (- 4 3))
Note that these expressions may look like function calls but, like last
chapter, they are not opening new stack frames (which I’ll explain more about
later). Instead, the compiler will recognize that the programmer is directly
applying the symbol +
and generate special code. You can think about this
kind of like an inlined function call.
It’s important to remember that the compiler has a certain internal contract:
the result of any given compiled expression is stored in rax
. This isn’t some
intrinsic property of all compilers, but it’s one we’ve kept so far in this
series.
This is similar to but not the same as the calling convention that I mentioned earlier, where function results are stored in
rax
. That calling convention is for interacting with other people’s code. Within your own generated code, there are no rules. So we could pick any other register, really, for storing intermediate results.
Now that we’re building primitive functions that can take two arguments, you
might notice a problem: our strategy of storing the result in rax
won’t work
on its own. If we were to naïvely write something like the following to
implement +
, then rax
would get overwritten in the code generated by
compiling operand1(args)
:
int Compile_call(Buffer *buf, ASTNode *callable, ASTNode *args) {
if (AST_is_symbol(callable)) {
// ...
if (AST_symbol_matches(callable, "+")) {
_(Compile_expr(buf, operand2(args)));
// The result of this is stored in rax ^
_(Compile_expr(buf, operand1(args)));
// Oops, we just overwrote rax ^
Emit_add_something(buf, /*dst=*/kRax));
return 0;
}
// ...
}
// ...
}
We could try and work around this by adding some kind of register allocation
algorithm and take advantage of rcx
, rdx
, etc. Or, simpler, we could decide
to allocate all intermediate values on the stack and move on with our lives. I
prefer the latter. It’s simpler.
Since we can’t yet save our compiled programs to disk, there’s some amount of setup that has to happen before they’re run. Right now, the C programs I’m providing along with this series compile to binaries that just run the test suites for the compiler. They don’t actually run full programs. For this reason, there are already some call frames on the stack by the time our generated code is run.
Let’s take a look at the stack at the moment we enter a compiled Lisp program:
| | High addresses
+------------------+
| main |
+------------------+ |
| ~ some data ~ | |
| ~ some data ~ | |
+------------------+ |
| compile_test | |
+------------------+ |
| ~ some data ~ | |
| ~ some data ~ | v
+------------------+
| Testing_exe... | rsp (stack pointer)
+------------------+
| | <-- Our frame!
| | Low addresses
In this diagram, we have the C program’s main
function, which has its own
local variables and so on. Then the main
function calls the compile_test
unit suite. This in turn calls this Testing_execute_expr
function
(abbreviated in the diagram), which is responsible for calling into our
generated code. Every call stores the return address (some place to find the
next instruction to execute) on the stack and adjusts rsp
down.
Refresher: the call stack grows down. Why? Check out this StackOverflow answer that quotes an architect on the Intel 4004 and 8080 architectures. It’s stayed the same ever since.
In this diagram, we have rsp
pointing at a return address somewhere inside
the function Testing_execute_expr
, since that’s what called our Lisp
entrypoint. We have some data “above” (higher addresses) rsp
that we’re not
allowed to poke at, and we have this empty space “below” (lower addresses)
rsp
that is in our current stack frame. I say “empty” because we haven’t
yet stored anything there, not because it’s necessarily zero-ed out. I don’t
think there are any guarantees about the values in this stack frame.
We can use our stack frame to write and read values for our current Lisp program. With every recursive subexpression, we can allocate a little more stack space to keep track of the values. When I say “allocate”, I mean “subtract from the stack pointer”, because the stack is already a contiguous space in memory allocated for us. For example, here is how we can write to the stack:
mov [rsp-8], 0x4
This puts the integer 4
at displacement -8
from rsp
. On the stack diagram
above, it would be at the slot labeled “Our frame”. It’s also possible to read
with a positive or zero displacement, but those point to previous stack frames
and the return address, respectively. So let’s avoid manipulating those.
Note that I used a multiple of 8. Not every store has to be a to an address that is a multiple of 8, but it is natural and I think also faster to store 8-byte-sized things at aligned addresses.
Let’s walk through a real example to get more hands-on experience with this
stack storage idea. We’ll use the program (+ 1 2)
. The compiled version of
that program should:
compile(2)
to rax
rax
into [rsp-8]
compile(1)
to rax
[rsp-8]
to rax
So after compiling that, the stack will look like this:
| | High addresses
+------------------+
| Testing_exe... | RSP
+------------------+
| 0x8 | RSP-8 (result of compile(2))
| | Low addresses
And the result will be in rax
, per our internal compiler contract.
This is all well and good, but at some point we’ll need our compiled programs
to emit the push
instruction or make function calls of their own. Both of
these modify the stack pointer. push
writes to the stack and decrements
rsp
. call
is roughly equivalent to push
followed by jmp
.
For that reason, x86-64 comes with another register called rbp
and it’s
designed to hold the Base Pointer. While the stack pointer is supposed to track
the “top” (low address) of the stack, the base pointer is meant to keep a
pointer around to the “bottom” (high address) of our current stack frame.
This is why in a lot of compiled code you see the following instructions repeated1:
myfunction:
push rbp
mov rbp, rsp
sub rsp, N ; optional; allocate stack space for locals
; ... function body ...
mov rsp, rbp ; required if you subtracted from rsp above
pop rbp
ret
The first three instructions, called the prologue, save rbp
to the stack,
and then set rbp
to the current stack pointer. Then it’s possible to maintain
steady references to variable locations on the stack even as rsp
changes.
Yes, the compiler could adjust its internal table of references every time the
compiler emits code that modifies rsp
, but that sounds much harder.
The last three instructions, called the epilogue, fetch the old rbp
that we
saved to the stack, write it back into rbp
, then exit the call.
To confirm this for yourself, check out this sample compiled C
code. Look at the disassembly following the
label square
. Prologue, code, epilogue.
Until now, we haven’t needed to keep track of much as we recursively traverse
expression trees. Now, in order to keep track of how much space on the stack
any given compiled code will need, we have to add more state to our compiler.
We’ll call this state the stack_index
— Ghuloum calls it si
— and we’ll
pass it around as a parameter. Whatever it’s called, it points to the first
writable (unused) index in the stack at any given point.
In compiled functions, the first writable index is -kWordSize
(-8
), since
the return address is already at 0
.
int Compile_function(Buffer *buf, ASTNode *node) {
Buffer_write_arr(buf, kFunctionPrologue, sizeof kFunctionPrologue);
_(Compile_expr(buf, node, -kWordSize));
Buffer_write_arr(buf, kFunctionEpilogue, sizeof kFunctionEpilogue);
return 0;
}
I’ve also gone ahead and added the prologue and epilogue. They’re stored in static arrays. This makes them easier to modify, and also makes them accessible to testing helpers. The testing helpers can use these arrays to make testing easier for us — we can check if our expected code is book-ended by this code.
static const byte kFunctionPrologue[] = {
// push rbp
0x55,
// mov rbp, rsp
kRexPrefix, 0x89, 0xe5,
};
static const byte kFunctionEpilogue[] = {
// pop rbp
0x5d,
// ret
0xc3,
};
For Compile_expr
, we just pass this new stack index through.
int Compile_expr(Buffer *buf, ASTNode *node, word stack_index) {
// ...
if (AST_is_pair(node)) {
return Compile_call(buf, AST_pair_car(node), AST_pair_cdr(node),
stack_index);
}
// ...
}
And for Compile_call
, we actually get to use it. Let’s look back at our stack
storage strategy for compiling (+ 1 2)
(now replacing rsp
with rbp
):
compile(2)
to rax
rax
into [rbp-8]
compile(1)
to rax
[rbp-8]
to rax
For binary functions, this can be generalized to:
arg2
(stored in rax
)rax
to stack_index
arg1
(stored in rax
)[rbp-stack_index]
and rax
)The key is this: for the first recursive call to Compile_expr
, the compiler
is allowed to emit code that can use the current stack_index
and anything
below that on the stack. For the second recursive call to Compile_expr
, the
compiler has to bump stack_index
, since we’ve stored the result of the first
compiled call at stack_index
.
Take a look at our implementation of binary add:
int Compile_call(Buffer *buf, ASTNode *callable, ASTNode *args,
word stack_index) {
if (AST_is_symbol(callable)) {
// ...
if (AST_symbol_matches(callable, "+")) {
_(Compile_expr(buf, operand2(args), stack_index));
Emit_store_reg_indirect(buf, /*dst=*/Ind(kRbp, stack_index),
/*src=*/kRax);
_(Compile_expr(buf, operand1(args), stack_index - kWordSize));
Emit_add_reg_indirect(buf, /*dst=*/kRax, /*src=*/Ind(kRbp, stack_index));
return 0;
}
// ...
}
// ...
}
In this snippet, Ind
stands for “indirect”, and is a constructor for a
struct. This an easy and readable way to represent (register, displacement)
pairs for use in reading from and writing to memory. We’ll cover this more
detail in the instruction encoding.
To prove to ourselves that this approach works, we’ll add some tests later.
Subtraction, multiplication, and division are much the same as addition. We’re also going to completely ignore overflow, underflow, etc.
Equality is different in that it does some comparisons after the fact (see Primitive unary functions). To check if two values are equal, we compare their pointers:
if (AST_symbol_matches(callable, "=")) {
_(Compile_expr(buf, operand2(args), stack_index));
Emit_store_reg_indirect(buf, /*dst=*/Ind(kRbp, stack_index),
/*src=*/kRax);
_(Compile_expr(buf, operand1(args), stack_index - kWordSize));
Emit_cmp_reg_indirect(buf, kRax, Ind(kRbp, stack_index));
Emit_mov_reg_imm32(buf, kRax, 0);
Emit_setcc_imm8(buf, kEqual, kAl);
Emit_shl_reg_imm8(buf, kRax, kBoolShift);
Emit_or_reg_imm8(buf, kRax, kBoolTag);
return 0;
}
It uses a new comparison opcode that compares a register with some memory. This
is why we can’t use the Compile_compare_imm32
helper function.
The less-than operator (<
) is very similar to equality, but instead we use
setcc
with the kLess
flag instead of the kEqual
flag.
We used some new opcodes today, so let’s take a look at the implementations. First, here is the indirection implementation I mentioned earlier:
typedef struct Indirect {
Register reg;
int8_t disp;
} Indirect;
Indirect Ind(Register reg, int8_t disp) {
return (Indirect){.reg = reg, .disp = disp};
}
I would have used the same name in the struct and the constructor but unfortunately that’s not allowed.
Here’s an implementation of an opcode that uses this Indirect
type. This
emits code for instructions of the form mov [reg+disp], src
.
uint8_t disp8(int8_t disp) { return disp >= 0 ? disp : 0x100 + disp; }
void Emit_store_reg_indirect(Buffer *buf, Indirect dst, Register src) {
Buffer_write8(buf, kRexPrefix);
Buffer_write8(buf, 0x89);
Buffer_write8(buf, 0x40 + src * 8 + dst.reg);
Buffer_write8(buf, disp8(dst.disp));
}
The disp8
function is a helper that encodes negative numbers.
The opcodes for add
, sub
, and cmp
are similar enough to this one, except
src
and dst
are swapped. mul
is a little funky because it doesn’t take
two parameters. It assumes that one of the operands is always in rax
.
As usual, we’ll close with some snippets of tests.
Here’s a test for +
. I’m trying to see if inlining the text assembly with the
hex makes it more readable. Thanks Kartik for the
suggestion.
TEST compile_binary_plus(Buffer *buf) {
ASTNode *node = new_binary_call("+", AST_new_integer(5), AST_new_integer(8));
int compile_result = Compile_function(buf, node);
ASSERT_EQ(compile_result, 0);
byte expected[] = {
// 0: 48 c7 c0 20 00 00 00 mov rax,0x20
0x48, 0xc7, 0xc0, 0x20, 0x00, 0x00, 0x00,
// 7: 48 89 45 f8 mov QWORD PTR [rbp-0x8],rax
0x48, 0x89, 0x45, 0xf8,
// b: 48 c7 c0 14 00 00 00 mov rax,0x14
0x48, 0xc7, 0xc0, 0x14, 0x00, 0x00, 0x00,
// 12: 48 03 45 f8 add rax,QWORD PTR [rbp-0x8]
0x48, 0x03, 0x45, 0xf8};
EXPECT_FUNCTION_CONTAINS_CODE(buf, expected);
Buffer_make_executable(buf);
uword result = Testing_execute_expr(buf);
ASSERT_EQ(result, Object_encode_integer(13));
AST_heap_free(node);
PASS();
}
Here’s a test for <
.
TEST compile_binary_lt_with_left_greater_than_right_returns_false(Buffer *buf)
{
ASTNode *node = new_binary_call("<", AST_new_integer(6), AST_new_integer(5));
int compile_result = Compile_function(buf, node);
ASSERT_EQ(compile_result, 0);
Buffer_make_executable(buf);
uword result = Testing_execute_expr(buf);
ASSERT_EQ_FMT(Object_false(), result, "0x%lx");
AST_heap_free(node);
PASS();
}
There are more tests in the implementation, as usual. Take a look if you like.
This has been a more complicated post than the previous ones, I think. The stack allocation may not make sense immediately. It might take some time to sink in. Try writing some of the code yourself and see if that helps.
Next time we’ll add the ability to bind variables using a
parser so we can input expressions more easily. See
you then!let