Welcome back to the “Compiling a Lisp” series. Last time we added support for
if
expressions. This time we’re going add support for basic heap allocation.
Heap allocation comes in a couple of forms, but the one we care about right now
is the cons
primitive. Much like AST_new_pair
in the compiler, cons
should:
car
and cdr
, andOnce we have that pair, we’ll want to poke at its data. This means we should
probably also implement car
and cdr
primitive functions today.
In order to generate code for packing and pulling apart pairs, we should probably know how they are laid out in memory.
Pairs contain two elements, side by side — kind of like a two-element array.
The first element (pair[0]
) is the car
and the second one (pair[1]
) is
the cdr
.
+-----+-----+
...| car | cdr |...
+-----+-----+
^
pointer
The untagged pointer points to the address of the first element, and the tagged
pointer has some extra information (kPairTag
== 1
) that we need to get rid
of to inspect the elements. If we don’t, we’ll try and read from one byte after
the pointer, somewhere in the middle of the car
. This will give us bad data.
To make things more concrete, imagine our pair is allocated at 0x10000
. Our
car
lives at *(0x10000)
(using C notation) and our cdr
lives at
*(0x10000 + kWordSize)
. The tagged pointer in this case would be 0x10001
and kWordSize
is 8.
We could make a call to malloc
whenever we need a new object. This has a
couple of downfalls, notably that malloc
does a lot of internal bookkeeping
that we really don’t need, and that there’s no good way to keep track of what
memory we have allocated and needs garbage collecting (which we’ll handle
later). It also has the unfortunate property of requiring C functional call
infrastructure, which we don’t have yet.
What we’re going to do instead is allocate a big slab of memory at the beginning of our process. That will be our heap. Then, to keep track of what memory we have used so far, we’re going to bump the pointer every time we allocate. So here’s what the heap looks like before we allocate a pair:
+-----+-----+-----+-----+-----+-----+-----+-----+
| | | | | | | | |...
+-----+-----+-----+-----+-----+-----+-----+-----+
^
heap
The empty cells aren’t necessarily empty, but they are unused and they are garbage data.
Here is what it looks like after we allocate a pair:
+-----+-----+-----+-----+-----+-----+-----+-----+
| car | cdr | | | | | | |...
+-----+-----+-----+-----+-----+-----+-----+-----+
^ ^
pair heap
Notice how the heap
pointer has been moved over 2 words, and the pair
pointer is the returned cons cell. Although we’ll tag the pair
pointer, I am
pointing it at the beginning of the car
for clarity in the diagram.
In order to get this big slab of memory in the first place, we’ll have the
outside C code (right now, that’s our test handler) call malloc
.
You’re probably wondering what we’re going to do when we run out of memory. At some point in this series we’ll have a garbage collector that can reclaim some space for us. Right now, though, we’re just going to do … nothing. That’s right, we won’t even raise some kind of “out of memory” error. Remember, we don’t yet have error reporting facilities! Instead, we’ll use tools like Valgrind and AddressSanitizer to make sure we’re not overrunning our allocated buffer.
In order to make allocation from that big buffer fast and easy, we’re going to
keep the heap pointer in a register. Our compiler emits instructions that use
rbp
, rsp
, and rax
, so we’ll have to pick another one. Ghuloum uses rsi
,
so we’ll use that as well.
In order to get the heap pointer in rsi
in the first place, we’ll have to
capture it from the outside C code. To do this, we’ll add a parameter to our
entrypoint by modifying the function prologue.
Remember JitFunction
? This is what the C code uses to understand how to call
our mmap
-ed function. We’re going to need to modify this first.
// Before:
typedef uword (*JitFunction)();
// After:
typedef uword (*JitFunction)(uword *heap);
That’s going to need to take a new parameter now — a pointer to the heap.
This means that our kFunctionPrologue
will need to expect that in the first
parameter register in the calling convention, and store it somewhere safe. This
register is rdi
, so we can emit a mov rsi, rdi
to store our heap pointer
away.
Now, for the lifetime of the Lisp entrypoint, we can refer to the heap by the
name rsi
and modify it accordingly. We’ll keep an internal convention that
rsi
always points to the next available chunk of memory.
Want to allocate memory? Copy the current heap pointer into rax
and update
the heap pointer with add rsi, AllocationSize
. We’ll need to add a new
instruction for moving data between registers. Honestly, I am kind of surprised
we haven’t needed that yet.
Want to store your car
and cdr
in your new pair? Write to offset 0
and
kWordSize
of rax
, respectively. We’ll reuse our indirect store instruction.
Want to tag your pointer? add rax, Tag
or or rax, Tag
. These two
instructions are equivalent because all the three taggable bits in a heap
object will be zero.
This word-alignment is easy to maintain now because all pairs will be size 16, which is a multiple of 8. Later on, when we add symbols and strings and other data types that have non-object data in them, we’ll have to insert padding between allocations to keep the alignment invariant.
Once we have pairs allocated, it’s kind of useless unless we can also poke at their elements.
To implement car
, we’ll remove the tag from the pointer and read from the
memory pointed to by the register: mov rax, [Ptr+Car-Tag]
. You can also do
this with a sub rax, Tag
and then a mov
.
Implementing cdr
is very similar, except we’ll be doing mov rax,
[Ptr+Cdr-Tag]
.
Now that we’ve gotten our minds around the abstract solution to the problem, we should write some code.
First off, here is the addition to the prologue I mentioned earlier:
const byte kEntryPrologue[] = {
// Save the heap in rsi, our global heap pointer
// mov rsi, rdi
kRexPrefix, 0x89, 0xfe,
};
Let’s once more add an entry to Compile_call
.
int Compile_call(Buffer *buf, ASTNode *callable, ASTNode *args,
word stack_index, Env *varenv) {
if (AST_is_symbol(callable)) {
// ...
if (AST_symbol_matches(callable, "cons")) {
return Compile_cons(buf, /*car=*/operand1(args), /*cdr=*/operand2(args),
stack_index, varenv);
}
// ...
}
}
We don’t really need to add a whole new function for cons
since we’re not
doing structural recursion on the parameters or anything, but Compile_call
just keeps getting bigger and this helps keep it smaller.
Compile_cons
is pretty much exactly what I described above. I pulled out
rsi
into kHeapPointer
so that we can change it later if we need to.
const Register kHeapPointer = kRsi;
int Compile_cons(Buffer *buf, ASTNode *car, ASTNode *cdr,
int stack_index, Env *varenv) {
// Compile and store car on the stack
_(Compile_expr(buf, car, stack_index, varenv));
Emit_store_reg_indirect(buf,
/*dst=*/Ind(kRbp, stack_index),
/*src=*/kRax);
// Compile and store cdr
_(Compile_expr(buf, cdr, stack_index - kWordSize, varenv));
Emit_store_reg_indirect(buf, /*dst=*/Ind(kHeapPointer, kCdrOffset),
/*src=*/kRax);
// Fetch car and store in the heap
Emit_load_reg_indirect(buf, /*dst=*/kRax, /*src=*/Ind(kRbp, stack_index));
Emit_store_reg_indirect(buf, /*dst=*/Ind(kHeapPointer, kCarOffset),
/*src=*/kRax);
// Store tagged pointer in rax
Emit_mov_reg_reg(buf, /*dst=*/kRax, /*src=*/kHeapPointer);
Emit_or_reg_imm8(buf, /*dst=*/kRax, kPairTag);
// Bump the heap pointer
Emit_add_reg_imm32(buf, /*dst=*/kHeapPointer, kPairSize);
return 0;
}
Note that even though we’re compiling two expressions one right after another,
we don’t need to bump .stack_index
or anything. This is because we’re storing
the results not on the stack but in the pair
As it turns out, we do need to store one of the intermediates on the stack
because otherwise we risk overwriting random data in the heap. As Leonard
Schütz pointed out to me, the previous
version of this code would fail if either the car
or cdr
expressions
modified the heap pointer. Thank you for the correction!
As promised, here is the new instruction to move data between registers:
void Emit_mov_reg_reg(Buffer *buf, Register dst, Register src) {
Buffer_write8(buf, kRexPrefix);
Buffer_write8(buf, 0x89);
Buffer_write8(buf, 0xc0 + src * 8 + dst);
}
Alright, that’s cons
. Let’s implement car
and cdr
. These are
extraordinarily short implementations:
int Compile_call(Buffer *buf, ASTNode *callable, ASTNode *args,
word stack_index, Env *varenv) {
if (AST_is_symbol(callable)) {
// ...
if (AST_symbol_matches(callable, "car")) {
_(Compile_expr(buf, operand1(args), stack_index, varenv));
Emit_load_reg_indirect(buf, /*dst=*/kRax,
/*src=*/Ind(kRax, kCarOffset - kPairTag));
return 0;
}
if (AST_symbol_matches(callable, "cdr")) {
_(Compile_expr(buf, operand1(args), stack_index, varenv));
Emit_load_reg_indirect(buf, /*dst=*/kRax,
/*src=*/Ind(kRax, kCdrOffset - kPairTag));
return 0;
}
// ...
}
}
Both car
and cdr
compile their argument and then load from the resulting
address.
That’s it. That’s the whole implementation! It’s kind of nice that now we have these building blocks, adding new features is not so hard.
I’ve written a couple of tests for this implementation. In order to make this
testing painless, I’ve also added a new type of test harness that passes the
tests a buffer and a heap. I call it — wait for it — RUN_HEAP_TEST
.
Anyway, here’s a test that we can allocate pairs. To fully test it, I’ve added
some helpers for poking at object internals: Object_pair_car
and
Object_pair_cdr
. Note that these may be the same as but are not necessarily
the same as the corresponding AST functions. The C compiler could
hypothetically re-order struct elements, I think. Joker_vD on Hacker News
points out that C compilers are not permitted to re-order elements, but may
insert padding for alignment.
TEST compile_cons(Buffer *buf, uword *heap) {
ASTNode *node = Reader_read("(cons 1 2)");
int compile_result = Compile_entry(buf, node);
ASSERT_EQ(compile_result, 0);
// clang-format off
byte expected[] = {
// mov rax, 0x2
0x48, 0xc7, 0xc0, 0x04, 0x00, 0x00, 0x00,
// mov [rbp-8], rax
0x48, 0x89, 0x45, 0xf8,
// mov rax, 0x4
0x48, 0xc7, 0xc0, 0x08, 0x00, 0x00, 0x00,
// mov [rsi+Cdr], rax
0x48, 0x89, 0x46, 0x08,
// mov rax, [rbp-8]
0x48, 0x8b, 0x45, 0xf8,
// mov [rsi+Car], rax
0x48, 0x89, 0x46, 0x00,
// mov rax, rsi
0x48, 0x89, 0xf0,
// or rax, kPairTag
0x48, 0x83, 0xc8, 0x01,
// add rsi, 2*kWordSize
0x48, 0x81, 0xc6, 0x10, 0x00, 0x00, 0x00,
};
// clang-format on
EXPECT_ENTRY_CONTAINS_CODE(buf, expected);
Buffer_make_executable(buf);
uword result = Testing_execute_entry(buf, heap);
ASSERT(Object_is_pair(result));
ASSERT_EQ_FMT(Object_encode_integer(1), Object_pair_car(result), "0x%lx");
ASSERT_EQ_FMT(Object_encode_integer(2), Object_pair_cdr(result), "0x%lx");
AST_heap_free(node);
PASS();
}
Here is a test for that tricky nested cons
case that messed me up originally:
TEST compile_nested_cons(Buffer *buf, uword *heap) {
ASTNode *node = Reader_read("(cons (cons 1 2) (cons 3 4))");
int compile_result = Compile_entry(buf, node);
ASSERT_EQ(compile_result, 0);
Buffer_make_executable(buf);
uword result = Testing_execute_entry(buf, heap);
ASSERT(Object_is_pair(result));
ASSERT(Object_is_pair(Object_pair_car(result)));
ASSERT_EQ_FMT(Object_encode_integer(1),
Object_pair_car(Object_pair_car(result)), "0x%lx");
ASSERT_EQ_FMT(Object_encode_integer(2),
Object_pair_cdr(Object_pair_car(result)), "0x%lx");
ASSERT(Object_is_pair(Object_pair_cdr(result)));
ASSERT_EQ_FMT(Object_encode_integer(3),
Object_pair_car(Object_pair_cdr(result)), "0x%lx");
ASSERT_EQ_FMT(Object_encode_integer(4),
Object_pair_cdr(Object_pair_cdr(result)), "0x%lx");
AST_heap_free(node);
PASS();
}
Here’s a test for reading the car
of a pair. The test for cdr
is so similar
I will not include it here.
TEST compile_car(Buffer *buf, uword *heap) {
ASTNode *node = Reader_read("(car (cons 1 2))");
int compile_result = Compile_entry(buf, node);
ASSERT_EQ(compile_result, 0);
// clang-format off
byte expected[] = {
// mov rax, 0x2
0x48, 0xc7, 0xc0, 0x04, 0x00, 0x00, 0x00,
// mov [rbp-8], rax
0x48, 0x89, 0x45, 0xf8,
// mov rax, 0x4
0x48, 0xc7, 0xc0, 0x08, 0x00, 0x00, 0x00,
// mov [rsi+Cdr], rax
0x48, 0x89, 0x46, 0x08,
// mov rax, [rbp-8]
0x48, 0x8b, 0x45, 0xf8,
// mov [rsi+Car], rax
0x48, 0x89, 0x46, 0x00,
// mov rax, rsi
0x48, 0x89, 0xf0,
// or rax, kPairTag
0x48, 0x83, 0xc8, 0x01,
// add rsi, 2*kWordSize
0x48, 0x81, 0xc6, 0x10, 0x00, 0x00, 0x00,
// mov rax, [rax-1]
0x48, 0x8b, 0x40, 0xff,
};
// clang-format on
EXPECT_ENTRY_CONTAINS_CODE(buf, expected);
Buffer_make_executable(buf);
uword result = Testing_execute_entry(buf, heap);
ASSERT_EQ_FMT(Object_encode_integer(1), result, "0x%lx");
AST_heap_free(node);
PASS();
}
I didn’t cover variable-length objects in this post because I wanted to focus
on the basics of allocating and poking at allocated data structures. Next time,
we’ll add symbols and strings we’ll learn about instruction encoding.