Compiling a Lisp: If

October 7, 2020

Welcome back to the “Compiling a Lisp” series. Last time we added support for let expressions. This time we’re going to compile if expressions.

Compiling if expressions will allow us to write code that performs decision making. For example, we can write code that does something based on the result of some imaginary function coin-flip:

(if (= (coin-flip) heads)
    123
    456)

If the call to coin-flip returns heads, then this whole if-expression will evaluate to 123. Otherwise, it will evaluate to 456. To determine if an expression is truthy, we’ll check if it is not equal to #f.

Note that the iftrue and iffalse expressions (consequent and alternate, respectively) are only evaluated if their branch is reached.

Implementation strategy

People normally compile if expressions by taking the following structure in Lisp:

(if condition
    consequent
    alternate)

and rewriting it to the following pseudo-assembly (where ...compile(X) is replaced with compiled code from the expressions):

  ...compile(condition)
  compare result, #f
  jump-if-equal alternate
  ...compile(consequent)
  jump end
alternate:
  ...compile(alternate)
end:

This will evaluate the condition expression. If it’s falsey, jump to the code for the alternate expression. Otherwise, continue on to the code for the consequent expression. So that the program does not also execute the code for the alternate, jump over it.

This transformation requires a couple of new pieces of infrastructure.

Implementation infrastructure

First, we’ll need two types of jump instructions! We have a conditional jump (jcc in x86-64) and an unconditional jump (jmp in x86-64). These are relatively straightforward to emit.

Somewhat more complicated are the targets of those jump instructions. We’ll need to supply each of the instructions with some sort of destination code address.

When emitting text assembly, this is not so hard: make up names for your labels (as with alternate and end above), and keep the names consistent between the jump instruction and the jump target. Sure, you need to generate unique labels, but the assembler will at least do address resolution for you. This address resolution transparently handles backward jumps (where the label is already known) and forward jumps (where the label comes after the jump instruction).

Since we’re not emitting text assembly, we’ll need to calculate both forward and backward jump offsets by hand. This ends up not being so bad in practice once we come up with an ergonomic way to do it. Let’s take a look at some production-grade assemblers for inspiration.

How Big Kid compilers do this

I read some source code for assemblers like the Dart assembler. Dart is a language runtime developed by Google and part of their infrastructure includes a Just-In-Time compiler, sort of like what we’re making here. Part of their assembler is some slick C++-y RAII infrastucture for emitting code and doing cleanup. Their implementation of compiling if expressions might look something like:

// Made-up APIs to make the Dart code look like our code
int Compile_if(Buffer *buf, ASTNode *cond, ASTNode *consequent,
               ASTNode *alternate) {
   Label alternate;
   Label end;
   compile(buf, cond);
   buf->cmp(kRax, Object::false());
   buf->jcc(kEqual, &alternate);
   compile(buf, consequent);
   buf->jmp(&end);
   buf->bind(&alternate);
   compile(buf, alternate);
   buf->bind(&end);
}

Their Label objects store information about where in the emitted machine code they are bound with bind. If they are bound before they are used by jcc or jmp or something, then the emitter will just emit the destination address. If they are not yet bound, however, then the Label will keep track of where it has to go back and patch the machine code once the label is bound to a location.

When the labels are destructed — meaning they can no longer be referenced by C++ code — their destructors have code to go back and patch all the instructions that referenced the label before it was bound.

While x86-64 has multiple jump widths available (for speed, I guess), it is a little tricky to use them for forward jumps. Because we don’t know in advance how long the intermediate code will be, we’ll just stick to generating 32-bit relative jumps always.

Virtual Machines like ART, OpenJDK Hotspot, SpiderMonkey, V8, HHVM, and Strongtalk also use this approach. So do the VM-agnostic AsmJit and GNU lightning assemblers. If I didn’t link an implementation, it’s either because I found the it too complicated to reproduce or couldn’t quickly track it down. Or maybe I don’t know about it!

Basically what I am trying to tell you is that this bind-and-backpatch approach is tried and tested and that we’re going to implement it in C. I hope you enjoyed the whirlwind tour of assemblers in various other runtimes along the way.

Compiling `if`-expressions, finally

Alright, so we finally get the big idea about how to do this transformation. Let’s put it into practice.

First, as with let, we’re going to need to handle the if case in Compile_call.

int Compile_call(Buffer *buf, ASTNode *callable, ASTNode *args,
                 word stack_index, Env *varenv) {
  if (AST_is_symbol(callable)) {
    // ...
    if (AST_symbol_matches(callable, "if")) {
      return Compile_if(buf, /*condition=*/operand1(args),
                        /*consequent=*/operand2(args),
                        /*alternate=*/operand3(args), stack_index, varenv);
    }
  }
  // ...
}

As usual, we’ll pull apart the expression so Compile_if has less work to do. Since we now have more than two operands (!), I’ve added operand3. It works just like you would think it does.

For Compile_if, we’re going to largely replicate the pseudocode C++ from above. I think you’ll find that if you squint it looks similar enough.

int Compile_if(Buffer *buf, ASTNode *cond, ASTNode *consequent,
               ASTNode *alternate, word stack_index, Env *varenv) {
  _(Compile_expr(buf, cond, stack_index, varenv));
  Emit_cmp_reg_imm32(buf, kRax, Object_false());
  word alternate_pos = Emit_jcc(buf, kEqual, kLabelPlaceholder); // je alternate
  _(Compile_expr(buf, consequent, stack_index, varenv));
  word end_pos = Emit_jmp(buf, kLabelPlaceholder); // jmp end
  Emit_backpatch_imm32(buf, alternate_pos);        // alternate:
  _(Compile_expr(buf, alternate, stack_index, varenv));
  Emit_backpatch_imm32(buf, end_pos); // end:
  return 0;
}

Instead of having a Label struct, though, I opted to just have a function to backpatch forward jumps explicitly. If you prefer to port Label to C, be my guest. I found it very finicky¹.

Also, instead of bind, I opted for a more explicit backpatch. This makes it clearer what is happening, I think.

This explicit backpatch approach requires manually tracking the offsets (like alternate_pos and end_pos) inside the jump instructions. We’ll need those offsets to backpatch them later. This means functions like Emit_jcc and Emit_jmp should return the offsets inside buf where they write placeholder offsets.

Let’s take a look inside these helper functions’ internals.

`jcc` and `jmp` implementations

The implementations for jcc and jmp are pretty similar, so I will only reproduce jcc here.

word Emit_jcc(Buffer *buf, Condition cond, int32_t offset) {
  Buffer_write8(buf, 0x0f);
  Buffer_write8(buf, 0x80 + cond);
  word pos = Buffer_len(buf);
  Buffer_write32(buf, disp32(offset));
  return pos;
}

This function is like many other Emit functions except for its return value. It returns the start location of the 32-bit offset for use in patching forward jumps. In the case of backward jumps, we can ignore this, since there’s no need to patch it after-the-fact.

Backpatching implementation

Here is the implementation of Emit_backpatch_imm32. I’ll walk through it and explain.

void Emit_backpatch_imm32(Buffer *buf, int32_t target_pos) {
  word current_pos = Buffer_len(buf);
  word relative_pos = current_pos - target_pos - sizeof(int32_t);
  Buffer_at_put32(buf, target_pos, disp32(relative_pos));
}

The input target_pos is the location inside the jmp (or similar) instruction that needs to be patched. Since we need to patch it with a relative offset, we compute the distance between the current position and the target position. We also need to subtract 4 bytes (sizeof(int32_t)) because the jump offset is relative to the end of the jmp instruction (the beginning of the next instruction).

Then, we write that value in. Buffer_at_put32 and disp32 are similar to their 8-bit equivalents.

Congratulations! You have implemented if.

A fun diagram

Radare2 has a tool called Cutter for reverse engineering and binary analysis. I decided to use it on the compiled output of a function containing an if expression. It produced this pretty graph!

Fig. 1 - Call graph as produced by Cutter.

It’s prettier in the tool, trust me.

Tests

I added two trivial tests for the condition being true and the condition being false. I also added a nested if case as a smoke test but I did not foresee that being troublesome with our handy recursive approach.

TEST compile_if_with_true_cond(Buffer *buf) {
  ASTNode *node = Reader_read("(if #t 1 2)");
  int compile_result = Compile_function(buf, node);
  ASSERT_EQ(compile_result, 0);
  byte expected[] = {
      // mov rax, 0x9f
      0x48, 0xc7, 0xc0, 0x9f, 0x00, 0x00, 0x00,
      // cmp rax, 0x1f
      0x48, 0x3d, 0x1f, 0x00, 0x00, 0x00,
      // je alternate
      0x0f, 0x84, 0x0c, 0x00, 0x00, 0x00,
      // mov rax, compile(1)
      0x48, 0xc7, 0xc0, 0x04, 0x00, 0x00, 0x00,
      // jmp end
      0xe9, 0x07, 0x00, 0x00, 0x00,
      // alternate:
      // mov rax, compile(2)
      0x48, 0xc7, 0xc0, 0x08, 0x00, 0x00, 0x00
      // end:
  };
  EXPECT_FUNCTION_CONTAINS_CODE(buf, expected);
  Buffer_make_executable(buf);
  uword result = Testing_execute_expr(buf);
  ASSERT_EQ_FMT(Object_encode_integer(1), result, "0x%lx");
  AST_heap_free(node);
  PASS();
}

TEST compile_if_with_false_cond(Buffer *buf) {
  ASTNode *node = Reader_read("(if #f 1 2)");
  int compile_result = Compile_function(buf, node);
  ASSERT_EQ(compile_result, 0);
  byte expected[] = {
      // mov rax, 0x1f
      0x48, 0xc7, 0xc0, 0x1f, 0x00, 0x00, 0x00,
      // cmp rax, 0x1f
      0x48, 0x3d, 0x1f, 0x00, 0x00, 0x00,
      // je alternate
      0x0f, 0x84, 0x0c, 0x00, 0x00, 0x00,
      // mov rax, compile(1)
      0x48, 0xc7, 0xc0, 0x04, 0x00, 0x00, 0x00,
      // jmp end
      0xe9, 0x07, 0x00, 0x00, 0x00,
      // alternate:
      // mov rax, compile(2)
      0x48, 0xc7, 0xc0, 0x08, 0x00, 0x00, 0x00
      // end:
  };
  EXPECT_FUNCTION_CONTAINS_CODE(buf, expected);
  Buffer_make_executable(buf);
  uword result = Testing_execute_expr(buf);
  ASSERT_EQ_FMT(Object_encode_integer(2), result, "0x%lx");
  AST_heap_free(node);
  PASS();
}

I made sure to test the generated code because we added some new instructions and also because I had trouble getting the offset computations perfectly right initially.

Anyway, that’s all for today. This post was made possible by contributions² to my blog from Viewers Like You. Thank you.

Next time on PBS, heap allocation.

Mini Table of Contents

Maybe it would be less finicky with __attribute__((cleanup)), but that is non-standard. This StackOverflow question and associated answers have some good information.

↩
By “contributions” I mean thoughtful comments, questions, and appreciation. Feel free to chime in on Twitter, HN, Reddit, lobste.rs, the mailing list… ↩