Compilers

Eagle

golen eagle
Eagle

Due on Tuesday, March 18th at 11:59 PM. This is a compiler lab. If you have a partner, the two of you will complete this lab as a team. If you do not have a partner, you are effectively in a team by yourself.

If you are working with a partner, please be familiar with the Partner Etiquette guidelines. You and your partner share a single repository and, barring unusual circumstances, will receive the same grade. If you experience any difficulties in your partnership, please alert your instructor as soon as possible.

If you are working alone, please don’t hesitate to seek help if you get stuck. Since there are no ninjas for this course and you don’t have a partner, your instructor is the only interactive resource available to you and is happy to help you through any problems you might encounter.

In either case, be familiar with the academic integrity policy! You are permitted to discuss high-level details of your compiler project with other students, but you should never see or share anything that will be turned in for a grade. If you need help, please post on the course forum or, if necessary, contact the instructor directly. Make sure to post privately if you are sharing code. If you have any doubts about what is okay and what is not, it’s much safer to ask than to risk violating the Academic Integrity Policy.

Overview

In this lab, we will extend the Dove compiler to a new language called Eagle. Eagle includes all of the functionality of Dove as well as heap-allocated tuples.

Migrating to Eagle

To start working on your Eagle compiler, you’ll run the same sort of commands as usual:

git fetch
git switch eagle
git merge dove

There are no new files in this lab. The files in src/language as well as the resources/printer.c file have been adjusted to support tuples. Once you have merged successfully, you should be able to make clean, make tests, and run ./tests to confirm that everything is working. As usual, you’ll get warnings about unmatched Eagle cases in the ASTs, but your Dove unit tests should run fine.

The Eagle Language

As with previous languages, we will begin with the syntax and semantics for Eagle. The only difference is the addition of tuples and operations performed on them, although these new features have broad implications for our compiler which are discussed below.

Concrete Syntax

Rather than show the full Eagle syntax, we summarize its additions from Dove to highlight the new expressions for tuples:

<expression> ::=
  | ...
  | (<expression>, <expression_list>)
  | <expression>[<expression>]
  | istuple(<expression>)

<expression_list> ::=
  | <expression> , <expression_list>
  | <expression>

That is, our new grammar allows:

The remainder of the concrete syntax is as in Dove. Operational precedence is the same as in Dove but favors tuple indexing; that is, x + y[1] parses as x + (y[1]) and not as (x + y)[1] (since the latter of which is never what we want).

Abstract Syntax

As usual, you can see the definitions of expressions for Eagle in src/language/asts.ml. Make sure to read over those definitions.

Semantics

Tuple expressions in Eagle construct new tuples. Tuple indexing expressions retrieve a value from the tuple where the leftmost element has index 0. For instance,

let t = (1,3,6,10) in
t[2]

evaluates to 6 because the tuple t has a 6 at index 2 (the third position). The istuple operator determines whether a value is a tuple or not, similar to isbool and isint. That is,

let t = (1,false,4) in
let n = 5 in
(istuple(t), istuple(n))

evaluates to the tuple (true, false). Note that tuples can contain any other kind of value (including other tuples!). Sub-expressions in a tuple evaluate from left to right; for instance,

let t = (print(1),print(2)) in
t[0]

must print

1
2
1

where the last 1 is produced by the driver showing the result of the program.

Runtime Errors

There are two new forms of run-time error which can occur in Eagle:

Your handling of the errors in tuple indexing expressions should be in the following order:

These runtime errors should be reported in addition to the runtime errors from the Dove lab. Note that tuples are a new kind of data type which is neither an integer nor a boolean, so every operator that expects an integer or a boolean should produce the correct error if passed a tuple. For instance, (1,2) + 3 should produce the same error as true + 3.

The Heap

With the introduction of tuples, we are no longer able to store every value in 64 bits. We will store all tuples in Eagle on the heap rather than the stack; we’ll then store a 64-bit pointer to those heap values in normal stack variables. To do this, of course, we’ll need to have some heap memory.

Creating the Heap

As with printing and runtime errors, memory allocation will require us to interact with the system. Rather than getting into system-specific memory allocation details, we will simply allocate a large block of memory in our C driver and pass it as an argument to bird_main.

To allocate the memory, use the C function malloc from the stdlib.h library. Make sure that you allocate enough memory for at least a million 64-bit values (e.g. sizeof(int64_t)*1000000). In your driver’s main function, create an int64_t* with malloc and pass it to bird_main as the only argument. You’ll need to update the declaration of bird_main in the same file.

Global Variables

We need to have access to this pointer throughout the execution of our bird_main function, so we need a global variable to store it. A global variable may be defined in the .data section of an x86 assembly file like so:

section .data
align 8
heap_cursor:
  dq 0

Line by line, the above code states:

Your code will require a single global variable to store the current heap pointer. You can access the content of this global variable by using the same indirection notation as with register addresses:

mov rax, [heap_cursor]

To support this, you may wish to add the following to your assembly.ml file:

Initializing the Heap Pointer

At the start of bird_main, the heap pointer is provided to you by the driver as a function argument. You should retrieve that argument and store it in heap_cursor. This should be enough to get started.

Heap Layout

Every value in an Eagle program’s heap memory is a tuple. Each tuple of \(n\) elements requires \(n+1\) words of memory to store; the first word of memory is a machine integer (not a bird integer!) describing the number of elements in the tuple. For instance, the 3-tuple

(7,11,13)

will allocate 32 bytes on the heap which will be laid out as follows:

(64 bits) (64 bits) (64 bits) (64 bits)
0x0000000000000003 0x00000000000000e 0x0000000000000016 0x000000000000001a

The first value, 0x0000000000000003, indicates the number of elements in the tuple. The remaining three values are the Eagle values stored in each location using the same representation of values that we use in local variables and on the stack (e.g. the first element is 0x000000000000000e because this is twice the integer value 7).

Representing Tuples

Although the above gives us a plan for using the heap memory, we also need to figure out how to refer to it. To do this, we will have to modify the binary representation we used in Eagle. Previously, we only had integers (ending in 0) and booleans (ending in 1). We’ll now use the following representations:

For instance, 0xbe0bad0beef0cab9 (which is 0xbe0bad0beef0cab[1001]) is interpreted as a pointer to the memory location 0xbe0bad0beef0cab8 (which is 0xbe0bad0beef0cab[1000]). This allows us to point to any memory location on a 8-byte boundary, which is acceptable since all of our values are 8 bytes in size.

Note that this representation gives us the same guarantees about integers but not about booleans. It is no longer sufficient to check the last bit of a word to determine if it is a boolean value; we must now check that the last two bits are 11 to decide if a value is a boolean. You may use a jump to do this; you might also choose to use xor, which sets the zero flag (used by jz and jnz) if the result of the xor is zero. In either case, this means you need to revisit things like isbool and your boolean runtime checks.

To create a tuple, you need only allocate the appropriate amount of heap memory, set the first word to the size of the tuple, and copy the tuple’s contents to the rest of the allocated memory. The variable heap_cursor should contain a pointer to the next byte of free heap memory at all times. Don’t forget to set the last three bits of your Eagle pointer to 001 so it can be recognized as a bird heap pointer. Also, don’t worry about running out of memory; Eagle programs that consume too much memory will simply crash, which is why we’re starting with such a large heap.

Accessing Tuple Contents

To access the contents of the tuple, we must perform several operations.

  1. First, check to ensure that the item in question is actually a tuple. As described above, expressions like 4[5] should terminate with an appropriate error.
  2. Verify that the index is an integer. Expressions like (1,2)[true] should terminate with an exit code of 1, since we expected but did not get an integer.
  3. Mask off the last three bits of the pointer. The 001 tag bits will be used to indicate that the value is a pointer, but we don’t actually want to point to that location in memory!
  4. Next, verify that the index is in bounds. Be careful here: the tuple size on the heap is in machine form but the index is in the Eagle representation. If the index is negative or is too large, the program should terminate with the appropriate error described above.
  5. Calculate the location of the element you want and copy it into rax.

The ordering of the above is important because it controls which error code you generate in situations where multiple errors exist. For instance, 8[true] should return exit code 3 (because 8 is not a tuple), not exit code 1 (even though true is not an integer).

Registers and Calculating Addresses

Some of these calculations are quite tedious. Remember that you can use several registers to perform your calculations. When you use new registers, however, be careful to check that you are conforming to the C and bird calling conventions (as appropriate) for the registers you are using!

x86 assembly supports a form of infix address computation like the following:

mov rax, [rax + r10 * 8]

This can be quite handy for e.g. retrieving values from tuples. You could add a constructor to your address data type (named e.g. AddressByRegisterProductOffset) if you want to use this addressing form in your generated code.

Implementing the Eagle Compiler

There are a few different tasks at hand to implement Eagle, but many of them are not very interdependent. For this reason, the organization of the lab work is left to you. See below for a summary of the work you’ll need to do.

It will be especially helpful to use tools like gdb to debug your compiler. Refer to the Bluebird lab for suggestions.

Summary

To complete this assignment, you’ll need to

Submitting

It is not enough to push your work. In your repository, you will find a Python script called submit.py. In order to submit your lab assignment, it should be sufficient to run that script by typing python3 submit.py -a eagle in your terminal. For this script to work correctly, you must have committed all uncommitted files in your repository and pushed your work so that your repository is in sync with your GitHub remote repository. Note that, because your compiler repository will be used for multiple different assignments, you must specify which assignment you are submitting when you run the submit script.

The aforementioned Python script is designed to create a Git tag: a name for a specific commit in your repository. (The script also performs a series of checks to help ensure that the tag you create contains what you likely think it contains.) The tag will be named using the words “submission”, the name of your assignment, and a version number (such as submission_eagle_01). Your instructor will know that you have submitted your work because this tag will appear in your repository. If you need to resubmit your work to correct changes after receiving a grade, you can simply create new commits and then create another tag (preferrably with submit.py). At any given time, only your most recent outstanding submission will be graded.

In addition to submitting your work, you should fill out a brief questionnaire found here. In most cases, the questionnaire should take less than a minute.

If You Have Trouble…

…then please contact your instructor! The course forum is the preferred method, but you can reach out via e-mail as well. Good luck!