Eagle
Due on Thursday, March 17th at 11:59 PM. This is a compiler lab. If you have a partner, the two of you will complete this lab as a team. If you do not have a partner, you are effectively in a team by yourself.
If you are working with a partner, please be familiar with the Partner Etiquette guidelines. You and your partner share a single repository and, barring unusual circumstances, will receive the same grade. If you experience any difficulties in your partnership, please alert your instructor as soon as possible.
If you are working alone, please don’t hesitate to seek help if you get stuck. Since there are no ninjas for this course and you don’t have a partner, your instructor is the only interactive resource available to you and is happy to help you through any problems you might encounter.
In either case, be familiar with the academic integrity policy! You are permitted to discuss high-level details of your compiler project with other students, but you should never see or share anything that will be turned in for a grade. If you need help, please post on the course forum or, if necessary, contact the instructor directly. Make sure to post privately if you are sharing code. If you have any doubts about what is okay and what is not, it’s much safer to ask than to risk violating the Academic Integrity Policy.
Overview
In this lab, we will extend the Dove compiler to a new language called Eagle. Eagle includes all of the functionality of Dove as well as heap-allocated tuples.
Transitioning to Eagle
To start working on your Eagle compiler, you’ll run the same sort of commands as usual:
git fetch
git checkout eagle
git merge dove
There are no new files in this lab. The files in src/language
as well as the resources/printer.c
file have been adjusted to support tuples. Once you have merged successfully, you should be able to make clean
, make tests
, and run ./tests
to confirm that everything is working. As usual, you’ll get warnings about unmatched Eagle cases in the ASTs, but your Dove unit tests should run fine.
The Eagle Language
As with previous languages, we will begin with the syntax and semantics for Eagle. The only difference is the addition of tuples and operations performed on them, although these new features have broad implications for our compiler which are discussed below.
Concrete Syntax
Rather than show the full Eagle syntax, we summarize its additions from Dove to highlight the new expressions for tuples:
<expression> ::=
| ...
| (<expression>, <expression_list>)
| <expression>[<expression>]
| istuple(<expression>)
<expression_list> ::=
| <expression> , <expression_list>
| <expression>
That is, our new grammar allows:
- A comma-separated list of at least two expressions surrounded by parentheses (e.g.
(4,5,6)
) - A bracketed indexing notation similar to Python (e.g.
my_tuple[1]
) - A built-in unary operator to determine if a value is a tuple at runtime
The remainder of the concrete syntax is as in Dove. Operational precedence is the same as in Dove but favors tuple indexing; that is, x + y[1]
parses as x + (y[1])
and not as (x + y)[1]
(since the latter of which is never what we want).
Abstract Syntax
As usual, you can see the definitions of expressions for Eagle in src/language/asts.ml
. Make sure to read over those definitions.
Semantics
Tuple expressions in Eagle construct new tuples. Tuple indexing expressions retrieve a value from the tuple where the leftmost element has index 0
. For instance,
let t = (1,3,6,10) in
t[2]
evaluates to 6
because the tuple t
has a 6
at index 2
(the third position). The istuple
operator determines whether a value is a tuple or not, similar to isbool
and isint
. That is,
let t = (1,false,4) in
let n = 5 in
(istuple(t), istuple(n))
evaluates to the tuple (true, false)
. Note that tuples can contain any other kind of value (including other tuples!). Sub-expressions in a tuple evaluate from left to right; for instance,
let t = (print(1),print(2)) in
t[0]
must print
1
2
1
where the last 1
is produced by the driver showing the result of the program.
Runtime Errors
There are two new forms of run-time error which can occur in Eagle:
- The user could attempt to use access a non-tuple value by index (e.g.
(1+2)[1]
). We will use exit code3
in this case. - The user could attempt to access a tuple using an invalid index (either negative or too large, e.g.
(1,2,3)[8]
). We will use exit code4
in this case.
Your handling of the errors in tuple indexing expressions should be in the following order:
- If the first expression is not a tuple, fail with exit code
3
. - Otherwise, if the second expression is not an integer, fail with exit code
1
. - Otherwise, if the second expression is negative or too large, fail with exit code
4
.
These runtime errors should be reported in addition to the runtime errors from the Dove lab. Note that tuples are a new kind of data type which is neither an integer nor a boolean, so every operator that expects an integer or a boolean should produce the correct error if passed a tuple. For instance, (1,2) + 3
should produce the same error as true + 3
.
The Heap
With the introduction of tuples, we are no longer able to store every value in 64 bits. We will store all tuples in Eagle on the heap rather than the stack; we’ll then store a 64-bit pointer to those heap values in normal stack variables. To do this, of course, we’ll need to have some heap memory.
Creating the Heap
As with printing and runtime errors, memory allocation will require us to interact with the system. Rather than getting into system-specific memory allocation details, we will simply allocate a large block of memory in our C driver and pass it as an argument to bird_main
.
To allocate the memory, use the C function malloc
from the stdlib.h
library. Make sure that you allocate enough memory for at least a million 64-bit values (e.g. sizeof(int64_t)*1000000
). In your driver’s main
function, create an int64_t*
with malloc
and pass it to bird_main
as the only argument. You’ll need to update the declaration of bird_main
in the same file.
Global Variables
We need to have access to this pointer throughout the execution of our bird_main
function, so we need a global variable to store it. A global variable may be defined in the .data
section of an x86 assembly file like so:
section .data
align 8
heap_cursor:
dq 0
Line by line, the above code states:
section .data
: The following lines are part of the data section of your program. We didn’t use a data section before, but we are introducing one now to allow the creation of global variables for this and future labs.align 8
: Everything in this section should be aligned to a 8-byte boundary.heap_cursor:
: A label identifying this location in memory. This works just like a label in the.text
section of your code.dq 0
: Allocates a single 4-byte unit of memory containing the value0
.
Your code will require a single global variable to store the current heap pointer. You can access the content of this global variable by using the same indirection notation as with register addresses:
mov rax, [heap_cursor]
To support this, you may wish to add the following to your assemblyLanguage.ml
file:
- An
instruction
constructorAsmSection
. This is a pseudo-instruction: it doesn’t wind up in your object file directly, but it gives the assembler some information about how to write the object file’s contents. - An
instruction
constructorAsmAlign
, which is a pseudo-instruction likeAsmSection
. - An
instruction
constructorAsmDq
. You can have any type of argument you like: astring list
suffices, or you may want a list of your own memory constant types (e.g.raw_data list
). - An
address
constructorAddressByLabel
. This is the constructor which would produce the[heap_cursor]
text above.
Initializing the Heap Pointer
At the start of bird_main
, the heap pointer is provided to you by the driver as a function argument. You should retrieve that argument and store it in heap_cursor
. This should be enough to get started.
Heap Layout
Every value in an Eagle program’s heap memory is a tuple. Each tuple of \(n\) elements requires \(n+1\) words of memory to store; the first word of memory is a machine integer (not a bird integer!) describing the number of elements in the tuple. For instance, the 3-tuple
(7,11,13)
will allocate 32 bytes on the heap which will be laid out as follows:
(64 bits) | (64 bits) | (64 bits) | (64 bits) |
---|---|---|---|
0x0000000000000003 |
0x00000000000000e |
0x0000000000000016 |
0x000000000000001a |
The first value, 0x0000000000000003
, indicates the number of elements in the tuple. The remaining three values are the Eagle values stored in each location using the same representation of values that we use in local variables and on the stack (e.g. the first element is 0x000000000000000e
because this is twice the integer value 7
).
Representing Tuples
Although the above gives us a plan for using the heap memory, we also need to figure out how to refer to it. To do this, we will have to modify the binary representation we used in Eagle. Previously, we only had integers (ending in 0
) and booleans (ending in 1
). We’ll now use the following representations:
0xZZZZZZZZZZZZZZZ[zzz0]
: the integer0xZZZZZZZZZZZZZZZ[zzz]
, as in previous labs.0xFFFFFFFFFFFFFFF[1111]
: the booleantrue
, as in previous labs.0x7FFFFFFFFFFFFFF[1111]
: the booleanfalse
, as in previous labs.0xZZZZZZZZZZZZZZZ[zz01]
: a pointer to the heap memory0xZZZZZZZZZZZZZZZ[zz00]
.
For instance, 0xa0badfad0beefbed
(which is 0xa0badfad0beefbe[1101]
) is interpreted as a pointer to the memory location 0xa0badfad0beefbec
(which is 0xa0badfad0beefbe[1100]
). This allows us to point to any memory location on a 8-byte boundary, which is acceptable since all of our values are 8 bytes in size.
Note that this representation gives us the same guarantees about integers but not about booleans. It is no longer sufficient to check the last bit of a word to determine if it is a boolean value; we must now check that the last two bits are 11
to decide if a value is a boolean. You may use a jump to do this; you might also choose to use xor
, which sets the zero flag (used by jz
and jnz
) if the result of the xor
is zero. In either case, this means you need to revisit things like isbool
and your boolean runtime checks.
To create a tuple, you need only allocate the appropriate amount of heap memory, set the first word to the size of the tuple, and copy the tuple’s contents to the rest of the allocated memory. The variable heap_cursor
should contain a pointer to the next byte of free heap memory at all times. Don’t forget to set the last two bits of your Eagle pointer to 01
so it can be recognized as a bird heap pointer. Also, don’t worry about running out of memory; Eagle programs that consume too much memory will simply crash, which is why we’re starting with such a large heap.
Accessing Tuple Contents
To access the contents of the tuple, we must perform several operations.
- First, check to ensure that the item in question is actually a tuple. As described above, expressions like
4[5]
should terminate with an appropriate error. - Verify that the index is an integer. Expressions like
(1,2)[true]
should terminate with an exit code of1
, since we expected but did not get an integer. - Mask off the last two bits of the pointer. The
01
tag bits will be used to indicate that the value is a pointer, but we don’t actually want to point to that location in memory! - Next, verify that the index is in bounds. Be careful here: the tuple size on the heap is in machine form but the index is in the Eagle representation. If the index is negative or is too large, the program should terminate with the appropriate error described above.
- Calculate the location of the element you want and copy it into
rax
.
The ordering of the above is important because it controls which error code you generate in situations where multiple errors exist. For instance, 8[true]
should return exit code 3
(because 8
is not a tuple), not exit code 1
(even though true
is not an integer).
Registers and Calculating Addresses
Some of these calculations are quite tedious. Remember that you can use several registers to perform your calculations. When you use new registers, however, be careful to check that you are conforming to the C and bird calling conventions (as appropriate) for the registers you are using!
x86 assembly supports a form of infix address computation like the following:
mov rax, [rax + r10 * 4]
This can be quite handy for e.g. retrieving values from tuples. You could add a constructor to your address
data type (named e.g. AddressByRegisterProductOffset
) if you want to use this addressing form in your generated code.
Implementing the Eagle Compiler
There are a few different tasks at hand to implement Eagle, but many of them are not very interdependent. For this reason, the organization of the lab work is left to you. See below for a summary of the work you’ll need to do.
It will be especially helpful to use tools like gdb
to debug your compiler. Refer to the Bluebird lab for suggestions.
Summary
To complete this assignment, you’ll need to
- Update your driver and
bird_main
to track free heap memory in the variableheap_cursor
- Add assembly code generation for tuples and tuple operations
- Update your existing code that interacts with booleans to account for the change in tag bits
- Add runtime error checking for when existing operations receive tuples
Submitting
It is not enough just to push your code. Due dates are flexible, so it is necessary for you to take action to make it clear which commit you would like to have graded. We will use Git tags for this. A tag is a way of giving a human-readable name to a particular commit. Unlike a branch, however, tags are expected not to change once they are created.
When you are finished working on this compiler assignment, commit and push your work. Then, once you are sure that there are no additional changes you need to make, run the following commands:
$ git tag eagle-submission
$ git push --tags
This will create a tag named eagle-submission
and push that tag to the Swarthmore GitHub Enterprise server. Your work on that tag will be graded.
In addition to pushing and tagging your work, you will need to fill out a brief questionnaire found here. In most cases, the questionnaire should take less than a minute. This questionnaire is required and will be used as part of your participation grade.
If You Have Trouble…
…then please contact your instructor! The course forum is the preferred method, but you can reach out via e-mail as well. Good luck!