Egg-Eater
Due on Monday, March 20th at 11:59 PM. This is a compiler lab. If you have a partner, the two of you will complete this lab as a team. If you do not have a partner, you are effectively in a team by yourself.
If you are working with a partner, please be familiar with the Partner Etiquette guidelines. You and your partner share a single repository and, barring unusual circumstances, will receive the same grade. If you experience any difficulties in your partnership, please alert your instructor as soon as possible.
If you are working alone, please don’t hesitate to seek help if you get stuck. Since there are no ninjas for this course and you don’t have a partner, your instructor is the only interactive resource available to you and is happy to help you through any problems you might encounter.
In either case, be familiar with the academic integrity policy! You are permitted to discuss high-level details of your compiler project with other students, but you should never see or share anything that will be turned in for a grade. If you need help, please post on the Piazza forum or, if necessary, contact the instructor directly. Make sure to post privately if you are sharing code. If you have any doubts about what is okay and what is not, it’s much safer to ask than to risk violating the Academic Integrity Policy.
Overview
In this lab, we will extend the Diamondback compiler to a new language called Egg-eater. Egg-eater includes all of the functionality of Diamondback as well as heap-allocated tuples.
Transitioning to Egg-Eater
When you git pull
, you’ll discover two new items:
src/languages/eggEater/
, the directory containing our lexer, parser, and ASTs.resources/printer.c
, which has been adjusted to accommodate tuples.
As usual, we will begin by changing the LANGUAGE
variable in language.mk
from diamondback
to eggEater
. Immediately after doing so, you should run make clean
, make tests
, and ./tests.byte
. Your tests from Diamondback should run correctly using the Egg-Eater parser, although you’ll get a lot of warnings about not handling the cases for tuples.
The Egg-Eater Language
As with previous languages, we will give the syntax and semantics for Egg-Eater. The only difference is the addition of tuples and operations performed on them, although these new features have broad implications for our compiler which are discussed below.
Concrete Syntax
Rather than show the full Egg-Eater syntax, it is clearer to highlight the new expressions for tuples:
<expression> ::=
| ...
| (<expression>, <expression_list>)
| <expression>[<expression>]
| istuple(<expression>)
<expression_list> ::=
| <expression> , <expression_list>
| <expression>
That is, our new grammar allows:
- A comma-separated list of at least two expressions surrounded by parentheses (e.g.
(4,5,6)
) - A bracketed indexing notation similar to Python (e.g.
my_tuple[1]
) - A built-in function to determine if a value is a tuple at runtime
The remainder of the concrete syntax is as in Diamondback. Operational precedence is the same as in Diamondback but favors tuple indexing; that is, x + y[1]
parses as x + (y[1])
and not as (x + y)[1]
(the latter of which is never what we want).
Abstract Syntax
As usual, you can see the definitions of expressions for Egg-Eater in src/language/eggEater/expressions.ml
. Make sure to read over those definitions.
Semantics
Tuple expressions in Egg-Eater construct new tuples. Tuple indexing expressions retrieve a value from the tuple where the leftmost element has index 0
. For instance,
let t = (1,3,6,10) in
t[2]
evaluates to 6
because the tuple t
has a 6
at index 2
(the third position). The istuple
operator determines whether a value is a tuple or not, similar to isbool
and isint
. That is,
let t = (1,false,4) in
let n = 5 in
(istuple(t), istuple(n))
evaluates to the tuple (true, false)
. Note that tuples can contain any other kind of value (including other tuples!). Sub-expressions in a tuple evaluate from left to right; for instance,
let t = (print(1),print(2)) in
t[0]
must print
1
2
1
where the last 1
is produced by the driver showing the result of the program.
Runtime Errors
There are two new forms of run-time error which can occur in Egg-Eater:
- The user could attempt to use access a non-tuple value by index (e.g.
(1+2)[1]
). We will use exit code4
in this case. - The user could attempt to access a tuple using an invalid index (either negative or too large, e.g.
(1,2,3)[8]
). We will use exit code5
in this case.
Your handling of the errors should be in the following order:
- If the first expression is not a tuple, fail with exit code
4
. - Otherwise, if the second expression is not an integer, fail with exit code
1
. - Otherwise, if the second expression is negative or too large, fail with exit code
5
.
These runtime errors should be reported in addition to the runtime errors from the Diamondback lab. Note that tuples are a new kind of data type which is neither an integer nor a boolean, so every operator that expects an integer or a boolean should produce the correct error if passed a tuple. For instance, (1,2) + 3
should produce the same error as true + 3
.
The Heap
With the introduction of tuples, we are no longer able to store every value in 32 bits: tuples like (500, 1067)
will require a different approach. We will store all tuples in Egg-Eater on the heap rather than the stack; we’ll then store a 32-bit pointer to those heap values in normal stack variables. To do this, of course, we’ll need to have some heap memory.
Creating the Heap
As with printing and runtime errors, memory allocation will require us to interact with the system. Rather than getting into system-specific memory allocation details, we will simply allocate a large block of memory in our C driver and pass it as an argument to snake_main
.
To allocate the memory, use the C function malloc
from the stdlib.h
library. Make sure that you allocate enough memory for at least a million 32-bit values (e.g. sizeof(int)*1000000
). In your driver’s main
function, create an int*
with malloc
and pass it to snake_main
as the only argument. You’ll need to update the declaration of snake_main
in the same file.
Global Variables
We need to have access to this pointer throughout the execution of our snake_main
function, so we need a global variable to store it. A global variable may be defined in the .data
section of an x86 assembly file like so:
section .data
align 4
heap_cursor:
dd 0
Line by line, the above code states:
section .data
: The following lines are part of the data section of your program. We didn’t use a data section before, but we are introducing one now to allow the creation of global variables for this and future labs.align 4
: Everything in this section should be aligned to a 4-byte boundary.heap_cursor:
: A label identifying this location in memory. This works just like a label in the.text
section of your code.dd 0
: Allocates a single 4-byte unit of memory containing the value0
.
Your code will require a single global variable to store the current heap pointer. You can access the content of this global variable by using the same indirection notation as with register addresses:
mov eax, [heap_cursor]
To support this, you may wish to add the following to your assemblyLanguage.ml
file:
- An instruction constructor
XSection
. This is a pseudo-instruction: it doesn’t wind up in your object file directly, but it gives the assembler some information about how to write the object file’s contents. - An instruction constructor
XAlign
, which is a pseudo-instruction likeXSection
. - An instruction constructor
XDd
. You can have any type of argument you like: astring list
suffices, or you may want a list of your own memory constant types (e.g.raw_data list
). - An address constructor
XAddressByLabel
. This is the constructor which would produce the[heap_cursor]
text above.
Initializing the Heap Pointer
At the start of snake_main
, the heap pointer is provided to you by the driver as a function argument. You should retrieve that argument and store it in heap_cursor
. This should be enough to get started.
Heap Layout
Every value in an Egg-Eater program’s heap memory is a tuple. Each tuple of \(n\) elements requires \(n+1\) words of memory to store; the first word of memory is a machine integer (not an Egg-Eater integer) describing the number of elements in the tuple. For instance, the 3-tuple
(7,11,13)
will allocate 16 bytes on the heap which will be laid out as follows:
(32 bits) | (32 bits) | (32 bits) | (32 bits) |
---|---|---|---|
0x00000003 |
0x000000e |
0x00000016 |
0x0000001a |
The first value, 0x00000003
, indicates the number of elements in the tuple. The remaining three values are the Egg-Eater values stored in each location (using the same representation of values that we use in local variables and on the stack).
Representing Tuples
Although the above gives us a plan for using the heap memory, we also need to figure out how to refer to it. To do this, we will have to modify the binary representation we used in Diamondback. Previously, we only had integers (ending in 0
) and booleans (ending in 1
). We’ll now use the following representations:
0xZZZZZZZ[zzz0]
: the integer0xZZZZZZZzzz
, as in previous labs.0xFFFFFFF[1111]
: the booleantrue
, as in previous labs.0x7FFFFFF[1111]
: the booleanfalse
, as in previous labs.0xZZZZZZZ[zz01]
: a pointer to the heap memory0xZZZZZZZzz00
.
For instance, 0xabeefbed
(which is 0xabeefbe[1101]
) is a pointer to the memory location 0xabeefbec
(which is 0xabeefbe[1100]
). This allows us to point to any memory location on a 4-byte boundary, which is acceptable since all of our values are 4 bytes in size.
Note that this representation gives us the same guarantees about integers but not about booleans. It is no longer sufficient to check the last bit of a word to determine if it is a boolean value; we must now check that the last two bits are 11
to decide if a value is a boolean. You may use a jump to do this; you might also choose to use xor
, which sets the zero flag (used by jz
and jnz
) if the result of the xor
is zero. In either case, this means you need to reconsider thigns like isbool
and your boolean runtime checks.
To create a tuple, you need only allocate the appropriate amount of heap memory, set the first word to the size of the tuple, and copy the tuple’s contents to the rest of the allocated memory. The variable heap_cursor
should contain a pointer to the next byte of free heap memory at all times. Don’t forget to set the last two bits of your pointer to 01
so it can be recognized as a heap pointer. Also, don’t worry about running out of memory; Egg-Eater programs that consume too much memory will simply crash, which is why we’re starting with such a large heap.
Accessing Tuple Contents
To access the contents of the tuple, we must perform several operations.
- First, check to ensure that the item in question is actually a tuple. As described above, expressions like
4[5]
should terminate with an appropriate error. - Verify that the index is an integer. Expressions like
(1,2)[true]
should terminate with an exit code of1
, since we expected but did not get an integer. - Mask off the last two bits of the pointer. The
01
tag bits will be used to indicate that the value is a pointer, but we don’t actually want to point to that location in memory! - Next, verify that the index is in bounds. Be careful here: the tuple size is in machine form but the index is in the Egg-Eater representation. If the index is negative or is too large, the program should terminate with the appropriate error described above.
- Calculate the location of the element you want and copy it into
eax
.
The ordering of the above is important because it controls which error code you generate in situations where multiple errors exist. For instance, 8[true]
should return exit code 4
(because 8
is not a tuple), not exit code 1
(even though true
is not an integer).
The ecx
Register
It’s quite tedious to perform the above calculation with only one register! For this lab, you should add the ecx
register to your register
data type. This way, you can store the address of a tuple in one register and the index of the tuple in the other.
x86 assembly supports a form of infix address computation like the following:
mov eax, [eax + ecx * 4]
This can be quite handy for e.g. retrieving values from tuples. If you like, you can use this syntax by adding a new constructor to your address
data type (named e.g. XAddressByRegisterProductOffset
) so that you can use this addressing form in your generated code.
Implementing the Egg-Eater Compiler
There are a few different tasks at hand to implement Egg-Eater, but many of them are not very interdependent. For this reason, the organization of the lab work is left to you. See below for a summary of the work you’ll need to do.
It will be especially helpful to use tools like gdb
to debug your compiler. Refer to the Diamondback lab for suggestions.
Summary
To complete this assignment, you’ll need to
- Update your driver and
snake_main
to track free heap memory in the variableheap_cursor
- Add assembly code generation for tuples and tuple operations
- Update your existing code that interacts with booleans to account for the change in tag bits
- Add runtime error checking for when existing operations receive tuples
Submitting
To submit your lab, just commit and push your work. The most recent pushed commit as of the due date will be graded. For more information on when assignments are due, see the Policies page.
If You Have Trouble…
…then please contact your instructor! Piazza is the preferred method, but you can reach out via e-mail as well. Good luck!