Foxsnake
Due on Monday, April 3rd at 11:59 PM. This is a compiler lab. If you have a partner, the two of you will complete this lab as a team. If you do not have a partner, you are effectively in a team by yourself.
If you are working with a partner, please be familiar with the Partner Etiquette guidelines. You and your partner share a single repository and, barring unusual circumstances, will receive the same grade. If you experience any difficulties in your partnership, please alert your instructor as soon as possible.
If you are working alone, please don’t hesitate to seek help if you get stuck. Since there are no ninjas for this course and you don’t have a partner, your instructor is the only interactive resource available to you and is happy to help you through any problems you might encounter.
In either case, be familiar with the academic integrity policy! You are permitted to discuss high-level details of your compiler project with other students, but you should never see or share anything that will be turned in for a grade. If you need help, please post on the Piazza forum or, if necessary, contact the instructor directly. Make sure to post privately if you are sharing code. If you have any doubts about what is okay and what is not, it’s much safer to ask than to risk violating the Academic Integrity Policy.
Overview
In this lab, we will extend the Egg-eater compiler to a new language called Foxsnake. Foxsnake introduces first-class functions, partial application, and anonymous functions.
Transitioning to Foxsnake
When you git pull
, you’ll discover the following updates:
src/languages/foxsnake/
, the directory containing our lexer, parser, and ASTs.src/closureConversion.ml
, the file where we will put our closure conversion code.resources/printer.c
, which has been adjusted to accommodate closures.
As usual, we will begin by changing the LANGUAGE
variable in language.mk
from eggEater
to foxsnake
. Once you have done so, your code will not compile! We will explain this below.
The Foxsnake Language
The syntax of Foxsnake function declarations and function calls is incompatible with the Diamondback syntax and the Foxsnake abstract syntax reflects this. We first present the syntax and semantics of Foxsnake and then discuss how to update the Egg-eater compiler incrementally to accommodate these changes.
Concrete Syntax
Foxsnake’s concrete syntax includes all of the features of Egg-eater, but function declarations and calls are written differently. Foxsnake also includes anonymous function syntax. For clarity, we present the full Foxsnake syntax here:
<program> ::=
| <declaration_list> <expression>
| <expression>
<declaration_list> ::=
| <declaration> <declaration_list>
| <declaration>
<declaration> ::=
| def <identifier> <parameter_list> = <expression> end
<parameter_list> ::=
| <identifier> <parameter_list>
| <identifier>
<expression> ::=
| <integer>
| true
| false
| <identifier>
| let <identifier> = <expression> in <expression>
| if <expression> then <expression> else <expression>
| inc(<expression>)
| dec(<expression>)
| print(<expression>)
| isbool(<expression>)
| isint(<expression>)
| istuple(<expression>)
| <expression> + <expression>
| <expression> - <expression>
| <expression> * <expression>
| <expression> < <expression>
| <expression> > <expression>
| <expression> = <expression>
| <expression> && <expression>
| <expression> || <expression>
| <expression>[<expression>]
| <expression> <expression>
| (<expression>)
| (<expression>, <expression_list>)
| fun <identifier> -> <expression>
<expression_list> ::=
| <expression> , <expression_list>
| <expression>
The notable changes from Egg-eater are:
- Function declarations require at least one parameter. Syntactically, the function declaration omits parentheses and commas from its parameter lists; instead, the
=
is used to separate the parameter list from the function body. - Function calls are by juxtaposition as in OCaml:
f x
calls functionf
with argumentx
. - Anonymous functions are written in a fashion similar to OCaml:
fun x -> x + 1
describes the anonymous increment function. As in OCaml, parentheses are needed around the function in many cases to separate the function body from the rest of the expression.
Abstract Syntax
As usual, you can see the definitions of expressions for Foxsnake in src/language/foxsnake/expressions.ml
. Make sure to read over those definitions. In particular, note that EAppl
now only carries two expressions; multiple arguments are passed by chaining EAppl
constructors (e.g. EAppl(EAppl(EVar f, EVar x), EVar y)
).
Semantics
Unlike our previous languages, function declarations in Foxsnake create new variables named after those functions. These variables are bound to closures which define the function and the arguments which have been passed to that function so far. For instance,
def f x y =
x + y
end
let g = f in
g 2 4
evaluates to 6
. The variable f
is bound to the closure of function f
with no arguments. These closures will be values in Foxsnake and, as such, can be assigned to other variables (such as g
). Arguments can then be applied to the closure to call the function.
Foxsnake supports partial application, so the code
def f x y =
x + y
end
let increment = f 1 in
(increment 3, increment 7)
evaluates to the tuple (4, 8)
; the inc
closure carries the argument 1
for parameter x
and this argument is used each time inc
is called in the tuple.
Foxsnake’s anonymous functions work much as in OCaml. The above code could be rewritten
let f = (fun x -> fun y -> x + y) in
let increment = f 1 in
(increment 3, increment 7)
which would produce the same result.
While we will represent closures in a fashion similar to tuples, closures are not tuples; they are a new kind of value. That is,
let identity = fun x -> x in
istuple(identity)
will evaluate to false
.
Errors
The introduction of higher-order functions causes some errors which are easily recognized by Egg-eater to become extremely difficult in Foxsnake. You are no longer required to recognize the following compile-time errors:
Function f is not defined.
: Because functions will be defined as variables bound to closures, this error becomesUnbound variable f.
Function f is called with an incorrect number of arguments.
: Because functions are now data, it becomes impossible to determine if a function has been applied to all of its arguments or not. A type system would help to mitigate this by weakening the kinds of application we could perform. For Foxsnake, you are simply not required to report this kind of error.
You are also now responsible for an additional type of runtime error:
- If an operation expects a closure but receives something else, exit with error code
6
.
For instance, exit code 6
would result from the code (1 + 2) (3 + 4)
because the evaluation of the first expression, 3
, is not a function. This runtime error should be reported in addition to the runtime errors from previous labs.
Implementing the Foxsnake Compiler
To implement higher-order functions in Foxsnake, we plan to modify the compiler to process the user’s program as follows:
- Parse the source file (as we have been doing)
- Check the well-formedness of the AST (with functions as simple variables now)
- Closure-convert the surface AST (to eliminate anonymous functions)
- A-normalize the resulting AST (as we have been doing)
- Compile the A-normalized AST
- binding each function name to a global closure value
- creating new closures as necessary
We will implement these stages in a different order in order to ensure that we can evaluate our progress as we go.
Getting Started
To begin your work on Foxsnake, you should get your code to a point where it compiles, even if this means losing some functionality. The following is recommended:
- Remove the arity check from well-formedness. The new code for
EAppl
is likely to be quite simple. - The A-normalization of
EAppl
toCAppl
is likewise fairly simple. - The A-normalization for
ELambda
should fail (e.g. withfailwith "Cannot A-normalize ELambda!"
). Eventually, we will erase allELambda
nodes from the AST before A-normalizing. - For now, change the compilation of
CAppl
tofailwith "TODO"
.
Once you have done this, all of your unit tests which don’t use function calls should be able to pass. You will eventually need to update the syntax of your test files to match the new syntax of Foxsnake.
Step 1: Closures
Once you’ve gotten your code to compile, you’re ready to add closures to your compiler. A closure must be able to store:
- The current number of arguments the closure contains.
- The number of arguments the function needs to be called.
- The memory address of the assembly code representing the function.
- The arguments currently contained by the closure.
We will lay closures out in memory in that order. As with tuples, we will always store closures in the heap (since they require several bytes of memory). To distinguish closures from tuples, we will set the first bit of the first word of the closure to 1
. Consider the following example:
(32 bits) | (32 bits) | (32 bits) | (32 bits) | (32 bits) |
---|---|---|---|---|
0x80000002 |
0x00000004 |
0x080484f8 |
0x7FFFFFFF |
0x0000001e |
This sequence of words represents a closure in the Foxsnake heap. They are:
- A word indicating that this is a closure which carries two arguments. We know that this is a closure because the highest bit is set; we know that there are two arguments because, after setting the highest bit to zero, the value is
0x00000002
. - A word indicating the number of arguments the function requires. In this case, the function requires four arguments before it can be called.
- A word indicating the memory address of the function to call. This is a raw memory address (and not a Foxsnake heap pointer).
- A word for the first of the two stored arguments. In this case, that argument is
false
(in Foxsnake’s memory representation). - A word for the second of the two stored arguments. In this case, that argument is the integer
15
(in Foxsnake’s memory representation).
Creating closures for declared functions
The first step toward implementing closures in Foxsnake is to create a set of global variables containing the zero-argument closures for the functions declared by the program. We are creating global variables here for two reasons:
CAppl
’s arguments are of typei_expr
and we don’t want to have to change our compilation ofi_expr
to do the work of creating a closure. (It would be against the spirit of immediate expressions.)- Even if we made that change, it would be inefficient to create zero-argument closures again and again when we can have a single immutable closure for this purpose instead.
In the Egg-eater lab, you created a single global label heap_cursor
to track heap memory. You should now create an additional label for the closure of each declared function in the program. For instance, for the program
def f x =
x + 1
end
def g x y =
x * y
end
f (g 2 3)
the data section of your assembly might look something like this:
section .data
align 4
heap_cursor:
dd 0
align 4
closure_of_f:
dd 0x80000000, 1, _f
align 4
closure_of_g:
dd 0x80000000, 2, _g
Note that these initial closures always carry zero arguments. The second word indicates the number of parameters the function declares while the third word is the label given to the user-declared function. The label, named by itself, refers to the memory address of the labeled code.
Checkpoint
Once you’ve completed the above steps, you should be able to write a program which declares a function but does not use it. Your compiler should work correctly on this program and produce a data section similar to the above.
Step 2: Binding Variables to Closures
Once you’ve declared the existence of these closures, your next step should be to have some means to refer to them. When a Foxsnake program refers to a function by its name (e.g. f
), you should be able to retrieve that closure. Our compiler’s environment currently only has the ability to refer to stack offsets, so we will have to make it a bit more sophisticated.
Flexible environments
First, change your environment so that it contains a mapping from strings to arguments (instead of a mapping from strings to integers). You will need to update environment.ml
and compiler.ml
to do this. When you’ve finished, the code should behave in exactly the same way as it did before; you can use your unit tests to confirm this. The objective here is to ensure that the environment continues to have the same functionality but its data structure is now more flexible.
Our next step will be to add bindings for function closures. You’ll likely want to add a function to environment.ml
to add an arbitrary string
-to-argument
binding to your environment to make this easier, but you don’t need to do it yet.
Bindings for closures
Next, we must ensure that each function declaration results in a single variable binding. So far, we’ve written our compiler to start with an empty environment in the main expression and an environment containing parameters for each function body expression. We will add to both of these environments a binding for each declared function. For instance, in the program
def f x =
x + 1
end
f 0
the main expression should be compiled with an environment which binds f
to its closure. For instance, the assembly code for evaluating f
in the above program might be
mov eax, closure_of_f + 1
Note the use of + 1
. The label closure_of_f
should be aligned to a 4 byte boundary, so the last two bits of its address will be 00
. If we add 1
to this, we get a value which conforms to the Foxsnake binary representation for a memory pointer!
To handle the above, you probably want to add a new assembly argument
to assemblyLanguage.ml
. The name XLabelLocationOffset
is recommended as it is consistent with the previous names we have chosen.
In addition to updating compiler.ml
with these new bindings, you will need to update wellformedness.ml
so that it knows that these variable names are legal. Make sure to update both files before proceeding.
Checkpoint
Although you cannot yet call these functions, it should now be possible for you to produce the closure as a value. Write a program which declares a function f
; then, allow the main expression for that program to be just f
. Using the new version of printer.c
you received as part of the starter code for Foxsnake, you should get a result that looks something like this:
<closure@080484f8>[0/1](?)
Step 3: Growing Closures
Once we have established closures in the compiler’s environment, we should work toward growing them: creating new closures which carry additional arguments. The compilation of CAppl
should generate assembly code that follows approximately this algorithm:
- Check to make sure the first argument is a closure. Fail if it is not.
- Compare the number of arguments in the closure \(a\) to the number of parameters for the function \(p\).
- If \(a + 1 < p\), then we’re not yet ready to call the function.
- Create a copy of the closure using new heap memory.
- Copy the new argument (the result of compiling the second
i_expr
given toCAppl
) to the end of the copy you just created. - In that copy, increase the number of arguments by one to account for the new argument.
- Make sure to update
heap_cursor
to reflect the fact that this new memory has been allocated. - Produce in
eax
a Foxsnake pointer to the new closure.
- If \(a + 1 = p\), then this argument is the last argument to the function and we are ready to call it.
- Allocate enough memory on the stack for all of the arguments. Do this by modifying
esp
directly since we’re pushing a dynamically-determined number of elements. - Copy the arguments from the closure into the stack.
- Copy the last argument (the result of the second
i_expr
given toCAppl
) onto the stack. - Call the function. This is most easily accomplished by moving the function address into
eax
and then using the instructionXCall("eax")
; it’s not terribly good form, but it saves us the effort of creating a newinstruction
for calling registers. - Adjust
esp
to remove the arguments from the stack.
- Allocate enough memory on the stack for all of the arguments. Do this by modifying
The above algorithm handles the case of growing closures as well as the case of calling functions (which is step 4 below). For now, focus on the first part. For the case in which you would call the function, you can simply generate an appropriate label and no instructions.
The edx
register
Implementing the closure-growing algorithm is a bit tricky with only two registers. Feel free to add edx
to your registers
data type and use it for this lab.
Copying memory with rep movsd
Writing a loop to copy memory in x86 assembly could be a bit tricky and rather error-prone. Thankfully, the processor comes with an instruction which makes it easy to copy several words of memory from one location to another.
To use this instruction:
- Set the
esi
register to the source memory address containing the data - Set the
edi
register to the destination memory address to write the data - Set the
ecx
register to the count of the number of 4-byte words to copy - Issue the instruction
rep movsd
The rep movsd
instruction will proceed by copying one word from esi
to edi
, moving esi
and edi
one word forward in memory, decrementing ecx
, and continuing until ecx
is 0
. For instance, consider the following code:
mov esi, ebp
add esi, 8
mov edi, [heap_cursor]
mov ecx, 3
rep movsd
At the start of rep movsd
:
- The
esi
register contains the addressebp + 8
: the first parameter of the current function. - The
edi
register contains the address of the heap cursor.
After rep movsd
is complete:
- 12 bytes of memory (3 words) starting at
ebp + 8
will be copied into the 12 bytes of memory starting atheap_cursor
- The
esi
register will contain the addressebp + 20
: 12 bytes from where it started. - The
edi
register will contain the addressheap_cursor + 12
: 12 bytes from where it started. - The
ecx
register will contain the value0
.
To use this instruction, you can simply add an appropriate XRepMovsd
constructor to your instruction
type in assemblyLanguage.ml
. You are not required to use rep movsd
, but it will likely make your code much cleaner and easier to debug. It’s also considerably more efficient for the CPU.
Checkpoint
Once you’ve finished the code which grows closures, you should be able to see the closures growing for functions with several parameters. Try writing a program which declares a function f
of six parameters. Then, call f
with an argument and see if the result is an apporpriate closure. Call that closure with another argument to see if it gets copied in correctly as well.
Step 4: Calling Functions
Once you’ve managed to make your closures grow appropriately, move on to implementing the function-calling part of the algorithm above. Probably the trickiest part of this is to copy the arguments from the closure onto the stack in the right order, although this is easier than it seems.
Remember: arguments are pushed onto the stack in reverse order only because stack memory grows downwards; the first argument to a function is lowest in memory. The closures also keep their arguments in ascending memory order, so it is enough simply to copy those arguments from the closure onto the stack and then copy the last argument after them.
Checkpoint
At this point, your unit tests from Egg-eater should all pass! You’ll need to update the syntax of function declarations and function calls if you haven’t already done so. You’ll also need to change any tests that relied on zero-parameter functions. In those cases, just add a parameter that you don’t use and then call the function with a dummy value, like 0
.
What’s more: you should now be able to pass functions as arguments to other functions! Try writing some programs which make use of your new compiler features. For instance, you could write an ntimes
function that takes parameters f
, n
, and x
and calls f
on x
a total of n
times.
Step 5: Closure Conversion
The only remaining feature we must address is the anonymous function. We will convert anonymous functions like fun x -> x
into named functions (of the form EFunction
) by searching the AST for them, picking a fresh name to use, creating a declaration with that name, and replacing the anonymous function with a reference to that new name as a variable. For instance, we might transform
def twice f x =
f (f x)
end
twice (fun a -> a * 2) 5
into
def twice f x =
f (f x)
end
def $lambda$1 a =
a * 2
end
twice $lambda$1 5
Non-Local Variables
When we closure-convert anonymous functions, we must be careful to handle non-local variables correctly. A non-local variable is one which is used within the function but not defined by it. For instance, in the program
let x = 2 in
(fun y -> x + y) 4
the variable x
is non-local to the anonymous function. When we closure-convert this code, we will add the non-locals as parameters to the resulting declaration:
def $lambda$1 x y =
x + y
end
let x = 2 in
$lambda$1 x 4
To do this correctly, you will want to implement a function for calculating the free variables in an expression. The non-local variables of an anonymous function are the free variables of its body minus the parameter it declares.
Some of the work of closure-converting Foxsnake programs is provided in the file closureConversion.ml
. Your job is to address the other cases and then call upon this code in the appropriate place in your compiler.
Miscellaneous details
Make sure to update the code for istuple
as well as the code that validates that an argument is a tuple for e.g. tuple indexing. The Egg-eater version of this code can simply check the tag bits of the argument, but Foxsnake uses the same tag bits for tuple pointers and closure pointers. You’ll need to check both the tag bits and the first bit of the heap value to determine if an argument is a tuple or not.
Checkpoint
At this point, we have added all three features of Foxsnake: first-class functions, partial application, and anonymous functions. Write some unit tests to be confident of your implementation and then congratulate yourself!
Summary
To complete this assignment, you’ll need to
- Modify your compiler to include closures as a possible heap value
- Add empty closures for each declared function
- Implement closure growth and function calls using closures
- Complete the closure conversion routine provided in the starter code and use it to handle anonymous functions
- Update tuple-handling code to handle closures correctly
Submitting
To submit your lab, just commit and push your work. The most recent pushed commit as of the due date will be graded. For more information on when assignments are due, see the Policies page.
If You Have Trouble…
…then please contact your instructor! Piazza is the preferred method, but you can reach out via e-mail as well. Good luck!