Compilers

Foxsnake

Source: Wikimedia Commons: by Don Becker (own work) [CC BY-SA 3.0]

Due on Monday, April 3rd at 11:59 PM. This is a compiler lab. If you have a partner, the two of you will complete this lab as a team. If you do not have a partner, you are effectively in a team by yourself.

If you are working with a partner, please be familiar with the Partner Etiquette guidelines. You and your partner share a single repository and, barring unusual circumstances, will receive the same grade. If you experience any difficulties in your partnership, please alert your instructor as soon as possible.

If you are working alone, please don’t hesitate to seek help if you get stuck. Since there are no ninjas for this course and you don’t have a partner, your instructor is the only interactive resource available to you and is happy to help you through any problems you might encounter.

In either case, be familiar with the academic integrity policy! You are permitted to discuss high-level details of your compiler project with other students, but you should never see or share anything that will be turned in for a grade. If you need help, please post on the Piazza forum or, if necessary, contact the instructor directly. Make sure to post privately if you are sharing code. If you have any doubts about what is okay and what is not, it’s much safer to ask than to risk violating the Academic Integrity Policy.

Overview

In this lab, we will extend the Egg-eater compiler to a new language called Foxsnake. Foxsnake introduces first-class functions, partial application, and anonymous functions.

Transitioning to Foxsnake

When you git pull, you’ll discover the following updates:

src/languages/foxsnake/, the directory containing our lexer, parser, and ASTs.
src/closureConversion.ml, the file where we will put our closure conversion code.
resources/printer.c, which has been adjusted to accommodate closures.

As usual, we will begin by changing the LANGUAGE variable in language.mk from eggEater to foxsnake. Once you have done so, your code will not compile! We will explain this below.

The Foxsnake Language

The syntax of Foxsnake function declarations and function calls is incompatible with the Diamondback syntax and the Foxsnake abstract syntax reflects this. We first present the syntax and semantics of Foxsnake and then discuss how to update the Egg-eater compiler incrementally to accommodate these changes.

Concrete Syntax

Foxsnake’s concrete syntax includes all of the features of Egg-eater, but function declarations and calls are written differently. Foxsnake also includes anonymous function syntax. For clarity, we present the full Foxsnake syntax here:

<program> ::=
  | <declaration_list> <expression>
  | <expression>

<declaration_list> ::=
  | <declaration> <declaration_list>
  | <declaration>

<declaration> ::=
  | def <identifier> <parameter_list> = <expression> end

<parameter_list> ::=
  | <identifier> <parameter_list>
  | <identifier>

<expression> ::=
  | <integer>
  | true
  | false
  | <identifier>
  | let <identifier> = <expression> in <expression>
  | if <expression> then <expression> else <expression>
  | inc(<expression>)
  | dec(<expression>)
  | print(<expression>)
  | isbool(<expression>)
  | isint(<expression>)
  | istuple(<expression>)
  | <expression> + <expression>
  | <expression> - <expression>
  | <expression> * <expression>
  | <expression> < <expression>
  | <expression> > <expression>
  | <expression> = <expression>
  | <expression> && <expression>
  | <expression> || <expression>
  | <expression>[<expression>]
  | <expression> <expression>
  | (<expression>)
  | (<expression>, <expression_list>)
  | fun <identifier> -> <expression>

<expression_list> ::=
  | <expression> , <expression_list>
  | <expression>

The notable changes from Egg-eater are:

Function declarations require at least one parameter. Syntactically, the function declaration omits parentheses and commas from its parameter lists; instead, the = is used to separate the parameter list from the function body.
Function calls are by juxtaposition as in OCaml: f x calls function f with argument x.
Anonymous functions are written in a fashion similar to OCaml: fun x -> x + 1 describes the anonymous increment function. As in OCaml, parentheses are needed around the function in many cases to separate the function body from the rest of the expression.

Abstract Syntax

As usual, you can see the definitions of expressions for Foxsnake in src/language/foxsnake/expressions.ml. Make sure to read over those definitions. In particular, note that EAppl now only carries two expressions; multiple arguments are passed by chaining EAppl constructors (e.g. EAppl(EAppl(EVar f, EVar x), EVar y)).

Semantics

Unlike our previous languages, function declarations in Foxsnake create new variables named after those functions. These variables are bound to closures which define the function and the arguments which have been passed to that function so far. For instance,

def f x y =
  x + y
end

let g = f in
g 2 4

evaluates to 6. The variable f is bound to the closure of function f with no arguments. These closures will be values in Foxsnake and, as such, can be assigned to other variables (such as g). Arguments can then be applied to the closure to call the function.

Foxsnake supports partial application, so the code

def f x y =
  x + y
end

let increment = f 1 in
(increment 3, increment 7)

evaluates to the tuple (4, 8); the inc closure carries the argument 1 for parameter x and this argument is used each time inc is called in the tuple.

Foxsnake’s anonymous functions work much as in OCaml. The above code could be rewritten

let f = (fun x -> fun y -> x + y) in
let increment = f 1 in
(increment 3, increment 7)

which would produce the same result.

While we will represent closures in a fashion similar to tuples, closures are not tuples; they are a new kind of value. That is,

let identity = fun x -> x in
istuple(identity)

will evaluate to false.

Errors

The introduction of higher-order functions causes some errors which are easily recognized by Egg-eater to become extremely difficult in Foxsnake. You are no longer required to recognize the following compile-time errors:

Function f is not defined.: Because functions will be defined as variables bound to closures, this error becomes Unbound variable f.
Function f is called with an incorrect number of arguments.: Because functions are now data, it becomes impossible to determine if a function has been applied to all of its arguments or not. A type system would help to mitigate this by weakening the kinds of application we could perform. For Foxsnake, you are simply not required to report this kind of error.

You are also now responsible for an additional type of runtime error:

If an operation expects a closure but receives something else, exit with error code 6.

For instance, exit code 6 would result from the code (1 + 2) (3 + 4) because the evaluation of the first expression, 3, is not a function. This runtime error should be reported in addition to the runtime errors from previous labs.

Implementing the Foxsnake Compiler

To implement higher-order functions in Foxsnake, we plan to modify the compiler to process the user’s program as follows:

Parse the source file (as we have been doing)
Check the well-formedness of the AST (with functions as simple variables now)
Closure-convert the surface AST (to eliminate anonymous functions)
A-normalize the resulting AST (as we have been doing)
Compile the A-normalized AST
- binding each function name to a global closure value
- creating new closures as necessary

We will implement these stages in a different order in order to ensure that we can evaluate our progress as we go.

Getting Started

To begin your work on Foxsnake, you should get your code to a point where it compiles, even if this means losing some functionality. The following is recommended:

Remove the arity check from well-formedness. The new code for EAppl is likely to be quite simple.
The A-normalization of EAppl to CAppl is likewise fairly simple.
The A-normalization for ELambda should fail (e.g. with failwith "Cannot A-normalize ELambda!"). Eventually, we will erase all ELambda nodes from the AST before A-normalizing.
For now, change the compilation of CAppl to failwith "TODO".

Once you have done this, all of your unit tests which don’t use function calls should be able to pass. You will eventually need to update the syntax of your test files to match the new syntax of Foxsnake.

Step 1: Closures

Once you’ve gotten your code to compile, you’re ready to add closures to your compiler. A closure must be able to store:

The current number of arguments the closure contains.
The number of arguments the function needs to be called.
The memory address of the assembly code representing the function.
The arguments currently contained by the closure.

We will lay closures out in memory in that order. As with tuples, we will always store closures in the heap (since they require several bytes of memory). To distinguish closures from tuples, we will set the first bit of the first word of the closure to 1. Consider the following example:

(32 bits)	(32 bits)	(32 bits)	(32 bits)	(32 bits)
`0x80000002`	`0x00000004`	`0x080484f8`	`0x7FFFFFFF`	`0x0000001e`

This sequence of words represents a closure in the Foxsnake heap. They are:

A word indicating that this is a closure which carries two arguments. We know that this is a closure because the highest bit is set; we know that there are two arguments because, after setting the highest bit to zero, the value is 0x00000002.
A word indicating the number of arguments the function requires. In this case, the function requires four arguments before it can be called.
A word indicating the memory address of the function to call. This is a raw memory address (and not a Foxsnake heap pointer).
A word for the first of the two stored arguments. In this case, that argument is false (in Foxsnake’s memory representation).
A word for the second of the two stored arguments. In this case, that argument is the integer 15 (in Foxsnake’s memory representation).

Creating closures for declared functions

The first step toward implementing closures in Foxsnake is to create a set of global variables containing the zero-argument closures for the functions declared by the program. We are creating global variables here for two reasons:

CAppl’s arguments are of type i_expr and we don’t want to have to change our compilation of i_expr to do the work of creating a closure. (It would be against the spirit of immediate expressions.)
Even if we made that change, it would be inefficient to create zero-argument closures again and again when we can have a single immutable closure for this purpose instead.

In the Egg-eater lab, you created a single global label heap_cursor to track heap memory. You should now create an additional label for the closure of each declared function in the program. For instance, for the program

def f x =
  x + 1
end

def g x y =
  x * y
end

f (g 2 3)

the data section of your assembly might look something like this:

section .data
align 4
heap_cursor:
  dd 0
align 4
closure_of_f:
  dd 0x80000000, 1, _f
align 4
closure_of_g:
  dd 0x80000000, 2, _g

Note that these initial closures always carry zero arguments. The second word indicates the number of parameters the function declares while the third word is the label given to the user-declared function. The label, named by itself, refers to the memory address of the labeled code.

Checkpoint

Once you’ve completed the above steps, you should be able to write a program which declares a function but does not use it. Your compiler should work correctly on this program and produce a data section similar to the above.

Step 2: Binding Variables to Closures

Once you’ve declared the existence of these closures, your next step should be to have some means to refer to them. When a Foxsnake program refers to a function by its name (e.g. f), you should be able to retrieve that closure. Our compiler’s environment currently only has the ability to refer to stack offsets, so we will have to make it a bit more sophisticated.

Flexible environments

First, change your environment so that it contains a mapping from strings to arguments (instead of a mapping from strings to integers). You will need to update environment.ml and compiler.ml to do this. When you’ve finished, the code should behave in exactly the same way as it did before; you can use your unit tests to confirm this. The objective here is to ensure that the environment continues to have the same functionality but its data structure is now more flexible.

Our next step will be to add bindings for function closures. You’ll likely want to add a function to environment.ml to add an arbitrary string-to-argument binding to your environment to make this easier, but you don’t need to do it yet.

Bindings for closures

Next, we must ensure that each function declaration results in a single variable binding. So far, we’ve written our compiler to start with an empty environment in the main expression and an environment containing parameters for each function body expression. We will add to both of these environments a binding for each declared function. For instance, in the program

def f x =
  x + 1
end

f 0

the main expression should be compiled with an environment which binds f to its closure. For instance, the assembly code for evaluating f in the above program might be

mov eax, closure_of_f + 1

Note the use of + 1. The label closure_of_f should be aligned to a 4 byte boundary, so the last two bits of its address will be 00. If we add 1 to this, we get a value which conforms to the Foxsnake binary representation for a memory pointer!

To handle the above, you probably want to add a new assembly argument to assemblyLanguage.ml. The name XLabelLocationOffset is recommended as it is consistent with the previous names we have chosen.

In addition to updating compiler.ml with these new bindings, you will need to update wellformedness.ml so that it knows that these variable names are legal. Make sure to update both files before proceeding.

Checkpoint

Although you cannot yet call these functions, it should now be possible for you to produce the closure as a value. Write a program which declares a function f; then, allow the main expression for that program to be just f. Using the new version of printer.c you received as part of the starter code for Foxsnake, you should get a result that looks something like this:

<closure@080484f8>[0/1](?)

Step 3: Growing Closures

Once we have established closures in the compiler’s environment, we should work toward growing them: creating new closures which carry additional arguments. The compilation of CAppl should generate assembly code that follows approximately this algorithm:

Check to make sure the first argument is a closure. Fail if it is not.
Compare the number of arguments in the closure \(a\) to the number of parameters for the function \(p\).
If \(a + 1 < p\), then we’re not yet ready to call the function.
1. Create a copy of the closure using new heap memory.
2. Copy the new argument (the result of compiling the second i_expr given to CAppl) to the end of the copy you just created.
3. In that copy, increase the number of arguments by one to account for the new argument.
4. Make sure to update heap_cursor to reflect the fact that this new memory has been allocated.
5. Produce in eax a Foxsnake pointer to the new closure.
If \(a + 1 = p\), then this argument is the last argument to the function and we are ready to call it.
1. Allocate enough memory on the stack for all of the arguments. Do this by modifying esp directly since we’re pushing a dynamically-determined number of elements.
2. Copy the arguments from the closure into the stack.
3. Copy the last argument (the result of the second i_expr given to CAppl) onto the stack.
4. Call the function. This is most easily accomplished by moving the function address into eax and then using the instruction XCall("eax"); it’s not terribly good form, but it saves us the effort of creating a new instruction for calling registers.
5. Adjust esp to remove the arguments from the stack.

The above algorithm handles the case of growing closures as well as the case of calling functions (which is step 4 below). For now, focus on the first part. For the case in which you would call the function, you can simply generate an appropriate label and no instructions.

The `edx` register

Implementing the closure-growing algorithm is a bit tricky with only two registers. Feel free to add edx to your registers data type and use it for this lab.

Copying memory with `rep movsd`

Writing a loop to copy memory in x86 assembly could be a bit tricky and rather error-prone. Thankfully, the processor comes with an instruction which makes it easy to copy several words of memory from one location to another.

To use this instruction:

Set the esi register to the source memory address containing the data
Set the edi register to the destination memory address to write the data
Set the ecx register to the count of the number of 4-byte words to copy
Issue the instruction rep movsd

The rep movsd instruction will proceed by copying one word from esi to edi, moving esi and edi one word forward in memory, decrementing ecx, and continuing until ecx is 0. For instance, consider the following code:

mov esi, ebp
add esi, 8
mov edi, [heap_cursor]
mov ecx, 3
rep movsd

At the start of rep movsd:

The esi register contains the address ebp + 8: the first parameter of the current function.
The edi register contains the address of the heap cursor.

After rep movsd is complete:

12 bytes of memory (3 words) starting at ebp + 8 will be copied into the 12 bytes of memory starting at heap_cursor
The esi register will contain the address ebp + 20: 12 bytes from where it started.
The edi register will contain the address heap_cursor + 12: 12 bytes from where it started.
The ecx register will contain the value 0.

To use this instruction, you can simply add an appropriate XRepMovsd constructor to your instruction type in assemblyLanguage.ml. You are not required to use rep movsd, but it will likely make your code much cleaner and easier to debug. It’s also considerably more efficient for the CPU.

Checkpoint

Once you’ve finished the code which grows closures, you should be able to see the closures growing for functions with several parameters. Try writing a program which declares a function f of six parameters. Then, call f with an argument and see if the result is an apporpriate closure. Call that closure with another argument to see if it gets copied in correctly as well.

Step 4: Calling Functions

Once you’ve managed to make your closures grow appropriately, move on to implementing the function-calling part of the algorithm above. Probably the trickiest part of this is to copy the arguments from the closure onto the stack in the right order, although this is easier than it seems.

Remember: arguments are pushed onto the stack in reverse order only because stack memory grows downwards; the first argument to a function is lowest in memory. The closures also keep their arguments in ascending memory order, so it is enough simply to copy those arguments from the closure onto the stack and then copy the last argument after them.

Checkpoint

At this point, your unit tests from Egg-eater should all pass! You’ll need to update the syntax of function declarations and function calls if you haven’t already done so. You’ll also need to change any tests that relied on zero-parameter functions. In those cases, just add a parameter that you don’t use and then call the function with a dummy value, like 0.

What’s more: you should now be able to pass functions as arguments to other functions! Try writing some programs which make use of your new compiler features. For instance, you could write an ntimes function that takes parameters f, n, and x and calls f on x a total of n times.

Step 5: Closure Conversion

The only remaining feature we must address is the anonymous function. We will convert anonymous functions like fun x -> x into named functions (of the form EFunction) by searching the AST for them, picking a fresh name to use, creating a declaration with that name, and replacing the anonymous function with a reference to that new name as a variable. For instance, we might transform

def twice f x =
  f (f x)
end

twice (fun a -> a * 2) 5

into

def twice f x =
  f (f x)
end

def $lambda$1 a =
  a * 2
end

twice $lambda$1 5

Non-Local Variables

When we closure-convert anonymous functions, we must be careful to handle non-local variables correctly. A non-local variable is one which is used within the function but not defined by it. For instance, in the program

let x = 2 in
(fun y -> x + y) 4

the variable x is non-local to the anonymous function. When we closure-convert this code, we will add the non-locals as parameters to the resulting declaration:

def $lambda$1 x y =
  x + y
end

let x = 2 in
$lambda$1 x 4

To do this correctly, you will want to implement a function for calculating the free variables in an expression. The non-local variables of an anonymous function are the free variables of its body minus the parameter it declares.

Some of the work of closure-converting Foxsnake programs is provided in the file closureConversion.ml. Your job is to address the other cases and then call upon this code in the appropriate place in your compiler.

Miscellaneous details

Make sure to update the code for istuple as well as the code that validates that an argument is a tuple for e.g. tuple indexing. The Egg-eater version of this code can simply check the tag bits of the argument, but Foxsnake uses the same tag bits for tuple pointers and closure pointers. You’ll need to check both the tag bits and the first bit of the heap value to determine if an argument is a tuple or not.

Checkpoint

At this point, we have added all three features of Foxsnake: first-class functions, partial application, and anonymous functions. Write some unit tests to be confident of your implementation and then congratulate yourself!

Summary

To complete this assignment, you’ll need to

Modify your compiler to include closures as a possible heap value
Add empty closures for each declared function
Implement closure growth and function calls using closures
Complete the closure conversion routine provided in the starter code and use it to handle anonymous functions
Update tuple-handling code to handle closures correctly

Submitting

To submit your lab, just commit and push your work. The most recent pushed commit as of the due date will be graded. For more information on when assignments are due, see the Policies page.

If You Have Trouble…

…then please contact your instructor! Piazza is the preferred method, but you can reach out via e-mail as well. Good luck!

Foxsnake

Overview

Transitioning to Foxsnake

The Foxsnake Language

Concrete Syntax

Abstract Syntax

Semantics

Errors

Implementing the Foxsnake Compiler

Getting Started

Step 1: Closures

Creating closures for declared functions

Checkpoint

Step 2: Binding Variables to Closures

Flexible environments

Bindings for closures

Checkpoint

Step 3: Growing Closures

The edx register

Copying memory with rep movsd

Checkpoint

Step 4: Calling Functions

Checkpoint

Step 5: Closure Conversion

Non-Local Variables

Miscellaneous details

Checkpoint

Summary

Submitting

If You Have Trouble…

The `edx` register

Copying memory with `rep movsd`