Dove
data:image/s3,"s3://crabby-images/3f4f7/3f4f7b2f82e36b1e50a8505538641e45ba570456" alt="a ring-necked dove"
Due on Friday, March 7th at 11:59 PM. This is a compiler lab. If you have a partner, the two of you will complete this lab as a team. If you do not have a partner, you are effectively in a team by yourself.
If you are working with a partner, please be familiar with the Partner Etiquette guidelines. You and your partner share a single repository and, barring unusual circumstances, will receive the same grade. If you experience any difficulties in your partnership, please alert your instructor as soon as possible.
If you are working alone, please don’t hesitate to seek help if you get stuck. Since there are no ninjas for this course and you don’t have a partner, your instructor is the only interactive resource available to you and is happy to help you through any problems you might encounter.
In either case, be familiar with the academic integrity policy! You are permitted to discuss high-level details of your compiler project with other students, but you should never see or share anything that will be turned in for a grade. If you need help, please post on the course forum or, if necessary, contact the instructor directly. Make sure to post privately if you are sharing code. If you have any doubts about what is okay and what is not, it’s much safer to ask than to risk violating the Academic Integrity Policy.
Overview
This lab will extend the Cardinal compiler to a new language called Dove. Dove includes first-order functions (similar to C functions) and function calls. It also adds some compile-time error checking related to functions.
Migrating to Dove
As usual, you will first change to the dove
branch, where your Dove starter files are located:
git fetch
git switch dove
You will then merge your work on the Cardinal compiler into that branch:
git merge cardinal
The changes for Dove are contained in two locations:
src/languages/
, in which the lexer and parser for Cardinal have been updated for the Dove syntax discussed below. Theasts.ml
file now contains new types which will be used in handling function declarations; make sure to take a look.src/compiler/wellformedness.ml
, a new file where you will write compile-time error checks for Dove.
Note that, this time, you will not be able to build your compiler after the merge
. This is because the Dove compiler’s parser produces a program
instead of an expr
. This is expected.
Your first task will be to update compile_program
and compile_to_assembly_code
to take program
as an argument. For now, just make your code ignore the list of declarations in the program
value and compile the expression it contains. Afterwards, you can make clean
, make
, and run ./tests
to ensure that everything is working again.
The Dove Language
In the Dove language, programs are a list of declarations followed by an expression. The expression is essentially the “main” part of the program; the declarations allow you to create functions that expressions can call. Below, we discuss the syntax and semantics for these declarations.
Concrete Syntax
Here, we use a common convention that our language starts at the first non-terminal – program
– below. The concrete syntax of Dove is:
<program> ::=
| <declaration_list> <expression>
| <expression>
<declaration_list> ::=
| <declaration> <declaration_list>
| <declaration>
<declaration> ::=
| def <identifier>(<parameter_list>) <expression> end
| def <identifier>() <expression> end
<parameter_list> ::=
| <identifier> , <parameter_list>
| <identifier>
<expression> ::=
| true
| false
| <integer>
| after(<expression>)
| before(<expression>)
| print(<expression>)
| isbool(<expression>)
| isint(<expression>)
| <expression> + <expression>
| <expression> - <expression>
| <expression> * <expression>
| <expression> < <expression>
| <expression> > <expression>
| <expression> = <expression>
| <expression> && <expression>
| <expression> || <expression>
| <identifier>
| let <identifier> = <expression> in <expression>
| if <expression> then <expression> else <expression>
| <identifier>(<argument_list>)
| <identifier>()
| (<expression>)
<argument_list> ::=
| <expression> , <argument_list>
| <expression>
Syntactic precedence is the same as in Cardinal.
Abstract Syntax
The abstract syntax of Dove has additional types: declaration
and program
. Make sure to examine src/language/asts.ml
to see their definitions. Each of these types correspond to their like-named non-terminals in the grammar above.
Semantics
Function declarations in Dove have semantics much like those of Python or C. When a function is called, each argument is evaluated from left to right. The values of those arguments are stored in the parameters of the function and the body of the function is executed. The result of the function call is the value returned by the expression in the function body. You must generate code that conforms to Bird calling conventions we discussed in class (and which are reviewed below).
Here are some examples of Dove code:
-
The code
def f(x) x + 1 end f(4)
returns the value
5
: the variablex
is set to the argument (4
) and the body of the function returns one more than that value. -
The code
def f(x) x * 2 end def g(y) f(y) + f(4) end g(5)
produces the value
18
. This is becauseg(5)
is equivalent tof(5) + f(4)
, which is equivalent to5*2+4*2
. -
We allow functions in Dove to be recursive with no declarations or additional syntax. The code
def summate(x) if x > 1 then x + summate(x-1) else x end summate(5)
is legal and will return the value
15
(that is,5+4+3+2+1
). -
We allow functions in Dove to be mutually recursive. The code
def f(x) if x > 1 then g(x) else x end def g(x) f(x - 1) end f(4)
is legal and will return
1
. -
Note that functions of multiple arguments and functions of no arguments are permitted. Further, function calls are expressions and so may appear anywhere that an expression may appear. The code
def f() 5 end def g(x,y) x > y end g(f(),4)
will return
true
.
Compile-Time Errors
The introduction of function definitions and function calls also introduces several new forms of mistake that a programmer can make which we can detect at compile time. Your Dove compiler will be required to detect and handle several classes of errors:
-
The programmer might call a function that has not been declared. If this occurs for a function
fn
, your compiler must emit an error message containing the strings “function fn
” and “not defined
”. For instance, the programdef f() 5 end g()
might generate the error “
Function g is not defined.
”. -
The programmer might call a function with the wrong number of arguments. If this occurs for a function
fn
, your compiler must emit an error message containing the strings “function fn
” and “number of arguments
”. For instance, the programdef f() 5 end f(true)
might generate the error “
Function f is called with an incorrect number of arguments.
”. -
The programmer might declare a function with duplicate parameter names. If this occurs for a function
fn
and parameterp
, your compiler must emit an error message containing the strings “duplicate
”, “parameter p
”, and “function fn
”. For instance, the programdef f(x,y,x) 0 end 4
might generate the error “
Function f declares a duplicate parameter x.
”. -
The programmer might declare multiple functions with the same name. If this occurs for a function
fn
, your compiler must emit an error message containing the strings “duplicate definition
” and “function fn
”. For instance, the programdef f() 0 end def f(x) x end f()
might generate the error “
Duplicate definition of function f.
”. -
As before, the programmer might try to use an undefined variable. If this occurs for variable
x
, your compiler must emit an error message containing the strings “unbound
” and “variable x
”. For instance, the programdef f(x) y end 4
might generate the error “
Unbound variable y.
”
Report all the errors!
Unlike our previous compilers, the Dove compiler must report all errors for a given file. For instance, the code
def f(x)
g(y,z)
end
f()
contains four errors: an unbound variable y
, an unbound variable z
, an undefined function g
, and an argument count error where f
is called. The compiler must report all of these errors, like so:
$ ./hatch above_code.bird
Unbound variable y.
Unbound variable z.
Function g is not defined.
Function f is called with an incorrect number of arguments.
$
The order of the errors is not relevant, but each error must appear on its own line.
To do this, you will need to write compile-time error checks for your AST and use them before proceeding with compilation. Note that you will not be able to rely upon your previous code for detecting unbound variables; that code throws an exception when it encounters a single variable and does not take any other errors into account. Instead, you will need to recurse over the AST and collect all of the errors it contains.
A small stub is provided in wellformedness.ml
to give you an approach to use. If you use this approach, you will need to modify the compile_to_assembly_code
function in builder.ml
(not your own version in compiler.ml
) so that it catches the IllFormed
exception and throws a BuildFailure
with the appropriate text. The command-line-handling code in hatch.ml
will catch the BuildFailure
exception and print the message it contains.
Runtime errors
You should report the same runtime errors that you did in the Cardinal lab, using the same exit codes:
- If an operation expects an integer but gets something else, exit with error code
1
. - If an operation expects a boolean but gets something else, exit with error code
2
.
Bird Calling Conventions
As in Cardinal, we will write the bird_main
function to be a C callee and we will use C caller conventions when invoking external C functions like stopWithError
and printValue
; that is, we will use the x86-64 POSIX C calling conventions. For function calls within our own language, though, we will use a different set of rules which we will call bird calling conventions. These conventions are similar to those of C but are more suitable to our language’s runtime. They are described below:
- At start of call, the caller must:
- Push caller-saved registers onto the stack (or else expect their values to change). The following registers are caller-saved:
rax
,rcx
,rdx
,rsi
,rdi
,r8
,r9
,r10
, andr11
. - Push arguments onto the stack in reverse order. (The first argument is on the top of the stack in the lowest memory address.) Note that, unlike C, we will not use any registers for arguments.
- Use
call
instruction to start the function.
- Push caller-saved registers onto the stack (or else expect their values to change). The following registers are caller-saved:
- When the call begins, the callee must
- Push
rbp
onto the stack (to save it). - Copy
rsp
intorbp
. - Adjust
rsp
to make room for local variables. (This means that environment offsets should be computed fromrbp
now.) - Push callee-saved registers (or else do not change their values). The following registers are callee-saved:
rbx
,r12
,r13
,r14
, andr15
. The registersrsp
andrbp
are also callee-saved but are preserved separately in the previous steps.
- Push
- When call is finished, callee must
- Leave return value in
rax
. - Restore callee-saved registers by popping them.
- Restore old top of stack:
mov rsp, rbp
- Restore old base pointer:
pop rbp
- Execute the
ret
instruction (to pop return address into instruction register)
- Leave return value in
- At end of call, caller must
- Remove the arguments from the stack
- Restore caller-saved registers
The only real difference in the two different calling conventions is that the bird calling conventions place all arguments on the stack while the C calling conventions put some arguments in registers. Putting all of our arguments on the stack simplifies the process of calling functions. The C approach of using registers offers opportunities for performance optimizations, but implementing those optimizations is outside the scope of this course.
Implementing the Dove Compiler
Here’s a rough outline of how to approach this lab:
-
Update your code so that your Cardinal unit tests pass. Initially, ignore the declarations that appear in the program so you can get your unit tests working again.
-
Add function declarations and function calls. Don’t worry about error handling for now; only test good programs. Make sure to use the Bird calling conventions above for calls from Dove code to Dove code. Here are a few things to think about:
-
You’ll need new code to add parameters to the environment when compiling a function body. If a function has two parameters, for instance, it shouldn’t start with an empty environment; it should start with an environment with both of those parameters mapped to positive offsets. You may wish to write some unit tests for that environment function (without using
testUtils.ml
) to make sure your new code does what you think it should. -
You’ll need code to compile a function declaration into assembly code. This should probably be a completely different function from any of those which exist already; don’t hesitate to write a
compile_function_declaration
routine or something similar. -
Make sure to test a variety of cases. In particular, you should test functions with varying parameter counts. It’s also easy to get things backwards when passing arguments on the stack, so make sure to write a function where the parameters are involved in an order-dependent operation, like subtraction or printing.
-
-
Add compile-time error checking. The new file
src/compiler/wellformedness.ml
contains some stubs for you to use; you should call these functions as one of your first steps incompile_program
. Write unit tests for your compile-time errors. As usual, be sure to be thorough.
Callee-saved registers and bird_main
The bird compilers we have developed so far have been breaking the C calling convetions! In our driver.c
file, the bird_main
function is called as a C function, but it does not preserve the state of callee-saved registers. Fortunately, our driver is so simple that the compiled driver.o
generated by clang
does not rely upon those registers. For the Dove compiler, however, you should correct this problem. Note that, while each of your Dove functions will save the callee-saved registers from the Bird calling conventions, your bird_main
must be a C callee and so will save the callee-saved registers from the C calling conventions.
OCaml Data Structures
You may find that, in addition to the dictionary keyed by strings defined in Map.String
, you may want a set of strings. A similar module, Set.String
, is defined by Batteries Included and is documented here. Just like dictionaries, these sets are immutable and all modifying functions derive a new set. While it’s not necessary to use this type, it’s likely to be helpful when identifying duplicate function or parameter names.
Summary
To complete this assignment, you’ll need to
- Update your compiler to compile
program
ASTs - Add assembly code generation for function declarations and function calls
- Observe bird calling conventions when Dove code calls other Dove code
- Fix the fact that
bird_main
has not obeyed the C callee conventions
- Add compile-time error checking to report all of the errors described above
Submitting
It is not enough to push your work. In your repository, you will find a Python script called submit.py
. In order to submit your lab assignment, it should be sufficient to run that script by typing python3 submit.py -a dove
in your terminal. For this script to work correctly, you must have committed all uncommitted files in your repository and pushed your work so that your repository is in sync with your GitHub remote repository. Note that, because your compiler repository will be used for multiple different assignments, you must specify which assignment you are submitting when you run the submit script.
The aforementioned Python script is designed to create a Git tag: a name for a specific commit in your repository. (The script also performs a series of checks to help ensure that the tag you create contains what you likely think it contains.) The tag will be named using the words “submission”, the name of your assignment, and a version number (such as submission_dove_01
). Your instructor will know that you have submitted your work because this tag will appear in your repository. If you need to resubmit your work to correct changes after receiving a grade, you can simply create new commits and then create another tag (preferrably with submit.py
). At any given time, only your most recent outstanding submission will be graded.
In addition to submitting your work, you should fill out a brief questionnaire found here. In most cases, the questionnaire should take less than a minute.
If You Have Trouble…
…then please contact your instructor! The course forum is the preferred method, but you can reach out via e-mail as well. Good luck!