Diamondback
Due on Monday, March 6th at 11:59 PM. This is a compiler lab. If you have a partner, the two of you will complete this lab as a team. If you do not have a partner, you are effectively in a team by yourself.
If you are working with a partner, please be familiar with the Partner Etiquette guidelines. You and your partner share a single repository and, barring unusual circumstances, will receive the same grade. If you experience any difficulties in your partnership, please alert your instructor as soon as possible.
If you are working alone, please don’t hesitate to seek help if you get stuck. Since there are no ninjas for this course and you don’t have a partner, your instructor is the only interactive resource available to you and is happy to help you through any problems you might encounter.
In either case, be familiar with the academic integrity policy! You are permitted to discuss high-level details of your compiler project with other students, but you should never see or share anything that will be turned in for a grade. If you need help, please post on the Piazza forum or, if necessary, contact the instructor directly. Make sure to post privately if you are sharing code. If you have any doubts about what is okay and what is not, it’s much safer to ask than to risk violating the Academic Integrity Policy.
Overview
This lab will extend the Cobra compiler to a new language called Diamondback. Diamondback includes first-order functions (similar to C functions) and function calls. It also adds some compile-time error checking related to functions.
Transitioning to Diamondback
As usual, you will find some new files in your repository when you git pull
:
src/languages/diamondback/
, a directory containing the lexer, parser, and AST of the Diamondback language. Make sure to readexpressions.ml
carefully, as there are new types in both the surface language and the A-normal form which address the definition of functions.src/wellformedness.ml
, a file where you will write compile-time error checks for Diamondback.
As usual, we will begin by changing the LANGUAGE
variable in language.mk
from cobra
to diamondback
. When you do this, you will not be able to build your program as the parser is now returning an AST of type program
rather than an AST of type expr
. This is expected.
Your first task will be to update compile_program
and compile_to_assembly_code
to take program
as an argument. For now, just make your code ignore the list of declarations in the program
value and get everything working again. Your unit tests will help you confirm that you have done so. As part of this process, you’ll have to write a new function or two to A-normalize program
values to a_program
values.
The Diamondback Language
In the Diamondback language, programs are a list of declarations followed by an expression. The expression is essentially the “main” part of the program; the declarations allow you to create functions that expressions can call. Below, we discuss the syntax and semantics for these declarations.
Concrete Syntax
Here, we use a common convention that our language starts at the first non-terminal – program
– below. The concrete syntax of Diamondback is:
<program> ::=
| <declaration_list> <expression>
| <expression>
<declaration_list> ::=
| <declaration> <declaration_list>
| <declaration>
<declaration> ::=
| def <identifier>(<parameter_list>) <expression> end
| def <identifier>() <expression> end
<parameter_list> ::=
| <identifier> , <parameter_list>
| <identifier>
<expression> ::=
| <integer>
| true
| false
| <identifier>
| let <identifier> = <expression> in <expression>
| if <expression> then <expression> else <expression>
| inc(<expression>)
| dec(<expression>)
| print(<expression>)
| isbool(<expression>)
| isint(<expression>)
| <expression> + <expression>
| <expression> - <expression>
| <expression> * <expression>
| <expression> < <expression>
| <expression> > <expression>
| <expression> = <expression>
| <expression> && <expression>
| <expression> || <expression>
| <identifier>(<argument_list>)
| <identifier>()
| (<expression>)
<argument_list> ::=
| <expression> , <argument_list>
| <expression>
Syntactic precedence is the same as in Cobra.
Abstract Syntax
The abstract syntax of Diamondback has additional types: declaration
and program
. Correspondingly, the A-normal form has types a_declaration
and a_program
. Make sure to examine src/language/diamondback/expressions.ml
to see how the abstract grammar has changed.
Semantics
Function declarations in Diamondback have semantics much like those of Python or OCaml. When a function is called, each argument is evaluated from left to right. The values of those arguments are stored in the parameter variables of the function and the body of the function is executed. The result of the function call is the value returned by the expression in the function body. You must generate code that conforms to the C calling conventions.
Here are some examples of Diamondback code:
-
The code
def f(x) x + 1 end f(4)
returns the value
5
: the variablex
is set to the argument (4
) and the body of the function returns one more than that value. -
The code
def f(x) x * 2 end def g(y) f(y) + f(4) end g(5)
produces the value
18
. This is becauseg(5)
is equivalent tof(5) + f(4)
, which is equivalent to5*2+4*2
. -
We allow functions in Diamondback to be recursive with no declarations or syntax. The code
def f(x) if x > 1 then x + f(x-1) else x end f(5)
is legal and will return the value
15
(that is,5+4+3+2+1
). -
We allow functions in Diamondback to be mutually recursive. The code
def f(x) if x > 1 then g(x) else x end def g(x) f(x - 1) end f(4)
is legal and will return
1
. -
Note that functions of multiple arguments and functions of no arguments are permitted. Further, function calls are expressions and so may appear anywhere that an expression may appear. The code
def f() 5 end def g(x,y) x > y end g(f(),4)
will return
true
.
Compile-Time Errors
The introduction of function definitions and function calls also introduces several new forms of mistake that a programmer can make which we can detect at compile time. Your Diamondback compiler will be required to detect and handle several classes of errors:
-
The programmer might call a function that has not been declared. If this occurs for a function
fn
, your compiler must emit an error message containing the strings “function fn
” and “not defined
”. For instance, the programdef f() 5 end g()
might generate the error “
Function g is not defined.
”. -
The programmer might call a function with the wrong number of arguments. If this occurs for a function
fn
, your compiler must emit an error message containing the strings “function fn
” and “number of arguments
”. For instance, the programdef f() 5 end f(true)
might generate the error “
Function f is called with an incorrect number of arguments.
”. -
The programmer might declare a function with duplicate parameter names. If this occurs for a function
fn
and parameterp
, your compiler must emit an error message containing the strings “duplicate
”, “parameter p
”, and “function fn
”. For instance, the programdef f(x,y,x) 0 end 4
might generate the error “
Function f declares a duplicate parameter x.
”. -
The programmer might declare multiple functions with the same name. If this occurs for a function
fn
, your compiler must emit an error message containing the strings “duplicate definition
” and “function fn
”. For instance, the programdef f() 0 end def f(x) x end f()
might generate the error “
Duplicate definition of function f.
”. -
As before, the programmer might try to use an undefined variable. If this occurs for variable
x
, your compiler must emit an error message containing the strings “unbound
” and “variable x
”. For instance, the programdef f(x) y end 4
might generate the error “
Unbound variable x.
”
Report all the errors!
Unlike our previous compilers, the Diamondback compiler must report all errors for a given file. For instance, the code
def f(x)
g(y,z)
end
f()
contains four errors: an unbound variable y
, an unbound variable z
, an undefined function g
, and an argument count error where f
is called. The compiler must report all of these errors, like so:
$ ./hatch.byte above_code.snake
Unbound variable y.
Unbound variable z.
Function g is not defined.
Function f is called with an incorrect number of arguments.
$
The order of the errors is not relevant, but each error must appear on its own line.
To do this, you will need to write compile-time error checks for your AST and use them before proceeding with compilation. Note that you will not be able to rely upon your previous code for detecting unbound variables; that code throws an exception when it encounters a single variable and does not take any other errors into account. Instead, you will need to recurse over the AST and collect all of the errors it contains.
A small stub is provided in wellformedness.ml
to give you an approach to use. If you use this approach, you will need to modify the compile_to_assembly_code
function in builder.ml
(not your own version in compiler.ml
) so that it catches the IllFormed
exception and throws a BuildFailure
with the appropriate text. The command-line-handling code in hatch.ml
will catch the BuildFailure
exception and print the message it contains.
Runtime errors
You should report the same runtime errors that you did in the Cobra lab, using the same exit codes:
- If an operation expects an integer but gets something else, exit with error code
1
. - If an operation expects a boolean but gets something else, exit with error code
2
. - If an arithmetic operation overflows, exit with error code
3
.
Implementing the Diamondback Compiler
Here’s a rough outline of how to approach this lab:
-
Change
language.mk
to set theLANGUAGE
variable todiamondback
rather thancobra
. -
Update your code so that your Cobra unit tests pass. Initially, ignore the declarations that appear in the program so you can get your unit tests working again.
-
Add function declarations and function calls. Don’t worry about error handling for now; only test good programs. As you write your code, add thorough unit tests for a variety of cases. For instance, make sure to test calling functions with zero, one, or more parameters, especially in cases where the order of the arguments matters (e.g. subtraction). Use what you learned about C calling conventions in the Cobra lab to implement your function calls and returns correctly.
-
Add error checking. Write unit tests for your compile-time errors. Be sure to be thorough.
OCaml Data Structures
In writing your compile-time error checking functions, you may wish to use StringMap
from the Environment
module. If you do so, it would be best to make a new module (e.g. utils.ml
) and put the declaration of StringMap
there. After all, well-formedness has nothing to do with the environment you use to compile your code.
You may also find that you need a set of strings as well as a dictionary keyed by strings. You can create such a module in a similar fashion:
module StringSet = Set.Make(
struct
type t = string
let compare = Pervasives.compare
end
);;
The StringSet
module will then contain functions for using sets in OCaml (e.g. StringSet.union
). Just like OCaml’s dictionaries, these sets are immutable and all operations create a new value.
You can also avoid creating this module by using StringMap
with a value type of unit
. You’re free to do so if you like, though use of StringSet
is better practice. You’re also welcome to complete the error checking without using either StringMap
or StringSet
; it’s up to you.
Debugging with gdb
You can use gdb
to step-debug your compiled programs! You can start by running gdb
from the command line just as you would for a C or C++ program:
$ gdb output/program.run
When you are greeted with the gdb
prompt, you might want to enter the following commands:
set disassembly-flavor intel
: This tellsgdb
to display assembly language in the Intel syntax we’ve been using in class rather than the AT&T syntax (which is the default).layout asm
: Displays the disassembly of your code.layout regs
: Displays the processor’s registers as you debug the program.break snake_main
: Stops execution upon enteringsnake_main
. After you’ve issued this command, you can typerun
to run your program up to that point.
To step through your program, use si
or stepi
(not s
or step
); this will step one instruction at a time. You can likewise use ni
or nexti
to step past function calls and finish
will allow you to leave a function call you are currently in.
Summary
To complete this assignment, you’ll need to
- Update your compiler to receive, A-normalize, and compile
program
ASTs - Add assembly code generation for function declarations and function calls
- Add compile-time error checking to report all of the errors described above
Submitting
To submit your lab, just commit and push your work. The most recent pushed commit as of the due date will be graded. For more information on when assignments are due, see the Policies page.
If You Have Trouble…
…then please contact your instructor! Piazza is the preferred method, but you can reach out via e-mail as well. Good luck!