Adder
Due on Monday, February 6th at 11:59 PM. This is a compiler lab. If you have a partner, the two of you will complete this lab as a team. If you do not have a partner, you are effectively in a team by yourself.
If you are working with a partner, please be familiar with the Partner Etiquette guidelines. You and your partner share a single repository and, barring unusual circumstances, will receive the same grade. If you experience any difficulties in your partnership, please alert your instructor as soon as possible.
If you are working alone, please don’t hesitate to seek help if you get stuck. Since there are no ninjas for this course and you don’t have a partner, your instructor is the only interactive resource available to you and is happy to help you through any problems you might encounter.
In either case, be familiar with the academic integrity policy! You are permitted to discuss high-level details of your compiler project with other students, but you should never see or share anything that will be turned in for a grade. If you need help, please post on the Piazza forum or, if necessary, contact the instructor directly. Make sure to post privately if you are sharing code. If you have any doubts about what is okay and what is not, it’s much safer to ask than to risk violating the Academic Integrity Policy.
Overview
This is the first graded lab in which you will produce a compiler. If you are working alone, your “team name” is your Swarthmore username. If you are working with a partner, your team name was sent to you in an e-mail. Your compiler work will be done in a GitHub repository at the following URL:
git@github.swarthmore.edu:cs75-s17/compiler-<teamname>
You will be maintaining this repository throughout the course of the semester; all of your compiler lab assignments will be submitted by modifying this code and pushing your updates to the GitHub server. You are permitted to manage this repository in most any way you like so long as the command-line interface for using the compiler remains the same (though you might want to check with your instructor before making sweeping changes!). You are permitted (and encouraged!) to create new modules for helper functions, split the existing modules into multiple parts, or even change how your compiler is built.
Getting To Know Your Code
Your repository starts with several files to get you started. Since you’ll be in charge of this code, we’ll begin by introducing you to it. Some of the files are critical to the compiler and you’ll likely be changing them every week; other files contain boilerplate for e.g. running clang
to link object files and you probably won’t need to change it at all. Here’s a summary of what the starter code does for you:
-
A
Makefile
has been provided which can build both your compiler and a unit test program for it. ThisMakefile
is a front end forocamlbuild
, a tool specifically designed to sort out the dependencies in OCaml programs to build them efficiently. The_tags
file is a data file forocamlbuild
to control how it calls the OCaml compiler. -
The
src/
directory contains all of your compiler’s source code. Thesrc/hatch.ml
file acts as the entry point for the compiler. It just gets the command line argument naming the source code file and sends it tosrc/builder.ml
, which goes through a building process very similar to that of our recent Neonate exercise: parse the source file, transform it into assembly, assemble the result into a program object, and link that with the driver to get the end result. -
As we move through the course, we will compile more complex languages. The
src/language/
directory will contain a subdirectory for each language we will be compiling; right now,src/language/adder/
is the only directory there. This file contains the code for the lexer and parser of the Adder language. It also containssrc/language/adder/expressions.ml
, which declares the AST type for this language. -
The runtime environment – the code used by your compiled programs when they execute – appears in the
resources/
directory. Currently, there is just adriver.c
file, which should look mostly like the driver from the Neonate lab. -
For this assignment, you probably only need to update
src/assemblyLanguage.ml
,src/compiler.ml
,src/environment.ml
, andsrc/tests.ml
. The discussion of your actual assignment appears below. -
Though you’re welcome to read the other files as well, the only support file you’ll really need to be familiar with is
testUtils.ml
: a module containing helper functions for building tests. -
Some example programs for you to experiment with are in the
test_code
directory. For this course, our programs will be saved in text files with the extension.snake
.
Building Your Code
The starter code is ready to be compiled. You can build your compiler just by typing
make
This will create a file hatch.byte
that contains your compiler. You can use this compiler to compile some code by typing e.g.:
./hatch.byte test_code/4.snake
This will fail, since we haven’t written any of the important compiler functions yet.
In order to compile your tests, you can use the similar
make tests
which makes a test program called tests.byte
. You can run this program just with
./tests.byte
which will report two failing tests because, again, we haven’t done any of the work yet.
The Adder Language
Each compiler lab will introduce you to the language you will be compiling; most of these languages build upon the previous language. Our first language, Adder, only has a few features: variables, integers, and very basic operations on them.
For each language, we must define
- The concrete syntax: the text the programmer writes
- The abstract syntax: the data structure representing these programs in our compiler
- The semantics: the behavior of the language, so our compiler knows what the code it generates should do
Concrete Syntax
The concrete syntax of Adder is:
<expression> ::=
| <integer>
| <identifier>
| let <identifier> = <expression> in <expression>
| inc(<expression>)
| dec(<expression>)
| (<expression>)
Abstract Syntax
The abstract syntax of Adder can be found in the file src/language/adder/expressions.ml
: it is the expr
data type. Note that the abstract syntax does not have the parenthetical expression (<expression>)
above. This is because parentheticals like this only affect the shape of the AST, not its contents, and so can be handled by the parser.
Semantics
An Adder program always evaluates to a single number, which the driver will print. The inc
and dec
built-in expressions increase and decrease (respectively) their argument by one. let
expressions bind a new variable to the result of an expression within another expression; the variable bound by the let is only in scope in the second expression. Here are some examples of Adder programs:
Concrete syntax | Abstract syntax | Evaluated result |
---|---|---|
4 |
EInt(4) |
4 |
dec(inc(inc(7))) |
EUnaryOp(OpDec,EUnaryOp(OpInc,EUnaryOp(OpInc,EInt(7)))) |
8 |
let x = 5 in dec(x) |
ELet("x",EInt(5),EUnaryOp(OpDec,EVar("x"))) |
4 |
let x = (let y = 4 in inc(y)) in inc(x) |
ELet("x",ELet("y",EInt(4),EUnaryOp(OpInc,EVar("y"))),EUnaryOp(OpInc,EVar("x"))) |
6 |
let x = inc(4) in y |
ELet("x",EUnaryOp(OpInc,EInt(4)),EVar("y")) |
compile error |
In the last example, the variable y
is unbound. In this case, your compiler must throw an exception of type UnboundVariable
, which is an exception type defined in compiler.ml
. See below for more information.
Implementing the Adder Compiler
In the starter code,
- the parser has been provided,
- the assembling and linking code has been provided (which just call out to
nasm
andclang
), - but none of the provided code compiles the AST into assembly language
Your primary task, then, is to add this functionality. You may do so in any way you see fit, but you are recommended to take the following path:
-
Start in
assemblyLanguage.ml
for a warm-up. This file contains some data types describing the AST for a small subset of Intel x86 assembly language. There are five functions here which can turn the assembly language data types into strings of assembly code; they are used by the compiler to generate the file given tonasm
. Start by implementing these functions. You can write tests if you like. -
Next, move to
compiler.ml
. Thecompile_with_environment
function is used to transform an Adder AST into a list of assembly instructions. Complete the generation of those instructions for the integer,inc
, anddec
expressions. For now, skip thelet
and variable expressions. Your compiler should now work on programs with just integers,inc
, anddec
. Try it out! -
Next, move to
environment.ml
. This file will contain tools you will use to manipulate your compiler’s variable environment. Right now (to make the previous step easier), theenvironment
type is declared to beunit
. Use the following replacement definitions instead:type environment = int StringMap.t * int;; let empty_environment = (StringMap.empty, 0);;
The
0
here indicates the number of bytes of stack memory used for the environment; that number will change as you allocate variables. TheStringMap.t
is used to map the names of variables to theESP
byte offset where that variable is stored at runtime.Then, write functions to help you manipulate the environment. One function,
environment_add_var
, should take anenvironment
and avariable_name
and give you back a newenvironment
in which that variable has been assigned new memory. (Make sure to assign memory four bytes at a time!) Another function,environment_get_var_offset
, retrieves the offset for a particular variable in the environment.You can rename these functions, add other functions, or not use any of the above; this is simply a recommendation which is known to work.
-
Now that you have defined a data structure to keep track of your environment, go back to
compiler.ml
and implement code for thelet
and variable expressions. Start by ignoring unbound variables and getting the compiler to work on good code. After that works, update your code to handle unbound variables. When you need to throw an exception, you can use the syntaxraise (UnboundVariable(x, env))
Testing the Adder Compiler
You can test your compiler directly after you make
the program by running hatch.byte
on a .snake
file (e.g. hatch.byte test_code/4.snake
). All code that the compiler generates gets stored in the output
directory. If something goes wrong in assembling or linking your code, you can see this in the files stored in the logs
directory (which capture the output and error messages of the nasm
and clang
calls we make).
You can also unit test your compiler. The src/testUtils.ml
file contains two functions that will help you build OUnit tests. The first, test_success
, takes the name of a source file and the output that you expect that source to produce when it is compiled and run. The second, test_compile_error
, takes the name of a source file and the output you expect the compiler to produce when the code fails to compile. (If the code compiles without complaint, the test_compile_error
test fails.)
You can see some examples of how to use these functions in the provided tests.ml
file. For this assignment only, you are required to write some tests that exercise the Adder language. Specifically, you must write tests to verify the behavior of each of the expressions (integer, variable, let
, inc
, and dec
) as well as an additional test to make sure you are handling unbound variables correctly.
You are encouraged to use the test_success
and test_compile_error
functions to create tests. Don’t let this stop you from writing other kinds of tests, though! For instance, you can write tests to determine if a particular AST becomes a particular x86 instruction list without parsing or linking anything: just write the same kind of unit test code as in your previous labs.
Future assignments will not require you to write unit tests, but you definitely should. As the semester progresses, the unit tests that you write now will save you time in each future assignment. You can do your future self a favor by getting into the habit of writing unit tests!
Debugging the Adder Compiler
There are a variety of techniques you can use to debug your compiler:
- You can use print debugging. See the second OCaml lab for a description of how that works in OCaml.
- You can use
ocamldebug
. This is OCaml’s equivalent ofgdb
and works much the same. In some situations, it even lets you step backwards through your code. - You can take a look at the files in the
logs/
directory. If something goes wrong with assembling or linking your code, the files in that directory will contain the messages that the compilers would provide to you. You can also look at the.s
files that your compiler generated to look for syntax errors or other problems.
Summary
To complete this assignment, you’ll need to
- Write code to turn assembly instructions into text to put into
.s
files - Write code to turn the Adder AST into assembly instructions
- Make sure your compiler throws an exception when unbound variables are used
- Write the unit tests described above
Submitting
To submit your lab, just commit and push your work. The most recent pushed commit as of the due date will be graded. For more information on when assignments are due, see the Policies page.
If You Have Trouble…
…then please contact your instructor! Piazza is the preferred method, but you can reach out via e-mail as well. Good luck!