Auklet
Due on Thursday, February 10th at 11:59 PM. This is a compiler lab. If you have a partner, the two of you will complete this lab as a team. If you do not have a partner, you are effectively in a team by yourself.
If you are working with a partner, please be familiar with the Partner Etiquette guidelines. You and your partner share a single repository and, barring unusual circumstances, will receive the same grade. If you experience any difficulties in your partnership, please alert your instructor as soon as possible.
If you are working alone, please don’t hesitate to seek help if you get stuck. Since there are no ninjas for this course and you don’t have a partner, your instructor is the only interactive resource available to you and is happy to help you through any problems you might encounter.
In either case, be familiar with the academic integrity policy! You are permitted to discuss high-level details of your compiler project with other students, but you should never see or share anything that will be turned in for a grade. If you need help, please post on the course forum or, if necessary, contact the instructor directly. Make sure to post privately if you are sharing code. If you have any doubts about what is okay and what is not, it’s much safer to ask than to risk violating the Academic Integrity Policy.
Overview
This is the first graded lab in which you will produce a compiler. Your team name was sent to you prior to the start of this lab. Your compiler work will be done in a GitHub repository at the following URL:
git@github.swarthmore.edu:cs75-s22/compiler-<teamname>
You will be maintaining this repository throughout the course of the semester; all of your compiler lab assignments will be submitted by modifying this code and pushing your updates to the GitHub server. You are permitted to manage this repository in most any way you like so long as the command-line interface for using the compiler remains the same (though you might want to check with your instructor before making sweeping changes!). You are permitted (and encouraged!) to create new modules for helper functions, split the existing modules into multiple parts, or even change how your compiler is built.
Getting To Know Your Code
Your repository starts with several files to get you started. Since you’ll be in charge of this code, we’ll begin by introducing you to it. Some of the files are critical to the compiler and you’ll likely be changing them every week; other files contain boilerplate for e.g. running clang
to link object files and you probably won’t need to change it at all. Here’s a summary of what the starter code does for you:
-
A
Makefile
has been provided which can build both your compiler and a unit test program for it. As with your introductory labs, thisMakefile
just runs the build tooldune
for us. Thedune
anddune-project
files describe the dependencies we need and other build configuration details. -
The
src/
directory contains subdirectories which hold all of your compiler’s source code. Thesrc/main/hatch.ml
file acts as the entry point for the compiler. It just gets the command line argument naming the source code file and sends it tosrc/compiler/builder.ml
, which goes through a building process very similar to that of our recent Hatchling exercise: parse the source file, transform it into assembly, assemble the result into a program object, and link that with the driver to get the end result. -
The definition for the language we are compiling appears in the
src/language/
directory. This directory contains a fileasts.ml
which contains the OCaml AST definitions of our Auklet language; later labs will expand on this definition. There are also definitions for the tools used to transform strings into ASTs, but we don’t need to look at them for now. -
The runtime environment – the code used by your compiled programs when they execute – appears in the
resources/
directory. Currently, there is just adriver.c
file, which should look mostly like the driver from the Hatchling lab. -
For this assignment, you probably only need to update
src/compiler/assembly.ml
,src/compiler/compiler.ml
,src/compiler/environment.ml
, andsrc/tests/tests.ml
. The discussion of your actual assignment appears below. -
Though you’re welcome to read the other files as well, the only support file you’ll really need to be familiar with is
testUtils.ml
: a module containing helper functions for building tests. -
Some example programs for you to experiment with are in the
test_code
directory. For this course, our programs will be saved in text files with the extension.bird
.
Building Your Code
The starter code is ready to be compiled. You can build your compiler just by typing
make
This will create a file hatch
that contains your compiler. (It’s actually a link to a file in the _build
directory containing your compiler. The link is for convenience.) You can use this compiler to compile some code by typing e.g.:
./hatch test_code/4.bird
This will fail, since we haven’t written any of the important compiler functions yet.
The make
command will also compile your unit tests into a program called tests
. You can run this program as usual:
./tests
This will report two failing tests because, again, we haven’t done any of the work yet.
The Auklet Language
Each compiler lab will introduce you to the language you will be compiling; most of these languages build upon the previous language. Our first language, Auklet, only has a few features: variables, integers, and very basic operations on them.
For each language, we must define
- The concrete syntax: the text the programmer writes
- The abstract syntax: the data structure representing these programs in our compiler
- The semantics: the behavior of the language, so our compiler knows what the code it generates should do
Concrete Syntax
The concrete syntax of Auklet is:
<expression> ::=
| <integer>
| after(<expression>)
| before(<expression>)
| <expression> + <expression>
| <expression> - <expression>
| <expression> * <expression>
| (<expression>)
Abstract Syntax
The abstract syntax of Auklet can be found in the file src/language/asts.ml
: it is the expr
data type. Note that the abstract syntax does not have the parenthetical expression (<expression>)
above. This is because parentheticals like this only affect the shape of the AST, not its contents, and so can be handled by the parser.
show
Functions
In examining the ASTs, you will likely notice an OCaml syntax we did not discuss. The unary_operator
type, for instance, looks like this:
type unary_operator =
| OpAfter
| OpBefore
[@@deriving eq, ord, show]
;;
The [@@deriving ...]
part of the declaration is called a PPX extension and works sort of like a template: it represents code that will be generated for you by the compiler. This PPX extension doesn’t affect the meaning of the type and you can ignore it if you wish.
The important part for you is that this PPX extension generates a show
function for the type. For instance, the above declaration creates a function named show_unary_operator
with type unary_operator -> string
, which means you can write e.g.
print_endline (show_unary_operator OpAfter)
and the string “OpAfter
” will be printed. This will be especially helpful in your unit tests. We didn’t use this extension for assembly code because the output is always in an OCaml-like form, which doesn’t match the concrete syntax for x86 assembly. It is useful for debugging and testing purposes, though!
Semantics
An Auklet program always evaluates to a single number which the driver will print. The after
and before
built-in expressions increase and decrease (respectively) their argument by one. let
expressions bind a new variable to the result of an expression within another expression; the variable bound by the let
is only in scope in the second expression. Binary arithmetic operators behave just as they do in mathematics. Here are some examples of Auklet programs:
Concrete syntax | Abstract syntax | Evaluated result |
---|---|---|
4 |
EInt(4) |
4 |
before(after(after(7))) |
EUnaryOp(OpBefore,EUnaryOp(OpAfter,EUnaryOp(OpAfter,EInt(7)))) |
8 |
4 + 2 |
EBinaryOp(OpPlus,EInt(4),EInt(2)) |
6 |
before(5) - 4 |
EBinaryOp(OpMinus, EUnaryOp(OpBefore,EInt(5)), EInt(4)) |
0 |
let x = 1 in after(x) |
ELet("x",EInt(1),EUnaryOp(OpAfter,EVar("x"))) |
2 |
let x = let y = 4 in y + 1 in x + x |
ELet("x",ELet("y",EInt(4),EBinaryOp(OpPlus,EVar("y"),EInt(1))),EBinaryOp(OpPlus,EVar("x"),EVar("x"))) |
10 |
4 + a |
EBinaryOp(OpPlus, EInt(4), EVar("a")) |
compile error |
Note that the last example shows a case in which a variable is used without being defined. This doesn’t make any sense: it has no meaning and so we can’t sensibly translate it to assembly. If your compiler encounters an undefined variable use in the program it is compiling, it should raise the UnboundVariable
exception found in the environment.ml
file; the starter code will catch that exception and take care of translating it into a readable error message.
Implementing the Auklet Compiler
In the starter code,
- the parser has been provided,
- the assembling and linking code has been provided (which just call out to
nasm
andclang
), - but none of the provided code compiles the AST into assembly language
Your primary task, then, is to add this functionality. You may do so in any way you see fit, but you are recommended to take the following path:
-
Start in
assembly.ml
for a warm-up. This file contains some data types describing the AST for a small subset of Intel x86 assembly language. There are five functions here which can turn the assembly language data types into strings of assembly code; they are used by the compiler to generate the file given tonasm
. Start by implementing these functions. You can write tests if you like.Note that a Guide to x86 Assembly from the University of Virginia may be helpful here. In particular, the binary operator instructions will be
add
,sub
, andimul
. Themul
instruction does not behave the way thatadd
andsub
do; theimul
instruction will be much more familiar. -
Next, move to
compiler.ml
. Thecompile_expression
function is used to transform an Auklet AST into a list of assembly instructions. Complete the generation of those instructions for the integer,after
, andbefore
expressions. For now, skip the other expressions. Your compiler should now work on programs with just integers,after
, andbefore
. Try it out! -
Once you have unary operators working, you should consider
let
expressions. As we discussed in class,let
expressions compute the result of an expression, store that result in a variable, and then execute a second expression. In the second expression, the variable name can be used to refer to the result of the first expression. We accomplish this in our compiler by reserving memory on the stack to store the result of the first expression. To keep track of how we are using stack memory, we will use the tools inenvironment.ml
.In this setting, the “environment” is a description of where information will be stored when the compiled program is running. For instance, in the expression
let x = after(4) in before(x)
we are expected to store the value
4
somewhere so we can then retrieve it and use it in thebefore
expression. This requires us to set aside a location in the stack where we will put this value at runtime. If the above expression is our whole program, then we can put thex
value at the position[esp-8]
. If the above expression is part of a larger program, however, that location might already be taken by something else. We will use theenvironment
data type to remember what stack memory we have already used and what stack memory is available to be allocated.The
environment
data type is defined inenvironment.ml
as a pair between an integer (the next free stack address) and a dictionary mapping strings (variable names) to assembly arguments like[rsp-8]
. The dictionary has the somewhat obtuse typeargument Map.String.t
, but you can manipulate it using functions in theMap.String
module. The documentation on that module is written generically, using the type variablekey
rather thanstring
and the type variable'a
rather thanargument
. For instance, the functionMap.String.find_opt
takes astring
key and anargument Map.String.t
dictionary and gives back anargument option
(which has valueNone
if the key isn’t in the dictionary).In
environment.ml
, you will find a stub for a functionallocate_named_variable
. This function takes a variable name (as astring
) and an existing environment. It gives back anaddress
where the named variable can be stored when the program is running and a new environment. (Remember: we’re approaching this in an immutable style, so the old environment doesn’t change; we just make a new one.) You should also implementfind_named_variable
, which takes a variable name and an environment and looks up the location where that variable was previously stored.Remember that you’re free to make any changes to this file you’d like, add helper functions, or even change the type of
environment
as long as your compiler works correctly. -
Now that you have an environment that can allocate named variables, you can return to
compiler.ml
to add support forlet
expressions via theELet
andEVar
constructors. Note that thecompile_program
function has already been written to pass an empty environment to the first call tocompile_expression
. You just need to use the environment passed tocompile_expression
(and to pass the new environment to recursive calls as appropriate) to allocate stack memory correctly. -
Finally, you’ll need to repeat steps three and four for binary operators. The user does not name the subexpressions for a binary operator; that is, in
1 * 2 + 3 * 4
, the subexpression1 * 2
does not have a name. Nonetheless, you can allocate a stack address for it! The stuballocate_temp_variable
inenvironment.ml
is meant for just this purpose. Implement that function and then return tocompile_expression
to add code to handle binary expressions appropriately. Remember to be careful about evaluation order: you must evaluate the left subexpression first!
Testing the Auklet Compiler
You can test your compiler directly after you make
the program by running hatch
on a .bird
file (e.g. ./hatch test_code/4.bird
). All code that the compiler generates gets stored in the output
directory. If something goes wrong in assembling or linking your code, you can see this in the files stored in the logs
directory (which capture the output and error messages of the nasm
and clang
calls we make).
You can also unit test your compiler. The src/testUtils.ml
file contains functions that will help you build OUnit tests. For this lab, test_success
is the most useful. This function takes the name of a source file and the output that you expect that source to produce when it is compiled and run.
You can see some examples of how to use these functions in the provided tests.ml
file. For this assignment only, you are required to write some tests that exercise the Auklet language. Specifically, you must write tests to verify the behavior of each of the expressions (integers, unary operators, let
expressions, and binary operators). You should make sure you nest these expressions so you can see if your environment is working correctly for deeper ASTs (e.g. 1 * 2 + 3 * 4
).
You are encouraged to use the test_success
function to create tests. Don’t let this stop you from writing other kinds of tests, though! For instance, you can write tests to determine if your environment
type is behaving the way it should or if your assembly instructions convert correctly to a string. To do this, just write the same kind of unit test code as in your previous labs.
Future assignments will not require you to write unit tests, but you definitely should. As the semester progresses, the unit tests that you write now will save you time in each future assignment. You can do your future self a favor by getting into the habit of writing unit tests!
Debugging the Auklet Compiler
There are a variety of techniques you can use to debug your compiler:
- You can use print debugging. See the second OCaml lab for a description of how that works in OCaml.
- You can use
ocamldebug
. This is OCaml’s equivalent ofgdb
and works much the same. In some situations, it even lets you step backwards through your code. - You can take a look at the files in the
logs/
directory. If something goes wrong with assembling or linking your code, the files in that directory will contain the messages that the compilers would provide to you. You can also look at the.s
files that your compiler generated to look for syntax errors or other problems.
Summary
To complete this assignment, you’ll need to
- Write code to turn assembly instructions into text to put into
.s
files - Write code to turn the Auklet AST into assembly instructions, including
- integer and unary operations
let
-bindings and variable expressions- raising
UnboundVariable
errors as appropriate
- raising
- binary operations
- Write the unit tests described above
Submitting
It is not enough just to push your code. Due dates are flexible, so it is necessary for you to take action to make it clear which commit you would like to have graded. We will use Git tags for this. A tag is a way of giving a human-readable name to a particular commit. Unlike a branch, however, tags are expected not to change once they are created.
When you are finished working on this compiler assignment, commit and push your work. Then, once you are sure that there are no additional changes you need to make, run the following commands:
$ git tag auklet-submission
$ git push --tags
This will create a tag named auklet-submission
and push that tag to the Swarthmore GitHub Enterprise server. Your work on that tag will be graded.
In addition to pushing and tagging your work, you will need to fill out a brief questionnaire found here. In most cases, the questionnaire should take less than a minute. This questionnaire is required and will be used as part of your participation grade.
If You Have Trouble…
…then please contact your instructor! The course forum is the preferred method, but you can reach out via e-mail as well. Good luck!