Bluebird
Due on Wednesday, February 13th at 11:59 PM. This is a compiler lab. If you have a partner, the two of you will complete this lab as a team. If you do not have a partner, you are effectively in a team by yourself.
If you are working with a partner, please be familiar with the Partner Etiquette guidelines. You and your partner share a single repository and, barring unusual circumstances, will receive the same grade. If you experience any difficulties in your partnership, please alert your instructor as soon as possible.
If you are working alone, please don’t hesitate to seek help if you get stuck. Since there are no ninjas for this course and you don’t have a partner, your instructor is the only interactive resource available to you and is happy to help you through any problems you might encounter.
In either case, be familiar with the academic integrity policy! You are permitted to discuss high-level details of your compiler project with other students, but you should never see or share anything that will be turned in for a grade. If you need help, please post on the Piazza forum or, if necessary, contact the instructor directly. Make sure to post privately if you are sharing code. If you have any doubts about what is okay and what is not, it’s much safer to ask than to risk violating the Academic Integrity Policy.
Overview
In this lab, we will extend the Auklet compiler to a new language called Bluebird. The Bluebird language includes the features of the Auklet language as well as let
bindings and conditional expressions. Since we are extending your previous compiler, you will be using the same Git repository as you did for the previous assignment.
Note that you need to have finished Auklet before working on this assignment!
Transitioning to Bluebird
To transition between assignments, we will use Git branches. A branch names a point in time in your repository; your auklet
branch, for instance, points to the last commit that you made when working on the Auklet compiler. Branches are often used to keep track of multiple lines of development in a repository; a project may have a “stable” branch and an “experimental” branch, for example, where most development is done in the latter and distributions of software are done in the former.
We will use branches in this course to keep track of the different versions of your compiler. Your Bluebird starter code is in a branch called bluebird
which has been stored in your GitHub repository. First, you need to get the information about that branch by running
git fetch
You can then switch to the bluebird
branch by running
git checkout bluebird
in your compiler directory. This causes Git to change all of your files to the most recent commit on the bluebird
branch; this branch exists because your instructor created it for you. Of course, bluebird
currently contains only the starter files; it doesn’t contain any of the work you did to complete the Auklet compiler. To bring that work into the bluebird
branch, we can now run
git merge auklet
This will combine all of the changes in auklet
(the work you did on the last assignment) with the starter code in bluebird
. Once you’re finished, you’ll be on the bluebird
branch and you’ll have all of your Auklet code. At this point, you should be able to make clean
, make tests
, and run ./tests.byte
without seeing any difference in how Auklet worked. (You will, however, see some compiler warnings.)
This merge won’t change a lot. Most notably, it updates src/language/asts.ml
file now contains the Bluebird AST constructors and the corresponding parser reads Bluebird syntax.
The Bluebird Language
As in the previous lab, we introduce the language you will compile in three parts: the concrete syntax, the abstract syntax, and the semantics. The Bluebird language extends Auklet to include let
bindings and ifnz
expressions with similar syntax to OCaml.
Concrete Syntax
The concrete syntax of Bluebird is:
<expression> ::=
| <integer>
| after(<expression>)
| before(<expression>)
| <expression> + <expression>
| <expression> - <expression>
| <expression> * <expression>
| <identifier>
| let <identifier> = <expression> in <expression>
| ifnz <expression> then <expression> else <expression>
| (<expression>)
Syntactic precedence is as in OCaml. To clarify: multiplication has higher precedence than addition and subtraction, which have higher precedence than let
. For instance, the expression ifnz x then 4 else 2 + 6
parses as ifnz x then 4 else (2 + 6)
; the expression (ifnz x then 4 else 2) + 6
must be written with parentheses to have that meaning.
Abstract Syntax
The abstract syntax of Bluebird is still defined in src/language/asts.ml
. It now includes a few more constructors: one for variables and the other for let
expressions. Consider the following examples:
Concrete syntax | Abstract syntax | Evaluated result |
---|---|---|
let x = 5 in before(x) |
ELet("x",EInt(5),EUnaryOp(OpBefore,EVar("x"))) |
4 |
let x = (let y = 4 in after(y)) in after(x) |
ELet("x",ELet("y",EInt(4),EUnaryOp(OpAfter,EVar("y"))),EUnaryOp(OpAfter,EVar("x"))) |
6 |
let x = 1 in ifnz x then 4 else 8 |
ELet("x",EInt(1),EIfNonZero(EVar("x"),EInt(4),EInt(8))) |
4 |
let x = after(4) in y |
ELet("x",EUnaryOp(OpAfter,EInt(4)),EVar("y")) |
compile error |
In the last example, the variable y
is unbound. In this case, your compiler must throw an exception of type UnboundVariable
, which is an exception type defined in compiler.ml
. See below for more information.
Semantics
Bindings
As in Auklet, every Bluebird expression evaluates to a single number and we will use the driver to print that number. The unary, binary, and constant expressions have the same meaning as before. Let bindings have the same meaning as in OCaml: the first expression is evaluated and bound to a variable; the second expression is then evaluated with that binding.
Bindings are only visible in the latter part of the let
expression in which they are defined. For instance, consider:
let x =
let y = 5 in
y
in
x + y
While this Bluebird code will parse correctly to produce an AST, it is meaningless; we can’t translate it to assembly language. In x + y
, the variable y
is undefined; this is because the let y
expression only binds y
in its own in
expression; it does not bind it elsewhere in the code. As stated above, an UnboundVariable
exception is required here.
Conditionals
Bluebird supports conditions similar to C: for the purpose of an ifnz
expression, any non-zero value is true while zero is false. For instance, the Bluebird expression
ifnz 0 then 1 else 2
evaluates to 2
because the condition was 0
. The expression
ifnz 5 then 4 else 8
evaluates to 4
because 5
is non-zero.
Implementing the Bluebird Compiler
Bindings
In order to implement let
bindings, we’ll need a place to store the contents of variables. We already have the environment
type which stores the locations of temporary data; we can use it to store variables as well. For Auklet, our environment
type was simply an int
giving the address of the next free stack memory location. For variables, we’ll need something a bit more complex (since we need to be able to look variables up again in the future).
The file utils.ml
contains a definition of a module StringMap
. This module defines a dictionary type in which the keys are strings; since variable names are strings, the StringMap
should be appropriate for tracking where each variable wound up on the stack. We can change environment.ml
to something like this:
type environment = int StringMap.t * int;;
let empty_environment = (StringMap.empty, 4);;
The int
still contains the next available stack address like before. The int StringMap.t
type is a dictionary mapping string
keys to int
values; it will map the name of each variable to the location in stack memory at which we have placed it.
You’ll also want to create a couple of functions: one which binds a new variable to the next free location (named e.g. environment_add_var
) and another to get the location where a variable is stored (named e.g. environment_get_var
). You can use these functions in the compile.ml
file to implement the two new forms of expression.
Note again that you are not required to use this environment
type. This type is simply recommended because it carries all of the information you need.
Conditionals
Implementing ifnz
expressions will require that you extend your compiler’s model of assembly language. So far, we have relied upon the add
, sub
, and imul
instructions; we’ll now need some more instructions to support conditionals. Again, the Guide to x86 Assembly from the University of Virginia may be helpful here. You’ll probably want to the following constructors to your instruction
type:
ALabel of string
(to indicate a label such asjump_here:
in the code)ACmp of arg * arg
(for thecmp
instruction to compare two values)AJmp of string
(to jump to a label)AJe of string
(to jump to a label only if the lastcmp
values were equal)AJne of string
(to jump to a label only if the lastcmp
values were not equal)
When you add these instructions, you’ll have to add them to your code_of_instruction
function as well.
Recall that, in x86 assembly, we use cmp
to compare two arguments and then decide what to do about it as a separate instruction. For instance, the code ifnz 6 then 4 else 8
might be written in x86 assembly like so:
mov eax, 6
cmp eax, 0
je label_else
mov eax 4
jmp label_endifnz
label_else:
mov eax 8
label_endifnz:
Here, the je
will jump when the comparison succeeds (meaning that the conditional was 0
), leading 8
to be stored in eax
. If the comparison succeeded (and the conditional was non-zero), the je
instruction would have no effect and the code would keep running, storing 4
in eax
. Then, it would jump to the end of the instructions shown here, which prevents the 8
from overwriting the 4
.
Label Names
Each time you compile a conditional expression, you’ll need to pick unique label names. While this is possible in a pure functional style, it’s quite tedious. Instead, your support code for the Bluebird compiler includes a file freshening.ml
with a function fresh_name
that can provide you with unique names. For instance, you might write
open Freshening;;
...
let name = fresh_name "else" in
...
which might assign the fresh_name
variable the value "else_4"
or "else_200"
or something similar. This name is guaranteed to be distinct from any name you’ve gotten from a previous call to fresh_name
; the string you provide is simply for convenience, as it might help you debug the output.
Testing the Bluebird Compiler
You are not required to write tests for this lab, but you definitely should! You still have tests from your first lab, so you can use those to test the Auklet language features. You’ll want to write some more tests for Bluebird to make sure that things are working correctly. A good way to deal with this is as follows: every time you feel like you’ve finished a new feature, write a test for it. Every time you find a bug, write a test that fails because of the bug and then fix the bug afterward. If you do this, you’ll (1) have a good way of running your code as you fix your bugs and (2) quickly build up a library of tests so you don’t have to worry about breaking things in the future.
Since there’s a new kind of behavior – compilation failure due to unbound variables – you’ll want to write some tests for that. The test_utils.ml
file includes a function test_compile_failure
which takes a filename and expected output and verifies that the compiler fails with the specified output. For instance, suppose the file unbound_x.bird
contained
x
Then the following test should pass because the compiler will fail appropriately:
test_compile_failure "test_code/unbound_x.bird" "Unbound variable x with environment {}"
When you run your unit tests, you’ll also be testing your Auklet features too. This is intentional; it helps to be sure you haven’t broken anything!
Summary
To complete this assignment, you’ll need to
- Update your
environment
type and functionality - Extend your model of x86 assembly language
- Modify your
compile_expression
function to allow variable bindings and conditionals - Make sure your compiler throws an exception when unbound variables are used
Unit tests are not listed on the bullet points above, but you should think of them as a natural part of the process.
Submitting
To submit your lab, just commit and push your work. The most recent pushed commit on the appropriate branch as of the due date will be graded. For more information on when assignments are due, see the Policies page.
After each lab, you will complete a brief questionnaire found here. In most cases, the questionnaire should take less than a minute. This questionnaire is required and will be used as part of your participation grade.
If You Have Trouble…
…then please contact your instructor! Piazza is the preferred method, but you can reach out via e-mail as well. Good luck!