Cardinal
Due on Thursday, February 24th at 11:59 PM. This is a compiler lab. If you have a partner, the two of you will complete this lab as a team. If you do not have a partner, you are effectively in a team by yourself.
If you are working with a partner, please be familiar with the Partner Etiquette guidelines. You and your partner share a single repository and, barring unusual circumstances, will receive the same grade. If you experience any difficulties in your partnership, please alert your instructor as soon as possible.
If you are working alone, please don’t hesitate to seek help if you get stuck. Since there are no ninjas for this course and you don’t have a partner, your instructor is the only interactive resource available to you and is happy to help you through any problems you might encounter.
In either case, be familiar with the academic integrity policy! You are permitted to discuss high-level details of your compiler project with other students, but you should never see or share anything that will be turned in for a grade. If you need help, please post on the course forum or, if necessary, contact the instructor directly. Make sure to post privately if you are sharing code. If you have any doubts about what is okay and what is not, it’s much safer to ask than to risk violating the Academic Integrity Policy.
Overview
This lab will extend the Bluebird compiler to new language called Cardinal. Cardinal includes a printing primitive as well as runtime type checks to identify when a user has misused a value. Unlike Bluebird (whose behavior is unspecified when the user writes e.g. true + true
), Cardinal will halt the program with a specified runtime error if the user’s program uses a value incorrectly. Just as with your previous assignment, this assignment uses the same compiler repository; we will continue this throughout the rest of the course.
This page will:
- Discuss the syntax and semantics of Cardinal
- Describe the obligations your code will have in calling and being called by C functions
- Give a reference for some relevant assembly instructions
- Suggest a strategy for approaching the assignment
Migrating to Cardinal
As with the previous lab, the starter files for Cardinal have been stored in a branch in your Git repository. To start work on Cardinal, you must first move to the Cardinal branch with
git fetch
git checkout cardinal
This will move you to a version of the compiler which contains Cardinal starter files and nothing else. Next, run
git merge bluebird
to merge your previous work – the Bluebird compiler – into this starter code.
Once you’ve done this, you’ll find some changes to your repository. The src/language
directory has changed again; the ASTs and parser here have been updated to include new Cardinal language features. There is also a new file resources/error.h
and resources/error.c
which we will use for error handling in Cardinal programs.
Make sure to run make clean
and make
to verify that your merge was successful. Upon doing this, you’ll probably get a lot of OCaml compiler warning messages – these are important and we’ll deal with them later – but your compiler should still be able to compile and run Bluebird programs. Cardinal is an extension of the Bluebird language, so everything that worked in Bluebird should still work now. You can verify this by running your unit tests.
The Cardinal Language
As usual, we begin with the concrete syntax, the abstract syntax, and the semantics of the Cardinal language. There is only one syntactic difference between Bluebird and Cardinal: the print
unary operator, which will display a value.
Concrete Syntax
For completeness, here is the concrete syntax of Cardinal:
<expression> ::=
| true
| false
| <integer>
| after(<expression>)
| before(<expression>)
| print(<expression>)
| isbool(<expression>)
| isint(<expression>)
| <expression> + <expression>
| <expression> - <expression>
| <expression> * <expression>
| <expression> < <expression>
| <expression> > <expression>
| <expression> = <expression>
| <expression> && <expression>
| <expression> || <expression>
| <identifier>
| let <identifier> = <expression> in <expression>
| if <expression> then <expression> else <expression>
| (<expression>)
Abstract Syntax
As with previous languages, the file src/language/asts.ml
contains an abstract syntax for the concrete syntax given above. The only difference corresponds to the concrete syntax, of course: the unary_operator
type has a new constructor OpPrint
.
Semantics
There are two kinds of semantic difference that we will observe in Cardinal. First: we have a new unary operator print
. It is somewhat different from other unary operators because it has the side-effect of printing, which we must define precisely. Second: we will implement runtime error checking to prevent Cardinal programs from executing nonsensical code. The Bluebird program true + true
has no meaning, but our compilation strategy causes its compiled form to produce -1
. The Cardinal program true + true
will stop with an error message.
Printing
The new print
operator in Cardinal’s syntax prints out a human-readable text representation of its argument and then returns that same value. We will use the printValue
C function in printer.c
to display the value so we can compile the program without worrying about system-specific details of interacting with the user.
Note that this printing occurs independently from the printing of the final value once the program is finished. For instance, the code
let x = 4 in
let y = print(x) in
print(y > 6)
will print
4
false
false
The 4
is printed by the expression print(x)
; this expression also evaluates to 4
, so y
is bound to 4
. This means that print(y > 6)
will print false
and also evaluate to false
. Since the overall expression evaluates to false
, this is printed when the program terminates.
Errors
The semantics of Cardinal also include error conditions, when we will terminate the program at runtime without producing a value. When an error arises, we will print a message and terminate the program with a non-zero exit code. (The implementation of this is discussed below.) The following conditions will produce an error:
- The arithmetic unary operators
after
andbefore
should require that their arguments are integers. If this is not the case, we terminate with exit code1
. - The arithmetic binary operators
+
,-
,*
,<
, and>
should require that both of their arguments are integers. If either argument is not, we terminate with exit code1
. - The logical binary operators
&&
and||
should require that both of their arguments are booleans. If either argument is not, we terminate with exit code2
. - The condition in an
if
-then
-else
must be a boolean. If it is not, we terminate with exit code2
.
In general, we assign meanings to the exit codes of Cardinal programs:
0
indicates that the program exited normally.1
indicates that an integer was expected but some other type was found.2
indicates that a boolean was expected but some other type was found.
When error checking a binary operator, you may either (1) check each operand after you compute it or (2) check both operands after computing both operands. Either is acceptable behavior.
In a fashion similar to the above, we will handle errors using the error.c
resource file. This file defines a C function which will print a message and immediately terminate the process with the provided error code.
As an example, the code
let x = 4 in
if x - 4 then 5 else 0
will terminate with exit code 2
because x - 4
evaluates to an integer and not a boolean.
Interacting with the World
The features of Cardinal allow the programmer to interact with the rest of the system in a more meaningful way. To do this, you’ll need to know the C function calling conventions so your code can call (and be called by) C correctly. You’ll also need to use some new assembly instructions.
C Calling Conventions
Code that either calls or is called by a C function must conform to a set of conventions in order to be linked properly. Each call has a caller (the code making the function call) and a callee (the code of the function being called). The particular conventions depend upon the operating system and architecture of the system. For x86-64 POSIX systems like Linux and MacOS, the following conventions apply:
- At start of call, the caller must:
- Push caller-saved registers onto the stack. (In Cardinal, we don’t necessarily need to save our registers.)
- Assign arguments to registers and memory.
- The first argument is stored in
rdi
- The second argument is stored in
rsi
- The third argument is stored in
rdx
- The fourth argument is stored in
rcx
- The fifth argument is stored in
r8
- The sixth argument is stored in
r9
- Any additional arguments are pushed onto the stack in reverse (with the seventh argument on top of the stack at the lowest memory address)
- The first argument is stored in
- Ensure the stack is 16-byte aligned (by padding if necessary). (†)
- Use
call
instruction to start the function.
- When the call begins, the callee must
- Push
rbp
onto the stack (to save it). - Copy
rsp
intorbp
. (‡) - Adjust
rsp
to make room for local variables. (This means that environment offsets should be computed fromrbp
now.) - Push any callee-saved registers that we might change. (In Cardinal, we aren’t changing any callee-saved registers.)
- Push
- When call is finished, callee must
- Leave return value in
rax
. - Restore callee-saved registers by popping them. (In Cardinal, we do nothing here.)
- Restore old top of stack:
mov rsp, rbp
(‡) - Restore old base pointer:
pop rbp
- Execute the
ret
instruction (to pop return address into instruction register)
- Leave return value in
- At end of call, caller must
- Remove padding and arguments from the stack
- Restore caller-saved registers
†Although the x86-64 POSIX conventions require 16-byte stack alignment, you can feel free to skip this step in your compiler. This alignment requirement is only necessary for vector instructions and similar operations that we are not using and, since the current implementation of printf
doesn’t appear to use it either, we can be successfully non-conformant for now.
‡Technically, the x86-64 POSIX conventions allow us to skip using rbp
entirely so long as we preserve it. But that’s a tricky, error-prone optimization, so we’re going to use rbp
as described above.
Allocating Enough Stack Space
So far, we’ve been intentionally sloppy about how we access stack memory: we’ve just been using memory beyond the location stored in rsp
. In the C calling convention, however, we are expected to move rsp
to make room for our local variables. In particular, all of our local variables should live between rbp
and rsp
.
In observing this calling convention, you can just change all of your rsp
offsets to use rbp
instead. Then, to make sure that functions that you call do not use this memory, you must adjust rsp
to account for all of your local variables (including temporary variables). Probably the easiest way to do this is to examine your assembly instructions for the expression after you have produced them. For instance, consider this hypothetical compilation of the expression 4 - let x = 2 in after(x)
:
mov rax, 8 ; a Cardinal 4 for the left subtraction operand
mov [rbp-8], rax ; store that 1 in a temp location
mov rax, 4 ; a Cardinal 2 for the value of x
mov [rbp-16], rax ; store the value of x
mov rax, [rbp-16] ; retrieve the value of x
sub rax, 2 ; perform the after operation
mov [rbp-16], rax ; store the result of after in a temp location
; note that x is out of scope now, so we reuse its memory
mov rax, [rbp-8] ; retrieve the first temp value
sub rax, [rbp-16] ; subtract the second temp value
For such a short code block, this unoptimized assembly accesses the stack in a lot of subtle ways. A quick skim, however, reveals that the lowest stack memory address that was accessed is [rbp-16]
. In order to determine the correct amount by which to move rsp
, it is enough to skim through this assembly code and determine the smallest offset to rbp
used throughout. This is only true because we always offset rbp
with constants! A more complex compiler might need to use a different strategy to compute the size of a stack frame but, for our purposes, this will suffice.
So for our purposes, you could write a function (e.g. stack_memory_of_instruction_list
) that would, given an instruction list
, produce an int
describing how much stack memory it used. You’ll almost certainly want some helper functions like e.g. stack_memory_of_instruction
, stack_memory_of_argument
, and so on; it’s up to you how to organize this algorithm. Once you have written a function to determine how much memory your assembly code uses, call that function from compile_program
and use the number it returns to set up an appropriate stack frame.
New Assembly Constructors
Here are some constructors you’ll probably want to include in your assemblyLanguage.ml
file.
-
AsmPush of arg
andAsmPop of arg
The
push
andpop
assembly instructions will copy values to and from stack memory and adjustrsp
accordingly. For instance,push rax
pushes the value ofrax
onto the stack and then movesrsp
by eight bytes. -
AsmCall of string
The
call
instruction is used to invoke a subroutine. It pushes the address of the next instruction onto the stack and then jumps to the provided label (as incall printValue
. Theret
instruction does the opposite of this, popping the instruction address and jumping to it.
Implementing the Cardinal Compiler
Here’s a strategy you can use to transition from Bluebird to Cardinal incrementally, which will hopefully make the process easier.
-
As stated above, perform your
git checkout
andgit merge
operations. Then, immediatelymake clean
,make tests
, and run./tests
. You’ll get some compiler warnings, but your tests should run just fine (as long as they did when you finished Bluebird!). -
Add the new registers to
assemblyLanguage.ml
:rdi
,rsi
,rdx
,rcx
,r8
,r9
, andrbp
. Change your compiler so that memory locations are offset fromrbp
instead ofrsp
. Updatecompile_expression
to include the number of bytes of stack memory used by the instructions it’s returning. Then, write code to make yourbird_main
routine observe the C calling conventions. Currently,compile_program
usescompile_expression
to compile your code and then just adds aret
instruction to the end. You’ll need to changecompile_program
so that it adds the appropriate callee behavior instead of just oneret
instruction. Again: run your unit tests after this step to make sure everything works. -
Write code for
print
expressions. This requires your code to observe the caller behavior of the C calling conventions. You’ll need to add the lineextern printValue
to the preamble of assembly generated incompile_to_assembly_code
so that you can use theprintValue
label even though it’s defined outside of your assembly code (byprinter.c
). Make sure to test with programs that print before, within, and after other expressions to look for stack memory bugs. -
Now add error checking to your compiler. For each operation that requires a certain type of operand (like
after
or+
), add code to check the operands before performing the operation. If the program detects an incorrect operand, it should callstopWithError
. As above, you’ll need to addextern stopWithError
to your assembly preamble to be able to call this external function; you’ll also need to adderror.c
to the list of linked files inbuilder.ml
.You’re likely going to need to add error checking to many places in your compiler: each operand of each binary operator, for instance, should be checked. Rather than writing every checking instruction in each place, you are encouraged to generalize with an OCaml helper function. Remember that your compiler is a program that is producing lists of instructions. You could easily write a function like
check_rax_int
that generates instructions to (1) check the tag bit ofrax
, (2) callstopWithError
ifrax
does not contain anint
, and (3) ensure thatrax
still contains the same value if it does contain anint
. You could then call this helper function everywhere you need some instructions to checkrax
.It’s likely that you’ll need another helper register to do this checking; after all, you’ll need somewhere to hold onto a real copy of the value you are checking while you use another register to manipulate it (and we still need
r10
to hold 64-bit values). Feel free to user11
as a second temporary register for this purpose.Make sure to write tests to confirm the behavior of runtime errors as well. Note that not all of the error messages and return codes have been written into
error.c
; you’ll need to add a couple.
That should cover all of the behaviors of Cardinal. Make sure you test each of the steps before moving on to the next one. Success in writing software often depends upon breaking the task into small, approachable pieces and adding bit by bit to your application.
Testing the Cardinal Compiler: Failing On Purpose
The testing function test_runtime_failure
in testUtils.ml
makes it easy to create an OUnit test that compiles and runs a given source file and expects it to fail with a particular exit code. You can use this function to test intentionally incorrect code to make sure your compiled Cardinal programs handle these errors correctly. Your compiler must compile programs that behave correctly in good situations and which stop with an error in bad situations.
Summary
To complete this assignment, you’ll need to
- Update your compiler to use C calling conventions
- Extend your compiler to allow printing
- Add error-checking code to your compiled programs to gracefully handle runtime type errors
Submitting
It is not enough just to push your code. Due dates are flexible, so it is necessary for you to take action to make it clear which commit you would like to have graded. We will use Git tags for this. A tag is a way of giving a human-readable name to a particular commit. Unlike a branch, however, tags are expected not to change once they are created.
When you are finished working on this compiler assignment, commit and push your work. Then, once you are sure that there are no additional changes you need to make, run the following commands:
$ git tag cardinal-submission
$ git push --tags
This will create a tag named cardinal-submission
and push that tag to the Swarthmore GitHub Enterprise server. Your work on that tag will be graded.
In addition to pushing and tagging your work, you will need to fill out a brief questionnaire found here. In most cases, the questionnaire should take less than a minute. This questionnaire is required and will be used as part of your participation grade.
If You Have Trouble…
…then please contact your instructor! The course forum is the preferred method, but you can reach out via e-mail as well. Good luck!