Part 3: Code Generation of basic functionality (10% of your grade)
Due: Thursday April 14, BEFORE 2:00 am (very late Wednesday night)
For the checkpoint you will implement the "Basic Functionality" part
of code generation (described below).
Part 4: Complete C-- Compiler, Code, Project Report, and Demo (90%)
Due: Thursday April 28, BEFORE 2:00 am
You may use only 1 late day on this assignment; I will not
accept assignments after 2:00 am Tuesday May 3.
Demo: sign-up for a 20 minute demo slot after you submit your code and project report
Your compiler should take two command line arguments, the first is the C-- code source file and the second is the name of the stack machine output file:
% ./mycc foo.c-- foo.stk
Once you successfully complete the basic C-- compiler, you can try adding additional features for extra credit points. Do not add additional features before your initial solution is complete and correct; an incomplete implementation of the required functionality plus an added feature will result in a lower grade than a complete implementation of the required functionality without any additional feature added.
Here are some additions you might want to consider (please talk to me about any additional ideas you have for extra features before attempting them):
int foo(int *x) { // x declared as a pointer to int *x = 6; // dereference the pointer to modify the arg's value return (*x + 10); } int main() { int x; write foo(&x); // pass x by reference to foo }
a = b = c = d; // b, c, and d are both l-values and r-values x = (y = 6) + 8; // (y = 6) assigns 6 to l-value y, then use // the r-value of y to add to 8
If you choose this option, you should submit two versions of your solution, one that generates smach code, the other that generates MIPS assembly code (and don't attempt this until after you get your smach solution working).
~newhall/public/cs75/smach/
I recommend that you use some type of revision control software (CVS would be the best choice if you are working with a partner) to keep versions of your working compiler as you add functionality. Information about using CVS and RCS is available off the cs75 homepage.
You should create a test suite of C-- programs to test your compiler (when you write a new one, first try parsing it with your parser to make sure it is valid C--). Think carefully about programs you add to this suite to make sure they completely test all aspects of your code generator. As you add features to your compiler, re-test your compiler on your test suite to make sure you didn't break something that you previously had working.
Now you are ready to actually generate some code.
INC 0 3
and your last line will always be
RTN 0 0
.
int main() { write 3+4*7; // should be 31 writeln; }
int main() { write (4 == 5) || (5 < 6); // should be 1 for true write (4 == 5) || (5 < 4); // should be 0 for false write (4 == 5) && (5 < 6); // should be 0 for false write (7 >= 7) && (8 <= 8); // should be 1 for true writeln; }
int main() { if (!1) // should output -1 write 1; else write -1; writeln; if (1 == 1) // should output 1 write 1; else write -1; writeln; }
int main() { while (1) // this will create an infinite loop write 1; } int main() { write 1; writeln; while (0) { // this loop should never get executed write 2; writeln; } write 3; writeln; }
As your compiler finds parameters or local or temporary variable declarations inside a function or a block, it should add a new entry to the symbol table for these variables. When it finishes compiling a block or function, entries defined within the block or function should be removed. Your code must detect the error condition of duplicate definitions of a name within the same scope.
You will likely need to add parameters to your parser functions to pass extra information necessary for code generation between the functions. In addition, you may discover that you need to re-write some of your grammar rules to better facilitate code generation.
The stack machine's stack is implemented as an array. As you push
a value on the stack, the top-of-stack value is a higher array index than the
previous top-of-stack value. You will access most stack contents as offsets
from BASEPTR. The locations of local variables will be a positive offset
from the current stack frame baseptr. The location of
parameters and the return value is at a negative offset from the current
stack frame baseptr. The locations of global variables will be below main's
stack frame. The locations of functions will be a line number in the
code table. For example, if main
calls function int blah()
and blah
calls
int foo(int x, int y)
, then the stack will look like this after
the call sequence instructions have executed:
----------------------- PC: addr of a foo instr. frame 0 | blah's PC value | <---- TOP | blah's baseptr | (foo's stack frame) | static link | <---- BASEPTR ----------------------- frame 1 | param y | | param x | (blah's stack frame) | foo's return value | | ... | | main's PC value | | main's baseptr | | static link | <---- blah's baseptr value pts here ----------------------- frame 2 | blah's return value | | ... | (main's stack frame) | 0 | | 0 | | 0 | <---- main's baseptr value pts here ======================= level -1 | global vars | ----------------------- NOTE: when function foo returns, blah's frame becomes frame 0, and main's frame is frame 1; frame 0 is always the top-most stack frame
To get and store the values of global variables do the following using absolute addressing:
PSH -1 posoffset STO -1 posoffsetAnd to get and store the values of local variables in the current stack frame do the following using relative addressing:
PSH 0 posoffset STO 0 posoffset
Whenever you are uncertain of the effects of a stack operation, go to the smach.c code and find out what it does.
Your compiler needs to be able to determine when to use an lvalue (the storage location) as opposed to an rvalue (the actual value). Obviously, it depends on whether or not we have an assignment statement.
One problem you need to solve is that in the LL(1) grammar, by the time you find the assignment token you have lost the information about the preceding identifier. One way to solve this problem is to pass to your expression parsing functions a value that will be updated by the part that parses an identifier and checked by the function that handles assignment statements. It is important that this information be stored locally, because the following kind of expression is allowed in C--:
a = b = c = 0;This is very tricky to handle properly, because here, each variable is being used as BOTH and lvalue AND and rvalue. Don't worry if you aren't able to implement this kind of assignment expression for the final version. However, if you don't handle it, your compiler should flag it as an error.
You should be able to detect invalid lvalues in assignment statements and correctly make a distinction between identifiers used as lvalues or rvalues. You must handle expressions like:
x = x + z + 6; array[i] = array[j] + foo(array, x, (y+7)); 6 + x = 10; // your compiler should detect this error: invalid lvalueIn addition, any time a name is used in an expression, you must ensure that it corresponds to a function or identifier that has been declared prior to its use.
Eventually the code you generate should begin as follows:
1 GLO 0 num ; where num is the amount of global memory needed at stack base 2 BRU 0 line# ; where line# is the line number of the start of main ... line# INC 0 3 ; the beginning of main ...
In 1 to 2 pages give an overview of the design of your compiler based on the approach to compiler construction we've adopted in this course. You should describe any special features of your compiler. If there are any aspects that have not been fully implemented or are buggy, you should discuss them here. And, include a copy of your LL(1) grammar.