Checkpoint (10%): Due Thursday March 17 at the beginning of class
(you may not use a late day on the checkpoint)
Complete Project (90%): Due Tuesday March 29, BEFORE 9:30 am
Your project report may be submitted on-line as part of the tar file in
ASCII, pdf, or postscript format, or may be submitted in class on the same
day as you submit your tar file.
/home/newhall/public/cs75/grammar_latex/
a[i]; // a[i] is not an lvalue here a[i] = x; // a[i] is an lvalue here x = 8 + (y = 6); // y is an lvalue here and (y = 6) is an expression with a value of 6but these are invalid C-- expressions:
a[i] + b[i] = x; // (a[i] + b[i]) is not a valid lvalue x = 8 + y = 6; // (8 + y) is not a valid lvalue // ( + has higher precedence than = )Correctly handing these cases requires more than one token look-ahead or one or more token look-behind, neither which is allowed in a predictive parser. For now, the solution is to go ahead and translate the CFG to an almost equivalent LL(1) grammar. The resulting LL(1) grammar will be equivalent to the CFG for C-- with the exception that it will incorrectly accept strings like
a[i] + b[j] = expr;
.
Later, when you implement the code generator part of the project, you
will take care of ensuring that an expression appearing on the left-hand-side
of an assignment statement is a valid lvalue; at the code generation
phase, your compiler will reject strings like "a[i] + b[j] = expr;
"
that your parser accepts.
The precedence of C-- operators from lowest to highest is (your LL(1) grammar must handle precedence):
Converting the CFG for C-- to an LL(1) grammar will be the most difficult part of the assignment. I strongly suggest first testing out your LL(1) grammar by hand to verify that its rules can correctly parse some very small C-- programs. Trying some very small examples with with zero, one, and two global variable declarations and one, and two very small functions. For example:
int x; char y; int main(int x, char y) { x = y; }In addition, I encourage you to compute FIRST and FOLLOW sets for all or part of your LL(1) grammar to prove that it is LL(1).
Once you have verified that your LL(1) grammar is in fact LL(1), and you have convinced yourself that it still parses C-- (you didn't modify the productions in such a way as to change the language it recognizes), then implement a recursive decent parser from your LL(1) grammar.
You should make sure to test that your parser can correctly parse all valid C-- language constructs and combinations of C-- language constructs before you start adding error recovery code.
Once your parser works, add some simple error recovery to your parser. You should implement simple panic and phrase-level recovery schemes (for a single missing or incorrect token); you do not have to implement complex error recovery, instead find simple cases of recovering from a single missing token during parsing. In addition, you should add error recovery to your lexer in cases where a missing or incorrect character(s) in the input can be easily handled (e.g. a missing '&' in the AND operator). When your compiler detects an error it will print out an error message, then either recover from the error and continue parsing or exit if it cannot recover. For example, after detecting the following error, the parser will print out an error message and continue parsing as if the missing RPAREN was there:
ID LPAREN ID SEMI ^^^^ missing RPAREN in function call
You are welcome to try more complicated error recovery, but first try implementing some relatively easy cases.
int main() { int x; x = 5 - 8 / 4; write x; }The parser should output something like the following:
# # Note: I've removed parts of my output that correspond to non-terminals in my LL(1) # that are not from the original CFG. I'm doing this because I want you to # convert the CFG to LL(1) without trying to see if your LL(1) is the same as # mine based on my sample output. # # In your solution, you should print out EVERY non-terminal in your LL(1) # grammar as it is parsed by your parser. # program MATCH: INT MATCH: ID.main MATCH: LPAREN ParamDeclList # when your parser matches a token you should MATCH: RPAREN # print out: "MATCH: TOKEN.attribute" MATCH: LBRACE # in my grammar I actually have terminals for Block # each keyword (ex. IF, WHILE, ...), however VarDecList # I'm printing them out here as KEYWORD.attr MATCH: INT # you are welcome to do either MATCH: ID.x MATCH: SEMI VarDecList StmtList Stmt Expr MATCH: ID.x MATCH: ASSIGN Expr MATCH: NUM.5 MATCH: MINUS MATCH: NUM.8 MATCH: DIV MATCH: NUM.4 MATCH: SEMI StmtList Stmt MATCH: KEYWORD.write Expr MATCH: ID.x MATCH: SEMI MATCH: RBRACE FuncDecList MATCH: DONE
You do not need to include with your checkpoint a copy of the steps you took to convert G to LL(1). However, if you do, it will be more likely that I will be able to find any errors in your conversion for you (and actually, you will be less likely to make them). I suggest that you compute FIRST and FOLLOW sets for parts of the grammar (like arithmetic expressions, and the rules with lists to see if your grammar is LL(1)).