Compilers

Pyrrhuloxia

pyrrhuloxia
Pyrrhuloxia

Due on Friday, April 28th at 11:59 PM. This is a compiler lab. If you have a partner, the two of you will complete this lab as a team. If you do not have a partner, you are effectively in a team by yourself.

If you are working with a partner, please be familiar with the Partner Etiquette guidelines. You and your partner share a single repository and, barring unusual circumstances, will receive the same grade. If you experience any difficulties in your partnership, please alert your instructor as soon as possible.

If you are working alone, please don’t hesitate to seek help if you get stuck. Since there are no ninjas for this course and you don’t have a partner, your instructor is the only interactive resource available to you and is happy to help you through any problems you might encounter.

In either case, be familiar with the academic integrity policy! You are permitted to discuss high-level details of your compiler project with other students, but you should never see or share anything that will be turned in for a grade. If you need help, please post on the course forum or, if necessary, contact the instructor directly. Make sure to post privately if you are sharing code. If you have any doubts about what is okay and what is not, it’s much safer to ask than to risk violating the Academic Integrity Policy.

Overview

In each of the previous labs, the parser – the tool which translates text files to ASTs – has been provided for you. In this lab, you will develop a compiler called Pyrrhuloxia. The Pyrrhuloxia compiler is functionally identical to a Cardinal compiler: it has the same behavior and builds the same programs. The only difference is that Pyrrhuloxia will use a parser you have implemented rather than the one provided with the starter code.

Migrating to Pyrrhuloxia

As usual, you will be switching to a branch named after the compiler and then merging your previous work on your compiler project. However, the compiler you merge into this branch is your choice. Every compiler following Cardinal supports the behavior of Cardinal; that is, every program that the Cardinal compiler can compile can also be compiled in any of our later compilers.

So you may pick any of those branches – cardinal, hoopoe, or anything in between – and merge it into pyrrhuloxia. You’ll likely want to pick a branch which you expect you can test efficiently. You should pick the latest such branch; only pick cardinal if you don’t believe that testing a later lab will work well. Here’s an example of a transition from Hoopoe to Pyrrhuloxia:

git fetch
git switch pyrrhuloxia
git merge hoopoe

This merge will introduce some new files in src/language/:

By default, these files do not have any effect on your compiler; they are not referenced anywhere. After the merge, you should be able to make your compiler and run your unit tests. These tests are running using the old parser.

Using the New Parser Code

To develop and test your own parser, you’ll need to modify the compiler to use it. The relevant code is in src/builder.ml in the build function definition. Find the lines

let ast =
  try
    ParserTool.parse filename contents
  with
  | ParserTool.ParseFailure msg -> raise (BuildFailure msg)
in

and change them to

let ast =
  try
    LLParserTool.parse filename contents
  with
  | LLParserTool.ParseFailure msg -> raise (BuildFailure msg)
in

replacing the occurrences of ParserTool with LLParserTool.

If you merged cardinal into your pyrrhuloxia branch (as opposed to a later lab), you’ll also need to change LLParser.ml so that the parse function at the bottom returns an expr rather than a program. You may also need to delete the original parser code (the lexer.mll, parser.mly, and parserTool.ml files) from your source tree.

Once you have finished, your compiler uses the new parser modules… which don’t work yet. As a result, most of your tests will all fail and every file you attempt to compile raises a parse error. This is good! You’re ready to begin developing your Pyrrhuloxia parser.

The Starter Code

The starter code for this lab provides you with a working parser for the simple grammar described below:

<expression> ::=
  | <integer>
  | (<expression>)
  | <expression> + <expression>

Note, however, that this grammar is left-recursive: the production <expression> + <expression> has, as its leftmost term, the same rule as the rule it is defining. The starter code defines a top-down parser, which cannot handle left-recursive grammars, so it instead parses this equivalent grammar:

<expression> ::=
  | <additiveExpression>

<additiveExpression> ::=
  | <primaryExpression> + <primaryExpression> ...
  | <primaryExpression>

<primaryExpression> ::=
  | <integer>
  | (<expression>)

The latter grammar is produced by removing left recursion from the former. This is always possible with context-free grammars, which are the sort of grammars we have been using in this class. Note that the ellipses (...) in the above grammar indicate a sequence of similar operations; 4 + 5 + (6) is a valid <additiveExpression>.

Understanding the Starter Code

At this point, you should read over the code in LLLexer.ml, LLParser.ml, and LLParserTool.ml. (You will, after all, have to modify this code pretty heavily to complete the lab.) Let’s begin by considering LLLexer.ml.

LLLexer.ml

The lex function in LLLexer.ml is the entry point for the lexer: it takes a string and transforms it into a token list. If lexing fails, a LexerError is raised instead. (The LexerError exception and the token type are both defined at the top of the file.) Other than these definitions, the file contains:

Note that this lexer, just like most of the rest of our code, still does not use mutable variables! The input is represented as a char list and we can use List library functions to operate on it. Implementation tips are provided later in this write-up.

LLParser.ml

The parse function at the end of LLParser.ml is the entry point for our parser. Overall, the parser works much like the lexer does: it defines a series of mini-parsers (such as parse_int) which know how to recognize exactly one kind of AST and then combines them together using a function named parse_first_of. Each function takes a token list and either returns an expr (if it can recognize the expression in the input tokens) or throws a ParserError (if it doesn’t know how to handle the input). The parse_first_of function will try each of a series of smaller parsers in turn, keeping the result of the first one which succeeds.

One key difference in the parser is that all of these parsing functions are defined mutually recursively. Examining the grammar above, we see that the <expression> grammar is defined in terms of <additiveExpression>, which is defined in terms of <primaryExpression>, which is defined in terms of <expression>. Since the grammar is recursive in this way, our top-down parser will be as well. We use OCaml’s let rec ... and ... syntax to handle this. For instance, consider the following:

let rec f n =
  if n = 0 then [] else n :: g (n-1)

and g n =
  if n = 0 then [] else f (n - 1)

;;

If we call f 8 using the definitions above, we will get the list [8;6;4;2]. The f function calls the g function with odd numbers while g calls f with even numbers; f records the numbers it sees in its output while g does not. The body of f refers to g which, without the and keyword, would be unacceptable (since g appears after f). The and keyword allows both f and g to be defined simultaneously while the rec keyword allows them to refer to each other (and themselves).

The code in parse_additive_expr deserves some special attention. This code parses a single <primaryExpression> and then attempts to process as many sequences of + <primaryExpression> as it can. It then uses that sequence to build up a tree of binary additions (via fold_left). This sort of repetition is likely the most complex work you’ll do in this assignment, so make sure to understand this pattern before making significant changes to the code.

The Pyrrhuloxia Parser

In this assignment, we are writing a top-down parser to be compatible with Cardinal. Let’s take a moment to recall the Cardinal grammar. In comparison to more recent labs, Cardinal has no functions, declarations, tuples, or calls. The full Cardinal grammar appears below:

<expression> ::=
  | true
  | false
  | <integer>
  | after(<expression>)
  | before(<expression>)
  | print(<expression>)
  | isbool(<expression>)
  | isint(<expression>)
  | <expression> + <expression>
  | <expression> - <expression>
  | <expression> * <expression>
  | <expression> < <expression>
  | <expression> > <expression>
  | <expression> = <expression>
  | <expression> && <expression>
  | <expression> || <expression>
  | <identifier>
  | let <identifier> = <expression> in <expression>
  | if <expression> then <expression> else <expression>
  | (<expression>)

As mentioned above, we will not be able to work with a left recursive grammar using a top-down approach. Instead, we will write a parser for the following (almost equivalent) grammar:

<expression> ::=
  | let <identifier> = <expression> in <expression>
  | if <expression> then <expression> else <expression>
  | <orExpression>

<orExpression> ::=
  | <andExpression> || <andExpression> ...
  | <andExpression>

<andExpression> ::=
  | <comparisonExpression> && <comparisonExpression> ...
  | <comparisonExpression>

<comparisonExpression> ::=
  | <additiveExpression> <comparisonOperator> <additiveExpression> ...
  | <additiveExpression>

<comparisonOperator> ::=
  | <
  | >
  | =

<additiveExpression> ::=
  | <multiplicativeExpression> <additiveOperator> <multiplicativeExpression> ...
  | <multiplicativeExpression>

<additiveOperator> ::=
  | +
  | -

<multiplicativeExpression> ::=
  | <primaryExpression> * <primaryExpression> ...
  | <primaryExpression>

<primaryExpression> ::=
  | true
  | false
  | <integer>
  | - <integer>
  | <identifier>
  | after(<expression>)
  | before(<expression>)
  | print(<expression>)
  | isint(<expression>)
  | isbool(<expression>)
  | (<expression>)

The above grammar accepts the same programs but has been adjusted to avoid left recursion. Again: the ellipses (...) in the above grammar indicate a sequence of similar operations.

Pyrrhuloxia Tokens

In Pyrrhuloxia, we will adopt the same tokenization standards as Cardinal.

Implementing Your Parser

The provided starter code is yours to modify as you see fit; the layout and organization of your parser is up to you. You can use whatever approach or design you like as long as your submission is your own code. You are not permitted to use a parser generator to complete this assignment (although you are permitted to write your own parser combinators if you like).

Much of your parser code is likely to be quite similar to the starter code. It might take a little work for you to develop a comfortable pattern while writing your parser, but it’s also important to use your time effectively. To avoid time-costly trial and error or reinvention of the metaphorical wheel, here are a few tips:

Testing Your Parser

Because Pyrrhuloxia is syntax compatible with Cardinal and Cardinal is a sublanguage of every more recent lab, you can use your Cardinal-compatible unit test code to test your new parser. To write parser tests, however, you don’t need to run the programs through the entire compiler pipeline via the helper functions in testUtils.ml (such as test_success). Instead, we can just test the parser portion of the lab.

You can write a function in testUtils.ml which expects the parser to produce a particular result for a particular file. To do so, you’ll want to make use of a few OCaml routines:

Summary

To complete this lab, you will

Once you’ve done this, you’ve written a compiler from beginning to end! Now that you’re more familiar with OCaml, you might be interested to glance over builder.ml and the other parts of the compiler you haven’t changed much. You’ll see that they contain code to open files and run subprograms (like nasm). These commands are basically identical to those from the Hatchling lab; there’s nothing magical going on there.

Congratulations on developing a complete compiler!

Submitting

It is not enough just to push your code. Due dates are flexible, so it is necessary for you to take action to make it clear which commit you would like to have graded. We will use Git tags for this. A tag is a way of giving a human-readable name to a particular commit. Unlike a branch, however, tags are expected not to change once they are created.

When you are finished working on this compiler assignment, commit and push your work. Then, once you are sure that there are no additional changes you need to make, run the following commands:

$ git tag pyrrhuloxia-submission
$ git push --tags

This will create a tag named pyrrhuloxia-submission and push that tag to the Swarthmore GitHub Enterprise server. Your work on that tag will be graded.

In addition to pushing and tagging your work, you will need to fill out a brief questionnaire found here. In most cases, the questionnaire should take less than a minute. This questionnaire is required and will be used as part of your participation grade.

If You Have Trouble…

…then please contact your instructor! The course forum is the preferred method, but you can reach out via e-mail as well. Good luck!