Lab 06

Due 11:59pm Wednesday, 28 March 2018

The goals of this lab are to:

This assignment consists of a written homework portion and a programming portion. Write your written solution using $\LaTeX$. A python sketch of the programming portion has been provided, but you are free to write a solution in C, C++, Java, or Python if desired. Submit both portions using github. This is a partnered assignment. You may work with one partner and submit a joint assignment. It is OK to discuss approaches at a high level with other students that are not your partner, but you must do your own write-up. Do not share your write up with other groups, and do not read other group's solutions. Please refer to the syllabus or ask me if you have questions about the policy.

Cloning the starting files
cd ~/cs46
git clone git@github.swarthmore.edu:CS46-S18/lab06-<group>.git ./lab06
where <group> is your group name. It is probably best to just log into github and find your repo name there.
  • At this point, you should have the lab starter code and be ready to start the lab.
  • I recommend that you read the "Commands for using a git repo" that lists the commands that you will need to submit your lab assignments. In particular, you should understand how to use "git add", "git commit" and "git push". Additionally, if your are working with a partner, it becomes more important to push more frequently to share changes with your partner. Your partner can then git pull your changes. In the event of merge conflicts, consult the merge conflict guide, ask on Piazza, or stop by my office. The best way to avoid merge conflicts is to work on the writeup together and add,commit,push changes when you take a break.

    Written questions appear in hw06.tex which you can edit to write your solution. You may also review the original pdf template.
    Parsing Context Free Languages
    In the programming portion of this lab, you will implement an algorithm to determine if a given string $w$ is generated by context free grammar $G$ in Chomsky Normal Form. The end of section 2.1 defines Chomsky Normal Form (CNF) and proves that any grammar can be converted into an equivalent CNF grammar. You are not expected to know how this proof works or how to execute it, but it may provide some insight.

    Given a grammar $G$ in Chomsky Normal Form, and an input string $w=w_1 w_2 \cdots w_n$, we can design a polynomial time algorithm to determine if $w \in L(G)$. The basic idea is to compute for all substrings $x=w_i \cdots w_j, j \geq i$ of $w$ if there is some rule in $G$ that generates $x$. Define $V_{i,j}$ to be the set of all variables in $V \in G$ that that can derive substring $x=w_i \cdots w_j$. $G$ can derive $w$ if $V_{1,n}$ contains the start symbol $S \in V$

    Our algorithm for parsing strings computes the sets $V_{i,j}$ for $1 \leq i \leq n, j \geq i$. The algorithm proceeds in a series of rounds, where in each round, it considers substrings of length $s, 1 \leq s \leq n$. In the first round, the algorithm sets $V_{i,i}$ to $\{ A \in V, A \rightarrow w_i$ for some rule $A$ in the grammar $\}$.

    For substrings longer than $1$, the algorithm checks if the substring $x=x_i \cdots x_{i+s-1}$ can be broken into two smaller pieces at some index $k$, $y=x_i \cdots x_k$ and $z=x_{k+1} \cdots x_{i+s-1}$ such that there is a rule $A \rightarrow BC$ in the grammar where $B \in V_{i,k}$ and $C \in V_{k+1,i+s-1}$. If such a rule $A$ exists, we add $A$ to $V_{i,i+s-1}$.

    After finishing all rounds, we just need to check if $S \in V_{1,n}$. A pseudocode summary is below

    for i in 1 to n:
      Add A to V[i,i] if A yields w[i] is a rule
    for s in 1 to n-1:
      for i in 1 to n-s:
        for k in i to i+s-1:
           if A yields BC is a rule
           where B is in V[i,k] and C is in V[k+1,i+s]
           then add A to V[i,i+s]
    return True if S is in V[1,n]
    
    Some example output for a CNF grammar for $a^nb^n$.
    $ cat grammar1.txt
    a b
    
    S: E
    S: A1 B
    S: A B
    A1: A S
    A: a
    B: b
    
    $ python cnf.py grammar1.txt string1.txt
    
    abab: False
    a: False
    aaaabbbb: True
    ab: True
    
    For the balanced parens
    $ cat grammar2.txt
    
    ( )
    
    S: E
    S: A1 B
    S: A B
    S: S S
    A1: A S
    A: (
    B: )
    
    
    $ python cnf.py grammar2.txt string2.txt
    
    (): True
    E: True
    ()(): True
    (()()): True
    )(: False
    (((()): False
    
    

    The grammar format starts with a line containing the symbols of the alphabet separated by spaces, a blank line, and then the grammar rules. The first rule lists the start symbol on the left hand side. The rule format is V: A B to indicate the rule $V \rightarrow AB$. Note A1 is a single variable not two variables. Variables on the right hand side are separated by spaces.

    Python tips

    I would strongly encourage you to use the python template provided to you in cnf.py. The functions main, readGrammar and testGrammar provide the basic framework for parsing the input files and processing the data using the partially implemented Grammar class. You should only need to complete the accepts method to complete the programming portion of this lab.

    self.

    If you are new to python, there may be some weird features to observe. Python uses the variable self to refer to member variables and methods within a class, so you will see it everywhere within a class. If you are familiar with the this keyword in Java or C++, it is a similar idea. If you want to modify or refer to a member variable, just remember to use the self. prefix.
    self.start = "S" #set start variable
    
    result = self.matchedSym("a") # call the matchedSym method with argument 'a'
    # result is saved in local variable result.
    

    0 indexing

    Remember that Python (and C, and C++, and Java) start indexing arrays at 0, so the first character of string w is w[0]. You can use len() to find the length of any sequence (list, string, etc.). Note that in math notation and the psuedocode above, we use 1 indexing, so the first character of string $w$ is $w_1$. Adjust accordingly.

    dictionaries

    I would recommend storing the table element denoted V[i,j] in a python dictionary. Below are some useful snippets that may help you with syntax, semantics
    
    d = {} # create empty dictionary
    
    i=1
    j=2
    
    #store the value 1 in the dictionary with the key (1,2)
    d[(i,j)] = 1
    
    #print the value associated with a key (i,j)
    #if the key does not already exists, it will throw an error
    print(d[(i,j)])
    
    """
    you can store any object as a value. A list may be handy.
    Python dictionary keys must be immutable, so a tuple (i,j) is fine,
    but a list [i,j] is not a valid key.
    """
    

    in

    You can use the keyword in to check if an item is in a list or a key is in a dictionary.
    if "a" in "apple":
      print("Yes!")
    

    range

    You can use the range(i,j) function to generate a list of values between i and j-1 inclusive. This is helpful for python for loops.
    #print 1 through 9 inclusive
    for i in range(1,10):
      print(i)
    
    If i is ommited from the range() function, it is assumed to be 0.

    If you have questions about the existing code structure in cnf.py, please ask in lab or on Piazza. I don't want you to be spending hours deciphering python syntax.

    Readme survey
    Before submitting the lab. Please complete the short survey in the file Readme.md. Your answers to the questions in this document will not impact your grade, but provide valuable feedback on the difficulty of the assignment and common sources of confusion or roadblocks. Your responses will help make the course better.
    Submit

    Once you have edited the files, you should publish your changes using the following steps:

    $ git add hw06.tex Readme.md cnf.py
    $ git commit -m "completed lab06"
    $ git push
    

    If you make changes to files after your push and want to share these changes, repeat the add, commit, push loop again to update the github server.

    To recap, the git commit cycle is git add, git commit, git push. Don't forget to git push when you have completed an assignment.

    You can review the basic git commands on the github help pages. Eventually, you should get in the habit of using git status to see if everything has been published, but we will talk about this more throughout the semester.

    Remember to add, commit, push. You may commit and push as often as you like, and only the most recent push will be graded.