CS46 homework 6

This homework is due 10:00pm Sunday, March 29.

The goals of this homework are to:

gain familiarity with Turing machine terminology and properties.
take an idea from theory to implementation in a high-level language
design an interesting grammar of your own

There is considerable leeway for creativity in the final section of this homework, which asks you to implement a Twitter bot for your grammar.

1. Turing machines

This part is worth 4 points.

The written questions appear in hw6.tex, which you should edit to include your solutions.

2. Parsing grammars

This part is worth 4 points.

We've talked a lot about about automata and computability, without actually using computers to solve any problems. Now that we're talking about computability and the Church-Turing thesis, it seems reasonable to ask how these concepts align with our actual usage of computers as they exist in the real world.

In lecture on Monday, March 23 we discussed (at a high level) how to write a program/Turing machine which checks if a particular DFA (given as input) accepts a particular string (given as input). Can we also do this for context-free grammars?

In this programming portion, you will implement an algorithm to determine if a given string w is generated by context-free grammar G in Chomsky normal form. The end of section 2.1 defines Chomsky normal form (CNF) and proves that any grammar can be converted into an equivalent grammar in CNF. You are not expected to know how this proof works or how to execute it, but rereading Sipser pages 108-111 may provide some insight.

(a) the algorithm

Given a grammar G in Chomsky Normal Form, and an input string w = w₁w₂⋯w_n, we can design a polynomial time algorithm to determine if w ∈ L(G). The basic idea is to compute for all substrings x = w_i⋯w_j, j ≥ i of w if there is some rule in G that generates x. Define V_{i, i + s} to be the set of all variables V ∈ G that that can generate x. G can generate w if V_1, n contains the start symbol S ∈ V.

Our algorithm for parsing strings computes the sets V_i, j for 1 ≤ i ≤ j ≤ n. The algorithm proceeds in a series of rounds, where in each round, it considers substrings of length s, where 1 ≤ s ≤ n.

In the first round, the algorithm sets V_i, i to {A ∈ V | A → w_i is a rule in grammar G}.
For substrings longer than 1, the algorithm checks if the substring x = x_i⋯x_{i + s − 1} can be broken into two smaller pieces at some index k, so we have y = x_i⋯x_k and z = x_k + 1⋯x_{i + s − 1} such that there is a rule A → BC in the grammar where B ∈ V_i, k and C ∈ V_{k + 1, i + s − 1}. If such a rule exists, we add A to V_{i, i + s − 1}.
After finishing all rounds, we just need to check if S ∈ V_1, n.

A pseudocode summary is below. (For those who are interested, this is a rephrasing of the dynamic programming algorithm given on Sipser page 291.)

for i from 1 to n:
    add A to V[i,i] if "A yields w[i]" is a rule
for s from 1 to n-1:
    for i from 1 to n-s:
        for k from i to i+s-1:
            if "A yields BC" is a rule where B is in V[i,k] and C is
            in V[k+1,i+s]
               then add A to V[i,i+s]
if S is in V[1,n]:
   return True
else:
    return False

(b) the task

Write a program that determines if a string w is accepted by a grammar G given in Chomsky normal form. Your program should be written in python in the file cnf.py; I strongly encourage you to use the starter template provided in that file. The functions main, readGrammar, and testGrammar provide the basic framework for parsing the input files and processing the data using the partially-implemented Grammar class. You should only need to complete the accepts method to complete the program.

If you have questions about the existing code structure in cnf.py, please ask on Piazza/Slack/email. Here are some tips for python. I don't want you to be spending hours deciphering python syntax. You are welcome to edit the files locally or to ssh to cs.swarthmore.edu and work on a lab computer remotely.

(c) testing

Two example grammars and string testing files are included in your repository.

The first, grammar1.txt, is a Chomsky normal form grammar generating our favorite {aⁿbⁿ}. It goes with the test string file string1.txt. Sample output below:

$ more grammar1.txt
a b

S: E
S: A1 B
S: A B
A1: A S
A: a
B: b

$ python cnf.py grammar1.txt string1.txt
aabb: True
abab: False
a: False
aaaabbbb: True
ab: True

The second, grammar2.txt is a Chomsky normal form grammar generating expressions of balanced parentheses. Its test string file is string2.txt.

$ more grammar2.txt
( )

S: E
S: A1 B
S: A B
S: S S
A1: A S
A: (
B: )

$ python cnf.py grammar2.txt string2.txt
(): True
E: True
()(): True
(()()): True
)(: False
(((()): False

You should write your own tests; do not rely exclusively on the example grammars.

The grammar format starts with a line containing the symbols of the alphabet separated by spaces, a blank line, and then the grammar rules. The first rule lists the start symbol on the left-hand side. The rule format is V : A B to indicate V → AB. Note that variables are separated by spaces (so A1 is a single variable, not two variables). The empty string ε is represented as E.

3. Context-free Twitter bots

This part is worth 2 points, with another possible 5 points for creativity and flair in execution. (Yes, those are 5 additional real-valued points. The maximum score on this homework is 15/10.)

Devise an interesting context-free language, and write a grammar for that language. You are free to choose any alphabet you like, and your grammar can be as complicated or elegant as you want. You should read through the instructions, then check out the examples. This year, no Sharples menu grammars (we had plenty last year).

In the file my-awesome-language.txt, give a high-level description of the language generated by your grammar, as well as the rules to fully describe your grammar.

Next, you will set up a Twitter bot that periodically generates a word using your grammar and tweets it to the world.

Create a new account on Twitter using [your username]@cs.swarthmore.edu. (This email account forwards to your @swarthmore.edu account).
Follow the @SwarthmoreCS46 bot.
Log in to Cheap Bots, Done Quick!. Set your source code to be visible to the public, and set your bot to tweet at least once a day.
Write your grammar rules using Tracery syntax. Note that the page will let you test your grammar by generating a random string.

Tracery syntax is as follows:

The start variable is always "origin".
Variables of the grammar appear inside double-quotes on the left-hand side of rules, and inside hashtags on the right-hand side of rules.
The → symbol is :.
The right-hand side of a rule is a list, in square brackets, of all possible rules for that variable. Items are separated by commas. Each rule must be a string in double quotes. After the list is over, include a comma. (For example, the rule A → aA|b becomes "A": ["a#A#","b"],.)

If you have questions about the syntax for Tracery, please ask on Piazza/email/in person. I don't want you to be spending hours fiddling with Tracery syntax.

Submission

I recommend frequent add-commit-push cycles as you develop your code.

Once you have finished, please fill out the post-homework survey.