CS 22
Clab 18: Grammars
Our goal will be to have some fun and to develop a data-directed
sentence generating system. Let's say a sentence is an article
followed by a noun phrase followed by a verb phrase. You might
imagine defining a system to generate random sentences as follows:
(define (nth n x)
(if (= n 0)
(car x)
(nth (- n 1) (cdr x))))
(define nounlis '(dog dean professor student class))
(define verblis '(ran ate slept drank talked))
(define adjlis '(red slow dead nice quietly))
(define advlis '(quickly happily well))
(define (sentence)
(append (append '(the) (nounphrase)) (verbphrase)))
(define (pickrand lst)
(nth (random (length lst)) lst))
(define (noun)
(pickrand nounlis))
(define (verb)
(pickrand verblis))
(define (adverb)
(pickrand advlis))
(define (adjective)
(pickrand adjlis))
(define (either a b)
(if (= (random 2) 0)
(a)
(b)))
(define (verbphrase)
(either (lambda () (list (verb)))
(lambda () (list (verb) (adverb)))
)
)
(define (nounphrase)
(either (lambda () (list (noun)))
(lambda () (cons (adjective) (nounphrase)))
)
)
Paste these definitions into a scheme window, save as sentences.scm
and generate a few sentences.
> (sentence)
(the slow professor talked quickly)
> (sentence)
(the red dean drank)
> (sentence)
(the dog drank)
A few of the interesting ones I got
were:
(the dog ran)
(the class drank)
(the nice class drank quickly)
(the professor ran well)
(the slow dog ate happily)
Of course I got some strange ones, too:
(the red dead student ran quietly)
(the slow nice dead dead class ran quietly)
We took a description of a sentence (in English) and wrote procedures
to generate sentences. If the description of sentence changed, we
would have to redefine our procedures. Today we would like to
define procedures that can take a formal description of sentences (a
grammar) as input data and from that description (=data) generate
sentences. Our key goal is this: If the definition of sentence
changes, we will change the description of sentences, BUT we will
NOT have to change our procedures.
We can formally describe the sentences like those above using
the following "context-free grammar":
sentence -> article nounphrase verbphrase
nounphrase -> adj* noun
adj* -> () | adj adj*
verbphrase -> verb adverb
noun -> dog | dean | professor | student | class
adj -> red | slow | dead | nice
verb -> ran | ate | slept | drank | talked
adverb -> quickly | happily | well | quietly
article -> the
Each line is a "rewrite rule". For example, any time verbphrase
appears, a verb followed by an adverb can be substituted. The names
on the left of -> are called nonterminal symbols. The other words (dog
dean professor student class red slow dead nice ran ate slept drank talked quickly
happily well quietly the) are called terminal symbols. Nonterminal symbols can
be rewritten. Terminal symbols cannot be rewritten. This is a special
case of a phrase structure grammar. To generate a phrase: start with
the nonterminal for the phrase and keep rewriting it according to the
rules of the grammar until you get to something with all terminal
symbols. The vertical bar '|' indicates a choice. () indicates the
empty string. For example to generate a nounphrase, we may choose the
following ('=>' indicates 'is rewritten as')
Derivation rule used
nounphrase => adj* noun nounphrase -> adj* noun
=> adj adj* noun adj* -> adj adj*
=> adj adj adj* noun adj* -> adj adj*
=> slow adj adj* noun adj -> slow
=> slow red adj* noun adj -> red
=> slow red noun adj* -> ()
=> slow red student noun -> student
We stop generating here because we have a string of all terminal
symbols. Derivations in grammars are inherently nondeterministic.
There are other ways that the same nounphrase could have been derived.
There are an infinite number of different nounphrases that can be
derived from this grammar. There is a lot more to formal grammars than
we are discussing here. They are very useful in both linguistics and
computer science. They are discussed further in CS courses on compiler
design, programming languages, and theory. They are also discussed in
a number of Linguistics courses.
We can represent the grammar shown above as a list as
follows:
(define simpgrammar
'((sentence -> (article nounphrase verbphrase))
(nounphrase -> (adj* noun))
(adj* -> () (adj adj*))
(verbphrase -> (verb adverb))
(noun -> dog dean professor student class)
(adj -> red slow dead nice quietly)
(verb -> ran ate slept drank talked)
(adverb -> quickly happily well)
(article -> the)
)
)
The representation of the grammar is a list of rules. Each rule
consists of a sublist. If the right side of a rule is just a choice of
simple objects, each choice is listed with a space in between (like the
right side for adj). If a choice is a compound object, then that
object is put in a sublist (like the right side of verbphrase ).
Our task is to write a procedure 'generate' that given a phrase, ph,
and a grammar, G, will generate a random phrase of type ph from grammar
G. In order to do this in a reasonable way, we will introduce some
abstraction barriers. Define simpgrammar as above. Then complete the
definition of each of the following:
;;; given a grammar rule of the form (ls -> rs) return ls
(define (ruleleft rule) ...
;;; given a grammar rule of the form (ls -> rs) return (rs)
(define (ruleright rule) ...
;;; given a grammar that consists of a list of rules of the
;;; form (ls -> rs) return a list of the possible rewrites
;;; for the nonterminal nonterm in the grammar specified
(define (rewrites nonterm grammar) ...
Here are some sample applications:
: (ruleleft '(sentence -> (article nounphrase verbphrase)) )
sentence
: (ruleright '(sentence -> (article nounphrase verbphrase)) )
((article nounphrase verbphrase))
: (rewrites 'sentence simpgrammar)
((article nounphrase verbphrase))
: (rewrites 'adj* simpgrammar)
(() (adj adj*))
Before we define generate, we need one more abstraction barrier:
;;;nonterm? returns a value that will be considered true
;;; (i.e. not #f) if cand is a nonterminal in the grammar
;;;specified by the second parameter. returns #f otherwise.
(define (nonterm? cand grammar)
(assoc cand grammar))
We also need a couple of functions that we used before:
(define (nth n x) ;;;returns item (n-1) from list x
(if (= n 0)
(car x)
(nth (- n 1) (cdr x))))
(define (pickrand lst) ;;;returns a random element of list lst
(nth (random (length lst)) lst))
We are now ready to try to define the procedure generate. generate
should return a list that consists of a random phrase of the type
specified by the first parameter derived using the grammar specified by
the second parameter. Here are three examples.
> (generate 'nounphrase simpgrammar)
(nice class)
> (generate 'sentence simpgrammar)
(the quiet nice dog ate quietly)
> (generate 'sentence simpgrammar)
(the student drank quietly)
Here is my first attempt:
;;; generate returns a list that consists of a random
;;; phrase of the type specified by the first parameter
;;; derived using the grammar specified by the second parameter.
(define (generate phrase grammar) ;;not quite right
(define (gen phrase) ;;; helping function which lets me
(generate phrase grammar)) ;;; skip the second parameter
(cond
((list? phrase) ;;; if the phrase is a list, apply gen
(map gen phrase)) ;;;recursively to each element
((nonterm? phrase grammar) ;;; if the phrase is a nonterminal
;;; pick one of the possible substitutions
;;; for phrase and apply generate recursively
(gen (pickrand (rewrites phrase grammar))))
(else (list phrase)) ;;;phrase must be a terminal, return it
)
)
Here are the results of two invocations:
> (generate 'sentence simpgrammar)
((the) (((slow) ((quiet) ())) (dog)) ((ran) (happily)))
> (generate 'sentence simpgrammar)
((the) (((dead) ((slow) ((nice) ((red) ())))) (professor)) ((talked) (quickly)))
Unfortunately, I got more parentheses than I wanted. What did I do
wrong? Please fix it for me and generate some funny sentences. Hint:
recall flatmap (not a standard scheme function but one discussed by
your authors on page 123).
Now for the payoff. The new improved generate should work for any
context-free grammar represented as specified. So try it on the
grammar below.
(define limpgrammar
'((sentence -> (nounphrase verbphrase))
(nounphrase -> (article adj* noun pp*) (name) (pronoun))
(adj* -> () (adj adj*))
(verbphrase -> (verb adverb) (verb nounphrase pp*))
(noun -> dog class professor student candidate ball dean table)
(adj -> red slow dead big little boring nice quiet long-winded)
(verb -> ran ate slept drank hit took saw talked)
(adverb -> quickly happily well slowly quietly)
(article -> the a)
(pp* -> () (pp pp*))
(pp -> (prep nounphrase))
(prep -> to in by with on)
(name -> george al kim lee terry robin)
(pronoun -> he she it these those that)
)
)
Here are a couple of the less silly sentences generated:
(robin slept well)
(al saw well)
(a professor by lee slept happily)
Here are a couple of the really bad sentences generated:
(a dead slow little slow little long-winded dean talked well)
(a table with a big dean to kim slept that)
(a student by the class in george took a class to lee to a quiet nice
student on robin in a dean with the dean on the boring professor by these
with these on a dean to these)
You can get some pretty stupid sentences from this grammar. We have
not done anything about case (e.g. 'on he') or whether a verb should
take a direct object. But you can see that with some refinement, it
might be possible to build a sentence generator. You may want to try
to make up better grammars. Take a linguistics course to learn more.
Now try to generate some arithmetic expressions using the following
expression grammar. Save anything that is valuable BEFORE you try
this.
(define exprgram
'((expr -> (ident)
( "(" expr ")" )
(numb)
(expr + expr)
(ident) ;;;repeats to make sure pickrand can stop
(expr * expr)
(numb)
(expr - expr)
(ident)
(expr / expr))
(ident -> x y z cat force mass)
(numb -> 1 2 3 40 50 60)))
We have to make the parentheses strings because parentheses have
special meaning in scheme. We have to repeat (numb) and (ident) on the
right side of expr in order to lessen the probability of very long
sequences of recursive calls to expr.
: (generate 'expr exprgram)
(x + cat)
: (generate 'expr exprgram)
("(" cat - 3 * 1 - y - "(" y ")" ")")
You can get rid of the quotes around the parentheses by invoking
display, e. g.
: (display (generate 'expr exprgram))
(( 2 ) - z)
The procedure generate is another example of data directed programming.
The data (in this case the grammar) tells the program what to do.
This data consists of a set of rules so this is also an example of a
"rule-based production system".