CS21 Lab 3

CS21 Lab 3: Loops and Conditionals

Due Saturday, September 24, by 11:59pm

Programming Tips

As you write programs, use good programming practices:

Use a comment at the top of the file to describe the purpose of the program (see example).
All programs should have a main() function (see example).
Use variable names that describe the contents of the variables.
Write your programs incrementally and test them as you go. This is really crucial to success: don’t write lots of code and then test it all at once! Write a little code, make sure it works, then add some more and test it again.
Don’t assume that if your program passes the sample tests we provide that it is completely correct. Come up with your own test cases and verify that the program is producing the right output on them.
Avoid writing any lines of code that exceed 80 columns.
- Always work in a terminal window that is 80 characters wide (resize it to be this wide)
- In vscode, at the bottom right in the window, there is an indication of both the line and the column of the cursor.

Are your files in the correct place?

Make sure all programs are saved to your cs21/labs/03 directory! Files outside that directory will not be graded.

$ update21
$ cd ~/cs21/labs/03
$ pwd
/home/username/cs21/labs/03
$ ls
Questions-03.txt
(should see your program files here)

Goals

The goals for this lab assignment are:

practice using if-else statements
continue working with for loops and accumulators
use formatted printing
learn how to import modules that extend python’s capabilities

1. Leet Speak

Leet speak is method of modifying text by replacing some letters with other symbols (such as numbers). We will change every upper- and lower-case A to 4, every upper- and lower-case E to 3, every upper- and lower-case L to 1, every upper- and lower-case O to 0, and every upper- and lower-case T to 7.

In the file leet.py, write a program that takes a string as input and prints out the Leet speak version of the string.

You must use an accumulator to solve this problem.

$ python3 leet.py
Type in some text to have it made 1337!
Input: Leet speak
Leet version: 1337 sp34k

$ python3 leet.py
Type in some text to have it made 1337!
Input: Taylor Swift's new album comes out next month
Leet version: 74y10r Swif7's n3w 41bum c0m3s 0u7 n3x7 m0n7h

$ python3 leet.py
Type in some text to have it made 1337!
Input: tattletale tattoo
Leet version: 7477137413 747700

$ python3 leet.py
Type in some text to have it made 1337!
Input: unbudging skunks
Leet version: unbudging skunks

Take a look at the last two examples. The string tattletale tattoo has every letter converted to a number, so the output is a bit strange looking. And the string unbudging skunks has no letters converted, so the the output looks the same as the input. That’s to be expected!

2. Digitally inspecting texts

We will gradually build up a program that allows users to investigate texts (e.g. novels) that are available through a python library that is installed on the CS lab machines.

You will start by writing a program that searches a text for a word that the user enters. We will show you how to read in all of the words from Jane Austen’s "Emma" as a list of strings. Your program will ask the user what word they want to search for. You will search through the words of the novel and report back how many times the word was found.

To get a list of words from the user, you’ll use the nltk library. Put the following line at the top of your program, which you will save in the file find.py.

import nltk

Once the nltk library has been imported, you can get the words from any number of files that are included with nltk. We will start with Jane Austen’s "Emma", which is accessible through the nltk library using the file name 'austen-emma.txt'. The nltk library will automatically read that file for you and store all of the words in a list of strings:

file_name = 'austen-emma.txt'
words = nltk.corpus.gutenberg.words(file_name)

In case you are wondering what this looks like, let’s just type those lines into python3:

$ python3
Python 3.10.4 (main, Jun 29 2022, 12:14:53) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> file_name = 'austen-emma.txt'
>>> words = nltk.corpus.gutenberg.words(file_name)
>>> words
['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']', ...]
>>> words[0]
'['
>>> words[3]
'Jane'

Notice that words is just a list of strings. The first word (if you can call it that) is the string '['. The second word is 'Emma'. The third word is 'by'. The fourth word (at index 3) is 'Jane'. You do not need to make this list. This list of words is automatically created for you when you run these lines:

file_name = 'austen-emma.txt'
words = nltk.corpus.gutenberg.words(file_name)

Now that you have a list of all the words in the text, you can try to count how many times a specific word appears. Ask the user to type in a word and count how many times that word appears in the list of words you read in. You are only looking for EXACT matches. For example, if the user types the, you will match only against the word the, but not words like there (which starts with the) or other (that has the in the middle) or even The (with a capital letter).

Here are four examples of the running program. User input is shown in bold.

$ python3 find.py
Searching austen-emma.txt
What word do you want to search for? the
Found 4844 times.

$ python3 find.py
Searching austen-emma.txt
What word do you want to search for? horse
Found 9 times.

$ python3 find.py
Searching austen-emma.txt
What word do you want to search for? dog
Found 1 times.

$ python3 find.py
Searching austen-emma.txt
What word do you want to search for? giraffe
Found 0 times.

2.1. Reporting the length of the text

You would expect that words would appear more often in longer texts than in shorter texts. Therefore, when investigating a text, it’s helpful to know how long the text is. Add information about the length of the text to your output. You can get the number of words using the len function on the list of words.

words = nltk.corpus.gutenberg.words('austen-emma.txt')
length = len(words)

Once you know the length of the text, report it as part of your output.

$ python3 find.py
Searching austen-emma.txt
Enter a word to find: Emma
Found 865 times.
There are 192427 words in the text.

2.2. Choosing the novel to read from

There are a handful of novels available to you in the nltk library, including Jane Austen’s "Emma" that you used above. Here are some of the options available:

Novel file name

Novel	file name
"Emma" by Jane Austen	`austen-emma.txt`
"Persuasion" by Jane Austen	`austen-persuasion.txt`
"Sense and Sensibility" by Jane Austen	`austen-sense.txt`
"Alice’s Adventures in Wonderland" by Lewis Carroll	`carroll-alice.txt`
"Leaves of Grass" by Walt Whitman	`whitman-leaves.txt`
"Julius Caesar" by William Shakespeare	`shakespeare-caesar.txt`
"Hamlet" by William Shakespeare	`shakespeare-hamlet.txt`
"Macbeth" by William Shakespeare	`shakespeare-macbeth.txt`
"Stories to Tell to Children" by Sarah Cone Bryant	`bryant-stories.txt`
"The Parent’s Assistant" by Maria Edgeworth	`edgeworth-parents.txt`
"Moby Dick" by Herman Melville	`melville-moby_dick.txt`

"Emma" by Jane Austen

austen-emma.txt

"Persuasion" by Jane Austen

austen-persuasion.txt

"Sense and Sensibility" by Jane Austen

austen-sense.txt

"Alice’s Adventures in Wonderland" by Lewis Carroll

carroll-alice.txt

"Leaves of Grass" by Walt Whitman

whitman-leaves.txt

"Julius Caesar" by William Shakespeare

shakespeare-caesar.txt

"Hamlet" by William Shakespeare

shakespeare-hamlet.txt

"Macbeth" by William Shakespeare

shakespeare-macbeth.txt

"Stories to Tell to Children" by Sarah Cone Bryant

bryant-stories.txt

"The Parent’s Assistant" by Maria Edgeworth

edgeworth-parents.txt

"Moby Dick" by Herman Melville

melville-moby_dick.txt

Add a menu to your program that allows the user to choose which text they’d like to search in. Make Jane Austen’s "Emma" the first option, then choose at least three other 3 novels you’d like to add to the menu. Here is an example of the program running:

$ python3 find.py
Choose the text to search from the following choices:
a. "Emma" by Jane Austen
b. "Moby Dick" by Herman Melville
c. "Stories to Tell Children" by Sarah Cone Bryant
d. "Leaves of Grass" by Walt Whitman
Your selection: a
Searching austen-emma.txt
Enter a word to find: dog
Found 1 times.
There are 192427 words in the text.

$ python3 find.py
Choose the text to search from the following choices:
a. "Emma" by Jane Austen
b. "Moby Dick" by Herman Melville
c. "Stories to Tell Children" by Sarah Cone Bryant
d. "Leaves of Grass" by Walt Whitman
Your selection: b
Searching melville-moby_dick.txt
Enter a word to find: dog
Found 17 times.
There are 260819 words in the text.

$ python3 find.py
Choose the text to search from the following choices:
a. "Emma" by Jane Austen
b. "Moby Dick" by Herman Melville
c. "Stories to Tell Children" by Sarah Cone Bryant
d. "Leaves of Grass" by Walt Whitman
Your selection: c
Searching bryant-stories.txt
Enter a word to find: dog
Found 14 times.
There are 55563 words in the text.

$ python3 find.py
Choose the text to search from the following choices:
a. "Emma" by Jane Austen
b. "Moby Dick" by Herman Melville
c. "Stories to Tell Children" by Sarah Cone Bryant
d. "Leaves of Grass" by Walt Whitman
Your selection: d
Searching whitman-leaves.txt
Enter a word to find: dog
Found 3 times.
There are 154883 words in the text.

If the user enters an invalid choice of text, let the user know that it was an invalid choice, then search just search "Emma" even though they hadn’t selected that. For example:

$ python3 find.py
Choose the text to search from the following choices:
a. "Emma" by Jane Austen
b. "Moby Dick" by Herman Melville
c. "Stories to Tell Children" by Sarah Cone Bryant
d. "Leaves of Grass" by Walt Whitman
Your selection: f
Invalid selection.
Searching austen-emma.txt
Enter a word to find: dog
Found 1 times.
There are 192427 words in the text.

2.3. Concordance (OPTIONAL)

When doing searches like these, it’s often helpful to see some context of where the word occurred in the text. A common way to do this is to use the Key Word in Context (KWIC) method.

Create a KWIC display of the words you found in your searches by displaying the three words before and three words after the word you were searching for. See if you can figure out how to keep the word you are searching for (the "key word") centered in the output. For example, here is an example of the KWIC output when searching for dog in "Stories to Tell Children":

$ python3 find.py
Choose the text to search from the following choices:
a. "Emma" by Jane Austen
b. "Moby Dick" by Herman Melville
c. "Stories to Tell Children" by Sarah Cone Bryant
d. "Leaves of Grass" by Walt Whitman
Your selection: c
Searching bryant-stories.txt
Enter a word to find: dog
              new puppy - dog to take home
              the puppy - dog was dead .
                A puppy - dog , Mammy ,"
                a puppy - dog ! The way
                a puppy - dog is to take
              the puppy - dog ' s neck
              the puppy - dog on the ground
           palace his pet dog ran to meet
           But the little dog was so used
              kick my own dog , if I
      pretty little white dog . The keeper
     the beautiful little dog to the court
          keep the little dog from growing ,
                   Am I a dog , that thou
Found 14 times.
There are 55563 words in the text.

Be careful about words that occur near the start or end of the novel since they may not have 3 words before or after them. For example, Fancy is the last word of "Leaves of Grass", so the final line of the concordance can not contain 3 words after Fancy:

$ python3 find.py
Choose the text to search from the following choices:
a. "Emma" by Jane Austen
b. "Moby Dick" by Herman Melville
c. "Stories to Tell Children" by Sarah Cone Bryant
d. "Leaves of Grass" by Walt Whitman
Your selection: d
Searching whitman-leaves.txt
Enter a word to find: Fancy
                - Bye My Fancy Good - bye
                - Bye My Fancy ! Good -
                - bye my Fancy ! Farewell dear
                - bye my Fancy . Now for
                - bye my Fancy . Yet let
               hail ! my Fancy .
Found 6 times.
There are 154883 words in the text.

3. Answer the Questionnaire

Each lab will have a short questionnaire at the end. Please edit the Questions-03.txt file in your cs21/labs/03 directory and answer the questions in that file.

Once you’re done with that, you should run handin21 again.

Submitting lab assignments

Remember to run handin21 to turn in your lab files! You may run handin21 as many times as you want. Each time it will turn in any new work. We recommend running handin21 after you complete each program or after you complete significant work on any one program.

Logging out

When you’re done working in the lab, you should log out of the computer you’re using.

First quit any applications you are running, like the browser and the terminal. Then click on the logout icon ( or other logout icon ) and choose "log out".

If you plan to leave the lab for just a few minutes, you do not need to log out. It is, however, a good idea to lock your machine while you are gone. You can lock your screen by clicking on the lock xlock icon. PLEASE do not leave a session locked for a long period of time. Power may go out, someone might reboot the machine, etc. You don’t want to lose any work!