CS91R Lab 01: NY Times games assistants

Due Wednesday, February 5, before midnight

Goals

The goals for this lab assignment are:

  • Learning the basics of regular expressions

  • Learning the basics of awk to prototype solutions on the command-line

  • Learning how to use regular expressions in awk and python

  • Applying these tools to build word game assistants

Cloning your repository

Log into the CS91R-S25 github organization for our class and find your git repository for Lab 01, which will be of the format lab01-user1-user2, where user1 and user2 are you and your partner’s usernames.

You can clone your repository using the following steps while connected to the CS lab machines:

# create your cs91r/labs directories if you have not done so
$ cd
$ mkdir cs91r
$ cd cs91r
$ mkdir labs

# cd into your cs91r/labs sub-directory and clone your lab01 repo
$ cd ~/cs91r/labs
$ git clone git@github.swarthmore.edu:CS91R-S25/lab01-user1-user2.git

# change directory to list its contents
$ cd ~/cs91r/labs/lab01-user1-user2

# ls should list the following contents
$ ls
bee.py  collab.md  ex1.py  README.md  sample.cast  sample.gif  wordle.py  words.txt

Python Regular Expressions warmup

CLAB 2B contains information on how to write regular expressions in python. It has two questions, and for this warmup, you should complete the first question (#18). Work with your lab partner on this — you don’t need to work with your in-class group.

Regular expressions refresher

If you’d like a refresher on regular expressions, we’ve created a regular expression tutorial. We have provided an example of how to code the tutorial examples in python. Be sure you understand how the examples work before moving on to the next section. This file (ex1.py) is also in your Lab 01 repository.

1. Spelling Bee assistant

The New York Times hosts a number of word games. One of these games is Spelling Bee. Like the other games on the NY Times website, these puzzles are updated only once a day. Like any successful product, there are knock-off versions such as spellbee.org.

The spellbee.org version allows you to link to a specific puzzle, which is useful for briefly explaining the game. For example, let’s look at this puzzle, which has the letter L in the center in yellow and the letters E, Q, T, I, U, R around the outside in white. Your goal is to make as many words as possible with the letters provided. You can use each letter as many times as you want (0 or more times), but you must use the yellow center tile at least once. Words must be at least 4 letters long.

The dictionary that the NY Times uses and the dictionary that the spellbee.org website uses are both unpublished, but we can use our own dictionary: a dictionary that contains words that are legal to play in the board game Scrabble. Our dictionary list should contain all of the words in the spelling bee solution, and it will almost certainly contain extra words that aren’t in the solution. That’s because the Scrabble word list is much more expansive (and contains words that some users might find offensive). However, using this list should ensure that we find all of the answers the game wants us to find and then a bunch more that the game considers invalid.

You can find the dictionary here: /data/cs91r-s25/scrabble/scrabble.txt

1.1 Using awk to prototype a solution

In a single command line, use awk to produce all of the valid words for the spellbee.org puzzle described above.

For the solution we are looking for, you can pipe the output of one awk command into another, but you shouldn’t use any awk syntax that is more complicated than the examples provided above.

If you’d like to use DataCamp’s Regular Expressions Cheat Sheet to figure out how to solve it with more complex regular expressions, you’re welcome to do so, but be sure to show us how you would do it using only what we’ve taught you, too.

Questions

Put the answer to these questions, and any additional questions, in your README.md file.

  1. Use asciinema to record interactions in the shell using awk demonstrating your spelling bee assistant.

1.2 Using python to get answers

To begin, write a python program called bee.py that calculates all of the answers to the spelling bee game. You will need to read in all of the words from the dictionary file into python. Then, for each word in the file, use the same regular expressions you just used in awk to determine the solutions. Use the python regex example shown earlier for reference.

Once you have that working, add two other features:

  • Your python program should read the letters of the puzzle from the command line.

  • Optional: Your program should produce the maximum achievable score.

To read the letters of the puzzle from the command line, you will need to access a special list, sys.argv. To access that, you will need to import sys at the top of your program. In python, argv stores the name of your python program in argv[0] and then any other values you typed starting in argv[1]. See the provided python argv example to see this in use.

Output

When you are done, your program should work like this:

$ python3 bee.py
(Provide an error message to let the user know they should provide letters)
$ python3 bee.py leiqrtu
Words: 105
The words are:
eelier
...

You can also run your program like this if you’d like:

$ python3 bee.py < /data/cs91r-s25/scrabble/scrabble.txt
(Provide an error message to let the user know they should provide letters)
$ python3 bee.py leiqrtu < /data/cs91r-s25/scrabble/scrabble.txt
Words: 105
The words are:
eelier
...

Notice that eelier is not a valid word in the spellbee.org puzzle, and we report far more words that the NY Times expects because our dictionary contains more words than the NY Times dictionary. For this lab, that’s fine!

Questions

Put the answer to these questions, and any additional questions, in your README.md file.

  1. Use asciinema to record interactions with your bee.py program to demonstrate that it is working.

Optional: Calculating points

In the spelling bee game, you get points for each valid word that you make. Four letter words are worth 1 points. All longer words have points equal to the number of letters in the word. However, if you make a word that uses each letter at least one time (called a "pangram"), you get 7 bonus points. Given the scoring rules, calculate the score for each word and report only the total score of all the words you can make. HINT: The python set data structure is very useful for figuring out if a word is a pangram.

If you implement this, your new output would look something like this:

$ python3 bee.py
(Provide an error message to let the user know they should provide letters)
$ python3 bee.py leiqrtu
Words: 105	Points: 485
The words are:
eelier
...

Optional: Generating hints

If you’d like some more python programming practice or enjoy the spelling bee puzzle, here’s an optional extension.

The NY Times provides hints. Can you generate similar hints for the user’s letters?

See the NY Times glossary of spelling bee terms if you’re unfamiliar with this grid.

Here’s what the hints looked like from the puzzle on January 1, 2025:

Center letter is in bold.

T A B G L N O

WORDS: 47, POINTS: 195, PANGRAMS: 1, BINGO

    4	5	6	7	8	Σ
A:	1	3	1	-	-	5
B:	5	2	2	1	-	10
G:	2	2	1	2	-	7
L:	1	1	-	1	1	4
N:	-	1	1	1	-	3
O:	1	-	-	-	-	1
T:	8	6	1	-	2	17
Σ:	18	15	6	5	3	47
Two letter list:

AB-1 AL-2 AT-2
BA-2 BL-5 BO-3
GA-2 GL-2 GN-1 GO-2
LA-1 LO-3
NA-3
ON-1
TA-8 TO-9

2. Wordle assistant

Another popular word puzzle on the NY Times website is Wordle. Like Spelling Bee, the puzzle is updated every day, and there are lots of knock-off versions, such as hellowordl.net. Like spellbee.org. we can link directly to a specific puzzle. Play this wordle puzzle to remind yourself how to play if you have never played or haven’t played in a while and need a refresher.

We will write an assistant for wordle so that, at any point in the game, we can get a list of all of the valid guesses we have remaining. For example, using the example puzzle above, if we’d guessed "STEAL" (follow along in another tab), we would know the following information:

  • There is at least one S in the puzzle, but it is not in the first position.

  • There is at least one E in the puzzle, but it is not in the third position.

  • There is no T, A, or L in the puzzle.

Given a dictionary containing five-letter words, your assistant should provide a list of all words that match those criteria.

If we then guessed "HOUSE" (follow along in another tab), we would know:

  • There is at least one S in the puzzle, but it is not in the first or fourth position.

  • There is at least one E in the puzzle. One of them is in the fifth position and, if there are more E’s, they are not in the third position.

  • There is at least one U in the puzzle, but not in the third position.

  • There is no T, A, L, H or O in the puzzle.

You will use the same Scrabble dictionary you used in the Spelling Bee assistant to build our Wordle assistant.

2.1 Using awk to prototype a solution

In a single command line, use awk to produce all of the valid remaining words after making at least one guess.

For the solution we are looking for, you can pipe the output of one awk command into another, but you shouldn’t use any awk syntax that is more complicated than the examples provided above.

If you’d like to use DataCamp’s Regular Expressions Cheat Sheet to figure out how to solve it with more complex regular expressions, you’re welcome to do so, but be sure to show us how you would do it using only what we’ve taught you, too.

You do not have to worry about what happens when the word you guess has repeated letters and they aren’t all in the answer. For example, if you guessed ISSUE in the puzzle above, you would get ISSUE. This tells you that there is only one S in the answer. However, you don’t have to explicity handle that case — you just need to make sure there is no S in the second position.

Questions

Answer each of these questions as relate to this puzzle. For each of the first 4 questions:

  • provide text output in a markdown code block and,

  • demonstrate that they all work in a single asciinema recording (see Section 3).

  1. To start, you guess STEAL and your output is STEAL. Given this information, what awk command (or series of awk commands) would you use to show all the remaining valid words?

  2. Your next guess is VIDEO and the output is VIDEO. Given this information and the information from your first guess, what awk command (or series of awk commands) would you use to show all the remaining valid words?

  3. Your third guess is PRICE and the output is PRICE. Given this information and the information from your previous guesses, what awk command (or series of awk commands) would you use to show all the remaining valid words?

  4. Your fourth guess is BEING and the output is BEING. Given this information and the information from your previous guesses, what awk command (or series of awk commands) would you use to show all the remaining valid words? If your awk commands were correct, you should only get two words as your output here.

For this last question, answer in your README.md file as plain text:

  1. Although you might not have explicity written it down, what algorithm are you using (in your head) to construct the awk commands?

2.2 Using python to improve the assistant

Write a solution to the Wordle assistant in python. Start by hard-coding the examples you used from the command-line in awk. The regular expressions will be the same. As with the spelling bee assistant, you will have to read in a dictionary of words into python as a first step.

Once you are sure that the hard-coded examples are providing the same output as the awk examples, modify your program so that it accepts multiple command-line arguments in the following format:

$ python3 wordle.py <guessed_letters> <col1> <col2> <col3> <col4> <col5>

The <guessed_letters> argument will include all of the letters you’ve guessed so far. You can duplicate letters here and your program should handle that without a problem.

Each of the 5 <col> arguments will be either:

  • The . symbol if no yellow or green letters have appeared at this position yet, or

  • An upper-case letter for each green letter at this position AND a lower-case letter for each yellow letter at this position.

Here is an example of how you would run the program after making your first guess of "STEAL" in the puzzle above:

$ python3 wordle.py steal . . e . .

Be sure you understand why that is how you would run the program given the instructions above.

After making your second guess, "VIDEO", you could run your program like this:

$ python3 wordle.py stealvido . i e e .

After making your third guess, "PRICE", you could run your program like this:

$ python3 wordle.py pricestealvido . i Ie e e

Just like with the awk commands, your program should print all remaining valid words.

The ex1.py file may provide some guidance on how to code this up.

Questions

  1. Use asciinema to record your interactions with your wordle.py program to demonstrate that it is working.

  2. Suggest any improvements you’d like to make to your current program (aside from not handling repeated letters properly)?

  3. Would you make any changes to the program’s interface? If no, why not? If yes, what would the new interface look like? Show some sample interactions with the modified interface in a markdown code block.

3. How to turn in your solutions

Edit the README.md file that we provided to discuss how you solved each problem. For each part (spelling bee with awk, spelling bee with python, wordle with awk, wordle with python), use asciinema to record a terminal session and include it your README.md.

For example, here is an asciinema recording of the awk examples shown above:

To record your session in asciinema, use the following command:

$ asciinema rec -i 2 awk_regex.cast

When your session is over, convert it to a .gif file:

$ agg awk_regex.cast awk_regex.gif

Add any .cast and .gif files to your repository.

You can name your .cast files anything you’d like, but you will need at least four of them to include in your writeup. Do not worry if you make typos as you are working in a recording: we are not evaluating your ability to type!