Lab 10: Scrabble Assistant
Due on Sunday, April 17th at 11:59 PM. This is a team lab. You and your assigned lab partner(s) will complete this lab together. Make sure that you are familiar with the Partner Etiquette guidelines. You may discuss the concepts of this lab with other classmates, but you may not share your code with anyone other than course staff and your lab partner(s). Do not look at solutions written by students other than your team. If your team needs help, please post on the Piazza forum or contact the instructor or ninjas. If you have any doubts about what is okay and what is not, it’s much safer to ask than to risk violating the Academic Integrity Policy.
Overview
In this lab, you will write a hash table implementation for a dictionary data structure. You will then use that implementation to create a tool which assists in finding solutions to tiled word games (such as Scrabble). In the process, you will learn about additional parts of the C++ standard libraries. You will also provide a written analysis of the performance of your hash table under different circumstances.
Starting Code
Your starting code consists of the following files. Files that you will need to change are bolded below.
- Files to define a hash table and use it as a dictionary.
dictionary.h
: TheDictionary
interface.hashes.h
/hashes.cpp
: Some hash implementations for the types you will be using.hash_table.h
/hash_table__definitions.h
: TheHashTable
class. Only the public methods have been provided for you; you will need to write your own helper method declarations.
- Files for implementing the Scrabble Assistant.
scrabble_assistant.h
/scrabble_assistant.cpp
: TheScrabbleAssistant
class, which bundles up the work of identifying anagrams and similar tasks.scrabble_assistant_main.h
/scrabble_assistant_main.cpp
: Thescrabble_assistant_main
declaration and related helper functions to bring everything together.program.cpp
: The file in which the “real”main
function is stored (which we keep separate for testing).
- A program for your tests.
test_utils.h
/test_utils.cpp
tests.cpp
- A program for analyzing the performance of the
HashTable
.stat_collection.cpp
More C++ Standard Library Tools
This section points out some C++ standard library tools which will be helpful in completing your lab.
pair
This lab does not provide you with a Pair
class; in fact, one already exists in the C++ standard library (spelled in lower case: pair
). You can create and use a pair as follows:
#include <utility>
pair<int,char> foo(int x) {
pair<int,char> p(5,'z'); // statically allocated pair
p.first += x; // update first field of the pair
if (p.first > 10) { // read first field
p.second = 'q'; // update second field
}
return p; // return the pair; note that this is a statically-allocated object, so the return is *copied*
}
sort
The C++ standard library also provides a sorting algorithm through a template function called sort
, although the interface is somewhat peculiar. It uses an object of type RandomAccessIterator
which, in this case, is used to mark locations within a data structure. This allows you to sort parts of e.g. a list. To sort a vector
, you might write:
#include <algorithm>
#include <vector>
void sort_a_vector(vector<int>* vec) {
sort(vec->begin(), vec->end());
}
Here, the begin
and end
method produce iterators. These iterators can be used for far more than just marking an index and are an important abstraction, but we needn’t worry about that just to use the sort
template function.
Note that vec
above is a pointer. If we passed in a vector (instead of a pointer to one), we would be sorting a copy of the vector (since it would be copied when it was passed in).
erase
The vector
class provides a means to remove an element from the end of a list (pop_back
) but doesn’t give a clear interface for removing elements e.g. from the front of a list. In fact, vectors provide a general function called erase
which can delete entire contiguous regions:
void remove_range_from_vector(vector<int>* vec, int start_index, int end_index) {
vec->erase(vec.begin() + start_index, vec.begin() + end_index);
}
The above function works much like a Python slice: the start index is included but the end index is not. Thus, remove_range_from_vector(1,3)
would remove the second and third elements (indices 1 and 2) from the vector.
Scrabble Assistant
Once you have a working (i.e. tested) HashTable
, you can write the Scrabble Assistant. We will design the application in two parts: a ScrabbleAssistant
class (which does all of the interesting work) and a scrabble_assistant_main
function (which provides the interface that allows the end user to interact with that object).
The ScrabbleAssistant
class has four methods. One is used to tell the ScrabbleAssistant
about a new word; when you first create the object, you will call this method for each word that appears in the dictionary and the ScrabbleAssistant
will use that call to update its data strutures and remember which words exist. Once it has been informed of these words, the ScrabbleAssistant
has three primary functions.
Spellchecking
The spellcheck
method determines whether a given word is in the dictionary. This would be trivial except that we have a performance requirement: spellcheck
must run in \(O(1)\) time!
In order to satisfy this constant time requirement of spellchecking, you’ll need to maintain a mapping in which the keys are dictionary words and the values are anything at all (e.g. true
, 0
, or the word itself). Since get
is constant time on HashTable
objects, you can determine whether a word is in the dictionary just by trying to get
it as a key.
Anagrams
The anagrams_of
method determines the dictionary words which can be created by rearranging a given string of characters. For instance, the word “bat” contains exactly the same letters as the word “tab” and no other English words contain exactly those letters.
In order to find anagrams, you should first write a simple function which sorts the letters in a string. For instance, the string "bat"
would sort to "abt"
. (Note: "tab"
would sort to the same string.) Then, when a new word is added to the ScrabbleAssistant
, you will sort it and store it in a Dictionary
that maps from sorted letter strings to lists of words that can be spelled by them.
In this example, for instance, that Dictionary
would map the string "abt"
to a list containing the strings "bat"
and "tab"
. Later, when a user wants to know the anagrams of "bat"
, you can sort the string and use the Dictionary
to find this list.
Word-Finding
The ultimate goal of the ScrabbleAssistant
is to help a player determine which words they could play in a word tile game like Scrabble. This goal is met by the find_words
method which, given a string of letters, will give each of the words that the player can form from that string. This includes words that are made of some subset of those letters.
A simple algorithm to address this would be to write an additional function which, given a string, produces all of the sorted combinations of letters than can be made from that string. For instance, the string "abt"
would produce the following list of strings:
a b t ab at bt abt
Note that, for a string of length \(n\), this vector is of size \(O(2^n)\). That’s okay.
To find words for the game, you can simply compute this vector of possible strings and then determine the legal anagrams for each one. Given an input (such as "bat"
), you can sort it ("abt"
), produce the above vector, and then determine the anagrams for each such string.
Writing the Main Program
Once you’ve written and tested your ScrabbleAssistant
class, the only thing you have left to do is to write a main program that allows the user to interact with it. In order to make this program useful, we will allow it to be used in one of two ways: batch mode (in which we operate entirely with files) and interactive mode (in which we interact with the user directly). This is a lot easier than it might sound.
To launch your program in interactive mode, we run it as follows:
scrabble_assistant ./test_data/dictionary.txt
Here, the first command-line argument is the dictionary to use. The user may then give any of the following three commands:
spellcheck
word: Call the spellchecking function. Print"yes"
if the word is in the dictionary and"no"
if it is not.anagrams
word: Call the anagram-finding function. Print a space-separated line containing the possible sorts. These words should be sorted; use thesort
function above to accomplish this. If no words are found, print"*none*"
.plays
word: Find words that can be played using the letters in the provided word. Print a sorted, space-separated line of result as above.
If the user gives you an incorrect command, just reply with a sensible error message.
To launch your program in batch mode, we run it as follows:
scrabble_assistant ./test_data/dictionary.txt ./test_data/some_input.txt ./test_data/some_output.txt
In this case, it reads commands from the second file and writes the result lines to the third file.
This is very easy to implement. You will find a function, answer_scrabble_assistant_questions
, in scrabble_assistant_main.cpp
. This function takes an istream
and an ostream
. istream
, for instance, is a superclass of both the type of cout
as well as of ifstream
. That means that, in scrabble_assistant_main
, you can call answer_scrabble_assistant_questions
either with cin
and cout
(in interactive mode) or with an ifstream
and an ofstream
that you created (in batch mode). From within answer_scrabble_assistant_questions
, you won’t have to worry about which mode you’re in!
Written Portion
The HashTable
class relies upon external guarantees to achieve its performance: many operations have \(O(1)\) time complexity as long as the hash function is “good”, the load factor is “low enough”, and so on. In order to get a more precise understanding of how HashTable
works, you will devise and conduct a small experiment.
Your starter code includes a simple program called stat_collection
; further, your HashTable
class contains a method called print_stats
. You should implement print_stats
to provide some information about the internal state of the hash table. Here are some examples:
- What percentage of buckets are empty?
- What is the average size of non-empty buckets?
- What is the size of the largest bucket?
For your experiment, you should pick an independent variable that you will control (e.g. size of table, numerical distance between keys added to table, load factor, etc.) and a dependent variable that you will observe (e.g. one of the above). Then, write code that prints the statistics for HashTable
s at varying values of your independent variable. Create one or two graphs explaining this. Then, create a write-up in which you discuss how you collected your data (at least one paragraph) and discussing the results (at least one paragraph).
You may use any software you like to create the graphs as long as the final result is accessible in a common image format (PNG or JPG preferred) or as a PDF or LaTeX source. Your write-up should be either a text document, a PDF, or LaTeX source. You can, for instance, use gnuplot to generate your graph if you are so inclined; there’s a brief tutorial here. You can also use a tool such as Microsoft Excel or OpenOffice Calc if you like, as long as your hand-in meets the requirements above.
Testing
As usual, you have been provided RUN_MAIN
and CHECK_FILES_EQUAL
. You should write tests in tests.cpp
for your HashTable
data structure, your ScrabbleAssistant
class, and your scrabble_assistant_main
function.
Memory
Your program is required to run without any memory leaks or errors; you must free up all of the memory that you allocate and you must only use memory that you have allocated and initialized. We will use valgrind
to determine if you have done this correctly, so you should as well. You should also consider using valgrind
to find bugs in your program as you proceed; it’s one of two very powerful C++ debugging tools that we’ve discussed.
Coding Style Requirements
You are required to observe some good coding practices:
-
You should pick meaningful variable names.
// Good int* pixels = new int[size]; // Bad int* p = new int[size];
-
You should use correct and consistent indentation. Lines of code within a block (that is, surrounded by
{
and}
) should be indented four spaces further than the lines surrounding them.// Good if (condition) { cout << "Test" << endl; } // Bad if (condition) { cout << "Test" << endl; }
-
You should use a block whenever possible, even if it’s not necessary. This helps you avoid subtle or messy bugs in the future.
// Good if (condition) { cout << "Something" << endl; } // Bad if (condition) cout << "Something" << endl;
-
Any new methods or fields in your header files should have comments explaining their purpose and behavior. You are permitted to omit documentation for methods that are inherited from other classes; that is, if your class has a
foo
method because its superclass has afoo
method, you don’t need to document that method.// Good public: /** * Saves this image object to a file. * @param file The already-open stream into which this image should be written. * @return The number of pixels in the saved image. * @throws runtime_error If the image could not be saved due to an I/O error. */ int save(ofstream& file); // Bad public: int save(ofstream& file);
Your method/field documentation does not have to be in the format above, but you must describe the method’s behavior, its parameters, its return value, and any exceptions that it may throw. (If you’re indifferent, the above syntax is a good one to know; it’s a de facto standard used by Javadoc, Doxygen, and other tools that automatically process source code comments into other formats like searchable webpages.)
Peer Review
As this is a team assignment, you are required to complete a peer review of your lab partner or partners. You must do this even if everything is fine. You should complete this peer review after you have turned in your assignment. The peer review comes in the form of a simple Google Form and can be accessed here; in most cases, it will likely take less than a minute to complete. If you have trouble accessing this peer review, please make sure that you are logged into Google using your Swarthmore credentials.
Your peer review will be kept private; only course staff will have access to this information. You can also update your peer review after you have submitted it. However, if you do not submit a peer review, you will lose points on your lab. Please don’t forget!
Summary of Requirements
When you are finished, you should have
- A completed implementation of
HashTable
- A
ScrabbleAssistant
class which implements the behavior described above - Appropriate tests
- No memory errors!
- Code which conforms to the style requirements above
- A write-up of your experiments with
HashTable
- A completed peer review