Lab 9: Hash Tables, Scrabble Assistant
Due on Wednesday, April 22 at 11:59 PM.
- Content
- Starting Code
- Part I: Linear Dictionary Implementation
- Part II: Hash Table Implementation
- Part III: Scrabble Assistant
- Memory Management
- Coding Style Requirements
- Summary of Requirements
Content
The main goals for this lab are to gain experience working with hash tables and to evaluate real-world data structure performance by running experiments. To that end, you will:
- Implement two versions of
Dictionary
:- A
LinearDictionary
for a simpler dictionary implementation for small sets of elements. - A
HashTable
implementation that uses separate chaining (each chain will itself be aLinearDictionary
).
- A
- Apply your hash table implementation to create a simple program to assist Scrabble players identify all anagrams of a string of letters that are valid words.
Starting Code
Your starting code can be found in lab09-[username] repo. The following is a description of the repository’s initial contents; bolded files are those which you must change in completing the lab.
adts/
: As with your last lab, the ADT interface declarations are stored here.Makefile
: the build instructions for this lab.linearDictionary.h/-inl.h
: TheLinearDictionary
class you will be implementing to store collisions within yourHashTable
.hashTable.h/-inl.h
: theHashTable
class implementation files.hashFunctions.h/cpp
: Simple hash functions for integer and string keys, similar to hashes we saw in class.tests.cpp
: Unit test wrapper that will call provided tests for all three classes you will define. The tests for the classes are intestLinearDictionary.cpp
,testHashTable.cpp
, andtestScrabbleAssistant.cpp
.manualTests.cpp
: A “sandbox” test program that you can use while you develop your program.main.cpp
: a simple Scrabble Assistant application.scrabbleAssistant.h/cpp
: a C++ class encapsulating the functionality of the Scrabble assistant.
Part I: Linear Dictionary Implementation
We recommend that you implement a LinearDictionary
as a vector of key-value pairs. All of the operations should run in \( O(n) \) time.
You may be thinking: “Why would I want such an inefficient implementation of the Dictionary
ADT?” The answer is that, for very small dictionaries, simpler implementations perform much better as they have a smaller constant overhead.
We will use the LinearDictionary
to represent the chain of collisions in our HashTable
. Because we expect each chain to be relatively short, the LinearDictionary
provides a nice abstraction that will be efficient enough for our purposes.
Incrementally develop and test your linear dictionary as you implement using manualTests.cpp
. We recommend starting with your constructor, insert
, and either get
or contains
. Once you are satisfied with your dictionary’s ability to add and find items, finish implementing the remaining methods one at a time.
Unit tests have been provided for this lab; once you have completed your LinearDictionary
implementation, the appropriate unit tests should pass.
$ make tests
$ ./tests linearDictionary
Success: 11 tests passed.
Test time: 0.06 seconds.
Removing From a vector
Implementing the various methods will require you to
manipulate the underlying vector you are using to store key-value pairs.
For the most part, this is straightforward (e.g., using vector::size()
to
get the size of the LinearDictionary
, or vector::push_back()
to insert). The
exception to this is implementing LinearDictionary::remove()
, which requires removing
an element in the middle of the vector. The vector
class provides a means to remove an element from the end of
a list (vector::pop_back()
) but doesn’t give a simple interface for removing
elements e.g. from an arbitrary index. In linearDictionary.h/inl.h
we
have fully implemented a helper method,
removeFromVector()
to make this task simpler by specifying a vector. The
method takes the vector
you wish to delete from and the index for locating
the element to remove. For example:
vector<string> myVec = {"a","b","c","d"};
//myVec is now ["a","b","c","d"]
removeFromVector(myVec,2); //removes the element at index 2 -> "c"
//myVec is now ["a","b","d"]
Part II: Hash Table Implementation
Once your LinearDictionary
implementation is complete, you can begin implementing a hash table using separate chaining as we discussed in class.
Your implementation will be in the files hashTable.h
and hashTable-inl.h
. Unit tests have been provided; once your
hash table implementation is complete, these unit tests should pass.
Below are some implementation details and suggestions:
- Your hash table implementation must use separate chaining.
- You will need to represent a chain of key-value pairs to handle
collisions. There are many data structures you could use to represent a bucket, including
vector
, aList
or even a balanced BST. However, the difference in performance should be minimal because you should aim to have \( O(1) \) elements in each bucket on average. Thus, represent each chain as aLinearDictionary
(this will make your implementation much easier). - Furthermore, we recommend that you use a dynamically allocated array of
statically-allocated
LinearDictionary
objects to represent the whole hash table. Referring to the picture at right,table
is the dynamic array - which may need to expand capacity. Each chain is aLinearDictionary
object storing key-value pairs. - The
HashTable
class has two constructors. One of the constructors allows the user to specify a maximum load factor; the other should use a sensible default value (e.g. 0.75). In either case, select a reasonable default table capacity, (e.g.,capacity=5
). Recall that the capacity is the number of buckets, not the number of items in the dictionary. - All operations on your
HashTable
(other thangetKeys
andgetItems
) must run in \( O(1) \) average time. - The HashTable must maintain a reasonable load factor in order to operate efficiently. You will likely want to create some private helper methods to assist with increasing the table’s capacity when the load factor becomes too high. See below for details. (Don’t forget to document those methods according to the Coding Style Requirements below!)
Incrementally develop and test your hash table in a similar manner as LinearDictionary
e.g., write tests along the way using manualTests.cpp
Start with your constructor, insert
, and get
to test that you can insert and find items. Then complete your implementation by implementing and testing one method at a time.
Test your hash table as you implement using manualTests.cpp
and our
automated (but not necessarily complete) auto tests:
$ make tests
$ ./tests hashTable
Success: 14 tests passed.
Test time: 0.50 seconds.
Expanding Capacity
If the load factor ever exceeds the maximum load factor of your hash table, you will need to expand the table. You only need to worry about this when an item is inserted into the table; you should check to see if the added element causes the load factor to exceed the maximum threshold that was set in the constructor.
If the load factor threshold has been exceeded, you should double the capacity of the table. Doubling the capacity of the table should be implemented in the
helper method expandCapacity
.
This is conceptually similar
to expanding capacity for ArrayList
. There are a
couple of differences. One is that when to expand differs here – it
depends on the load factor, not on whether the array is full. A more
subtle difference is that it’s not enough to just copy table entries
over. Each key-value pair in the hash table needs to be rehashed,
because which bucket it goes into depends on the capacity of the table
itself! Below we itemize what tasks need to be done to properly expand
capacity.
- First, decide what variables you need. You will need to keep track of the old table (and its capacity) as well as the new table (and its capacity).
- Create a new table with double the capacity of the old table.
- Next, go through each bucket in the old table:
- For each bucket, look at each entry in that bucket’s chain.
- Find that entry’s bucket in the new table by hashing the new key (using the new table’s capacity!).
- Insert the entry into the corresponding bucket in the new table.
- Finally, delete the old table and be sure update all member variables to be consistent with the new table.
Remember, your hash table should never have load factor higher than the maximum load factor. As soon as this happens, you should expand capacity.
Part III: Scrabble Assistant
Once you have a working hash table, you can start writing your Scrabble
Assistant. We’ll design the application in two parts: a
ScrabbleAssistant
class and a separate main
program. The
main program (which we’ve written for you) provides a simple menu
interface and makes calls to the ScrabbleAssistant
, which encapsulates
all of the Scrabble Assistant features. The ScrabbleAssistant
class
should load an English-language dictionary (the main program allows
the user to specify a dictionary on the command line, but uses /usr/share/dict/words
as a default) into a hash
table and support the following functions:
spellCheck
: given a word, this function should return true if and only if the word is in the dictionary. Implementing this should be straightforward and should run in \( O(1) \) time.anagramsOf
: given a string of letters, this function returns a vector of all anagrams (i.e., permutations of these letters) which represent valid words. For example the letters “tca” will return “cat” and “act” as they contain exactly the same letters. This function should also take constant time in terms of the size of the dictionary (it is allowed to depend on the length of the string).findWords
: given a string of letters, this function returns a vector of all legal words the player could spell with the given letters. Note: This function is an optional extra challenge for this lab.
Implementing these in the given order, with thorough testing in between each
to ensure you are understanding your data structures. The latter two methods, anagramsOf
and findWords
, will require some algorithm
design (see below for tips). You’re welcome to include any data members you need for the
ScrabbleAssistant
, as well as any private helper functions you’d like.
As you develop, test your ScrabbleAssistant
using manualTests.cpp
and our
automated (but not necessarily complete) auto tests:
$ make tests
$ ./tests scrabbleAssistant
Success: 4 tests passed.
Test time: 0.88 seconds.
You should also run your main program and test to see if it matches our sample output.
Constructing Anagrams
In order to make anagramsOf
run in constant time, it will be helpful to
preprocess the word list in the ScrabbleAssistant
constructor and
maintain a table of anagrams. First, write a helper function to sort the
letters in a string. (e.g. sortString("cat")
should return “act”).
You don’t have to do this yourself; instead, use the std::sort function
provided in the algorithm library.
#include <algorithm>
string sortLetters(string s) {
sort(s.begin(), s.end());
return s;
}
Then, create a dictionary that maps each sorted string to a vector of all words that sort to that string. To do so:
- For each word in the dictionary:
- sort the letters of the word to form the key
- if the sorted letters are not already a key in the dictionary, add it with a vector containing just current word.
- if the key exists, then update the vector to include the word
For example, the search “tca”
will sort to “act”. Our dictionary should return a vector containing
all legal words with the letters “act” (i.e., anagramDictionary.get("act")
should return ["act", "cat"]
). Be
sure to write your own tests to verify that anagrams are being
constructed properly. Note: if no anagrams exist, your function
should return an empty vector (not throw a runtime_error
).
Finding All Legal Plays
If you're looking for an extra challenge in this lab, try to implement a function that, given a string of letters, returns a vector of all legal words that can be spelled from these letters. This feature is optional; you do not need to write this function to complete this lab.
This anagram dictionary will quickly provide you all possible legal words
that use the whole set of letters in the search. In Scrabble, however,
you are not required to play all letters in your hand. To account for
this, you’ll also need to evaluate all legal plays using fewer
letters as well in the function findWords
. Implementing findWords
will require some clever design.
One way to implement findWords
is to break down the task into the
following subtasks:
- Find the power set of the given string
- Get all anagrams of each string in the power set (using
anagramsOf
) - Accumulate all the anagrams in a single vector
- Remove Duplicates
-
Compute the power set: given a string, calculate the set of all subsets of the given characters. For example, “abcd” should return all (set) combinations of size 4 (“abcd”), size 3 (“abc”, “abd”, “acd”, “bcd”), size 2 (“ab”, “ac”, “ad”, “bc”, “bd”,” cd”), and size 1 (“a”, “b”, “c”, “d”). We have provided this for you in the function
stringPowerSet
. -
Eliminating Duplicate Strings: If you use the
powerSet
function, you will likely find at some point that you have a vector of words that contain several duplicates. It might be helpful to create aremoveDuplicates
function that eliminates any extra copies of a word. There are several methods of accomplishing this:- Use a double-for loop that compares all pairs of words in the vector, looking for pairs that are the same. This will run in \( O(n^2) \) time.
- First, sort the vector of words. Then, make a linear scan over the list, eliminating adjacent duplicates. This will take \( O(n \log n) \) time using an efficient sorting algorithm.
- Create a hash table with words as keys. Then, insert each word in the hash table assuming it isn’t already there. This should take \( O(1) \) time per word, or \( O(n) \) overall time.
Once the above code is complete, your ScrabbleAssistant
will
be complete!
Memory Management
For this lab, your program is required to run without memory errors or leaks. You should use valgrind
as you proceed in your testing to track memory errors. When you have a complete first draft of your implementation:
- Commit and push what you have (in case something goes wrong).
- Run
valgrind ./tests all
to make sure that the test program does not cause memory errors. - Once memory errors are cleared, run
valgrind --leak-check=full ./tests
to identify and correct memory leaks.
Coding Style Requirements
You are required to observe good coding practices. You have seen these requirements many times, and should know them by now. If you’re unsure, check previous labs where the style requirements were stated explicitly.
Summary of Requirements
When you are finished, you should have
- A completed implementation of
LinearDictionary
and the helper methods - A completed implementation of
HashTable
and the helper methods - A completed implementation of
ScrabbleAssistant
- Appropriate tests for these classes
- No memory errors
- No memory leaks
- Code which conforms to the style requirements