Lab 9: Hash Tables, Scrabble Assistant

Due on Wednesday, April 22 at 11:59 PM.

Content
Starting Code
Part I: Linear Dictionary Implementation
- Removing From a vector
Part II: Hash Table Implementation
- Expanding Capacity
Part III: Scrabble Assistant
- Constructing Anagrams
- Finding All Legal Plays
Memory Management
Coding Style Requirements
Summary of Requirements

Content

The main goals for this lab are to gain experience working with hash tables and to evaluate real-world data structure performance by running experiments. To that end, you will:

Implement two versions of Dictionary:
- A LinearDictionary for a simpler dictionary implementation for small sets of elements.
- A HashTable implementation that uses separate chaining (each chain will itself be a LinearDictionary).
Apply your hash table implementation to create a simple program to assist Scrabble players identify all anagrams of a string of letters that are valid words.

Optional Extra Challenge: Implement an additional function that takes a string of letters and returns all legal words the player could spell with these letters.

Starting Code

Your starting code can be found in lab09-[username] repo. The following is a description of the repository’s initial contents; bolded files are those which you must change in completing the lab.

adts/: As with your last lab, the ADT interface declarations are stored here.
Makefile: the build instructions for this lab.
linearDictionary.h/-inl.h: The LinearDictionary class you will be implementing to store collisions within your HashTable.
hashTable.h/-inl.h: the HashTable class implementation files.
hashFunctions.h/cpp: Simple hash functions for integer and string keys, similar to hashes we saw in class.
tests.cpp: Unit test wrapper that will call provided tests for all three classes you will define. The tests for the classes are in testLinearDictionary.cpp, testHashTable.cpp, and testScrabbleAssistant.cpp.
manualTests.cpp: A “sandbox” test program that you can use while you develop your program.
main.cpp: a simple Scrabble Assistant application.
scrabbleAssistant.h/cpp: a C++ class encapsulating the functionality of the Scrabble assistant.

Part I: Linear Dictionary Implementation

We recommend that you implement a LinearDictionary as a vector of key-value pairs. All of the operations should run in \( O(n) \) time. You may be thinking: “Why would I want such an inefficient implementation of the Dictionary ADT?” The answer is that, for very small dictionaries, simpler implementations perform much better as they have a smaller constant overhead.

We will use the LinearDictionary to represent the chain of collisions in our HashTable. Because we expect each chain to be relatively short, the LinearDictionary provides a nice abstraction that will be efficient enough for our purposes.

Incrementally develop and test your linear dictionary as you implement using manualTests.cpp. We recommend starting with your constructor, insert, and either get or contains. Once you are satisfied with your dictionary’s ability to add and find items, finish implementing the remaining methods one at a time.

Unit tests have been provided for this lab; once you have completed your LinearDictionary implementation, the appropriate unit tests should pass.

$ make tests
$ ./tests linearDictionary
Success: 11 tests passed.
Test time: 0.06 seconds.

Removing From a vector

Implementing the various methods will require you to manipulate the underlying vector you are using to store key-value pairs. For the most part, this is straightforward (e.g., using vector::size() to get the size of the LinearDictionary, or vector::push_back() to insert). The exception to this is implementing LinearDictionary::remove(), which requires removing an element in the middle of the vector. The vector class provides a means to remove an element from the end of a list (vector::pop_back()) but doesn’t give a simple interface for removing elements e.g. from an arbitrary index. In linearDictionary.h/inl.h we have fully implemented a helper method, removeFromVector() to make this task simpler by specifying a vector. The method takes the vector you wish to delete from and the index for locating the element to remove. For example:

vector<string> myVec = {"a","b","c","d"};
//myVec is now ["a","b","c","d"]
removeFromVector(myVec,2); //removes the element at index 2 -> "c"
//myVec is now ["a","b","d"]

Part II: Hash Table Implementation

Once your LinearDictionary implementation is complete, you can begin implementing a hash table using separate chaining as we discussed in class. Your implementation will be in the files hashTable.h and hashTable-inl.h. Unit tests have been provided; once your hash table implementation is complete, these unit tests should pass. Below are some implementation details and suggestions:

Your hash table implementation must use separate chaining.
You will need to represent a chain of key-value pairs to handle collisions. There are many data structures you could use to represent a bucket, including vector, a List or even a balanced BST. However, the difference in performance should be minimal because you should aim to have \( O(1) \) elements in each bucket on average. Thus, represent each chain as a LinearDictionary (this will make your implementation much easier).
Furthermore, we recommend that you use a dynamically allocated array of statically-allocated LinearDictionary objects to represent the whole hash table. Referring to the picture at right, table is the dynamic array - which may need to expand capacity. Each chain is a LinearDictionary object storing key-value pairs.
The HashTable class has two constructors. One of the constructors allows the user to specify a maximum load factor; the other should use a sensible default value (e.g. 0.75). In either case, select a reasonable default table capacity, (e.g., capacity=5). Recall that the capacity is the number of buckets, not the number of items in the dictionary.
All operations on your HashTable (other than getKeys and getItems) must run in \( O(1) \) average time.
The HashTable must maintain a reasonable load factor in order to operate efficiently. You will likely want to create some private helper methods to assist with increasing the table’s capacity when the load factor becomes too high. See below for details. (Don’t forget to document those methods according to the Coding Style Requirements below!)

Incrementally develop and test your hash table in a similar manner as LinearDictionary e.g., write tests along the way using manualTests.cpp Start with your constructor, insert, and get to test that you can insert and find items. Then complete your implementation by implementing and testing one method at a time.

Test your hash table as you implement using manualTests.cpp and our automated (but not necessarily complete) auto tests:

$ make tests
$ ./tests hashTable
Success: 14 tests passed.
Test time: 0.50 seconds.

Expanding Capacity

If the load factor ever exceeds the maximum load factor of your hash table, you will need to expand the table. You only need to worry about this when an item is inserted into the table; you should check to see if the added element causes the load factor to exceed the maximum threshold that was set in the constructor.

If the load factor threshold has been exceeded, you should double the capacity of the table. Doubling the capacity of the table should be implemented in the helper method expandCapacity. This is conceptually similar to expanding capacity for ArrayList. There are a couple of differences. One is that when to expand differs here – it depends on the load factor, not on whether the array is full. A more subtle difference is that it’s not enough to just copy table entries over. Each key-value pair in the hash table needs to be rehashed, because which bucket it goes into depends on the capacity of the table itself! Below we itemize what tasks need to be done to properly expand capacity.

First, decide what variables you need. You will need to keep track of the old table (and its capacity) as well as the new table (and its capacity).
Create a new table with double the capacity of the old table.
Next, go through each bucket in the old table:
- For each bucket, look at each entry in that bucket’s chain.
- Find that entry’s bucket in the new table by hashing the new key (using the new table’s capacity!).
- Insert the entry into the corresponding bucket in the new table.
Finally, delete the old table and be sure update all member variables to be consistent with the new table.

Remember, your hash table should never have load factor higher than the maximum load factor. As soon as this happens, you should expand capacity.

Part III: Scrabble Assistant

Once you have a working hash table, you can start writing your Scrabble Assistant. We’ll design the application in two parts: a ScrabbleAssistant class and a separate main program. The main program (which we’ve written for you) provides a simple menu interface and makes calls to the ScrabbleAssistant, which encapsulates all of the Scrabble Assistant features. The ScrabbleAssistant class should load an English-language dictionary (the main program allows the user to specify a dictionary on the command line, but uses /usr/share/dict/words as a default) into a hash table and support the following functions:

spellCheck: given a word, this function should return true if and only if the word is in the dictionary. Implementing this should be straightforward and should run in \( O(1) \) time.
anagramsOf: given a string of letters, this function returns a vector of all anagrams (i.e., permutations of these letters) which represent valid words. For example the letters “tca” will return “cat” and “act” as they contain exactly the same letters. This function should also take constant time in terms of the size of the dictionary (it is allowed to depend on the length of the string).
findWords: given a string of letters, this function returns a vector of all legal words the player could spell with the given letters. Note: This function is an optional extra challenge for this lab.

Implementing these in the given order, with thorough testing in between each to ensure you are understanding your data structures. The latter two methods, anagramsOf and findWords, will require some algorithm design (see below for tips). You’re welcome to include any data members you need for the ScrabbleAssistant, as well as any private helper functions you’d like.

As you develop, test your ScrabbleAssistant using manualTests.cpp and our automated (but not necessarily complete) auto tests:

$ make tests
$ ./tests scrabbleAssistant
Success: 4 tests passed.
Test time: 0.88 seconds.

You should also run your main program and test to see if it matches our sample output.

Constructing Anagrams

In order to make anagramsOf run in constant time, it will be helpful to preprocess the word list in the ScrabbleAssistant constructor and maintain a table of anagrams. First, write a helper function to sort the letters in a string. (e.g. sortString("cat") should return “act”). You don’t have to do this yourself; instead, use the std::sort function provided in the algorithm library.

#include <algorithm>
string sortLetters(string s) {
  sort(s.begin(), s.end());
  return s;
}

Then, create a dictionary that maps each sorted string to a vector of all words that sort to that string. To do so:

For each word in the dictionary:
- sort the letters of the word to form the key
- if the sorted letters are not already a key in the dictionary, add it with a vector containing just current word.
- if the key exists, then update the vector to include the word

For example, the search “tca” will sort to “act”. Our dictionary should return a vector containing all legal words with the letters “act” (i.e., anagramDictionary.get("act") should return ["act", "cat"]). Be sure to write your own tests to verify that anagrams are being constructed properly. Note: if no anagrams exist, your function should return an empty vector (not throw a runtime_error).

Finding All Legal Plays

If you're looking for an extra challenge in this lab, try to implement a function that, given a string of letters, returns a vector of all legal words that can be spelled from these letters. This feature is optional; you do not need to write this function to complete this lab.

This anagram dictionary will quickly provide you all possible legal words that use the whole set of letters in the search. In Scrabble, however, you are not required to play all letters in your hand. To account for this, you’ll also need to evaluate all legal plays using fewer letters as well in the function findWords. Implementing findWords will require some clever design. One way to implement findWords is to break down the task into the following subtasks:

Find the power set of the given string
Get all anagrams of each string in the power set (using anagramsOf)
Accumulate all the anagrams in a single vector
Remove Duplicates

Compute the power set: given a string, calculate the set of all subsets of the given characters. For example, “abcd” should return all (set) combinations of size 4 (“abcd”), size 3 (“abc”, “abd”, “acd”, “bcd”), size 2 (“ab”, “ac”, “ad”, “bc”, “bd”,” cd”), and size 1 (“a”, “b”, “c”, “d”). We have provided this for you in the function stringPowerSet.
Eliminating Duplicate Strings: If you use the powerSet function, you will likely find at some point that you have a vector of words that contain several duplicates. It might be helpful to create a removeDuplicates function that eliminates any extra copies of a word. There are several methods of accomplishing this:
1. Use a double-for loop that compares all pairs of words in the vector, looking for pairs that are the same. This will run in \( O(n^2) \) time.
2. First, sort the vector of words. Then, make a linear scan over the list, eliminating adjacent duplicates. This will take \( O(n \log n) \) time using an efficient sorting algorithm.
3. Create a hash table with words as keys. Then, insert each word in the hash table assuming it isn’t already there. This should take \( O(1) \) time per word, or \( O(n) \) overall time.

Once the above code is complete, your ScrabbleAssistant will be complete!

Memory Management

For this lab, your program is required to run without memory errors or leaks. You should use valgrind as you proceed in your testing to track memory errors. When you have a complete first draft of your implementation:

Commit and push what you have (in case something goes wrong).
Run valgrind ./tests all to make sure that the test program does not cause memory errors.
Once memory errors are cleared, run valgrind --leak-check=full ./tests to identify and correct memory leaks.

Coding Style Requirements

You are required to observe good coding practices. You have seen these requirements many times, and should know them by now. If you’re unsure, check previous labs where the style requirements were stated explicitly.

Summary of Requirements

When you are finished, you should have

A completed implementation of LinearDictionary and the helper methods
A completed implementation of HashTable and the helper methods
A completed implementation of ScrabbleAssistant
Appropriate tests for these classes
No memory errors
No memory leaks
Code which conforms to the style requirements