CS35: Data Structures and Algorithms

Lab 9: Hash Tables, Scrabble Assistant

Due on Wednesday, April 22 at 11:59 PM.

Content

The main goals for this lab are to gain experience working with hash tables and to evaluate real-world data structure performance by running experiments. To that end, you will:

  1. Implement two versions of Dictionary:
    • A LinearDictionary for a simpler dictionary implementation for small sets of elements.
    • A HashTable implementation that uses separate chaining (each chain will itself be a LinearDictionary).
  2. Apply your hash table implementation to create a simple program to assist Scrabble players identify all anagrams of a string of letters that are valid words.
Optional Extra Challenge: Implement an additional function that takes a string of letters and returns all legal words the player could spell with these letters.

Starting Code

Your starting code can be found in lab09-[username] repo. The following is a description of the repository’s initial contents; bolded files are those which you must change in completing the lab.

Part I: Linear Dictionary Implementation

We recommend that you implement a LinearDictionary as a vector of key-value pairs. All of the operations should run in \( O(n) \) time. You may be thinking: “Why would I want such an inefficient implementation of the Dictionary ADT?” The answer is that, for very small dictionaries, simpler implementations perform much better as they have a smaller constant overhead.

We will use the LinearDictionary to represent the chain of collisions in our HashTable. Because we expect each chain to be relatively short, the LinearDictionary provides a nice abstraction that will be efficient enough for our purposes.

Incrementally develop and test your linear dictionary as you implement using manualTests.cpp. We recommend starting with your constructor, insert, and either get or contains. Once you are satisfied with your dictionary’s ability to add and find items, finish implementing the remaining methods one at a time.

Unit tests have been provided for this lab; once you have completed your LinearDictionary implementation, the appropriate unit tests should pass.

$ make tests
$ ./tests linearDictionary
Success: 11 tests passed.
Test time: 0.06 seconds.

Removing From a vector

Implementing the various methods will require you to manipulate the underlying vector you are using to store key-value pairs. For the most part, this is straightforward (e.g., using vector::size() to get the size of the LinearDictionary, or vector::push_back() to insert). The exception to this is implementing LinearDictionary::remove(), which requires removing an element in the middle of the vector. The vector class provides a means to remove an element from the end of a list (vector::pop_back()) but doesn’t give a simple interface for removing elements e.g. from an arbitrary index. In linearDictionary.h/inl.h we have fully implemented a helper method, removeFromVector() to make this task simpler by specifying a vector. The method takes the vector you wish to delete from and the index for locating the element to remove. For example:

vector<string> myVec = {"a","b","c","d"};
//myVec is now ["a","b","c","d"]
removeFromVector(myVec,2); //removes the element at index 2 -> "c"
//myVec is now ["a","b","d"]

Part II: Hash Table Implementation

Hash Table

Once your LinearDictionary implementation is complete, you can begin implementing a hash table using separate chaining as we discussed in class. Your implementation will be in the files hashTable.h and hashTable-inl.h. Unit tests have been provided; once your hash table implementation is complete, these unit tests should pass. Below are some implementation details and suggestions:

Incrementally develop and test your hash table in a similar manner as LinearDictionary e.g., write tests along the way using manualTests.cpp Start with your constructor, insert, and get to test that you can insert and find items. Then complete your implementation by implementing and testing one method at a time.

Test your hash table as you implement using manualTests.cpp and our automated (but not necessarily complete) auto tests:

$ make tests
$ ./tests hashTable
Success: 14 tests passed.
Test time: 0.50 seconds.

Expanding Capacity

If the load factor ever exceeds the maximum load factor of your hash table, you will need to expand the table. You only need to worry about this when an item is inserted into the table; you should check to see if the added element causes the load factor to exceed the maximum threshold that was set in the constructor.

If the load factor threshold has been exceeded, you should double the capacity of the table. Doubling the capacity of the table should be implemented in the helper method expandCapacity. This is conceptually similar to expanding capacity for ArrayList. There are a couple of differences. One is that when to expand differs here – it depends on the load factor, not on whether the array is full. A more subtle difference is that it’s not enough to just copy table entries over. Each key-value pair in the hash table needs to be rehashed, because which bucket it goes into depends on the capacity of the table itself! Below we itemize what tasks need to be done to properly expand capacity.

Remember, your hash table should never have load factor higher than the maximum load factor. As soon as this happens, you should expand capacity.

Part III: Scrabble Assistant

Once you have a working hash table, you can start writing your Scrabble Assistant. We’ll design the application in two parts: a ScrabbleAssistant class and a separate main program. The main program (which we’ve written for you) provides a simple menu interface and makes calls to the ScrabbleAssistant, which encapsulates all of the Scrabble Assistant features. The ScrabbleAssistant class should load an English-language dictionary (the main program allows the user to specify a dictionary on the command line, but uses /usr/share/dict/words as a default) into a hash table and support the following functions:

Implementing these in the given order, with thorough testing in between each to ensure you are understanding your data structures. The latter two methods, anagramsOf and findWords, will require some algorithm design (see below for tips). You’re welcome to include any data members you need for the ScrabbleAssistant, as well as any private helper functions you’d like.

As you develop, test your ScrabbleAssistant using manualTests.cpp and our automated (but not necessarily complete) auto tests:

$ make tests
$ ./tests scrabbleAssistant
Success: 4 tests passed.
Test time: 0.88 seconds.

You should also run your main program and test to see if it matches our sample output.

Constructing Anagrams

In order to make anagramsOf run in constant time, it will be helpful to preprocess the word list in the ScrabbleAssistant constructor and maintain a table of anagrams. First, write a helper function to sort the letters in a string. (e.g. sortString("cat") should return “act”). You don’t have to do this yourself; instead, use the std::sort function provided in the algorithm library.

#include <algorithm>
string sortLetters(string s) {
  sort(s.begin(), s.end());
  return s;
}

Then, create a dictionary that maps each sorted string to a vector of all words that sort to that string. To do so:

For example, the search “tca” will sort to “act”. Our dictionary should return a vector containing all legal words with the letters “act” (i.e., anagramDictionary.get("act") should return ["act", "cat"]). Be sure to write your own tests to verify that anagrams are being constructed properly. Note: if no anagrams exist, your function should return an empty vector (not throw a runtime_error).

If you're looking for an extra challenge in this lab, try to implement a function that, given a string of letters, returns a vector of all legal words that can be spelled from these letters. This feature is optional; you do not need to write this function to complete this lab.

This anagram dictionary will quickly provide you all possible legal words that use the whole set of letters in the search. In Scrabble, however, you are not required to play all letters in your hand. To account for this, you’ll also need to evaluate all legal plays using fewer letters as well in the function findWords. Implementing findWords will require some clever design. One way to implement findWords is to break down the task into the following subtasks:

  1. Find the power set of the given string
  2. Get all anagrams of each string in the power set (using anagramsOf)
  3. Accumulate all the anagrams in a single vector
  4. Remove Duplicates

Once the above code is complete, your ScrabbleAssistant will be complete!

Memory Management

For this lab, your program is required to run without memory errors or leaks. You should use valgrind as you proceed in your testing to track memory errors. When you have a complete first draft of your implementation:

Coding Style Requirements

You are required to observe good coding practices. You have seen these requirements many times, and should know them by now. If you’re unsure, check previous labs where the style requirements were stated explicitly.

Summary of Requirements

When you are finished, you should have