The main goals for this lab are to gain experience working with hash tables and to evaluate real-world data structure performance by running experiments. To that end, you will:
As with most assignments for the rest of the semester, you will be working with a partner. Both partners should be present and working on the code together. You will both be responsible for understanding all concepts, so dividing and conquering is not an option. The academic integrity policy applies to the entire pair; you cannot work or share code with anyone outside your partner.
You and your lab partner will share the same git repository,
which is named lab08-<user1>-<user2>.git
.
Please access your lab by cloning the appropriate git
repository, e.g.:
$ git clone git@github.swarthmore.edu:CS35-S18/lab08-lmeeden1-mgagne1.gitYou can also get the ssh git url from CS35 github org.
Below is an overview of the files required for submitting this lab.
Those highlighted in blue
will require implementation
on your part. Those highlighted in black
are
complete and should not be modified except for comments.
Dictionary.h
.
Makefile
: the build instructions for this
lab.hashTable.h, hashTable-inl.h
: the hash table
implementation files.hashFunctions.*
: Simple hash functions for
integer and string keys, similar to hashes we saw in class. If
you decide to introduce new hash functions
, add
them here.tests.cpp
: Unit tests for your hash table
implementation.main.cpp
: a simple Scrabble Assistant
application.scrabbleAssistant.*
: a C++ class encapsulating
the functionality of the Scrabble assistant.WrittenLab.tex
The $\LaTeX$ source for your
experimental analysis portion of the lab.experiment.cpp
A program that runs
experiments comparing performance of different dictionary
implementations (hash table, binary search tree). We've coded the
experiment framework for you. You just need to implement a single
function in:experimentUtil.*
This file contains a single
function called getExperimentKeys(int num_keys). The
experiment framework calls this function to get the keys to put
into the dictionaries during testing.
experimentClasses.*, experimentHashes.*
:
special C++ classes (and hash functions for those classes) you'll
use to run your experiments.manualTests.cpp
: A "sandbox" test program that
you can use while you develop your program.hashTable.h
and hashTable-inl.h
.
Unit tests
have been provided; once your hash
table implementation is complete, these unit tests should pass.
Below are some implementation details and suggestions:
pop_back
) but doesn't give a clear interface for
removing elements from the middle. We suggest two possible approaches.
In the first approach, you can use pop_back
, by swapping the item at the end of the vector with the item at index i
that you want to remove. Then simply call pop_back
to remove the item that is now at the end.
The second approach uses some advanced features of the STL vector class. Vectors to provide a general method called erase which can delete entire contiguous regions:
vectorThe above function works much like a Python slice: the start index is included, but the end index is not, e.g., erasing with start/end indices (1,3) would remove themyVector = ...; myVector.erase(myVector.begin()+start_index, myVector.begin()+end_index);
vectorFor more information, consult the vector documentation in cplusplus.com.myVector = ...; // whatever defines/initializes the vector int index = ...; // the index of the element you want to remove myVector.erase(myVector.begin() + index, myVector.begin() + index+1);
Once you have a working hash table, you can start writing your
Scrabble Assistant. We'll design the application in two parts:
a ScrabbleAssistant
class and a
separate main
program. The main program (which
we've written for you) provides a simple menu interface and makes
calls to the ScrabbleAssistant, which encapsulates all of the
Scrabble Assistant features. The ScrabbleAssistant class should
load an English-language dictionary (allow the user to specify a
dictionary, but use /usr/share/dict/words as a default) into
a hash table and support the following functions:
When we say that anagramsOf should take constant time, we mean that its running time should not depend on the size of the English-language dictionary you are using. The running time is however allowed to depend on the length of the string.
Implementing anagramsOf and findWords will require some algorithm design. You're welcome to include any data members you need for the ScrabbleAssistant, as well as any private helper functions you'd like.
In order to make anagramsOf run in constant time, it will be helpful to preprocess the word list in the ScrabbleAssistant constructor and maintain a table of anagrams. First, write a helper function to sort the letters in a string. (e.g. sortString("cat") should return "act"). You don't have to do this yourself; instead, use the std::sort function provided in the algorithm library.
#include <algorithm> string sortLetters(string s) { sort(s.begin(), s.end()); return s; }Then, create a dictionary that maps each sorted string to a vector of
anagramsOf should provide all anagrams of a given set of letters, but won't be enough to provide all legal plays given a set of letters. To do this, you'll also need to evaluate all legal plays using fewer letters as well. Implementing findWords will require some clever design. One way to implement findWords is to break down the task into the following subtasks:
vector<string> stringPowerSet(word): if word length is 1: /* return a vector containing the empty string "" and the word */ return ["", word] firstChar = word[0] /* substring is original word excluding the first char */ substring = word.substr(1) smallerSet = stringPowerSet(substring) result = [] // an empty vector of strings for each word in smallerSet: add word to result add (firstChar+word) to result return result
ScrabbleAssistant
will be complete!
insert
and get
operation.
There are two classes used in this experiment, both defined
in experimentClasses
.
The StringIntGoodHash and StringIntBadHash classes
both contain a string and an int data member. These classes are
identical except for the hash functions, given
in experimentHashes
. It is your responsability
to understand how these hash functions differ and exploit these
differences in the experiment.
experiment.cpp
. Your only implementation task
is to implement getExperimentKeys
function
in experimentUtil.cpp
. This function is used by the
experiment to determine which keys to add to the dictionary. Your
task is to create a set of keys tailor-made to cause several
collisions for StringIntBadHash
. Note that every
key returned by getExperimentKeys
must be distinct;
the experiment will not run if you create duplicate keys.
Once you have written your getExperimentKeys
function, use the experiment program to produce a graph
much as you did for the mystery functions program
from Lab03. This program does not require
any arguments, so you can run it as e.g.:
$ ./experiment | gnuplotThis will both save a file called experiment.png as well as open a new window showing you the results of your experiment. If you successfully exploited the bad hash function, the runtime of the bad hash experiment should be comparitively high.
After completing your experiment, create a
Your program is required to run without memory errors; run valgrind to eliminate them. For this lab, memory leaks are not a priority. Small memory leaks are acceptable and will not significantly affect your grade. While "lost" memory should not occur in large amounts (e.g. millions of bytes), we encourage you to not spend hours and hours tracking down every last memory leak.
When you have completed your lab, you should have
Once you are satisfied with your code, hand it in using git. Remember the add, commit, push development cycle. You can push as many times as you like, but only the most recent submission will be graded. You may want to run git status to confirm all modifications have been pushed.