git@github.swarthmore.edu:cs35-s22/lab08-<your-teamname>
Due on Wednesday, April 13th (by the end of the day, U.S. Eastern Time).
The main goal for this lab is to gain experience working with hash tables. To that end, you will:
Implement two new versions of the Dictionary
ADT:
A LinearDictionary
for a simpler dictionary implementation for small sets of elements.
A HashTable
implementation that uses separate chaining (each chain
will itself be a LinearDictionary
).
Apply your hash table implementation to create a simple program that counts words.
As usual, you can select your partner using Teammaker.
The URL for your repository will be:
git@github.swarthmore.edu:cs35-s22/lab08-<your-teamname>
Your starting code can be found in the appropriate repository for your team. The following is a description of the repository’s initial contents; bolded files are those which you must change in completing the lab.
adts/
: As with your last lab, the ADT interface declarations are stored here.
Makefile
: the build instructions for this lab.
linearDictionary.h/-inl.h
: The LinearDictionary
class you will be implementing to store collisions within your HashTable
.
hashTable.h/-inl.h
: the HashTable
class implementation files.
hashFunctions.h/cpp
: Simple hash functions for integer and string
keys, similar to hashes we saw in class.
tests.cpp
: Unit test wrapper that will call provided tests for
all three classes you will define. The tests for the classes are in testLinearDictionary.cpp
and testHashTable.cpp
.
manualTests.cpp
: A "sandbox" test program that you can
use while you develop your program.
main.cpp
: a simple program that counts words
We recommend that you implement a LinearDictionary
as a vector of key-value pairs. All of the operations should run in \(O(n)\) time. You may be thinking: “Why would I want such an inefficient implementation of the Dictionary
ADT?” The answer is that, for very small dictionaries, simpler implementations perform much better as they have a smaller constant overhead.
We will use the LinearDictionary
to represent the chain of collisions in our HashTable
. Because we expect each chain to be relatively short, the LinearDictionary
provides a nice abstraction that will be efficient enough for our purposes.
Incrementally develop and test your linear dictionary as you implement using manualTests.cpp
. We recommend starting with your constructor, insert
, and either get
or contains
. Once you are satisfied with your dictionary’s ability to add and find items, finish implementing the remaining methods one at a time.
Unit tests have been provided for this lab; once you have completed your LinearDictionary
implementation, the appropriate unit tests should pass.
$ make tests
$ ./tests linearDictionary
Success: 11 tests passed.
Test time: 0.06 seconds.
Implementing the various methods will require you to manipulate the underlying vector you are using to store key-value pairs.
For the most part, this is straightforward (e.g., using vector::size()
to
get the size of the LinearDictionary
, or vector::push_back()
to insert). The
exception to this is implementing LinearDictionary::remove()
, which requires removing
an element in the middle of the vector. The vector
class provides a means to remove an element from the end of
a list (vector::pop_back()
) but doesn’t give a simple interface for removing
elements e.g. from an arbitrary index. In linearDictionary.h/inl.h
we
have fully implemented a helper method,
removeFromVector()
to make this task simpler by specifying a vector. The
method takes the vector
you wish to delete from and the index for locating
the element to remove. For example:
vector<string> myVec = {"a","b","c","d"};
//myVec is now ["a","b","c","d"]
removeFromVector(myVec,2); //removes the element at index 2 -> "c"
//myVec is now ["a","b","d"]
It may be useful to you in this lab to concatenate one vector to another.
Here’s an example of how to do this. The vector a
contains the integers 10 and 20.
The vector b
contains the integers 30 and 40. You can insert all the elements of b
to the end of vector a
as shown below:
vector<int> a = {10, 20};
vector<int> b = {30, 40};
// concatenate b onto the end of a
a.insert(a.end(), b.begin(), b.end());
Once your LinearDictionary
implementation is complete, you can begin
implementing a chaining hash table as we discussed in class. The
figure at right shows an example of a chaining hash table with size 3,
capacity 4, and storing integer keys with string values.
Your implementation will be in the files hashTable.h
and
hashTable-inl.h
. Here are some important details about this class:
We recommend that you use a dynamically allocated array of statically-allocated LinearDictionary
objects to represent the hash table.
The HashTable
class has two constructors. One of the constructors allows the user to specify a maximum load factor; the other should use a sensible default value (e.g. 0.75). In either case, select a reasonable default table capacity, (e.g., capacity=5
).
All operations on your HashTable (other than getKeys
and getItems
) must run in \(O(1)\) average time.
The HashTable must maintain a reasonable load factor in order to operate efficiently. You will create a private helper method called expandCapacity
to assist with increasing the table’s capacity when the load factor becomes too high.
Note that hashing functions have been provided for you in hashFunctions.h
and hashFunctions.cpp
for keys of type int
and string
.
Incrementally develop and test your hash table in a similar manner as LinearDictionary
e.g., write tests along the way using manualTests.cpp
Start with your constructor, insert
, and get
to test that you can insert and find items. Then complete your implementation by implementing and testing one method at a time.
Once you are able to pass all of your own tests in manualTests.cpp
then try our unit tests:
$ make tests
$ ./tests hashTable
Success: 14 tests passed.
Test time: 0.50 seconds.
Once you have a working hash table, you can start writing your Main Program. This is meant to be a simple program that will use the HashTable data structure that you just implemented.
For this program, you will fill in the main method of the main.cpp
file.
Within the main method, you will create a HashTable with keys of type string
and values of type int
.
You will then read all of the words within the file wordlist.txt
and use your HashTable to count how many times each word appears.
Note that only one word appears per line within the wordlist.txt
.
Finally, you will print every item within your HashTable. In other words, you will print each word with its associated count.
For example, if wordlist.txt
contained the following:
the
more
you
learn
the
more
you
know
Then, your main program would print:
the,2
more,2
you,2
learn,1
know,1
Once you finish your main program, you can compile and run it using the following commands:
$ make
$ ./main
For this lab, your program is required to run without memory errors or leaks. Because we are using large data structures in this lab, valgrind will run very slowly. We recommend that you run valgrind on each section of the lab separately as shown below. However, each test may still run quite slowly, so be patient.
Test the LinearDictionary
: valgrind ./tests linearDictionary
Test the HashTable
: valgrind ./tests hashTable
Test the Main Program
: valgrind ./main
You are required to observe good coding practices. You have seen these requirements many times, and should know them by now. If you’re unsure, check previous labs where the style requirements were stated explicitly.
When you are finished, you should have
A completed implementation of LinearDictionary
and the helper methods
A completed implementation of HashTable
and the helper methods
A completed implementation of Main Program
No memory errors
No memory leaks
Code which conforms to the style requirements