CPSC 035: Lab 8: HashTables

Due on Wednesday, April 13th (by the end of the day, U.S. Eastern Time).

Overview

The main goal for this lab is to gain experience working with hash tables. To that end, you will:

  1. Implement two new versions of the Dictionary ADT:

    • A LinearDictionary for a simpler dictionary implementation for small sets of elements.

    • A HashTable implementation that uses separate chaining (each chain will itself be a LinearDictionary).

  2. Apply your hash table implementation to create a simple program that counts words.

As usual, you can select your partner using Teammaker.

The URL for your repository will be:

git@github.swarthmore.edu:cs35-s22/lab08-<your-teamname>

Starting Code

Your starting code can be found in the appropriate repository for your team. The following is a description of the repository’s initial contents; bolded files are those which you must change in completing the lab.

  • adts/: As with your last lab, the ADT interface declarations are stored here.

  • Makefile: the build instructions for this lab.

  • linearDictionary.h/-inl.h: The LinearDictionary class you will be implementing to store collisions within your HashTable.

  • hashTable.h/-inl.h: the HashTable class implementation files.

  • hashFunctions.h/cpp: Simple hash functions for integer and string keys, similar to hashes we saw in class.

  • tests.cpp: Unit test wrapper that will call provided tests for all three classes you will define. The tests for the classes are in testLinearDictionary.cpp and testHashTable.cpp.

  • manualTests.cpp: A "sandbox" test program that you can use while you develop your program.

  • main.cpp: a simple program that counts words

Part I: Linear Dictionary Implementation

We recommend that you implement a LinearDictionary as a vector of key-value pairs. All of the operations should run in \(O(n)\) time. You may be thinking: “Why would I want such an inefficient implementation of the Dictionary ADT?” The answer is that, for very small dictionaries, simpler implementations perform much better as they have a smaller constant overhead.

We will use the LinearDictionary to represent the chain of collisions in our HashTable. Because we expect each chain to be relatively short, the LinearDictionary provides a nice abstraction that will be efficient enough for our purposes.

Incrementally develop and test your linear dictionary as you implement using manualTests.cpp. We recommend starting with your constructor, insert, and either get or contains. Once you are satisfied with your dictionary’s ability to add and find items, finish implementing the remaining methods one at a time.

Unit tests have been provided for this lab; once you have completed your LinearDictionary implementation, the appropriate unit tests should pass.

$ make tests
$ ./tests linearDictionary
Success: 11 tests passed.
Test time: 0.06 seconds.

Removing From a vector

Implementing the various methods will require you to manipulate the underlying vector you are using to store key-value pairs.

For the most part, this is straightforward (e.g., using vector::size() to get the size of the LinearDictionary, or vector::push_back() to insert). The exception to this is implementing LinearDictionary::remove(), which requires removing an element in the middle of the vector. The vector class provides a means to remove an element from the end of a list (vector::pop_back()) but doesn’t give a simple interface for removing elements e.g. from an arbitrary index. In linearDictionary.h/inl.h we have fully implemented a helper method, removeFromVector() to make this task simpler by specifying a vector. The method takes the vector you wish to delete from and the index for locating the element to remove. For example:

vector<string> myVec = {"a","b","c","d"};
//myVec is now ["a","b","c","d"]
removeFromVector(myVec,2); //removes the element at index 2 -> "c"
//myVec is now ["a","b","d"]

Concatenating vectors

It may be useful to you in this lab to concatenate one vector to another. Here’s an example of how to do this. The vector a contains the integers 10 and 20. The vector b contains the integers 30 and 40. You can insert all the elements of b to the end of vector a as shown below:

vector<int> a = {10, 20};
vector<int> b = {30, 40};
// concatenate b onto the end of a
a.insert(a.end(), b.begin(), b.end());

Part II: Hash Table Implementation

Hash Table

Once your LinearDictionary implementation is complete, you can begin implementing a chaining hash table as we discussed in class. The figure at right shows an example of a chaining hash table with size 3, capacity 4, and storing integer keys with string values.

Your implementation will be in the files hashTable.h and hashTable-inl.h. Here are some important details about this class:

  • We recommend that you use a dynamically allocated array of statically-allocated LinearDictionary objects to represent the hash table.

  • The HashTable class has two constructors. One of the constructors allows the user to specify a maximum load factor; the other should use a sensible default value (e.g. 0.75). In either case, select a reasonable default table capacity, (e.g., capacity=5).

  • All operations on your HashTable (other than getKeys and getItems) must run in \(O(1)\) average time.

  • The HashTable must maintain a reasonable load factor in order to operate efficiently. You will create a private helper method called expandCapacity to assist with increasing the table’s capacity when the load factor becomes too high.

  • Note that hashing functions have been provided for you in hashFunctions.h and hashFunctions.cpp for keys of type int and string.

Incrementally develop and test your hash table in a similar manner as LinearDictionary e.g., write tests along the way using manualTests.cpp Start with your constructor, insert, and get to test that you can insert and find items. Then complete your implementation by implementing and testing one method at a time.

Once you are able to pass all of your own tests in manualTests.cpp then try our unit tests:

$ make tests
$ ./tests hashTable
Success: 14 tests passed.
Test time: 0.50 seconds.

Part III: Main Program

Once you have a working hash table, you can start writing your Main Program. This is meant to be a simple program that will use the HashTable data structure that you just implemented.

For this program, you will fill in the main method of the main.cpp file. Within the main method, you will create a HashTable with keys of type string and values of type int. You will then read all of the words within the file wordlist.txt and use your HashTable to count how many times each word appears. Note that only one word appears per line within the wordlist.txt. Finally, you will print every item within your HashTable. In other words, you will print each word with its associated count.

For example, if wordlist.txt contained the following:

the
more
you
learn
the
more
you
know

Then, your main program would print:

the,2
more,2
you,2
learn,1
know,1

Once you finish your main program, you can compile and run it using the following commands:

$ make
$ ./main

Memory Management

For this lab, your program is required to run without memory errors or leaks. Because we are using large data structures in this lab, valgrind will run very slowly. We recommend that you run valgrind on each section of the lab separately as shown below. However, each test may still run quite slowly, so be patient.

  • Test the LinearDictionary: valgrind ./tests linearDictionary

  • Test the HashTable: valgrind ./tests hashTable

  • Test the Main Program: valgrind ./main

Coding Style Requirements

You are required to observe good coding practices. You have seen these requirements many times, and should know them by now. If you’re unsure, check previous labs where the style requirements were stated explicitly.

Summary of Requirements

When you are finished, you should have

  • A completed implementation of LinearDictionary and the helper methods

  • A completed implementation of HashTable and the helper methods

  • A completed implementation of Main Program

  • No memory errors

  • No memory leaks

  • Code which conforms to the style requirements