Data Structures and Algorithms

Final Exam Study Guide

The final exam for this course is comprehensive: it will contain material from throughout the semester, albeit with a slight emphasis on more recent material. This study guide contains only material from after Test 3. To prepare for the final exam, please consult the following problems as well as the older study guides (listed below).

Hash Tables

What is a hash? What is a hash collision?

A hash is a function which transforms data (e.g. a dictionary key) into uniform data like an integer. A collision occurs when two different inputs map to the same hash value.
We discussed two techniques for resolving hash collisions in class: linear probing and chaining. The technique used by a HashTable affects its structure.
- Briefly explain the difference between these hash collision resolution techniques.
In linear probing, each hash value is assigned a bucket which holds at most one value and collisions are resolved by moving on to the next bucket. In chaining, each bucket may hold many values.
- Give a C++ data type which could be used as the contents of each type of hash table. (You may use pair and other data types we discussed in class. You may also use a pair within a pair or assume the existence of a type triple for three elements.)
A linear probing hash table might store all of its buckets as a pair<pair<K,V>,bool>*, where the boolean indicates whether a bucket is in use or not. A chaining hash table might store its buckets as a vector<pair<K,V>>* where each bucket is a vector containing all keys of the same hash.
What is load factor? How is it used in a hash table?

The load factor of a hash table is its size divided by its capacity. Load factor is used to determine when more buckets are allocated in order to keep mappings spread out between buckets.
What is the worst case time for inserting an element into a hash table? How does this worst case time relate to the hashing function?

In the worst case, an element insertion takes \(O(n)\) time; this occurs if all of the mappings have the same hash. The likelihood of this occurring is based on the quality of the hash function used with the table.
Suppose you have an empty chaining hash table with three buckets and a maximum load factor of 0.75. Assume that keys and values are integers and that the hash function is hash(x, capacity) is x % capacity. We will insert the mappings \(4 \mapsto 8\), \(1 \mapsto 5\), \(0 \mapsto 6\), and \(6 \mapsto 2\). Draw the hash table after each addition. Assume that ensure_capacity doubles the number of buckets before rehashing.

Empty hash table:

After \(4 \mapsto 8\):

After \(1 \mapsto 5\):

After \(0 \mapsto 6\), which causes a growth:

After \(6 \mapsto 2\):

Graphs

Graph for Question 3

What is a graph?

A graph is a pair of sets. The first set, \(V\), is a set of vertices of any type. The second set, \(E\), is a set of edges. An edge is a grouping between a source vertex, a target vertex, a weight, and a label.
Define the following terms in relation to graphs:
- Weighted
A graph is weighted if its edges have weights / have meaningful weights / have non-uniform weights. A social network, for instance, is unlikely to be weighted, while a map probably is (where weight is distance, difficulty of traveling, etc.).
- Directed
A graph is undirected if, for each edge heading in one direction, there is an identical edge heading in the opposite direction. A graph is directed if it is not undirected.
- Connected
A graph is connected if, for each vertex in the graph, there is a path to every other vertex in the graph. A graph is weakly connected if its undirected equivalent is connected.
Consider the graph on the right.
- Which path would a breadth first search find from vertex A to vertex G?
A breadth-first search would find the path [A, D, G].
- Which path would Dijkstra’s algorithm find from vertex A to vertex G? Why is this path different from the BFS path above?
Dijkstra’s algorithm would find the path [A, C, D, E, G]. This is different from breadth-first search because the sum of the weights of this path is less than the sum of the weights of the path with the fewest edges.
- Ignoring the weights on this graph, perform a topological sort of it. Give the resulting topological sort as a list of vertices.
There are many valid sequences that may result from the topological sort. Here are a two possible outcomes:
- [A, B, F, C, D, E, G]
- [A, C, B, F, D, E, G]
The graph data structure you used in class uses a dictionary to store the edges of the graph. Edges may be stored in a variety of different ways, so let us consider the adjacency matrix representation. Instead of a dictionary, we will store a two-dimensional array of weight values of size \(|V|^2\): that is, there is a row and a column for each vertex. If an edge exists from vertex A to vertex B, then the cell in row A and column B will contain its weight; otherwise, that cell contains a zero (for simplicity). Answer the following questions, using \(|V|\) for the number of vertices in the graph and \(|E|\) for the number of edges.
- What is the worst-case time complexity of adding an edge to this graph?
The worst-case time complexity for adding an edge to the graph is \(O(1)\); the only action necessary is indexing into an array.
- What is the worst-case time complexity of producing a list of edges in this graph?
The worst-case time complexity to find all edges in the graph is \(O(|V|^2)\), since we have to look at every cell in the matrix to find all of the edges.
- What is the worst-case time complexity of determining the out-degree of a vertex?
The worst-case time complexity of determining a vertex’s out degree is \(O(|V|)\), since we must examine each vertex individually to determine if an edge exists to it.
- Suppose that we copy our edge data into a new matrix of size \((|V|+1)^2\) each time a new vertex is added. What is the time complexity of addVertex in this implementation?
The complexity of this copy is \(O(|V|^2)\), proportional to the number of elements copied from the old matrix.
- Consider the strategy that ArrayList uses to amortize the cost of copying data; a similar approach can be used here. Describe how the addVertex algorithm can be improved by allocating more space than is immediately necessary. The amortized worst-case time complexity of adding a new vertex should be \(O(n)\). Explain why.
The addVertex routine can instead copy into a matrix of size \(O((2|V|)^2)\) similar to how an ArrayList might double its own capacity. This operation would involve copying \(O(|V|^2)\) elements but it would be less common to do so. In particular, assuming a starting capacity of 2 (and thus a 2x2 matrix), this doubling strategy would yield the following numbers of copies for each call to addVertex: \(0, 0, 2^2, 0, 4^2, 0, 0, 0, 8^2, 0, 0, 0, 0, 0, 0, 0, 16^2, \ldots\). This is less than \(\sum\limits_{i=1}^{\log_2 n} (2^i)^2 = \dfrac{4}{3}(n^2-1) \in O(n^2)\) copies for \(n\) operations, or amortized \(O(n)\) worst-case time. Intuitively, each matrix copy takes more time than all of the previous copies combined, so \(i\) calls to addVertex will result in less than \(2 \cdot (2^{\lceil\log_2 i\rceil})^2 \leq 4 \cdot i^2\) copies.