The tests in this course may contain any material covered in lecture up until this point, although much stronger emphasis will be placed on material for which you have received graded feedback. This page contains a collection of exercises that you may wish to complete to prepare yourself. We also encourage you to review the lecture notes from topics presented since the previous test.
Sorting
You should make sure to review the sorting algorithms we covered in class and be ready to discuss them.
What is the worst-case time complexity of bubble sort? QuickSort? MergeSort?
Although QuickSort’s worst-case time complexity is worse than MergeSort’s, we use QuickSort as a default in practice. Why?
What is an in-place sorting algorithm?
What is wrong with the following QuickSort pseudocode? What would happen if we used this pseudocode to sort an array? How could we fix it?
Write pseudocode for a partition function which can be used in the QuickSort algorithm.
Write pseudocode for the merge function of the MergeSort algorithm.
Bubble sort and QuickSort both have worst-case complexity of \(O(n^2)\). MergeSort has a worst-case complexity of \(O(n \log n)\).
QuickSort’s expected worst-case complexity is \(O(n \log n)\). Also, QuickSort is an in-place sort, meaning that it performs less copying than MergeSort.
An in-place sorting algorithm is one in which the contents of the array are sorted while remaining within the array.
There is no base case in the provided recursive pseudocode. If this code were executed, it would break the list down into paritions of size 0 and keep trying to split those size 0 partitions indefinitely. We need to add a condition that does not try to sort if the array is of size less than one.
Any partition function which correctly organizes the array into a “low section” and a “high section”, returning the index of the pivot between them, is acceptable. Here’s the Lomuto partitioning algorithm which was discussed in Lecture:
partition(array, left, right):
pivot_val = array[right]
s = left
l = left
while l <= right:
if array[l] <= pivot_val:
swap(array, s, l)
s++
l++
swap(array, s, right)
return s
An alternative approach is the Hoare partitioning algorithm:
partition(array, left, right):
pivot_val = array[right]
l = left
r = right-1
while l <= r
if array[l] <= pivot_val:
l += 1
elif array[r] >= pivot_val:
r -= 1
else:
swap(array, l, r)
swap(array, l, right)
return l
Function merge(a1, a2, am)
i <- 0
j <- 0
k <- 0
While i < length(a1) And j < length(a2)
If a1[i] < a2[j]
am[k] <- a1[i]
i <- i + 1
k <- k + 1
Else
am[k] <- a2[j]
j <- j + 1
k <- k + 1
EndIf
EndWhile
While i < length(a1)
am[k] <- a1[i]
i <- i + 1
k <- k + 1
EndWhile
While j < length(a2)
am[k] <- a2[j]
j <- j + 1
k <- k + 1
EndWhile
EndFunction
Big-O Proofs
Recall the definition of big-\(O\) notation we used in class: \(f(x)\) is \(O(g(x))\) if \(\exists c>0, k\geq 1.\ \forall n \geq k.\ f(n) \leq cg(n)\). You should be prepared, given this definition, to prove complexity bounds on functions.
Prove that \(f(n) = \frac{n^2+1}{n}\) is \(O(n)\).
Prove that \(f(n) = 3n^3 + 5n^2 + 8\) is \(O(n^3)\).
Prove that \(f(n) = (n+1)(n-1)\) is \(O(n^2)\).
(This is a “reverse” proof.)
We want to show that \(f(n)\) is \(O(n)\). By definition, this is to claim that \(\exists c>0, k\geq 1.\ \forall n \geq k.\ \frac{n^2+1}{n} \leq cn\).
Let \(k=1\) and \(c=2\). Then it suffices to show that \(\forall n \geq 1.\ \frac{n^2+1}{n} \leq 2n\). We can redistribute the let side to read \(\forall n \geq 1.\ n + \frac{1}{n} \leq 2n\). We have from algebra that, if \(a \leq c\) and \(b \leq d\), then \(a + b \leq c + d\); therefore, it suffices to show both that \(\forall n \geq 1. n \leq n\) and also that \(\forall n \geq 1. \frac{1}{n} \leq n\). The prior is true by definition of equality. For the latter, we multiply both sides by \(n\) (a positive number) to get \(\forall n \geq 1. 1 \leq n^2\); we take the square root of both sides to get \(\forall n \geq 1. 1 \leq n\), which is immediately true. As these two points are sufficient to show the original statement, we conclude that \(f(n)\) is \(O(n)\).
(This is a “forward” proof.)
\begin{align}
&\text{1.}& \forall n \geq 1. 1 &\leq n & \text{by definition} \\
&\text{2.}& \forall n \geq 1. 1 &\leq n^2 & \text{(1), square both (positive) sides} \\
&\text{3.}& \forall n \geq 1. 1 &\leq n^3 & \text{(1), cube both (positive) sides} \\
&\text{4.}& \forall n \geq 1. 8 &\leq 8n^3 & \text{(3), multiply by 8} \\
&\text{5.}& \forall n \geq 1. 5n^2 &\leq 5n^3 & \text{(1), multiply by \(5n^2\) (positive by (2))} \\
&\text{6.}& \forall n \geq 1. 3n^3 &\leq 3n^3 & \text{by definition of equality} \\
&\text{7.}& \forall n \geq 1. 3n^3 + 5n^2 &\leq 3n^3 + 5n^3 & \text{by (5), (6) and transitivity of \(\leq\)} \\
&\text{8.}& \forall n \geq 1. 3n^3 + 5n^2 + 8 &\leq 3n^3 + 5n^3 + 8n^3 & \text{by (4), (7), and transitivity of \(\leq\)} \\
&\text{9.}& \forall n \geq 1. 3n^3 + 5n^2 + 8 &\leq 16n^3 & \text{by (8), simplification} \\
&\text{10.}& \exists c>0. \forall n \geq 1. 3n^3 + 5n^2 + 8 &\leq cn^3 & \text{by (9), introduction of existential} \\
&\text{11.}& \exists c>0, k \geq 1. \forall n \geq k. 3n^3 + 5n^2 + 8 &\leq cn^3 & \text{by (10), introduction of existential} \\
&\text{12.}& 3n^3 + 5n^2 + 8 &\text{ is } O(n^3) & \text{by (11), definition of big-O} \\
\end{align}
(This is a limit-based “reverse” proof. This sort of proof uses an alternative, but equivalent definition of Big-O.)
We aim to prove that \(f(n) = (n+1)(n-1)\) is \(O(n^2)\). By definition, this is to say that \(\lim\limits_{n\to\infty} \dfrac{(n+1)(n-1)}{n^2} = c\) for some constant c. By distribution, this is to say that \(\lim\limits_{n\to\infty} \dfrac{n^2-1}{n^2} = c\).
By the sum law for limits, this is equivalent to the statement \(\left(\lim\limits_{n\to\infty} 1\right) - \left(\lim\limits_{n\to\infty} \dfrac{1}{n^2}\right) = c\). The limit of a constant is the constant itself. The second term in this equation has \(n\) only in the denominator, so this limit is zero. It therefore suffices to show \(1 - 0 = c\) for some constant \(c\), which is true.
Data Structures and Invariants
You should be prepared to discuss the data structures we have created in class, the fields in their implementations, and the invariants we rely upon in order to write algorithms for those data structures.
What is an ADT? What is the difference between an ADT and a data structure? Give two examples of each.
What is the difference between a List and a Stack? When might we use a Stack instead of a List?
What are the big-\(O\) complexities of the following operations? Answer for each of (a) LinkedList with a tail pointer, (b) LinkedListwithout a tail pointer, and (c) ArrayList. Briefly explain each answer.
Insert to the front of the list.
Insert to the back of the list.
Retrieve the first element of the list.
Retrieve the middle element of the list.
Remove an element from the back of the list.
What is amortized complexity?
Draw a stack diagram of a stack frame in which the following code has been run:
An ADT is a description of how data can be accessed and manipulated. In other words, an ADT is an interface allowing the caller to store and retrieve data. A data structure is a concrete realization of an ADT; data structures give specific algorithms to implement the interface that the ADT describes.
Two examples of ADTs are lists and stacks. Lists and stacks are both defined by a collection of routines that allow a user to store and manipulate data (e.g. insertFirst, getLast, etc. for list; pop, push, etc. for stack). Two examples of data structures are linked lists and array lists. Both linked lists and array lists are concrete implementations of the list ADT; each of them give an implementation for the interface that the list ADT describes.
List and Stack are two different ADTs. The List ADT supports a variety of operations (such as get by index) that a Stack does not. A Stack may be implemented using a List but is not a kind of List.
We might use a Stack instead of a List if the operations provided by Stack are enough for our algorithm. By using a “weaker” data structure, the algorithm that uses it is easier to verify as correct since it’s easier to understand how the data structure behaves.
LinkedList with tail
LinkedList without tail
ArrayList
insertFirst
\(O(1)\) A new head is created and linked to the old head.
\(O(1)\) Identical to LinkedList with tail.
\(O(n)\) Every element must shift down one space.
insertLast
\(O(1)\) A new tail is created and linked to the old tail.
\(O(n)\) Without a tail pointer, we must walk to the end of the list to find the tail.
amortized \(O(1)\) We add to the end of the array; this may require us to grow the array.
getFirst
\(O(1)\) The first node of the list is immediately available.
\(O(1)\) Identical to LinkedList with tail.
\(O(1)\) Accessing an array is constant time.
get
\(O(n)\) We have to find the appropriate node.
\(O(n)\) Identical to LinkedList with tail.
\(O(1)\) Accessing an array is constant time.
removeLast
\(O(n)\) The tail pointer lets us find the last element, but we need to find the penultimate element (second to last) so we can make it the new last element. This requires us to walk the list.
\(O(n)\) We must walk to the end of the list to find the last two nodes.
\(O(1)\) We can simply decrease the size of the ArrayList by one; the array doesn’t need to change.
Amortized complexity is the average complexity of an algorithm over a sequence of calls to it. For instance, the amortized worst-case complexity of ArrayList’s insertLast is \(O(1)\) even though the worst case complexity is \(O(n)\): on any given call to insertLast, \(O(n)\) is the worst complexity that can occur. However, a sequence of \(n\) calls to insertLast on the same list has time complexity \(O(n)\) so, on average once the sequence is finished, each call can be assigned \(O(1)\) time.
Note that this stack diagram does not include a tail pointer, although it wouldn’t be wrong to do so.