Design and Analysis of Algorithms

CS41: Final Exam Study Guide

This study guide includes topics covered over the entire semester. The final will be cumulative, although weighted towards the second half of the semester. This study guide is intended to be comprehensive, but not guaranteed to be complete. You are responsible for all material discussed in lecture or lab, and all material on homework assignments. You are also expected to be familiar with data structures introduced in CS35.


In addition to using this study guide, you should:
  • review your lecture notes
  • review your homework assignments
  • review lab questions
You can also look over assigned readings from the textbook, but they were not required.

You should be able to define and explain the following terms:

  • stable matching
    • matching instability
    • Gale-Shapley algorithm
  • algorithm analysis
    • asymptotic notation (big-Oh, big-Omega, big-Theta)
    • direct proof
    • induction
    • proof by contradiction
  • data structures
    • list
    • array
    • adjacency list and adjacency matrix
    • any data structure from CS35 that you want to use (but make sure you are doing the correct worst-case analysis!)
  • graph algorithms
    • breadth-first search (BFS)
    • depth-first search (DFS)
    • topological ordering
    • spanning tree
    • minimum spanning tree (MST)
    • Kruskal's Algorithm
    • Prim's Algorithms
    • directed, undirected graphs
    • bipartite graph
    • cycle
    • connected component
  • greedy algorithm
    • stays-ahead method
    • exchange argument
  • divide and conquer
    • recurrence relations
    • recursion tree
    • substitution method (and partial substitution method)
    • mergesort
    • Karatsuba multiplication
  • dynamic programming
    • memoization
    • tabulation
  • network flow
    • augmenting path
    • Ford-Fulkerson algorithm
    • maximum flow / minimum cut
  • computational complexity
    • intractability
    • decision problem
    • optimization problem
    • complexity classes (P, NP, NP-Complete, NP-hard)
    • The Cook-Levin Theorem
    • NP-Complete problems (SAT, VertexCover, IndependentSet, SetCover, 3-Coloring, 3SAT, ...)
    • polynomial-time reduction
    • polynomial-time verifier
  • approximation
    • approximation algorithm
    • approximation ratio
    • minimization problem, maximization problem
    • travelling salesman problem (TSP)
    • maximum cut
  • randomization
    • randomized algorithms
    • discrete probability
    • random variable
    • expected value
    • linearity of expectation

Practice problems:

  1. Divide and Conquer. An investment company has designed several new trading strategies and wants to compare them. Unfortunately, the strategies were tested on different commodities over different time periods, so the total profit earned is not a fair comparison. Instead, the company wants to compare each strategy's profit to the maximum possible profit that could have been made (with perfect hindsight) during the same time span. Each strategy was allowed to buy a fixed amount of the commodity once and then sell what it bought at some later date.

    During each testing period, the investment company recorded the minimum and maximum prices that were reached each day. Your algorithm receives a list of $n$ pairs, where the $i^{th}$ pair has the minimum price of the commodity and the maximum price of the commodity on day $i$. Your algorithm should output the largest price difference that could have been achieved by buying at the minimum price on day $i$ and selling at the maximum price on day $j > i$.

    For example, suppose $n=3$ and the (min, max) prices were $[ (8, 14), (1, 2), (4, 6) ]$. Then you should return a per-unit profit of 5, corresponding to buying for 1 on day 2 and selling for 6 on day three (recall that the trading strategies must buy first, and can't sell on the same day).

    Clearly, there is a simple algorithm that takes time $O(n^2)$: try all possible pairs of buy/sell days and see which makes the most money. Design a divide and conquer algorithm to determine the best profit in hindsight more efficiently. Set up a recurrence that describes the running time of your algorithm, and solve it to show that your algorithm is faster than $O(n^2)$.

  2. Dynamic Programming. In the Subset-Sum problem, you are given $n$ items $\{1, ..., n\}$. Each item has a nonnegative integer weight $w_i$. You are also given an integer weight threshold $W$. For a subset of elements $S$, define $w(S)$ to be the sum of $w_i$, taken over all $i$ in $S$. Your goal is to output the set $S$ that maximizes $w(S)$, subject to $w(S) \leq W$.
    The Subset-Sum problem can be solved using a dynamic programming approach where the subproblems use a subset of the elements and a smaller threshold. The dynamic programming table stores in entry $(i, j)$ the solution to the subset sum problem for items $\{1, ..., i\}$ with threshold $j$.
    1. Explain how to calculate SubsetSum$(i, j)$, assuming that the table already contains correct answers to all smaller subproblems $(i', j')$, namely those such that either $(i' \leq i$ and $j' < j)$ or $(i' < i$ and $j' \leq j)$.
    2. What is the running time of the dynamic programming algorithm that fills this table for successively larger subsets and thresholds and then returns entry $(n, W)$?
  3. Intractability. Consider the following decision-version of Subset-Sum, which we'll call SSum. In this version, you're given n items $\{1,..., n\}$ with item weights $w_i$, and a weight threshold $W$, and you must output YES iff there exists a subset whose item weights total exactly $W$; i.e., if there is $S$ such that $w(S) = W$.
    1. Show that SSum is NP-Complete.
    2. Show how to solve SSum using your solution for Subset-Sum from the previous problem.
    3. Why does your answer to (b) not pose a contradiction with your dynamic programming algorithm for Subset-Sum (assuming $P \neq NP$)?
  4. Network flow. (Kleinberg and Tardos 7.12) You are given a flow network with unit-capacity edges: it consists of a directed graph $G=(V,E)$, a source $s \in V$, a sink $t \in V$, and $c_e = 1$ for all $e \in E$. You are also given an integer $k$.

    Your goal is to delete $k$ edges in order to reduce the maximum $s \rightsquigarrow t$ flow in $G$ by as much as possible. In other words, you should find a set of edges $F \subset E$ so that $|F| = k$ and the maximum $s\rightsquigarrow t$ flow in $G'%' = (V, E \backslash F)$ is as small as possible.

    Give a polynomial-time algorithm to solve this problem.

  5. Approximation Algorithms. (Kleinberg and Tardos 11.10) Suppose you are given an $n \times n$ grid graph $G$. Associated with each node $v$ is a weight $w_v$, which is a nonnegative integer. You may assume that the weights of all nodes are distinct. Your goal is to choose an independent set $S$ of nodes of the grid so that the sum of the weights of the nodes in $S$ is as large as possible. Consider the following greedy algorithm for this problem:
    1. Start with $S$ equal to the empty set
    2. While some node remains in $G$
      • Pick the heaviest remaining node $v$.
      • Add $v$ to $S$
      • Delete $v$ and all its neighbors from $G$
    3. Return $S$

    First, let $S$ be the independent set returned by this algorithm, and let $T$ be some other independent set in $G$. Show that for each node $v$ in $T$, either $v$ is in $S$, or $v$ is adjacent to some $v'$ in $S$ with $w_{v'} \geq w_v$.

    Show that this algorithm returns an independent set with total weight at least 1/4 of the maximum total weight of an independent set.

  6. Randomized Algorithms. (Kleinberg and Tardos 13.8) Let $G =(V,E)$ be an undirected graph with $n$ nodes and $m$ edges. For a subset of vertices $X\subseteq V$, let $G[X]$ denote the subgraph induced on $X$; i.e., the graph whose vertices are $X$ and whose edge set consists of all edges in $G$ with both endpoints in $X$.
    We are given a natural number $k \leq n$ and are interested in finding a set of $k$ nodes that induces a "dense" subgraph of $G$; we'll phrase this concretely as follows. Give a randomized algorithm that produces, for a given graph $G$ and natural number $k$, a set $X$ of $k$ vertices with the property that the induced subgraph $G[X]$ has at least $\frac{mk(k-1)}{n(n-1)}$ edges.

    Your algorithm should have expected polynomial time but always output correct answers. What is the expected running time of your algorithm?