Week 9: Search Algorithms
Week 9 Goals
-
Search Algorithms (linear and binary)
-
Complexity Analysis of algorithms
Week 9 Files
-
guessing.py
: a number guessing game -
search.py
: start of a program to implement some searching algorithms -
search_worksheet.pdf
: a binary search worksheet on the webpage. At the top of the file are instructions on how to print this to a CS lab printer.
Search Motivation
Computer Science as a discipline is focused on two primary questions:
-
What types of problems can be solved computationally?
-
How efficiently can these problems be solved?
One of the core problems we computers are asked to solved is the search problem. Broadly speaking, the search problem searches for a query item in a (potentially very large) set of potential matches. For two large internet companies, search is one part of their core business model.
Searching efficiently can help you solve larger problems, help more customers, or make larger profits. So how do we organize data and write code/algorithms to search efficiently? And what do we even mean by efficient?
As concrete examples, Google once found that a half-second delay caused a 20% drop in traffic, and 100 extra milliseconds of page load time reduced Amazon’s revenue substantially.
Consider the two game modes in the number guessing game — the modes differ slightly in how they give you feedback when your guess is wrong. Try playing both games. Do you have a different strategy for one game than the other? Why?
Motivating example: Number Guessing Game
To help us with our upcoming analysis of searching, let’s consider a number guessing game, guessing.py
. Here are the rules:
-
At the start, the program asks the user which mode to play the game in:
easy
orhard
. -
The game then chooses a random
target
number between 0 and 200. -
The game then repeatedly prompts the user for a guess of the number. It checks the user’s number against the
target
. If they match, the user has won and the game ends. Otherwise:-
In easy mode, tells the user whether the
target
is lower or higher than the value they entered. -
In hard mode, just tells the user whether the guess was right or wrong, but doesn’t give any additional hints or feedback.
-
Extra practice on your own
An implementation of this game has been provided to you, but for extra practice outside of class, you can try doing the design and implementation of this game yourself or with a partner.
After you have completed the design, implement the program and test it.
Extra challenge
Think about how you might add another game mode: either
extra easy
or extra hard
, and implement one such mode if time permits.
Algorithmic Complexity
In addition to learning and coding a few searching and sorting algorithms over the next two weeks, we’ll also start analyzing algorithms. We can analyze and compare algorithms by classifying them into broad complexity categories so we can compare one type of algorithm to another without worrying about details like implementation language used or speed of the physical machine.
To analyze the complexity of an algorithm, we need to consider several questions. For now, we’ll think about these in the context of searching, but these idea apply much more generally too:
-
What are the resources are we trying to optimize for? (e.g., minimize time to win the game, minimize CPU time, minimize memory usage)
-
How do we analyze how long it takes for an algorithm to finish?
-
If we want to count the "steps" needed to complete a task, what counts as a step? (e.g., number of guesses, number of comparisons made in a search)
-
Do we care about the best case scenario, what happens on average, or the worst case?
Let’s draw a rough analysis of our guessing game on the board.
Linear Search
To explore the complexity of search, we will narrow the problem to searching
for an item x
in a python list of items. Python already has two ways of doing
this. The first is the Boolean in
operator, x in ls
which returns True
if
x
is in the list ls
and False
otherwise. Python also supports the
index()
method which will tell you the position in the list of the first
occurrence of x
, provided x
appears in the list. So ls.index(x)
will
return an integer position if x
is in ls
. If x
is not in the list,
Python will generate an error called an exception that will likely crash your
program since we have not talked much about how to handle exceptions in this
class.
But how do these methods actually work? At some point, a computer scientist and
python programmer designed and wrote the code for these built-in features. We
will discuss the algorithm for searching a collection of items. Together,
we’ll write a function that does something familiar: search through a list for
an item without using the in
operator. Our functions will be called
contains
and position_of
.
Example program
-
Complete the program
search.py
-
This program reads a list of numbers from a file
-
It then prompts the user for a number, and then searches through the numbers
-
to see if the user’s selection is present
Binary Search
A key algorithmic question is: can we do better? In some cases, we can’t, and we can prove this! In the case of linear search, we cannot do better in the general case (i.e. for any type of problem we might want to use search for). However, if all items in the collection are in sorted order, we can perform a faster algorithm known as binary search.
Binary search works using divide and conquer approach. Each step of the algorithm divides the number of items to search in half until it finds the value or has no items left to search. Here is some pseudocode for the algorithm:
set low = lowest possible index set high = highest possible index LOOP: calculate middle index = (low + high) // 2 if item is at middle index, we're done (found it! return matching index) elif item is < middle item, set high to middle - 1 elif item is > middle item, set low to middle + 1 if low is ever greater than high, item not here (done, return -1)
-
How and why does this algorithm work?
-
Why does it require the list to be in order before we begin?
-
Why is this faster than linear search?
-
How much faster will it be?
Binary search worksheet
To help illustrate the behavior of binary search, print out the binary search worksheet which will let you trace the algorithm through four examples.
Comparing Linear and Binary Search
Our intuition tells us that binary search is faster than linear search, but how much faster and how can we talk about relative speed of two algorithms? Consider a list of size \(n\). How many steps does each algorithm take to find an item in the list in the worst-case? How does this change as the size of the list grows? In the case of linear search, we have to look at potentially every item in the list once to determine that our search query is not in the list.
In the case of binary search, each time we look at one item in the list, we gain information about where our query might be located in the list. In one step of binary search, we can eliminate half of the remaining items from consideration. This savings adds up quickly as the size of the list grows.
List size |
Max steps |
Max steps |
Linear |
Binary |
|
\(1=2^0\) |
1 |
1 |
\(2=2^1\) |
2 |
2 |
\(4=2^2\) |
4 |
3 |
\(8=2^3\) |
8 |
4 |
\(16=2^4\) |
16 |
5 |
\(32=2^5\) |
32 |
6 |
\(64=2^6\) |
64 |
7 |
\(128=2^7\) |
128 |
8 |
\(1024=2^{10}\) |
1024 |
11 |
\(2^{20}\) |
about 1 million |
21 |
\(2^{30}\) |
about 1 billion |
31 |
\(n\) |
\(n\) |
\(log_2(n)+1\) |
We say that linear search scales linearly with the size of the list. If we double the size of the list, we expect the total run time for a search to roughly double.
Binary search scales logarithmically with the size of the list. If we double the size of the list, we expect the total run time for a search to increase by just a small amount. Even for very large lists containing over 1 billion elements, binary search is very fast.
For binary search to work correctly, the list must be sorted. This is a key difference between linear and binary search. Linear search works on any list, but binary search only works on lists sorted by the query type.
How long does it take to sort a list? Is this faster or slower than searching? Do we have to sort the list every time we want to search it? What kind of search (linear or binary) do you think python uses when you call ls.index(x)
?