WEEK08: searching, analysis of algorithms
---------------------------------------------------------------
F: timing, real-world example...anagrams
PYLAB:
- copy over pylabLinePlots.py and try it out
- http://matplotlib.sourceforge.net (good data plotting from python)
- note how logn is better than linear for large N (we'll look
at the nlogn and quad lines next week)
SHUFFLE and SORT:
>>> from random import *
>>> L = range(10)
>>> shuffle(L)
>>> print L
[4, 2, 1, 5, 8, 7, 0, 3, 6, 9]
>>>
>>> L.sort()
>>> print L
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
sort() is a list method, shuffle is a function from the
random library. Both do their work "in place", meaning the
list (L) is changed.
TIMING SEARCHES:
- write some code to show how the time for a linear search
doubles when we double the size of the data (N)
- also, how does the "in" operator work? Does it do a linear
search or a binary search??
from searches import *
from random import *
from time import time
##############################################
def main():
"""ask user for size of data, run searches"""
N = input("size of data, N: ")
T = input(" num trials, T: ")
L = []
for i in range(N):
L.append(randrange(-N,N))
L.sort()
lintotal = 0
bintotal = 0
intotal = 0
for t in range(T):
x = randrange(-N,N)
t1 = time()
linearSearch(x, L)
t2 = time()
lintotal = lintotal + (t2 - t1)
t1 = time()
binarySearch(x, L)
t2 = time()
bintotal = bintotal + (t2 - t1)
t1 = time()
x in L
t2 = time()
intotal = intotal + (t2 - t1)
print "ave linear time: %0.7f" % (lintotal/float(T))
print "ave binary time: %0.7f" % (bintotal/float(T))
print "ave in time: %0.7f" % (intotal/float(T))
##############################################
main()
$ python timesearches.py
size of data, N: 200000
num trials, T: 20
ave linear time: 0.0186138
ave binary time: 0.0000101
ave in time: 0.0085083
$ python timesearches.py
size of data, N: 400000
num trials, T: 20
ave linear time: 0.0469730
ave binary time: 0.0000128 <--- does not double!
ave in time: 0.0235038
ANAGRAMS:
- copy /home/jk/inclass/anagrams.py and run the code:
$ python anagrams.py
-- Welcome to A N A G R A M --
Enter a word and I'll find the anagrams (QUIT to quit): miles
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
miles
limes
smile
slime
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
- look at the code to see how it works. You should be able to understand
all but the Anagram function, which uses recursion (we'll cover that
next week)
- the Anagram function just creates a list of all possible permutations
of the given word. If you type in cat, Anagram creates the following
list: ['cat', 'act', 'atc', 'cta', 'tca', 'tac']
- this list of all permutations is sent to displayAnagram which only
prints out the real English words.
- what happens when I type in an 8-letter word???
- why does it run so slow???
- for a 3-letter word there are 6 permutations. How many are there
for a 4-letter word? How about a 5-letter word??
YOUR TURN:
- change the displayAnagram function to use a binary search, instead
of this linear search:
if w in english and w not in uniqalist:
make that something like this:
if binarySearch(w,english) and w not in uniqalist:
so your binarySearch function needs to return a True if the word is
found, and a False if not found.
RESULTS:
- how many anagrams does the word "education" have?
- how long does it take to figure that out with
binarysearchanagrams.py vs. anagrams.py??