CS21 Lab 9: Searching
Due Saturday, April 9, by 11:59pm
Please read through the entire lab before starting!
This is a one week lab. For the previous lab, you used a full week to practice the TDD component and a second week to implement your solution. Since the design process should be similar to the previous lab, you will be doing the design and implementation for this lab in a single week. If you found the open-ended nature of the last lab challenging, you will want to start this lab early.
Additionally, all the "standard" requirements from previous labs such as programming tips, function comments, etc. still apply; feel free to refer to previous lab pages as a reminder.
Goals
-
Write a program that uses linear search
-
Write a program that uses binary search
-
Continue practicing Top-Down Design (TDD)
-
Connect CS topics with real-world data
1. Searching the dam data
1.1. Hydroelectric Dam Data Set
For this this assignment, we will be working with a real-world dataset from CORGIS (which is an acronym for: The Collection of Really Great, Interesting, Situated Datasets). We will be using a slightly modified version of this dataset that contains records on a number of hydroelectric dams across the United States. This includes information on things like when each dam was built, where it is located, and structural properties like the height and length of the crest (i.e. the top of the dam). For this lab, we’ll be focusing on just a few of these features.
This data is stored in CSV (Comma Separated Value) format and is located at:
/data/cs21/hydropower/hydropower.csv
there are actually two versions of the data file; the primary one is:
/data/cs21/hydropower/hydropower.csv
but there’s also a copy that’s had most of the lines removed, which may be
easier to use for testing your program, particularly in the early stages:
/data/cs21/hydropower/hydropower_small.csv
|
Here are the first three lines and last three lines from this file:
Agate Dam,Dry Creek,Jackson,Oregon,1520,3800,1965 Agency Valley Dam,North Fork of the Malheur River,Malheur,Oregon,3348,1850,1934 Alcova Dam,North Platte River,Natrona,Wyoming,5510,76300,1935 ... Yellowstone River Diversion Dam,Yellowstone River,Yellowstone,Montana,3004,325,1934 Yellowtail Afterbay Dam,Bighorn River,Big Horn,Montana,3197.5,1360,1964 Yellowtail Dam,Bighorn River,Big Horn,Montana,3660,1480,1963
Notice that the file is sorted alphabetically by dam name.
Each line contains 7 features. The following table provides information about these features.
Feature index |
Type |
Description |
0 |
string |
Dam name |
1 |
string |
Watercourse name |
2 |
string |
County |
3 |
string |
State |
4 |
float |
Crest Elevation |
5 |
float |
Crest Length |
6 |
int |
Year |
In this lab we will focus on searching for features 0 and 3 (plus 4 and 5 if you do the optional challenge).
You will implement a program called search-dams.py
, which will allow the user
to search through this dataset and learn particular information.
1.2. Program Requirements
-
Reads in stored data from a file, storing a list of records. The data is stored in the file:
/data/cs21/hydropower/hydropower.csv
Note: don’t copy this file to your
labs/09
directory, just use"/data/cs21/hydropower/hydropower.csv"
as the file name in your python program.As you read in the data, each line of the file corresponds to one record, which your program should represent using a HydroDam object and store in a list.
-
Repeatedly presents a menu to the user with three choices: to search by dam name, to search by state, or to quit. For example:
Please select one of the following choices: (1) Find by Name (2) Find by State (3) Quit Choice?
Depending on the user’s selection, prompt them for additional information and then do one of:
-
Search the dataset by name, using binary search.
-
Search the dataset by state, using linear search.
-
Exit the program.
-
-
When searching, compares data in lowercase format (so that user input is case insensitive) but displays results with their original case.
-
Ensures that the user provides valid input (menu choices and state names).
-
Uses formatted printing to show search results in a table format that fits in an 80-character wide terminal window.
-
Uses linear search to find matches by state.
-
Uses binary search to find matches by name.
1.3. Example Output
-
Here are some examples of searching:
-
Here are some examples of handling valid/invalid input:
1.4. Tips
1.4.1. Continue to use TDD
We strongly recommend that you create a top-down design with a fully fleshed
out main
and stubs for all other functions. However, you may not work with a
partner on this lab. Also, you don’t need to get approval from an instuctor for
your design, though we’d be glad to review it with you in lab, office hours, or
ninja sessions. You should only begin implementing the other functions after
you have a clear plan in place for the structure of main
.
1.4.2. Create modular, reusable functions
Avoid duplicating similar code in multiple places throughout the program. Think about how you could re-use the same function. For example, we are doing two types of searchs (by name and by state), but the results of those searches are displayed in exactly the same way. Consider having a function that takes in a list of search results and displays them. Then you could call this function for either type of search.
1.4.3. Displaying matches
We want the results of our searches to be nicely displayed in the terminal window. However, some game names in the data set are quite long. We can use slicing to truncate these strings to a fixed maximum length, allowing us to create an easy to read table format. This section of the online textbook explains how to slice a string. For instance, the following example shows how to slice out the first 10 characters from a string:
>>> alphaString = "abcdefghijklmnopqrstuvwxyz"
>>> alphaString[0:10]
'abcdefghij'
Notice how this nicely works, even if we provide an upper bound that is longer than the string we’re trying to slice (unlike when dealing with lists):
>>> shortString = "xyz"
>>> shortString[0:10]
'xyz'
Recall also that you can specify the width of a field when printing, e.g.
print("%10s" % ("hi"))
will print "hi" with 8 spaces in front of it. You can
also use the -
flag to get left-justified printing (the default is
right-justified), e.g. print("%-10s %10s" % ("hi", "there"))
will print "hi"
followed by 8 spaces, and then "there" with 5 spaces in front of it:
>>> print("|%-10s|=|%10s|" % ("hi", "there"))
|hi |=| there|
By combining these methods, you can make your output look like a nice table where everything is aligned (as demonstrated in the example output)
1.4.4. Adapting linear search
The version of linear search that we discussed in class stops as soon as it finds the item it is looking for, and returns the index of that item or -1 if not found. For this problem, your linear search will need to accumulate a list of matching results and return that.
In addition, the basic version of linear search assumes that you are searching a flat list. We will be searching a list of objects, so will need to adapt linear search to use method syntax to get at the data within each object.
1.4.5. Adapting binary search
Because the data is sorted by dam name, we can use binary search to very quickly locate a particular dam. However, names can be somewhat long. Additionally, the data may contain multiple dams with similar name prefixes.
For example, there are several dam names that start with the word "Glen":
...
Gibson Dam <- this has a different prefix
Glen Anne Dam
Glen Canyon Dam <- suppose binary search finds this one
Glen Elder Dam
Glendo Dam
Granby Dam <- this has a different prefix
...
Finding the first match
First write a binary search function that searches for a partial match to a
dam name and returns the index of the first match that it finds (or -1
otherwise). Make sure your binary search function uses startswith
(see
here)
rather than ==
so it doesn’t miss any partial matches (e.g. using "Glen" to
match on "Glendo Dam").
However, there might be more matches that are just before or just after that initial index you locate. So next you’ll need to search in the vicinity of that location to find all of the other potential matches.
Fortunately, you can use the provided library function for this.
2. Provided Library
The lab comes with a hydrodam_lab
library that gives you several helpful
tools intended to make this lab easier:
2.1. HydroDam Record Class
The library provides a HydroDam
class, which encapsulates all the features
needed to store one "record." You can create an object of this class by
calling the HydroDam()
constructor with each field of information passed in
as separate parameters. The types of the fields are specified in the table below.
Once you have a HydroDam
object, you can use methods to retrieve information about it:
Parameter |
Type |
Description |
Getter Method |
0 |
string |
Dam name |
|
1 |
string |
Watercourse name |
|
2 |
string |
County |
|
3 |
string |
State |
|
4 |
float |
Crest Elevation |
|
5 |
float |
Crest Length |
|
6 |
int |
Year |
|
You’ll need to get the various fields from the CSV file, convert them to the
correct types, and then pass them to the constructor. You can use the string
method .split()
in order to turn one long string into a list of shorter
strings. This method takes an argument to let you specify what to use as the
"separator;" in this case, you’ll want to split based on commas, using a
command like:
s.split(",")
The constructor creates an object from some parameters:
myDam = HydroDam("My Dam", "Some River", "Some County", "A State", 100.0, 500.0, 2022)
2.2. Finding nearby matches
We have conveniently provided you with a function to help with finding nearby
matches based on an initial seed. This is called lookForwardBackward()
and
you can import it from the hydrodam_lab
library. This function takes four
parameters and returns a list of matching items that it found looking either
forward or backwards. Those parameters are:
-
A list with the data to search over
-
An integer where the search should start from (this is the index that your "binary search" function will return)
-
The partial match (string) that the function should be searching for (e.g. "LEGO" in the above example)
-
A search direction. This is an integer which should have the value 1 (to search forward) or -1 (to search backwards)
You will need to import the function at the top of the program (from hydrodam_lab import *
)
and call it twice for each binary search (once to get the forward matches and
again to get the backward matches).
A call to the lookForwardBackward()
function might look something like this:
backwardMatches = lookForwardBackward(currData, currIndex, name, -1)
3. Extra Challenges
Here are TWO optional extensions! Neither is required, but either or both might be fun and rewarding.
3.1. Optional: Finding the biggest dams
After completing a search by state, inform the user of which is the "biggest" dam according to both crest elevation and crest length.
Here is an example run showing how this extra challenge should work:
3.2. Optional: Extending your own binary search
Rather than importing the provided lookForwardBackward
function, try writing
your own. This should work the same way as the provided version: it will look
linearly forwards or linearly backwards based on an initial seed index that it
takes as a parameter. The function will also need to take the search string as
an input parameter (as well as potentially other information depending on your
design). This function should return a list of all the partial matches based on
that initial seed index.
The program can then add that list together with the entry at the initial seed to compute the total list of partial matches which are then displayed to the user.
Answer the Questionnaire
Each lab will have a short questionnaire at the end. Please edit
the Questions-09.txt
file in your cs21/labs/09
directory
and answer the questions in that file.
Once you’re done with that, you should run handin21
again.
Submitting lab assignments
Remember to run handin21
to turn in your lab files! You may run handin21
as many times as you want. Each time it will turn in any new work. We
recommend running handin21
after you complete each program or after you
complete significant work on any one program.
Logging out
When you’re done working in the lab, you should log out of the computer you’re using.
When Remotely logged in
When you are ssh’ed into the CS labs, first quit any applications you are
running, like vim
, then simply type exit
at the prompt in your terminal
window to disconnect.
When Physically logged in
When you are in a CS lab logged into a CS machine. First quit any applications you are running, like the browser and the terminal. Then click on the logout icon ( or ) and choose "log out".
If you plan to leave the lab for just a few minutes, you do not need to log out. It is, however, a good idea to lock your machine while you are gone. You can lock your screen by clicking on the lock icon. PLEASE do not leave a session locked for a long period of time. Power may go out, someone might reboot the machine, etc. You don’t want to lose any work!