CS21 Lab 7: Searching
Due Saturday, November 7th, before midnight
Read through this entire lab before you begin coding!
Content warning: This lab deals with data about hospitals and the costs associated with certain medical issues (heart procedure, pneumonia, and knee/hip replacement). If dealing with this subject matter would make it difficult for you to work on the assignment, please contact an instructor as soon as possible to discuss alternatives.
Goals
-
Continue to gain proficiency at top-down design and bottom-up implementation.
-
Practice reading data from a file.
-
Practice using a list of lists to store data.
-
Understand how to adapt linear search and binary search to real-world data.
Hospital dataset
For this assignment, we will be working with a real-world dataset from CORGIS (which is an acronym for: The Collection of Really Great, Interesting, Situated Datasets). We will be using a dataset that contains information about hospitals throughout the US with the goal of helping consumers make informed choices. This dataset contains many features, but we will be focusing on a small subset of these features.
The hospital data is stored in CSV format and is located at:
/data/cs21/hospital/hospitals.csv
Here are the first five lines and last five lines from this file:
Abbeville Area Medical Center,Abbeville,SC,4,0,14548,24265 Abbeville General Hospital,Abbeville,LA,4,0,16666,23508 Abbott Northwestern Hospital,Minneapolis,MN,4,21776,17981,19259 Abilene Regional Medical Center,Abilene,TX,2,23627,18312,25205 Abington Memorial Hospital,Abington,PA,2,23358,16744,20167 ... Yukon Kuskokwim Delta Reg Hospital,Bethel,AK,3,0,11547,0 Yuma District Hospital,Yuma,CO,3,0,16974,0 Yuma Regional Medical Center,Yuma,AZ,3,24631,18413,22354 Zuckerberg San Francisco General Hosp & Trauma Ctr,San Francisco,CA,1,20738,15537,24723 Zuni Comprehensive Community Health Center,Zuni,NM,-1,0,14541,0
Notice that the file is sorted by hospital name.
Each line contains seven features. The following table provides more detailed information about each of these features.
Feature |
Type |
Description |
Notes |
0 |
string |
Hospital name |
|
1 |
string |
City |
|
2 |
string |
State |
two-letter mail code |
3 |
integer |
Overall rating 1-5 |
-1 means not rated |
4 |
integer |
Avg cost of heart procedure |
0 means not reported |
5 |
integer |
Avg cost of pneumonia |
0 means not reported |
6 |
integer |
Avg cost of hip/knee replacement |
0 means not reported |
In this lab we will focus on features 0-3. In the next lab, which adds sorting, we will incorporate more functionality for features 4-6.
Using search to explore hospital data
You will implement a program called search-hospitals.py
, which will
allow the user to search through this dataset and learn interesting
information.
Users may search by location:
Or users may search by hospital name:
Program Requirements
-
All numeric data in the file should be type cast to the correct type when stored in the program.
-
Repeatedly presents a menu to the user with three choices: to search by city and state, to search by name, or to quit.
-
Ends when the user chooses to quit.
-
Ensures that the user provides a valid menu choice.
-
Uses formatted printing to show search results in a table format that fits in an 80-character wide terminal window.
-
Summarizes the results of each search with a count and average overall rating (excluding ratings that were not reported).
-
Uses linear search to find exact matches by city and state.
-
Uses binary search to find matches by the prefix of a hospital name.
Tips
1. Continue to use TDD
We strongly recommend that you create a top-down design with a fully
fleshed out main
and stubs for all other functions. However, you
may not work with a partner on this lab. Also you don’t need to get
approval from an instuctor for your design, though we’d be glad to
review it with you in lab, office hours, or ninja sessions. You
should only begin implementing the other functions after you have a
clear plan in place for the structure of main
.
2. Create modular, reusable functions
Avoid duplicating similar code in multiple places throughout the program. Think about how you could re-use the same function. For example, we are doing two types of searchs (by location and by name), but the results of those searches are displayed in exactly the same way. Consider having a function that takes in a list of search results and displays them. Then you could call this function for either type of search.
3. Displaying matches
We want the results of our searches to be nicely displayed in the terminal window. However, some hospital names (and city names) are quite long. We can use slicing to truncate these strings to a fixed maximum length, allowing us to create an easy to read table format. This section of the online textbook explains how to slice a string.
4. Adapting linear search
The version of linear search that we discussed in class stops as soon as it finds the item it is looking for, and returns the index of that item or -1 if not found. For this problem, your linear search will need to accumulate a list of matching results and return that.
In addition, the basic version of linear search assumes that you are searching a flat list. We will be searching a list of lists, so will need to adapt linear search to use double indexing to get at the data within each sublist.
5. Adapting binary search
Because the hospital data is sorted by hospital name we can use binary search to very quickly locate a particular name. However, hospital names are quite long, and users may not know the exact name. So instead of searching for the full name, we will allow users to search using a prefix.
To see if a string begins with a particular prefix, the syntax is:
<string>.startswith(<prefix>)
which will return True
or False
.
For example, there are quite a few hospitals that start with
"Mary"
as shown below:
... Martin Medical Center <- this has a different prefix Mary Black Health System Gaffney Mary Black Health System Spartanburg Mary Breckinridge Arh Hospital Mary Bridge Children's Hospital Mary Greeley Medical Center Mary Hitchcock Memorial Hospital Mary Hurley Hospital Mary Immaculate Hospital Mary Imogene Bassett Hospital Mary Lanning Healthcare <- suppose binary search finds this Mary Rutan Hospital Mary Washington Hospital Inc Marymount Hospital Mason District Hospital <- this has a different prefix ...
You should modify binary search so that it uses startswith
rather
than ==
to find name matches. However, things get interesting once
you find a match. Binary search will inform you of the index of the
first match it finds, but there might be more matches just before that
index and just after that index. So you’ll need to search in the
vicinity of that location to find all of the other potential matches.
For example, suppose that the first match it finds for "Mary"
is the
one indicated above. You’ll need to search backward from there until
you find a name that does not begin with that prefix and you’ll need
to search forward from there until you find a name that doesn’t begin
with that prefix. You should accumulate all of those hospitals and
display their information.
Extra Challenge
This is an optional extension and is not required.
After completing a search by city and state, inform the user of which hospitals at that location offer the minimum cost for each type of procedure. Be sure to ignore hospitals with a cost of 0 as this simply means that they didn’t report any cost. Also there is no need to find the minimum costs if there is only a single match.
Here is an example run showing how this extra challenge should work:
Answer the Questionnaire
Please edit
the Questions-07.txt
file in your cs21/labs/07
directory
and answer the questions in that file.
Once you’re done with that, run handin21
again.
Turning in your labs….
Remember to run handin21
to turn in your lab files!
You may run handin21
as many times as you want. Each time it will
turn in any new work. We recommend running handin21
after
you complete each program or after you complete significant
work on any one program.