CS21 Lab 8: Searching a Dataset
Due Saturday, November 13, before midnight
Please read through the entire lab before starting!
This is a one week lab. For the previous lab, you used a full week to practice the TDD component and a second week to implement your solution. Since the design process should be similar to the previous lab, you will be doing the design and implementation for lab 8 in a single week. If you found the open-ended nature of the last lab challenging, you will want to start this lab early.
Goals
-
Write a program that uses linear search
-
Write a program that uses binary search
-
Continue practicing top-down design (TDD)
-
Connect CS topics with real data
Earthquake Data
For this lab, we’ll be using a month’s worth of earthquake data collected by the United States Geological Survey (USGS) in 2016. The dataset is a modified version from the CORGIS educational dataset archive, and it contains 8394 entries. We have defined an earthquake class to represent one earthquake record.
Introduction: Searching the Dataset
Your task is to write a program that:
-
Reads in the dataset from a file, storing a list of records. The data is stored in the file:
/usr/local/doc/earthquakes.txt
Note: don’t copy this file to your 08 directory, just use
"/usr/local/doc/earthquakes.txt"
as the file name in your python program. Each line of the file corresponds to one record and contains the following fields (separated by semicolons):-
A unique ID (string)
-
The magnitude of the earthquake (float)
-
The full location of the earthquake (string)
-
The latitude of the earthquake (float)
-
The longitude of the earthquake (float)
-
A shortened location of the earthquake (string)
-
The date and time of the earthquake (string)
Here’s one example:
uw61189786;2.41;10km W of Spirit Lake, Idaho;47.968;-117.0135;ID;2016-08-02 18:50:57
Each line of data ends with a "newline" character, which causes the following text to start on the next line of the file. When you’re reading in data from the file, you can call the
.strip
method on each line (prior to calling.split
) to remove any or newlines or other unnecessary spaces.As you read in earthquake data, each line of the file corresponds to one record, which your program should represent using an earthquake object.
-
-
Prompts the user with a menu of four choices:
Please select one of the following choices: (1) Find by ID (2) Find by location (3) Map state (4) Quit Choice?
Depending on the user’s selection, you will prompt them for additional information and then do one of:
-
Search the dataset by ID, using binary search.
-
Search the dataset by (full) location, using linear search.
-
Produce a graphical state map to display the locations of earthquakes by state.
-
Exit the program.
-
You program should continue prompting the user until they enter a valid choice and then perform the following queries listed below for each menu option. The program should exit if the user selects option 4 (Quit).
Find by ID
For option 1 (find by ID), you must:
-
Prompt the user to enter an ID.
-
Use binary search to find the ID in your list of records. The data file provided lists each earthquake in increasing order by ID.
-
Print the information for the record matching the user-entered ID, if there is one. See the note about printing records in the tips section.
-
If the ID was not in the list, let the user know you couldn’t find it.
Here are some examples (input bolded):
$ python3 earthquakes.py Please select one of the following choices: (1) Find by ID (2) Find by location (3) Map state (4) Quit Choice? 1 ID to search for? se60029103 se60029103: magnitude 1.5 @ 8km SSW of Loudon, Tennessee (35.663, -84.3598333) on 2016-08-06 22:55:59 Please select one of the following choices: (1) Find by ID (2) Find by location (3) Map state (4) Quit Choice? 1 ID to search for? R2D2 Could not find any records matching ID: R2D2 Please select one of the following choices: (1) Find by ID (2) Find by location (3) Map state (4) Quit Choice? 4
Find by Location
For option 2 (find by location), you must:
-
Prompt the user for a location string.
-
Use linear search to find all the records for which the user-entered location string appears in the full location. For many locations, you are likely to find more than one matching record, so your search function should produce a list of matching records.
Recall the
in
operator for strings from week 4. Thein
operator will help you to determine whether or not the user-entered string appears in a record’s full location string. -
Print the information for all matching records, one per line, if there were any matches. See the note about printing records in the tips section.
-
If no records match the entered location, let the user know that you couldn’t find any.
Here are some examples:
$ python3 earthquakes.py Please select one of the following choices: (1) Find by ID (2) Find by location (3) Map state (4) Quit Choice? 2 Location? Kansas ismpkansas70199223: magnitude 2.19 @ 16km S of Conway Springs, Kansas (37.2426667, -97.6583333) on 2016-07-29 00:46:35 ismpkansas70199888: magnitude 2.32 @ 17km W of Caldwell, Kansas (37.0196667, -97.8001667) on 2016-07-31 10:39:05 ismpkansas70200663: magnitude 1.96 @ 16km S of Anthony, Kansas (37.0031667, -98.0351667) on 2016-08-07 17:16:54 ismpkansas70200668: magnitude 1.88 @ 6km ENE of Caldwell, Kansas (37.0555, -97.5393333) on 2016-08-07 19:49:34 ismpkansas70200768: magnitude 2.03 @ 17km ESE of Anthony, Kansas (37.1183333, -97.8401667) on 2016-08-09 03:08:48 ismpkansas70200778: magnitude 1.97 @ 6km ENE of Caldwell, Kansas (37.0548333, -97.5411667) on 2016-08-09 03:16:22 ismpkansas70200808: magnitude 1.81 @ 17km ESE of Anthony, Kansas (37.1185, -97.8408333) on 2016-08-09 10:14:37 ismpkansas70201053: magnitude 1.85 @ 11km SSE of Conway Springs, Kansas (37.2896667, -97.6086667) on 2016-08-11 12:02:05 ismpkansas70201078: magnitude 2.14 @ 20km ENE of Anthony, Kansas (37.2113333, -97.8156667) on 2016-08-11 23:56:19 ismpkansas70201133: magnitude 2.13 @ 20km ENE of Anthony, Kansas (37.2106667, -97.8151667) on 2016-08-12 07:19:55 ismpkansas70201148: magnitude 2.31 @ 11km SSE of Conway Springs, Kansas (37.2898333, -97.6091667) on 2016-08-12 09:56:01 ismpkansas70201168: magnitude 2.28 @ 11km SSE of Conway Springs, Kansas (37.2901667, -97.6085) on 2016-08-12 16:39:04 ismpkansas70201583: magnitude 1.81 @ 16km S of Conway Springs, Kansas (37.242, -97.6586667) on 2016-08-15 10:12:10 ismpkansas70203423: magnitude 1.91 @ 13km ESE of Harper, Kansas (37.2561667, -97.8805) on 2016-08-23 10:50:38 us10006c9i: magnitude 2.7 @ 6km ENE of Caldwell, Kansas (37.0547, -97.5375) on 2016-08-09 03:10:23 us10006ckk: magnitude 2.7 @ 7km NNE of Rose Hill, Kansas (37.6196, -97.1033) on 2016-08-10 06:49:29 us10006ezf: magnitude 3.5 @ 6km SW of Cheney, Kansas (37.5848, -97.8274) on 2016-08-19 06:40:43 us10006fkc: magnitude 2.8 @ 20km SE of Anthony, Kansas (37.0396, -97.8554) on 2016-08-21 06:49:49 us10006fkw: magnitude 2.2 @ 2km NNW of Derby, Kansas (37.5718, -97.2771) on 2016-08-21 10:10:16 us10006fq3: magnitude 2.9 @ 7km S of Anthony, Kansas (37.0838, -98.0433) on 2016-08-22 04:22:00 us20006u1y: magnitude 2.1 @ 3km S of Anthony, Kansas (37.1238, -98.0251) on 2016-08-25 17:04:19 Please select one of the following choices: (1) Find by ID (2) Find by location (3) Map state (4) Quit Choice? 2 Location? Mars Could not find any records matching that location. Please select one of the following choices: (1) Find by ID (2) Find by location (3) Map state (4) Quit Choice? 4
Map State
For option 3 (map state), you must:
-
Prompt the user for a state abbreviation (e.g.,
CA
). -
Prompt the user for a minimum magnitude. You need to ensure that the user enters a valid float. If not, continue to prompt until they do. We have provided an
is_float
function to help you. -
If the state abbreviation is valid, you should draw the boundary of the given state (with help from a provided library) and draw a
Circle
at the location of each earthquake recorded in that state whose magnitude is larger than the minimum.When looking for matching states, use the
get_location
method of an earthquake object to retrieve the short location. For locations in the United States, the short location will contain the two-letter state abbreviation (e.g.,CA
). -
The radius of the circle should increase based on the earthquake’s magnitude. Use a radius of:
2 ** magnitude * 0.01
. (You’re welcome to experiment with different sizes, but please use this size when submitting your final version, to make the lab easier to grade.) -
If the state abbreviation is not valid, print a suitable error message.
-
After drawing the map, wait until the user clicks inside the graphics window before closing the window and returning to the main menu.
Here are some examples:
$ python3 earthquakes.py Please select one of the following choices: (1) Find by ID (2) Find by location (3) Map state (4) Quit Choice? 3 Which state? ZZ Minimum magnitude? big Minimum magnitude? 1.3.4 Minimum magnitude? 1.4 Unknown state: ZZ Please select one of the following choices: (1) Find by ID (2) Find by location (3) Map state (4) Quit Choice? 3 Which state? CA Minimum magnitude? 3.1
Produces:
Provided Library
The earthquakes_lab
library has some helpful tools for managing earthquake
records in your program.
Earthquake Class
The library provides an Earthquake
class, which encapsulates all the
information you’ll need to keep track of one record. You can create an
instance object of this class by calling the Earthquake()
constructor with
each field of information from one row of the earthquakes.txt
file passed in
as separate parameters. The number fields (magnitude, latitude, and longitude)
are floats, and all other fields are strings.
You can call the following methods on an Earthquake
object to retrieve information about it:
-
get_id()
: Get the ID field. (string) -
get_magnitude()
: Get the magnitude field. (float) -
get_location_full()
: Get the full location field. (string) -
get_latitude()
: Get the latitude field. (float) -
get_longitude()
: Get the longitude field. (float) -
get_location()
: Get the short location field. (string) -
get_date()
: Get the date/time field. (string)
For example:
>>> from earthquakes_lab import * >>> record = Earthquake("id12345", 3.5, "Parrish Hall, Swarthmore, Pennsylvania", 39.90523, -75.35425, "PA", "2021-11-13 23:59:59") >>> print(record.get_magnitude()) 3.5 >>> print(record.get_location_full()) Parrish Hall, Swarthmore, Pennsylvania >>> print(record.get_location()) PA >>> print(record) id12345: magnitude 3.5 @ Parrish Hall, Swarthmore, Pennsylvania (39.90523, -75.35425) on 2021-11-13 23:59:59
Validating Floats
For option 3, you’ll prompt the user to enter a minimum magnitude, which needs
to be a float. The library provides an is_float
function that you can call on
a string that will return a Boolean, where True
means the string can safely
be converted with float
:
>>> from earthquakes_lab import * >>> is_float("3.1") True >>> is_float("3.1.5") False >>> is_float("big") False
Mapping a State
The library also provides a getStateGraphWin(state)
function. If you pass
this function a valid two-letter state abbreviation, the function will create
and return a GraphWin
object with the boundary of the state shown in black
and the coordinate of the window set to be the minimum and maximum longitudes
and latitudes of the state using setCoords
. If the provided state string is
not a valid abbreviation, a special None
object is returned. The None
type
is not a GraphWin
object and you cannot do anything with it besides check if
it is None
.
Here’s an example of this function:
state = "CA"
win = getStateGraphWin(state)
if win == None:
print("Unknown state!")
return
# Window is OK. You can proceed with drawing circles.
# Wait for mouse click before closing
win.getMouse()
win.close()
Requirements, Hints, and Tips
-
You should practice good top-down design, incrementally implement and test your solution, and document your code with comments. Start with the main menu and get a skeleton of your code working first. Then add each option from the menu one at a time (start with quit as it is easy and will allow you to exit your program when you are testing it). While much of the design is up to you, the requirements below are designed to avoid some headaches in the initial design.
-
When reading the input data file, process the file once and store all the records as a list of
Earthquake
objects. Reading the file once will make it faster to process later. Creating a list ofEarthquake
objects will avoid some list of list headaches. -
To print an instance object of the
Earthquake
class, you can callprint()
directly on the object:record = Earthquake("id12345", 3.5, "Parrish Hall, Swarthmore, Pennsylvania", 39.90523, -75.35425, "PA", "2021-11-13 23:59:59") print(record)
The class knows how to format itself into a string, so you can avoid writing nasty formatting strings.
-
Validate menu selections. If the user doesn’t enter a valid number (1-4), let the user know it wasn’t a valid selection and prompt them again for a new menu item.
-
When reading file input,
split()
only generates a list of strings. Some of the strings will need to be converted to floats. -
Latitude is a y-like coordinate, and Longitude is x-like. Mixing these up will likely cause you to not see any circles on your map if everything else is working correctly:
from earthquakes_lab import * win = get_state_graph_window("PA") #Only one of these points will display on the PA map pt1 = Point(-76, 40) pt2 = Point(40, -76) pt1.setFill("black") pt2.setFill("gold") pt1.draw(win) pt2.draw(win) win.getMouse() win.close()
Optional Extension
This is an optional extra challenge. This part does not affect your grade so
please only attempt this after completing the rest of your lab. If you work on
this extra challenge, please put your extensions in a file named
extended_earthquakes.py
to keep it separate from your graded submission.
When working with a real dataset, natural questions arise. For example, what were the earthquakes with the largest magnitudes? For this optional extension, add a fifth option to find the N largest-magnitude earthquakes, where N is a user-specified number. For example, here the user asks for the eight largest earthquakes (by magnitude):
$ python3 extended_earthquakes.py Please select one of the following choices: (1) Find by ID (2) Find by location (3) Map state (4) Find the N largest earthquakes (by magnitude) (5) Quit Choice? 4 How many (N)? 8 Largest 8 earthquakes by magnitude: us100068jg: magnitude 7.7 @ 28km SSW of Agrihan, Northern Mariana Islands (18.5439, 145.541) on 2016-07-29 17:18:26 us10006exl: magnitude 7.4 @ South Georgia Island region (-55.2793, -31.874) on 2016-08-19 03:32:22 us10006d5h: magnitude 7.2 @ 109km E of Ile Hunter, New Caledonia (-22.4953, 173.1114) on 2016-08-11 21:26:35 us10006gbf: magnitude 6.8 @ 25km W of Chauk, Burma (20.9192, 94.5789) on 2016-08-24 06:34:55 us10006fik: magnitude 6.4 @ South Georgia Island region (-55.2775, -31.7546) on 2016-08-20 23:45:23 us10006a2k: magnitude 6.3 @ 70km ENE of Iwo Jima, Japan (24.9477, 142.0074) on 2016-08-04 12:24:33 us10006a1d: magnitude 6.2 @ 53km NW of Abra Pampa, Argentina (-22.3942, -66.0814) on 2016-08-04 10:15:12 us10006d6d: magnitude 6.2 @ South of the Fiji Islands (-25.1394, -177.3386) on 2016-08-11 23:29:33 Please select one of the following choices: (1) Find by ID (2) Find by location (3) Map state (4) Find the N largest earthquakes (by magnitude) (5) Quit Choice? 5
Submitting
Answer the Questionnaire
Please edit
the Questions-08.txt
file in your cs21/labs/08
directory
and answer the questions in that file.
Once you’re done with that, run handin21
again.
Turning in your labs…
Remember to run handin21
to turn in your lab files! You may run handin21
as many times as you want. Each time it will turn in any new work. We
recommend running handin21
after you complete each program or after you
complete significant work on any one program.
Logging out
When you’re done working in the lab, you should log out of the computer you’re using. First quit any applications you are running, like the browser and the terminal. Then click on the logout icon ( or ) and choose "log out".
If you plan to leave the lab for just a few minutes, you do not need to log out. It is, however, a good idea to lock your machine while you are gone. You can lock your screen by clicking on the lock icon. PLEASE do not leave a session locked for a long period of time. Power may go out, someone might reboot the machine, etc. You don’t want to lose any work!