CS21 Lab 8: Searching a Dataset

Due Saturday, November 13, before midnight


Please read through the entire lab before starting!

This is a one week lab. For the previous lab, you used a full week to practice the TDD component and a second week to implement your solution. Since the design process should be similar to the previous lab, you will be doing the design and implementation for lab 8 in a single week. If you found the open-ended nature of the last lab challenging, you will want to start this lab early.

Goals

  • Write a program that uses linear search

  • Write a program that uses binary search

  • Continue practicing top-down design (TDD)

  • Connect CS topics with real data

Earthquake Data

For this lab, we’ll be using a month’s worth of earthquake data collected by the United States Geological Survey (USGS) in 2016. The dataset is a modified version from the CORGIS educational dataset archive, and it contains 8394 entries. We have defined an earthquake class to represent one earthquake record.

Introduction: Searching the Dataset

Your task is to write a program that:

  • Reads in the dataset from a file, storing a list of records. The data is stored in the file:

    /usr/local/doc/earthquakes.txt

    Note: don’t copy this file to your 08 directory, just use "/usr/local/doc/earthquakes.txt" as the file name in your python program. Each line of the file corresponds to one record and contains the following fields (separated by semicolons):

    • A unique ID (string)

    • The magnitude of the earthquake (float)

    • The full location of the earthquake (string)

    • The latitude of the earthquake (float)

    • The longitude of the earthquake (float)

    • A shortened location of the earthquake (string)

    • The date and time of the earthquake (string)

      Here’s one example:

      uw61189786;2.41;10km W of Spirit Lake, Idaho;47.968;-117.0135;ID;2016-08-02 18:50:57

      Each line of data ends with a "newline" character, which causes the following text to start on the next line of the file. When you’re reading in data from the file, you can call the .strip method on each line (prior to calling .split) to remove any or newlines or other unnecessary spaces.

      As you read in earthquake data, each line of the file corresponds to one record, which your program should represent using an earthquake object.

  • Prompts the user with a menu of four choices:

    Please select one of the following choices:
    (1) Find by ID
    (2) Find by location
    (3) Map state
    (4) Quit
    
    Choice?

    Depending on the user’s selection, you will prompt them for additional information and then do one of:

    1. Search the dataset by ID, using binary search.

    2. Search the dataset by (full) location, using linear search.

    3. Produce a graphical state map to display the locations of earthquakes by state.

    4. Exit the program.

You program should continue prompting the user until they enter a valid choice and then perform the following queries listed below for each menu option. The program should exit if the user selects option 4 (Quit).

Find by ID

For option 1 (find by ID), you must:

  • Prompt the user to enter an ID.

  • Use binary search to find the ID in your list of records. The data file provided lists each earthquake in increasing order by ID.

  • Print the information for the record matching the user-entered ID, if there is one. See the note about printing records in the tips section.

  • If the ID was not in the list, let the user know you couldn’t find it.

Here are some examples (input bolded):

$ python3 earthquakes.py

Please select one of the following choices:
(1) Find by ID
(2) Find by location
(3) Map state
(4) Quit

Choice? 1
ID to search for? se60029103

se60029103: magnitude 1.5 @ 8km SSW of Loudon, Tennessee (35.663, -84.3598333) on 2016-08-06 22:55:59

Please select one of the following choices:
(1) Find by ID
(2) Find by location
(3) Map state
(4) Quit

Choice? 1
ID to search for? R2D2

Could not find any records matching ID: R2D2

Please select one of the following choices:
(1) Find by ID
(2) Find by location
(3) Map state
(4) Quit

Choice? 4

Find by Location

For option 2 (find by location), you must:

  • Prompt the user for a location string.

  • Use linear search to find all the records for which the user-entered location string appears in the full location. For many locations, you are likely to find more than one matching record, so your search function should produce a list of matching records.

    Recall the in operator for strings from week 4. The in operator will help you to determine whether or not the user-entered string appears in a record’s full location string.

  • Print the information for all matching records, one per line, if there were any matches. See the note about printing records in the tips section.

  • If no records match the entered location, let the user know that you couldn’t find any.

Here are some examples:

$ python3 earthquakes.py

Please select one of the following choices:
(1) Find by ID
(2) Find by location
(3) Map state
(4) Quit

Choice? 2
Location? Kansas

ismpkansas70199223: magnitude 2.19 @ 16km S of Conway Springs, Kansas (37.2426667, -97.6583333) on 2016-07-29 00:46:35
ismpkansas70199888: magnitude 2.32 @ 17km W of Caldwell, Kansas (37.0196667, -97.8001667) on 2016-07-31 10:39:05
ismpkansas70200663: magnitude 1.96 @ 16km S of Anthony, Kansas (37.0031667, -98.0351667) on 2016-08-07 17:16:54
ismpkansas70200668: magnitude 1.88 @ 6km ENE of Caldwell, Kansas (37.0555, -97.5393333) on 2016-08-07 19:49:34
ismpkansas70200768: magnitude 2.03 @ 17km ESE of Anthony, Kansas (37.1183333, -97.8401667) on 2016-08-09 03:08:48
ismpkansas70200778: magnitude 1.97 @ 6km ENE of Caldwell, Kansas (37.0548333, -97.5411667) on 2016-08-09 03:16:22
ismpkansas70200808: magnitude 1.81 @ 17km ESE of Anthony, Kansas (37.1185, -97.8408333) on 2016-08-09 10:14:37
ismpkansas70201053: magnitude 1.85 @ 11km SSE of Conway Springs, Kansas (37.2896667, -97.6086667) on 2016-08-11 12:02:05
ismpkansas70201078: magnitude 2.14 @ 20km ENE of Anthony, Kansas (37.2113333, -97.8156667) on 2016-08-11 23:56:19
ismpkansas70201133: magnitude 2.13 @ 20km ENE of Anthony, Kansas (37.2106667, -97.8151667) on 2016-08-12 07:19:55
ismpkansas70201148: magnitude 2.31 @ 11km SSE of Conway Springs, Kansas (37.2898333, -97.6091667) on 2016-08-12 09:56:01
ismpkansas70201168: magnitude 2.28 @ 11km SSE of Conway Springs, Kansas (37.2901667, -97.6085) on 2016-08-12 16:39:04
ismpkansas70201583: magnitude 1.81 @ 16km S of Conway Springs, Kansas (37.242, -97.6586667) on 2016-08-15 10:12:10
ismpkansas70203423: magnitude 1.91 @ 13km ESE of Harper, Kansas (37.2561667, -97.8805) on 2016-08-23 10:50:38
us10006c9i: magnitude 2.7 @ 6km ENE of Caldwell, Kansas (37.0547, -97.5375) on 2016-08-09 03:10:23
us10006ckk: magnitude 2.7 @ 7km NNE of Rose Hill, Kansas (37.6196, -97.1033) on 2016-08-10 06:49:29
us10006ezf: magnitude 3.5 @ 6km SW of Cheney, Kansas (37.5848, -97.8274) on 2016-08-19 06:40:43
us10006fkc: magnitude 2.8 @ 20km SE of Anthony, Kansas (37.0396, -97.8554) on 2016-08-21 06:49:49
us10006fkw: magnitude 2.2 @ 2km NNW of Derby, Kansas (37.5718, -97.2771) on 2016-08-21 10:10:16
us10006fq3: magnitude 2.9 @ 7km S of Anthony, Kansas (37.0838, -98.0433) on 2016-08-22 04:22:00
us20006u1y: magnitude 2.1 @ 3km S of Anthony, Kansas (37.1238, -98.0251) on 2016-08-25 17:04:19

Please select one of the following choices:
(1) Find by ID
(2) Find by location
(3) Map state
(4) Quit

Choice? 2
Location? Mars

Could not find any records matching that location.

Please select one of the following choices:
(1) Find by ID
(2) Find by location
(3) Map state
(4) Quit

Choice? 4

Map State

For option 3 (map state), you must:

  • Prompt the user for a state abbreviation (e.g., CA).

  • Prompt the user for a minimum magnitude. You need to ensure that the user enters a valid float. If not, continue to prompt until they do. We have provided a is_float function to help you.

  • If the state abbreviation is valid, you should draw the boundary of the given state (with help from a provided library) and draw a Circle at the location of each earthquake recorded in that state whose magnitude is larger than the minimum.

    When looking for matching states, use the get_location method of an earthquake object to retrieve the short location. For locations in the United States, the short location will contain the two-letter state abbreviation (e.g., CA).

  • The radius of the circle should increase based on the earthquake’s magnitude. Use a radius of: 2 ** magnitude * 0.01. (You’re welcome to experiment with different sizes, but please use this size when submitting your final version, to make the lab easier to grade.)

  • If the state abbreviation is not valid, print a suitable error message.

  • After drawing the map, wait until the user clicks inside the graphics window before closing the window and returning to the main menu.

Here are some examples:

$ python3 earthquakes.py

Please select one of the following choices:
(1) Find by ID
(2) Find by location
(3) Map state
(4) Quit

Choice? 3
Which state? ZZ
Minimum magnitude? big
Minimum magnitude? 1.3.4
Minimum magnitude? 1.4

Unknown state: ZZ

Please select one of the following choices:
(1) Find by ID
(2) Find by location
(3) Map state
(4) Quit

Choice? 3
Which state? CA
Minimum magnitude? 3.1

Produces:

Map of California with circles to indicate earthquake locations.

Provided Library

The earthquakes_lab library has some helpful tools for managing earthquake records in your program.

Earthquake Class

The library provides an Earthquake class, which encapsulates all the information you’ll need to keep track of one record. You can create an instance object of this class by calling the Earthquake() constructor with each field of information from one row of the earthquakes.txt file passed in as separate parameters. The number fields (magnitude, latitude, and longitude) are floats, and all other fields are strings.

You can call the following methods on a Earthquake object to retrieve information about it:

  • get_id(): Get the ID field. (string)

  • get_magnitude(): Get the magnitude field. (float)

  • get_location_full(): Get the full location field. (string)

  • get_latitude(): Get the latitude field. (float)

  • get_longitude(): Get the longitude field. (float)

  • get_location(): Get the short location field. (string)

  • get_date(): Get the date/time field. (string)

For example:

>>> from earthquakes_lab import *
>>> record = Earthquake("id12345", 3.5, "Parrish Hall, Swarthmore, Pennsylvania", 39.90523, -75.35425, "PA", "2021-11-13 23:59:59")
>>> print(record.get_magnitude())
3.5
>>> print(record.get_location_full())
Parrish Hall, Swarthmore, Pennsylvania
>>> print(record.get_location())
PA
>>> print(record)
id12345: magnitude 3.5 @ Parrish Hall, Swarthmore, Pennsylvania (39.90523, -75.35425) on 2021-11-13 23:59:59

Validating Floats

For option 3, you’ll prompt the user to enter a minimum magnitude, which needs to be a float. The library provides a is_float function that you can call on a string that will return a Boolean, where True means the string can safely be converted with float:

>>> from earthquakes_lab import *
>>> is_float("3.1")
True
>>> is_float("3.1.5")
False
>>> is_float("big")
False

Mapping a State

The library also provides a getStateGraphWin(state) function. If you pass this function a valid two-letter state abbreviation, the function will create and return a GraphWin object with the boundary of the state shown in black and the coordinate of the window set to be the minimum and maximum longitudes and latitudes of the state using setCoords. If the provided state string is not a valid abbreviation, a special None object is returned. The None type is not a GraphWin object and you cannot do anything with it besides check if it is None.

Here’s an example of this function:

state = "CA"
win = getStateGraphWin(state)
if win == None:
    print("Unknown state!")
    return

# Window is OK. You can proceed with drawing circles.

# Wait for mouse click before closing
win.getMouse()
win.close()

Requirements, Hints, and Tips

  • You should practice good top-down design, incrementally implement and test your solution, and document your code with comments. Start with the main menu and get a skeleton of your code working first. Then add each option from the menu one at a time (start with quit as it is easy and will allow you to exit your program when you are testing it). While much of the design is up to you, the requirements below are designed to avoid some headaches in the initial design.

  • When reading the input data file, process the file once and store all the records as a list of Earthquake objects. Reading the file once will make it faster to process later. Creating a list of Earthquake objects will avoid some list of list headaches.

  • To print an instance object of the Earthquake class, you can call print() directly on the object:

    record = Earthquake("id12345", 3.5, "Parrish Hall, Swarthmore, Pennsylvania", 39.90523, -75.35425, "PA", "2021-11-13 23:59:59")
    print(record)

    The class knows how to format itself into a string, so you can avoid writing nasty formatting strings.

  • Validate menu selections. If the user doesn’t enter a valid number (1-4), let the user know it wasn’t a valid selection and prompt them again for a new menu item.

  • When reading file input, split() only generates a list of strings. Some of the strings will need to be converted to floats.

  • Latitude is a y-like coordinate, and Longitude is x-like. Mixing these up will likely cause you to not see any circles on your map if everything else is working correctly:

    from earthquakes_lab import *
    
    win = get_state_graph_window("PA")
    
    #Only one of these points will display on the PA map
    pt1 = Point(-76, 40)
    pt2 = Point(40, -76)
    
    pt1.setFill("black")
    pt2.setFill("gold")
    
    pt1.draw(win)
    pt2.draw(win)
    
    win.getMouse()
    win.close()

Optional Extension

This is an optional extra challenge. This part does not affect your grade so please only attempt this after completing the rest of your lab. If you work on this extra challenge, please put your extensions in a file named extended_earthquakes.py to keep it separate from your graded submission.

When working with a real dataset, natural questions arise. For example, what were the earthquakes with the largest magnitudes? For this optional extension, add a fifth option to find the N largest-magnitude earthquakes, where N is a user-specified number. For example, here the asks for the eight largest earthquakes (by magnitude):

$ python3 extended_earthquakes.py

Please select one of the following choices:
(1) Find by ID
(2) Find by location
(3) Map state
(4) Find the N largest earthquakes (by magnitude)
(5) Quit

Choice? 4
How many (N)? 8

Largest 8 earthquakes by magnitude:
us100068jg: magnitude 7.7 @ 28km SSW of Agrihan, Northern Mariana Islands (18.5439, 145.541) on 2016-07-29 17:18:26
us10006exl: magnitude 7.4 @ South Georgia Island region (-55.2793, -31.874) on 2016-08-19 03:32:22
us10006d5h: magnitude 7.2 @ 109km E of Ile Hunter, New Caledonia (-22.4953, 173.1114) on 2016-08-11 21:26:35
us10006gbf: magnitude 6.8 @ 25km W of Chauk, Burma (20.9192, 94.5789) on 2016-08-24 06:34:55
us10006fik: magnitude 6.4 @ South Georgia Island region (-55.2775, -31.7546) on 2016-08-20 23:45:23
us10006a2k: magnitude 6.3 @ 70km ENE of Iwo Jima, Japan (24.9477, 142.0074) on 2016-08-04 12:24:33
us10006a1d: magnitude 6.2 @ 53km NW of Abra Pampa, Argentina (-22.3942, -66.0814) on 2016-08-04 10:15:12
us10006d6d: magnitude 6.2 @ South of the Fiji Islands (-25.1394, -177.3386) on 2016-08-11 23:29:33

Please select one of the following choices:
(1) Find by ID
(2) Find by location
(3) Map state
(4) Find the N largest earthquakes (by magnitude)
(5) Quit

Choice? 5

Submitting

Answer the Questionnaire

Please edit the Questions-08.txt file in your cs21/labs/08 directory and answer the questions in that file.

Once you’re done with that, run handin21 again.

Turning in your labs…​

Remember to run handin21 to turn in your lab files! You may run handin21 as many times as you want. Each time it will turn in any new work. We recommend running handin21 after you complete each program or after you complete significant work on any one program.

Logging out

When you’re done working in the lab, you should log out of the computer you’re using. First quit any applications you are running, like the browser and the terminal. Then click on the logout icon (logout icon or other logout icon) and choose "log out".

If you plan to leave the lab for just a few minutes, you do not need to log out. It is, however, a good idea to lock your machine while you are gone. You can lock your screen by clicking on the lock xlock icon. PLEASE do not leave a session locked for a long period of time. Power may go out, someone might reboot the machine, etc. You don’t want to lose any work!