CS21 Lab 8: Searching a Dataset
Due Saturday, November 16, before midnight
Lab Content Warning - READ THIS FIRST
This lab asks students to work with sensitive data on the topic of gun violence in the U.S. The dataset we’ll be searching contains records of fatal police shootings as recorded by the Washington Post. This dataset is disturbing, so we encourage you to prepare yourself before proceeding. If you believe that working on this lab will be personally traumatizing, please contact your CS 21 instructor to discuss the lab as soon as possible (by Monday, November 11). For context, we have chosen to use this dataset because it contains real data surrounding an important societal issue. As you progress in computer science, we believe that it’s important to make connections between the technical content we’re discussing in CS classes and the ways in which it might be used to affect the world. This lab is an attempt to do that. Please be respectful when talking to others about the lab’s content. We want CS 21 to be an open space that invites a civil exchange of ideas. |
Goals
-
Write a program that uses linear search
-
Write a program that uses binary search
-
Continue practicing top-down design (TDD)
-
Connect CS topics with social issues
Notes
Please read through the entire lab before starting!
This is a one week lab. For the previous lab, you used a full week to practice the TDD component and a second week to implement your solution. Since the design process should be similar to the previous lab, you will be doing the design and implementation for lab 8 in a single week. If you found the open-ended nature of the last lab challenging, you will want to start this lab early.
Police Shooting Data
Since January 1, 2015, the Washington Post has been compiling a database of every fatal shooting in the United States by a police officer in the line of duty. The Post is documenting only those shootings in which a police officer, in the line of duty, shot and killed a civilian — the circumstances that most closely parallel the 2014 killing of Michael Brown in Ferguson, Mo., which began the protest movement culminating in Black Lives Matter and an increased focus on police accountability nationwide. For this lab, we’ll be using a simplified version of this real dataset that comes from the CORGIS educational dataset archive.
While you may find this topic to be morbid, we hope that this lab illustrates how the skills you’re learning in computer science can be applied to other disciplines, including relevant societal issues.
Searching the Dataset
Your task is to write a program that:
-
Reads in the dataset from a file, storing a list of records. Each record contains details about the incident, including the name and demographic information of the deceased, the date of the incident, and the location where it occurred.
The data is stored in the file
/usr/local/doc/shooting-data.txt
(don’t copy this file to your 08 directory, just use"/usr/local/doc/shooting-data.txt"
as the file name in your python program). Each line of the file contains the following fields separated by commas:Name,Age,Sex,Race,Year,Month,Day,City,State
Here’s one example:
Andrew Jospeh Todd,20,Male,White,2015,12,12,East Stroudsburg,PA
-
Prompts the user with a menu of four choices:
Please select one of the following choices: 1. Find by name 2. Find by date 3. Map state 4. Quit Choice?
Depending on the user’s selection, you will prompt them for additional information and then do one of:
-
Search the database by name, using binary search.
-
Search the database by date, using linear search.
-
Produce a graphical state map to display the locations of police shootings by state.
-
Exit the program.
-
You program should continue prompting the user until they enter a valid choice and then perform the following queries listed below for each menu option. The program should exit if the user selects option 4 (Quit).
Find by name
For option 1 (find by name), you must:
-
Prompt the user to enter a name.
-
Use binary search to find the name in your list of records. The data file provided lists each incident in increasing order by name.
-
Print the information for the record matching the user-entered name, if there is one. See the note about printing records in the tips section below.
-
If the name was not in the list, let the user know you couldn’t find it.
You may assume that the names in the dataset are unique, and when grading, we will only query for names that are unique. In reality, there are a few duplicate names (e.g., Robert Martinez) and several entries for which the name is missing (appearing as "TK TK"). These issues are some of the challenges of working with real data. |
Here are some examples:
$ python3 police_shootings.py Please select one of the following choices: 1. Find by name 2. Find by date 3. Map state 4. Quit Choice? 1 Name to search for? Adam Smith Adam Smith, 33, Male, White, 2016, 7, 7, Clovis, CA Please select one of the following choices: 1. Find by name 2. Find by date 3. Map state 4. Quit Choice? 1 Name to search for? Jane Fakename Could not find any records matching name: Jane Fakename Please select one of the following choices: 1. Find by name 2. Find by date 3. Map state 4. Quit Choice? 4
Find by date
For option 2 (find by date), you must:
-
Prompt the user for a year, then prompt for a month, then prompt for a day. You may assume that the user will type in strings that can be converted to an integer with
int()
. -
Use linear search to find all the records that match the entered date. You should expect that, for most dates, there will be more than one matching record. Therefore, the search function you write should return a list of matching records.
-
Print the information for all matching records, one per line, if there were any matches. See the note about printing records in the tips section below.
-
If no records match the entered date, let the user know that you couldn’t find any.
Here are some examples:
$ python3 police_shootings.py Please select one of the following choices: 1. Find by name 2. Find by date 3. Map state 4. Quit Choice? 2 Year? 2016 Month? 8 Day? 19 Jorge Ceniceros, 22, Male, Hispanic, 2016, 8, 19, Phoenix, AZ Kole B. Knight, 31, Male, White, 2016, 8, 19, New London, WI Kyle Zimbelman, 29, Male, White, 2016, 8, 19, Reno, NV Marcelo Luna, 47, Male, Hispanic, 2016, 8, 19, East Hollywood, CA Todd P. Browning, 54, Male, White, 2016, 8, 19, East Ridge, TN Please select one of the following choices: 1. Find by name 2. Find by date 3. Map state 4. Quit Choice? 2 Year? 2025 Month? 10 Day? 22 Could not find any records matching that date. Please select one of the following choices: 1. Find by name 2. Find by date 3. Map state 4. Quit Choice? 4
Map State
For option 3 (map state), you must:
-
Prompt the use for a state abbreviation (e.g.,
PA
). -
If the state abbreviation is valid, you should draw the boundary of the given state (with help from a provided library) and draw a
Point
at the location of each shooting recorded in that state. -
If the state abbreviation is not valid, print a suitable error message.
Here are some examples:
$ python3 police_shootings.py Please select one of the following choices: 1. Find by name 2. Find by date 3. Map state 4. Quit Choice? 3 Which state? PB No state found for PB Unknown state: PB Please select one of the following choices: 1. Find by name 2. Find by date 3. Map state 4. Quit Choice? 3 Which state? PA
Provided Library
The police_shootings_lab
library has some helpful tools for managing shooting
records in your program.
Shooting Class
The library provides a Shooting
class, which encapsulates all the information you’ll
need to keep track of one record. You can create an instance object of this
class by calling the Shooting()
constructor with each piece of information
from one row of the shooting-data.txt
file passed in as a separate parameter.
Each number (e.g., age, year) is an integer, and all other fields are strings.
You can call the following methods on a Shooting
object to retrieve information about it:
-
get_name()
: Retrieve the name string. -
get_age()
: Retrieve the age integer. -
get_sex()
: Retrieve the sex string. -
get_race()
: Retrieve the race string. -
get_year()
: Retrieve the year portion of the incident date. -
get_month()
: Retrieve the month portion of the incident date. -
get_day()
: Retrieve the day portion of the incident date. -
get_city()
: Retrieve the city string. -
get_state()
: Retrieve the state string (abbreviated, e.g.,PA
).
For example:
>>> from police_shootings_lab import * >>> record = Shooting("John Doe", 40, "Male", "White", 2019, 10, 22, "Nowhere", "PA") >>> print(record.get_year()) 2019 >>> print(record.get_state()) PA >>> print(record) John Doe, 40, Male, White, 2019, 10, 22, Nowhere, PA
To create a list of all the shooting records from a file, you could use a sample function like the one below:
def read_input(filename):
shootings = []
f = open(filename, 'r')
for line in f:
line = line.strip().split(",")
shooting = Shooting(line[0], int(line[1]), line[2], line[3],
int(line[4]), int(line[5]), int(line[6]),
line[7], line[8])
shootings.append(shooting)
f.close()
return shootings
Mapping a State
The library also provides a get_state_graph_window(state)
function. If you pass
this function a valid two-letter state abbreviation, the function will create
and return a GraphWin
object with the boundary of the state shown in black and
the coordinate of the window set to be the minimum and maximum longitudes and
latitudes of the state using setCoords
. If the provided state string is not a
valid abbreviation, a special None
object is returned. The None
type is not a
GraphWin
object and you cannot do anything with it besides check if it is None
.
Here’s an example of this function:
state = "PA"
win = get_state_graph_window(state)
if win == None:
print("Unknown state!")
return
# Window is OK. You can do drawing here
# Wait for mouse click before closing
win.getMouse()
win.close()
City Coordinates
Finally, the library provides a get_city_coordinates(city, state)
function.
If you pass this function a valid city name and two-letter state abbreviation,
it will return a list containing the [latitude, longitude]
coordinates of the
city. Note: this function only knows about coordinates for the cities listed
in the shooting-data.txt
dataset. If you pass in a city/state pair that isn’t
in the dataset, it will return None
:
>>> from police_shootings_lab import * >>> print(get_city_coordinates("Philadelphia", "PA")) [39.9527237, -75.1635262] >>> print(get_city_coordinates("Swarthmore", "PA")) None
Requirements, Hints, and Tips
-
You should practice good top-down design, incrementally implement and test your solution, and document your code with comments. Start with the main menu and get a skeleton of your code working first. Then add each option from the menu one at a time (start with quit as it is easy and will allow you to exit your program when you are testing it). While much of the design is up to you, the requirements below are designed to avoid some headaches in the initial design.
-
When reading the input data file, process the file once and store all the records as a list of
Shooting
objects. Reading the file once will make it faster to process later. Using theShooting
list will avoid some list of list headaches. -
To print an instance object of the
Shooting
class, you can callprint()
directly on the object:record = Shooting("John Doe", 40, "Male", "White", 2019, 10, 22, "Nowhere", "PA") print(record)
The class knows how to format itself into a string, so you can avoid writing a nasty formatting string.
-
Validate menu selections. If the user doesn’t enter a valid number (1-4), let the user know it wasn’t a valid selection and prompt them again for a new menu item.
-
When reading file input, split() only generates a list of strings. Some of the strings may need to be converted to integers.
-
Latitude is a y-like coordinate, and Longitude is x-like. Mixing these up will likely cause you to not see any cities on your map if everything else is working correctly:
from police_shootings_lab import * win = get_state_graph_window("PA") #Only one of these points will display on the PA map pt1 = Point(-76, 40) pt2 = Point(40, -76) pt1.setFill("black") pt2.setFill("gold") pt1.draw(win) pt2.draw(win) win.getMouse() win.close()
Extra Challenge
This will not affect your grade or gain extra points. It is included just as an extra challenge, if you are interested (and have already finished the entire assignment as described above).
The mapping portion of the assignment asks you to draw Point
objects for each
city with a recorded shooting, but it doesn’t account for multiple incidents in
the same city — the repeated cities just draw additional points on top of each
other. Instead, modify the lab to draw circles whose radius increases as a
city appears more frequently.
For example, here’s what a map of PA might look like using Circle
objects,
centered at the latitude and longitude of the city, whose radius is .01 *
num_city_appearances
:
Answer the Questionnaire
Please edit
the Questions-08.txt
file in your cs21/labs/08
directory
and answer the questions in that file.
Once you’re done with that, run handin21
again.
Turning in your labs….
Remember to run handin21
to turn in your lab files!
You may run handin21
as many times as you want. Each time it will
turn in any new work. We recommend running handin21
after
you complete each program or after you complete significant
work on any one program.