CS21 Lab 8: Sorting
Due Saturday, April 24, before midnight (US/Eastern local to Swarthmore)
Read through this entire lab before you begin coding!
Goals
-
Continue to gain proficiency at top-down design and bottom-up implementation.
-
Continue practicing using a list of lists to store data.
-
Understand how to adapt selection sort to real-world data.
-
Understand how to adapt selection sort to sort from highest to lowest instead of lowest to highest.
Video game dataset
For this assignment, we will continue to work with the video game data we used in Lab 7. The video game data is stored in CSV format and is located at: /data/cs21/videogames/video_games.csv
As a reminder, here are some details about the data set, beginning with the first five lines and last five lines from this file:
Title,MaxPlayers,Genre,ReviewScore,Sales,UsedPrice,Console,Rating,Year,PlayLength AC/DC Live: Rock Band - Track Pack,4,Action / Simulation,63,0.19,14.95,X360,M,2008,3.933333333 AC/DC Live: Rock Band - Track Pack,1,Action / Simulation,60,0.19,14.95,PS3,E,2008,3.933333333 Ace Combat 6: Fires of Liberation,1,Action / Simulation,80,0.58,19.95,X360,E,2007,13.75 Ace Combat X: Skies of Deception,1,Action / Simulation,75,0.32,22.95,PSP,E,2006,4.116666667 ... Zendoku,1,Role-Playing (RPG) / Strategy,68,0.04,16.95,DS,E,2007,4 Zoids Assault,1,Role-Playing (RPG) / Strategy,46,0.07,17.95,X360,M,2007,12.05 Zoo Keeper,1,Action,74,0.1,14.95,DS,T,2004,5 Zoo Tycoon DS,1,Simulation / Strategy,44,0.92,12.95,DS,T,2005,-1 Zubo,1,Adventure / Role-Playing (RPG) / Strategy,75,0.05,14.95,DS,E,2008,15
Notice that the file is sorted by game title.
Each line contains 10 features. The following table provides more detailed information about each of these features.
Feature |
Type |
Description |
Notes |
0 |
string |
Game Title |
|
1 |
integer |
Max number of players |
|
2 |
string |
Genre |
Listing multiple, seperated by slashes |
3 |
integer |
Review Score |
|
4 |
float |
Sales |
Measured in millions of dollars |
5 |
float |
Price |
Average "used" price (in dollars) |
6 |
string |
Console |
|
7 |
string |
Rating |
|
8 |
integer |
Release Year |
|
9 |
float |
Play Length |
Average time in hours to play through the game (-1 if play through time is not available) |
Using sorting to explore video game data
You will implement a program called sort-games.py
, which will allow the user to search through this dataset and learn particular information. To start, copy your Lab 07 search-games.py
file into a new sort-games.py
that you will use for Lab 08:
cp ~/cs21/labs/07/search-games.py ~/cs21/labs/08/sort-games.py
Users may get a list of games filtered by Console and Genre and sorted by review score from highest to lowest. Users can also get a list of Genres in the data (sorted alphabetically) if they need help crafting a search. Genre searches should include partial matches (so the results for the query "Racing" should include "Alone in the Dark" because its listed Genre is "Action / Adventure / Racing / Driving")
Users may get a list of games ranked by sales figures by year:
Program Requirements
-
All numeric data in the file should be type cast to the correct type when stored in the program.
-
Compares data in lowercase format (so that user input is case insensitive) but displays results with their original case.
-
Repeatedly presents a menu to the user with six choices: to search by console/year, to search by game title, to sort best reviewed games, to sort best selling games, to list the possible genres, or to quit.
-
Ends when the user chooses to quit.
-
Ensures that the user provides a valid menu choice.
-
Uses formatted printing to show search results in a table format that fits in an 80-character wide terminal window.
-
Uses linear search to find matches by console/year/genre.
-
Uses binary search to find matches by game title.
-
Searches are limited to returning a maximum of 25 results.
-
Uses selection sort to sort results (you must write this yourself and cannot use the built-in
.sort()
method) -
Finds a list of genres present in the data and presents them to the user in alphabetical order (you cannot hardcode the genres)
-
You do not need to verify the user typed in a valid console or title. If the user types in a console or title that doesn’t exist in the data set, your program will report that no results were found.
Tips
Extracting the Genre Data
To implement menu item #5, you’ll need to store a list of possible genres somewhere. You can accomplish this by iterating through all the game entries, and dividing up the 'genre' field into its component parts using the <str>.split()
function (notice that multiple genres are seperated by '/'), clearing away any whitespace with <str>.strip()
, and accumulating the results. (Be sure not to add any repeat genres to your list!)
Sorting results
In order to display the games with the highest review score or the highest selling games, you should first use a search method to narrow down the amount of data you need to sort. You already have search functions written from Lab 7, so reuse your code if you can. For example, to display the highest selling games by year, first find all the games for that particular year (you already did that as part of Lab 7) and then sort the resulting list by the Sales value.
If two (or more) games have the same review score or sales numbers, it does not matter which order you display them. You don’t need to make sure the games remain in alphabetical order.
Sorting from largest to smallest
The selection sort algorithm that we covered in class finds the smallest item and puts it first. Then it finds the second smallest item and puts it second, etc. This gives you a list that is sorted from smallest to largest. If you want to build a list that is sorted from largest to smallest, you have two options:
-
You can change the selection sort algorithm so that it finds the largest element and puts it first, then finds the second largest element and puts it second, etc.
-
You can use your existing selection sort algorithm, then use the
reverse()
method on lists to reverse the order of the elements:
lst = [1, 2, 3]
lst.reverse()
print(lst) # prints [3, 2, 1]
Extra Challenge
This is an optional extension and is not required.
Which console has games with the highest average review scores across the most categories? To answer this first find the list of genres and list of consoles. Then compute the mean review score within each genre for each of the five console. Next, use that to sort the five consoles (in rank order from 1-5) for each genre. Finally, sum up the number 1st place finishes for each console. Which one has the highest?
For another bonus, try to come up with a different way to rank consoles. Describe and implement your alternative system and see if it leads to a different result than the above.
Answer the Questionnaire
Please edit
the Questions-08.txt
file in your cs21/labs/08
directory
and answer the questions in that file.
Once you’re done with that, run handin21
again.
Turning in your labs….
Remember to run handin21
to turn in your lab files!
You may run handin21
as many times as you want. Each time it will
turn in any new work. We recommend running handin21
after
you complete each program or after you complete significant
work on any one program.