CS21 Lab 10: Sorting
Due Saturday, April 16, by 11:59pm
Please read through the entire lab before starting!
In this lab you will continue using the Hyrdoelectric Dam data set from Lab 9, but support filtering and sorting operations instead of searching.
Programming Tips
As you write programs, use good programming practices:
-
Use a comment at the top of the file to describe the purpose of the program (see example).
-
All programs should have a
main()
function (see example). -
Use variable names that describe the contents of the variables.
-
Write your programs incrementally and test them as you go. This is really crucial to success: don’t write lots of code and then test it all at once! Write a little code, make sure it works, then add some more and test it again.
-
Don’t assume that if your program passes the sample tests we provide that it is completely correct. Come up with your own test cases and verify that the program is producing the right output on them.
-
Avoid writing any lines of code that exceed 80 columns.
-
Always work in a terminal window that is 80 characters wide (resize it to be this wide)
-
In
vim
, at the bottom left in the window, there is an indication of both the line and the column of the cursor.
-
Are your files in the correct place?
Make sure all programs are saved to your cs21/labs/10
directory! Files
outside that directory will not be graded.
$ update21 $ cd ~/cs21/labs/10 $ pwd /home/username/cs21/labs/10 $ ls Questions-10.txt (should see your program files here)
Goals
-
Write a program that filters data by different criteria
-
Write a program that sorts data by different fields
-
Continue practicing Top-Down Design (TDD)
-
Connect CS topics with real-world data
1. Filtering the Hydroelectric Dam Data
1.1. Hydroelectric Dam Data Set
Refer to Lab 9 for a overview of the Hydroelectric Dam Dataset.
Recall that each line contains 7 features. The following table provides information about these features.
Feature index |
Type |
Description |
0 |
string |
Dam name |
1 |
string |
Watercourse name |
2 |
string |
County |
3 |
string |
State |
4 |
float |
Crest Elevation |
5 |
float |
Crest Length |
6 |
int |
Year |
In this lab we will focus on using features 3 and 6.
You will implement a program called sort-dams.py
, which will allow the user
to filter this data set and sort by different fields.
1.2. Program Requirements
-
Reads in stored data from a file, storing a list of records. The data is stored in the file:
/data/cs21/hydropower/hydropower.csv
Note: don’t copy this file to your
labs/10
directory, just use"/data/cs21/hydropower/hydropower.csv"
as the file name in your python program.As you read in the data, each line of the file corresponds to one record, which your program should represent using a HydroDam object and store in a list.
This is likely the same as your solution to lab09 and you may copy over the code to read this data over to your lab10 solution.
-
Repeatedly presents a menu to the user with six choices shown below:
Please select one of the following choices: 1. Filter dams by state name 2. Filter dams by year 3. Sort by state name 4. Sort by year 5. Reset list to all dams 6. Quit Choice?
Depending on the user’s selection, prompt them for additional information and then do one of:
-
Filter the data set by state name.
-
Filter the data set by min and max year
-
Sort the current set of records by state name
-
Sort the current set of records by year
-
Reset the current set of records to the original data set.
-
Exit the program.
-
-
When filtering, your program should prompt for the filter criteria (state name or min/max year) and return a new list of records that match only the filter criteria.
-
Ensures that the user provides valid input (menu choices and numerical years).
-
After a filter or sort step, print all current records in the appropriate order
-
Uses formatted printing to show search results in a table format that fits in an 80-character wide terminal window.
1.3. Example Output
-
Here are some examples of filtering:
-
Here are some examples of handling valid/invalid input:
1.4. Tips
1.4.1. Continue to use TDD
We strongly recommend that you create a top-down design with a fully fleshed
out main
and stubs for all other functions.
1.4.2. Create modular, reusable functions
Avoid duplicating similar code in multiple places throughout the program. Think about how you could re-use the same function. For example, we are doing two types of filters (by state and by year), but the results of those searches are displayed in exactly the same way. Consider having a function that takes in a list of results and displays them. Then you could call this function for either type of filter or sort.
Likewise, sorting by year and sorting by state can re-use the same swap function. You may need to copy/modify the inclass examples to sort objects by different fields (year/state name). Make sure when doing a comparison to determine which item comes first, you only compare the appropriate field/subpart of each object.
1.4.3. Displaying matches
You can re-use the same functions to display matches as you did in lab 09. Refer to Lab 09 section 1.4.3 for string slicing and display tips.
1.4.4. Applying filters
The initial set of records should be all dams. However, anytime you apply a filter operation, the filter operation replaces the current set of records with the records matching the filter. Thus if you filter by the state name "Montana" and then sort by year, you should only display the dams in Montana sorted by year. The reset option restores the current set of records to be all records.
A sample non trivial workflow might be to filter by a state (e.g., "Utah"), sort by year, then filter by year (e.g. 1970-2000). The current list of records after this sequence would be dams in Utah built between 1970 and 2000.
When searching for matching records and applying the filter, you do not know how the records are sorted (they could change after a sort operation). For this reason, you can use a linear search strategy for all your filters.
When filtering by year, you just need to check if the year entered by the user is a non-negative integer. Years of 500, or 2045 are fine inputs, even if there are no dams in the data set from these years.
1.4.5. Adapting sort
You should use one of the quadratic sorts discussed in class. However, you will need to adapt your algorithms to sort objects instead of just a list of ints. You will likely need to have two different but similar sort functions to sort by year and sort by state. The basic structure of both algorithms will be the same. Only the way you do the comparisons would be different.
2. Provided Library
The lab comes with a hydrodam_lab
library that gives you several helpful
tools intended to make this lab easier:
2.1. HydroDam Record Class
The library provides a HydroDam
class, which encapsulates all the features
needed to store one "record." You can create an object of this class by
calling the HydroDam()
constructor with each field of information passed in
as separate parameters. The types of the fields are specified in the table below.
Once you have a HydroDam
object, you can use methods to retrieve information about it:
Parameter |
Type |
Description |
Getter Method |
0 |
string |
Dam name |
|
1 |
string |
Watercourse name |
|
2 |
string |
County |
|
3 |
string |
State |
|
4 |
float |
Crest Elevation |
|
5 |
float |
Crest Length |
|
6 |
int |
Year |
|
You’ll need to get the various fields from the CSV file, convert them to the
correct types, and then pass them to the constructor. You can use the string
method .split()
in order to turn one long string into a list of shorter
strings. This method takes an argument to let you specify what to use as the
"separator;" in this case, you’ll want to split based on commas, using a
command like:
s.split(",")
The constructor creates an object from some parameters:
myDam = HydroDam("My Dam", "Some River", "Some County", "A State", 100.0, 500.0, 2022)
3. Extra Challenges
Here are TWO optional extensions! Neither is required, but either or both might be fun and rewarding.
3.1. Optional: Add more filters
As the lab shows, you can filter on different criteria. Consider adding additional filters of your choosing, or extending your records by adding an e.g., extend by state option that queries all records and expands the current set of records by the given state. For example, a filter by "Montana" followed by an extend by "Pennsylvania" would result in a current list containing dams in either "Montana" or "Pennsylvania".
3.2. Optional: Find the first dam in every state
Print a list of the earliest dam in each state. You will likely want to sort the entire data set by year first. Then when filtering, only add the next record if the state of the corresponding record has not been added already. You may want to make a temporary list of previously seen state name to help determine if you already have the earliest dam for a given state
Answer the Questionnaire
Each lab will have a short questionnaire at the end. Please edit
the Questions-10.txt
file in your cs21/labs/10
directory
and answer the questions in that file.
Once you’re done with that, you should run handin21
again.
Submitting lab assignments
Remember to run handin21
to turn in your lab files! You may run handin21
as many times as you want. Each time it will turn in any new work. We
recommend running handin21
after you complete each program or after you
complete significant work on any one program.
Logging out
When you’re done working in the lab, you should log out of the computer you’re using.
When Remotely logged in
When you are ssh’ed into the CS labs, first quit any applications you are
running, like vim
, then simply type exit
at the prompt in your terminal
window to disconnect.
When Physically logged in
When you are in a CS lab logged into a CS machine. First quit any applications you are running, like the browser and the terminal. Then click on the logout icon ( or ) and choose "log out".
If you plan to leave the lab for just a few minutes, you do not need to log out. It is, however, a good idea to lock your machine while you are gone. You can lock your screen by clicking on the lock icon. PLEASE do not leave a session locked for a long period of time. Power may go out, someone might reboot the machine, etc. You don’t want to lose any work!