CS21 Lab 09: Sorting

Due Saturday, November 23, by 11:59pm


Please read through the entire lab before starting!

In this lab you will continue using the Novel data set from Lab 08. In Lab 08, you implemented three ways to search the data. In this lab, you will implement two ways to filter and sort the data instead of just searching.

All of the standard style guidelines from previous labs still apply. Be sure to follow good top-down-design practices.

As you write programs, use good programming practices:

  • Use a comment at the top of the file to describe the purpose of the program (see example).

  • All programs should have a main() function (see example).

  • Use variable names that describe the contents of the variables.

  • Write your programs incrementally and test them as you go. This is really crucial to success: don’t write lots of code and then test it all at once! Write a little code, make sure it works, then add some more and test it again.

  • Don’t assume that if your program passes the sample tests we provide that it is completely correct. Come up with your own test cases and verify that the program is producing the right output on them.

  • Avoid writing any lines of code that exceed 80 columns.

    • Always work in a terminal window that is 80 characters wide (resize it to be this wide)

    • In vscode, at the bottom right in the window, there is an indication of both the line and the column of the cursor.

Function Comments

All functions should have a top-level comment! Please see our function example page if you are confused about writing function comments.

Goals

  • Write a program that filters data by different criteria

  • Write a program that sorts data by different fields

  • Continue practicing Top-Down Design (TDD)

  • Connect CS topics with real-world data

1. Filtering and sorting the novel dataset

1.1. Relationship to Lab 08

In this lab, you will implement a program called sort_novels.py, which will allow the user to filter the novels data set and sort by different fields. This is a new lab, and you will write a new program to solve this lab. It is related to last week’s lab and there is some code from your previous lab that you can reuse if you’d like. You don’t need to re-implement functions you wrote for Lab 08 if they work as expected.

Two major pieces from Lab 08 that you’ll need to reuse are the function that reads the data from the .dsv file into a list of Novel objects, and the function that displays the table of results. If your versions of these functions aren’t working, we’ve provided working implementations for you. There are other functions you wrote (such as the functions for displaying the menu and making sure that the menu only accepts valid inputs) that you can likely use in this lab with very little modification and you are encouraged to copy over the relevant portions of your previous lab if they are working well.

1.2. Expectations

  • Your program should read in stored data from a file, storing a list of records. This is the same as your solution to Lab 08 and you may copy over the code to read this data over to your Lab 09 solution.

    As before, the data is stored in the file:

    /data/cs21/novels/novels.dsv

    Please don’t copy this file to your labs/09 directory, just use "/data/cs21/novels/novels.dsv" as the file name in your python program.

    As you read in the data, each line of the file corresponds to one novel, which your program should represent using a Novel object and accumulate into a list.

    Recall that there are two versions of the data file; the complete one is: /data/cs21/novels/novels.dsv but there’s also a small copy that’s had most of the lines removed, which may be easier to use for testing your program, particularly in the early stages: /data/cs21/novels/novels_small.dsv

    If your version of this function from Lab 08 is broken, you can use a version we have provided. At the top of your file, include the line from novels_lab9 import read_novels. The function definition and block comment for read_novels are shown below:

    def read_novels(filename):
        """
        Read the contents of a .dsv file containing information about the
        top 500 novels. Returns a list of Novel objects where each object
        represents one line of the file.
    
        Args:
           filename (str): the name of the file to read
    
        Returns:
           list: a list of Novel objects representing the data in the file
        """
  • Repeatedly presents a menu to the user with six choices shown below:

    Please select one of the following choices:
    1. Filter by title
    2. Filter by gender
    3. Sort by rank (worst to best)
    4. Sort by number of ratings (least to most)
    5. Reset list to all novels
    0. Quit
    
    Enter selection:

    Depending on the user’s selection, prompt them for additional information and then do one of:

    1. Filter the current set of novels by title (using linear search)

    2. Filter the current set of novels by gender (using linear search)

    3. Sort the current set of novels by rank (from worst rank to best rank)

    4. Sort the current set of novels by number of ratings (from fewest ratings to most ratings)

    5. Reset the current set of novels to the original data set

    6. Exit the program

  • When sorting, your program should perform the sort in place, meaning it should mutate the list passed in to the sorting function. It should not create a new list.

  • You must use different sorting algorithms for the two sort operations (e.g. Selection Sort for one and Bubble Sort for the other)

  • Your filtering functions should return a new list of records that match only the filter criteria.

  • Ensure that the user provides valid input for the menu choices.

  • After a filter or sort step, print all current records.

  • Use formatted printing to show search results in a table format that fits in an 80-character wide terminal window exactly like you did for lab 08. If your code from Lab 08 to display the records isn’t working, you can use a version we have provided. At the top of your file, include the line from novels_lab9 import display_novel_data. The function definition and block comment for display_novel_data are shown below:

    def display_novel_data(novel_list):
        """
        Given a list of Novel objects, display the data in a nice format
        as shown in the Lab 8 and Lab 9 writeups. If the novel_list is
        empty, prints "No records found".
    
        Args:
           novel_list (list): a list of Novel objects to print
    
        Returns:
           No return value.
        """

1.4. Novel Dataset

Refer to Lab 08 for an overview of the Novel data set.

Recall that each line of the file contains several features, which can be easily accessed using the provided library.

1.5. Tips

1.5.1. Continue to use TDD and incremental development

We strongly recommend that you create a top-down design with a fully fleshed out main and stubs for all other functions. Likewise, you should continue to implement and test one function at a time, making sure it works correctly before moving on to the next one.

1.5.2. Re-use code from Lab 08

Since this lab works with the same data as we used for Lab 08 and performs a number of similar operations, you should try to re-use functions you wrote for Lab 08 when completing this lab, rather than re-implementing them from scratch. Simply copy/paste these functions into your new python file, and then modify them if necessary.

It’s fine to copy your own code; however, be sure that you’re only re-using code that you personally wrote. Using code from other students, the internet, etc. is still the same violation of academic misconduct that it has been all semester.

1.5.3. Create modular, reusable functions

Avoid duplicating similar code in multiple places throughout the program. Think about how you could re-use the same function. For example, we are doing two types of filters (by state and by population), but the results of those searches are displayed in exactly the same way. Consider having a function that takes in a list of results and displays them. Then you could call this function for either type of filter or sort.

Note also that you may be able to re-use functions you wrote for the perevious lab; you should always re-use code that you wrote in the past to make your current life easier!

Likewise, sorting by state and sorting by population can re-use the same swap function. You may need to copy/modify the inclass examples to sort objects by different fields (population/state name). Make sure when doing a comparison to determine which item comes first, you only compare the appropriate field/subpart of each object.

1.5.4. Applying filters

The initial set of records should be the every record in the data file. However, anytime you apply a filter operation, the filter operation replaces the current set of records with the records matching the filter. Thus if you filter by the title using "house" and then sort by rank, you should only display the records from novels that contain the substring "house", and those records should be sorted by rank. The reset option restores the current set of records to be all records. For this reason, you will therefore need separate variables to keep track of the "original" data set and the "filtered" data set.

Note that filtering is algorithmically the same as the linear searches you did in Lab 08. When you filter, you will use linear search to find all matching Novel records and you will return them in a list. Because the data might be sorted by fields other than the year of publication, you can’t use binary search in this lab. That is, when searching for matching titles in order to apply a filter, you do not know how the records are sorted (they could change after a sort operation). For this reason, you should use linear search for all your filters.

A sample non-trivial workflow might be to filter by a title (e.g., "house"), sort by ranking, then filter by gender (e.g. "female"). The list of records displayed after this sequence would be novels that contain the word "house", were written by "female"-identified authors, and sorted by rank.

1.5.5. Copying a list

Normally, assigning two variables to refer to the same list is what you want to happen, since copying a list may be computationally expensive. However, in this program you’ll need to modify a list while still maintaining a copy of the original list (so you can implement the "reset" functionality).

In python, the easiest way to do this is to use the .copy() method of the list class. To see how it works, try running code like the following:

$ python3
>>> a = [3, 7, 2, 4]
>>> b = a
>>> c = a.copy()
>>> a[1] = 5
>>> a
[3, 5, 2, 4]
>>> b
[3, 5, 2, 4]
>>> c
[3, 7, 2, 4]

Notice that when we change list a, list b also changes. That’s because on the stack, both a and b are pointing to the same list. Changing a therefore changes b. But c is an exact copy of a from before we modified list a. Therefore, changing a doesn’t impact the value of c. If we wanted to restore a back to its original value, we could say a = c.copy().

1.5.6. Adapting sort

You should use two of the quadratic sorts discussed in class. Just like you did in Lab 08, you can start by using the sorting algorithms we provided you. Then, you will need to modify them slightly so that they sort by rank and number of ratings, respectively.

When sorting by rank, the worst-ranked novel should be at the beginning of the list and the best-ranked novel should be at the end of the list. The worst-ranked novel in the dataset has rank 500 (Deception Point by Dan Brown) and the best-ranked novel in the dataset has rank 1 (Don Quixote by Miguel de Cervantes)

When sorting by number of ratings, the novel with the fewest ratings should be at the beginning of the list and the novel with the most ratings should be at the end of the list.

2. Provided Library

Remember that this lab comes with a novels_lab library that gives you several helpful tools intended to make this lab easier. See the appropriate section of Lab 08 for full details.

3. Requirements

The code you submit for labs is expected to follow good style practices, and to meet one of the course standards, you’ll need to demonstrate good style on six or more of the lab assignments across the semester. To meet the good style expectations, you should:

  • Write a comment that describes the program as a whole using a comment at the top of the file.

  • All programs should have a main() function.

  • Use descriptive variable names.

  • Write code with readable line length, avoiding writing any lines of code that exceed 80 columns.

  • Add comments to explain complex code segments (as applicable).

  • Write a comment for each function (except main) that describes its purpose, parameters and return value.

In addition, you’ll need to demonstrate good top-down design practices on two or more lab assignments across the semester. To meet the top-down design expectations, you should:

  • Have a main() function that serves as an outline for your program.

  • Your main() function should call other functions that implement subtasks.

  • The functions that are called by main() should be well-named and have good comments explaining their purpose, parameters (including their types), and return value (including its type).

Your program should meet the following requirements:

  1. Your program must (a) read in the data from the .dsv file, using your code from last week or our provided code, (b) display well-formatted results after every filter or sort, using your code from last week or our provided code, and (c) return the user to the main menu after every filter, sort or reset operation until the user quits.

  2. Your program must save a copy of the original data set. When a filter or sort operation is performed, you should filter or sort the copy, not the original data. This will allow the original data set to be restored by when the user selects the reset option.

  3. When the user selects filter by title, your program must search the copy of the novels data set by title using linear search.

  4. When the user selects filter by gender, your program must search the copy of the novels data set by gender using linear search.

  5. When the user selects sort by rank, your program must sort the copy of the novels data set by rank using a quadratic sort algorithm. The sort must be done in place, meaning that the list passed in to the sorting function should be mutated. The worst-ranked novel should be at the beginning of the list and the best-ranked novel should be at the end of the list. Remember that the worst-ranked novel is #500 and the best-ranked novel is #1, so you’ll need to make sure that the lowest numbered rank is at the end of the list.

  6. When the user selects sort by number of ratings, your program must sort the copy of the novels data set by number of ratings using a different quadratic sort algorithm than you used to sort by rank. The sort must be done in place, meaning that the list passed in to the sorting function should be mutated. The novel with the fewest ratings should be at the beginning of the list and the novel with the most ratings should be at the end of the list.

  7. When the user selects reset list to all novels, your program must restore the original novels data set by copying the original data set back into the copy. At no point should you modify your original data set, and you should not read the data from the file more than once during the entire running of the program.

4. Extra Challenges

Here are some optional extensions! These are not required, but they might be fun and rewarding.

If you implement extensions, please make sure the original menu operations remain the same. That is, you are welcome to add a menu item 6, but don’t change menu item 3 to be menu item 4.

4.1. Optional: Add to the working set

As the lab shows, you can filter on different criteria. Consider adding additional filters of your choosing, or extending your records by adding an e.g., 'extend by title' option that queries all records and expands the current set of records by the new title. For example, a title filter by "house" followed by an extend by title "home" would result in a list containing novels with either "house" or "home" in the title.

4.2. Optional: Sort or filter on other fields

The lab asks you to sort and filter on only a two fields. However, you can apply the same process to sort any of the fields accessible through the object. Choose one (or more) additional fields to add to the set of things the program can deal with (sorting, filtering, extending).

4.3. Optional: Add features based on your own interests!

Do whatever you’d like that we haven’t told you to do! Go crazy!

Answer the Questionnaire

After each lab, please complete the short Google Forms questionnaire. Please select the right lab number (Lab 09) from the dropdown menu on the first question.

Once you’re done with that, you should run handin21 again.

Submitting lab assignments

Remember to run handin21 to turn in your lab files! You may run handin21 as many times as you want. Each time it will turn in any new work. We recommend running handin21 after you complete each program or after you complete significant work on any one program.

Logging out

When you’re done working in the lab, you should log out of the computer you’re using.

First quit any applications you are running, including your vscode editor, the browser and the terminal. Then click on the logout icon (logout icon or other logout icon) and choose "log out".

If you plan to leave the lab for just a few minutes, you do not need to log out. It is, however, a good idea to lock your machine while you are gone. You can lock your screen by clicking on the lock xlock icon. PLEASE do not leave a session locked for a long period of time. Power may go out, someone might reboot the machine, etc. You don’t want to lose any work!