CS21 Lab 8: Climate Database: Searching
Due Saturday, April 6, by 11:59pm
Goals
The goals for this lab assignment are:
-
Practice programming with file I/O
-
Practice using real-world data
-
Practice using binary and linear search
-
Practice writing a menu-driven program
-
Get more practice with top down design and writing functions
-
Get more practice with incremental implementation and testing
Getting and Working on Lab08 Code
Make sure you are working in the correct directory when you edit and run lab 08 files.
Run update21
to get the starting point code of the next lab assignment.
This will create a new ~/cs21/labs/08
directory for your lab work.
Edit and run Lab 08 programs from within your cs21/labs/08
directory.
Make sure all programs are saved to your cs21/labs/08
directory! Files outside that directory will not be graded.
$ update21 $ cd ~/cs21/labs/08 $ pwd /home/username/cs21/labs/08 $ ls (should see your program files here)
Then edit and run program files in your ~/cs21/labs/08
directory:
$ code filename.py $ python3 filename.py
Programming Tips
As you write programs, use good programming practices:
-
Use a comment at the top of the file to describe the purpose of the program (see example).
-
All programs should have a
main()
function (see example). -
Use variable names that describe the contents of the variables.
-
Write your programs incrementally and test them as you go. This is really crucial to success: don’t write lots of code and then test it all at once! Write a little code, make sure it works, then add some more and test it again.
-
Don’t assume that if your program passes the sample tests we provide that it is completely correct. Come up with your own test cases and verify that the program is producing the right output on them.
-
Avoid writing any lines of code that exceed 80 columns.
-
Always work in a terminal window that is 80 characters wide (resize it to be this wide)
-
In
vscode
, at the bottom right in the window, there is an indication of both the line and the column of the cursor.
-
Function Comments
All functions should have a top-level comment! Please see our function example page if you are confused about writing function comments.
Lab08 Overview
Climate scientists often use computer programs to simulate the interactions of the earth’s atmosphere, oceans, land surface, and ice, and also to analyze past data in order to understand the effects of human activity on the climate and to make predictions of future climate changes.
In this assigment you will write a program climate.py
that reads in
climate data from a file, and then allows the user to select a specific query
about the data from a menu of options and display the result.
The data we are using was obtained from Climate Watch.
The professors cleaned up the raw data from this source for you, and created a file consisting of some historical annual carbon dioxide ("CO2") total emission amounts (in Megatons), gross domestic product (or GDP, in billions of US dolars), and population (in millions of people) data for most countries in the world.
However, like most real world data, there are some missing values in this data set, and your program will need to appropriately ignore missing values when computing the result of a user’s query on the data.
We will use this data set in the next lab assignment, too, so you will reuse some of the initial processing of the data and some helper functions you write for this lab when you complete the next one.
General Requirements
-
Your program should have many functions, including: reading in the data from the file, one for performing each operation, one for the main data processing loop, and some helper functions for reading in and checking input values in different ways. We leave many of the function definition and design up to you, but you should define others the way we specify.
-
Your program will use binary search on different data fields; note that the data in the file is sorted by countries' names.
-
You program should be well designed: use good modular design, have complete function comments, use descriptive variable names, have no line wrapping, and be robust to input errors; the requirements for each part of the assignment describe what types of bad input your program does and does not need to handle.
General Hints/Tips
-
Use what you know about good Top-down design and incremental implementation and testing to implement this large program. Do not try to complete it all in one sitting! Rather, implement and test each piece a little bit at a time.
-
Refer to in-class code for examples. You likely need to refer to code from many different weeks depending on the example you are looking for, as we’ve covered functions, while loops, formatted output, file I/0, searching, strings, lists, etc. over many weeks.
-
Use the
print()
function to add debug statements to help you see what your program is doing as you try to find and fix bugs (and be sure to remove these after you fix the bugs, though!).
Example Run
The link below is output from an example run of a working program that chooses each menu option, some more than one time. Additionally, with the details of each Menu Option in the sections below, we also show example output of just that menu option.
The following are the details of each of the main parts of your program:
1. Create a List of Country
Objects
We will provide the definition of a class called Country
that is further
described below.
The first thing your program will do is read in data from the file
and create a list of Country
objects. Each country’s information is
on a single line of the file. Individual values are comma separated.
The file is in sorted order by country name, and your resulting list should maintain that order (entries in sorted order by country name).
We suggest that you write a get_data
function to perform this action.
get_data
should take the name of the climate data file as its argument and
returns a list of Country
objects, one per line in the file. Your
code can assume that the file exists.
There are two files you can use to run your program:
-
/usr/local/doc/climatewatchdata.csv
: is the full set of climate data. Your submitted solution should open and work correctly with this file. -
small.csv
: is a smaller file with information about just 20 countries. It may be useful to use this file when you are first debugging and testing some of your program’s functionality.
For each line of the file you read in, you should:
-
process the line from the file to extract the 10 values for the country (one string and 9 floats). See the details about the input file’s format below.
-
create a
Country
object with these data. See information about theCountry
class below. -
add to the
Country
object to the list to be returned by the function, making sure that the resulting list will be in sorted order by country name (matching the order of the set of country information in the file).
1.1. Input File Format
Each line in the input file contains 10 comma-separated values for a country in the following order:
Name,1960,1980,2000,2020,2022,pop_1960,pop_2020,gdp_1960,gdp_2020
For example, here is the information for two countries from the file (note
that Namibia has some missing data (for 1960 and 1980 CO2 levels and for
1960 GDP) that are represented with -1
values:
Namibia,-1,-1,1.6048,3.6818,3.953,0.634138,2.540916,-1,10.56263738 Nepal,0.080608,0.5413,3.0374,14.9024,15.5,10.10506,29.136808,0.508334414,33.43367051
The values for each country on each line are as follows:
-
The first value is the name of the Country, which may include white space characters and other non-alphabetic characters. For example,
European Union (27)
is the name of one "country" in this list. Your program will use this value as astr
. -
The next five values are yearly total CO2 emission values for the years 1960, 1980, 2000, 2020, and 2022 (the most recent year in the data set). The amounts are in units of Mt (Megatons of CO2 emissions). Your program will use these as
float
values. -
The next two values are the country’s populations for the years 1960 and 2020. These are in units of millions of people. Your program will use these as
float
values. -
The final two values are the country’s total GDP for the years 1960 and 2020. These are in units of billions of US dollar equivalents. Your program will use these as
float
values.
about missing values
There are missing data for some fields of some countries. Missing
values are encoded as Your program needs to correctly handle any missing values correctly---do
not do arithmetic using Any field that stores a numeric value could have a missing value
(i.e, any CO2 value, population value, or GDP value could be A country’s name field will never be a missing: no country has
a name of |
1.2. The Country
Class
The starter code imports the Country
class that you should use: you
will create a list of Country
objects, one for each line in the file.
The Country class constructor is invoked passing in the following information:
next_country = Country(name, co2_vals, pop_1960, pop_2020, gdp_1960, gdp_2020)
This creates a new Country
object with the following values:
-
name
: the name of the country (str) -
co2_vals
: a list containing the five CO2 emission values for 1960, 1980, 2000, 2020, 2022 (alist of float
) -
pop_1960
: 1960 population in Millions (float
) -
pop_2020
: 2020 population in Millions (float
) -
gdp_1960
: 1960 GDP in Billions of US dollars (float
) -
gdp_2020
: 2020 GDP in Billions of US dollars (float
)
1.3. All Country
Class Methods
Here is complete information about the Country
class and its
method functions: Country Class Documentation
1.4. Example output (reading file)
Here is program output from the first step: reading in the file and creating
the list of objects. Your program should print out the number of countries
in the file, i.e. the number of elements in the list that is returned from
the call to your get_data
function:
$ python3 climate.py There are data for 193 countries in this file
2. Main Loop and the Menu
After your program reads in the file data and creates a list of
Country
objects, it should call a function that, in a loop:
-
prints out a menu of options for the user to choose from
-
performs the operation on the data, and displays the results (some output needs to be in tabular form, details below)
Your program should repeat these steps until the user chooses the menu
option to quit
.
The six menu options for getting information about the climate data set include:
-
Print the name and population value of the least populated country and the most populated country in the year given by the user (either 1960 or 2020).
-
List all countries with a yearly CO2 emissions level above a lower bound level, for a specified year. The user enters values for the lower bound level and for the year (one of 1960, 1980, 2000, 2010, 2020, 2022).
-
Print the name and CO2 level of the country with largest CO2 per GDP for a given year. The user enters the year (either 1960 or 2022).
-
Print the name and population of all countries with population larger than a given value for given year. The user enters the value and the year (either 1960 or 2022).
-
Print out all information about a country given its name entered by the user.
-
Quit
Be sure to read the "Required Features" and "Hints/Tips" before you
start implementing this part.
One requirement is that you implement
a specific helper function, get_value_between
, that you will use
in several places in your program.
2.1. Example Output: Menu
Here is example output from a working program that reads in the
data, prints out the menu, reads in an option from the user with 6
as the
quit
option, and performs the action (note how it handles
bad input values):
$ python3 climate.py
There are data for 193 countries in this file
======== Menu Options: ========
1. country w/lowest and country w/highest population in given year
2. list all countries with CO2 above some value for a given year
3. country with largest CO2 per GDP for a given year
4. countries with population larger than given value for given year
5. print a country's info
6. quit
Select a menu option
Enter a value between 1 and 6: 9
9 is not a valid choice, try again
Enter a value between 1 and 6: 7
7 is not a valid choice, try again
Enter a value between 1 and 6: -3
-3 is not a valid choice, try again
Enter a value between 1 and 6: 6
bye bye
2.2. Required Features
-
Your program should list the 6 menu options in the exact order as the example shown above. Do not choose a different ordering of operations on the data (e.g., option
3
needs to show the country with the largest CO2 emissions per GDP for a given year). -
Your program should gracefully handle, and re-prompt for, invalid menu options entered by the user.
You should implement a function,
get_value_between
that takes a string with instructions (like"Select a menu option"
) and two int values, low and high, and returns a value betweenlow
andhigh
inclusive. If the value of thelow
parameter is larger than thehigh
parameter, the function should just return the value oflow
and not prompt for any input.In the example output above, we called our function like this:
option = get_value_between("Select a menu option", 1, 6)
Your program does not need to handle non-integer value input like the user entering
hello there
at the menu options prompt.This function will be useful for implementing some other parts of your program.
-
Menu option
6
(quit) should be implemented with this step. Print out a good bye message and return from your main menu function back tomain
. -
Your main menu looping should work at this point. For menu options not yet implemented, just print out the menu option selected by the user, and then your function should repeat its main actions: print out the menu; get the next selection from the user; repeat.
-
Feel free to add any other helper functions you’d like.
2.3. Hints/Tips
-
Implement and test your
get_value_between
function independently by adding some calls to it frommain
to test different values. Then add the call to this function to your main loop function. Try for values other than just 1 and 6, and for different instruction strings as well. -
Refer to the in-class programs and to previous lab assignments that use while-loops and functions.
-
Refer to the Country Class Documentation for methods that might be helpful for implementing this menu option.
3. Menu Option 1
Implement a function to perform menu option 1: print out information about the country with the lowest and the country with the highest population in a given year.
Be sure to read the "Required Features" and "Hints/Tips" before you
start implementing this option.
One requirement is that you implement
a specific helper function, get_value_in_set
, that you will use
in this, and in other, menu options.
3.1. Example Output
Here is some example output from this option:
$ python3 climate.py
There are data for 193 countries in this file
======== Menu Options: ========
1. country w/lowest and country w/highest population in given year
2. list all countries with CO2 above some value for a given year
3. country with largest CO2 per GDP for a given year
4. countries with population larger than given value for given year
5. print a country's info
6. quit
Select a menu option
Enter a value between 1 and 6: 1
Enter one of 1960 or 2020 for the year: 1981
1981 is not a valid value, try again
Enter one of 1960 or 2020 for the year: 1956
1956 is not a valid value, try again
Enter one of 1960 or 2020 for the year: 2020
---- Population in 2020 ----
lowest: 0.01083 M in Nauru
highest: 1411.10000 M in China
======== Menu Options: ========
1. country w/lowest and country w/highest population in given year
2. list all countries with CO2 above some value for a given year
3. country with largest CO2 per GDP for a given year
4. countries with population larger than given value for given year
5. print a country's info
6. quit
Select a menu option
Enter a value between 1 and 6: 1
Enter one of 1960 or 2020 for the year: 1
1 is not a valid value, try again
Enter one of 1960 or 2020 for the year: 1960
---- Population in 1960 ----
lowest: 0.00438 M in Nauru
highest: 667.07000 M in China
======== Menu Options: ========
...
3.2. Required Features
-
This option must be implemented in a separate function: don’t include its code in your function that gets the menu option. Instead, that function should make a call to your function that implements this feature when the user selects menu option 1.
-
Implement a helper function,
get_value_in_set
that takes a list of values and a prompt string and returns the user’s entered choice from one of the values in the set. You will use this function in many other menu options. For this one, you will use it to have the user enter a value for one of the two years for which there are population data (either 1960 or 2020). -
You should handle missing values from the data set and ignore them in your output. Missing values are represented with
-1
in the data file. Note: any field in a Country object that stores a numeric value could be missing (i.e, a CO2 value, a population value or a GDP value). A country’s name will note be a missing value. -
You should print out each country’s name and the population, and a header line with the year in a format similar to our output.
-
Use formatted
print
to ensure that the population and name values of the two countries with the lowest and highest populations for the given year line up vertically in the output. -
Population values should be printed with 4 places beyond the decimal point. For example,
%12.4f
is a placeholder for a float value, printed in a field width of12
with4
places beyond the decimal point. -
Print out a heading for the data returned that includes the year.
-
-
Use methods of the
Country
class to access appropriate values from each country’s data: Section 1.3
3.3. Hints/Tips
-
Test your
get_value_in_set
function independently by adding some calls to it frommain
. Try passing different values to ensure that it works correctly. -
Look at example in-class code for lists, searching, objects, and output formatting.
-
Try running your program on the smaller file to help you debug, then comment out this call from main and try on the bigger file.
-
You may add other helper functions to implement this feature if you’d like.
-
Refer to the Country Class Documentation for methods that might be helpful for implementing this menu option.
4. Menu Option 2
Implement a function to perform menu option 2: list all countries in a given year that have a CO2 level above an amount entered by the user.
Be sure to read the "Required Features" and "Hints/Tips" before you start implementing this option.
4.1. Example Output
Here is some example output from this option:
======== Menu Options: ========
1. country w/lowest and country w/highest population in given year
2. list all countries with CO2 above some value for a given year
3. country with largest CO2 per GDP for a given year
4. countries with population larger than given value for given year
5. print a country's info
6. quit
Select a menu option
Enter a value between 1 and 6: 2
Enter a year (one of 1960, 1980, 2000, 2020, 2022): 2
2 is not a valid value, try again
Enter a year (one of 1960, 1980, 2000, 2020, 2022): 1980
Enter a lower CO2 limit value
Enter a value between 0 and 20000: 1000
Country 1980 CO2 emissions > 1000.0000 Mt
--------------------------------------------------------------
China 1494.4959
European Union (27) 4077.5007
Germany 1100.0660
Russia 2129.1103
United States of America 4808.5564
======== Menu Options: ========
...
Select a menu option
Enter a value between 1 and 6: 2
Enter a year (one of 1960, 1980, 2000, 2020, 2022): 2022
Enter a lower CO2 limit value
Enter a value between 0 and 20000: 1000
Country 2022 CO2 emissions > 1000.0000 Mt
--------------------------------------------------------------
China 11396.7774
European Union (27) 2761.9071
India 2829.6442
Japan 1053.7979
Russia 1652.1773
United States of America 5057.3038
======== Menu Options: ========
...
4.2. Required Features
-
This option must be implemented in a separate function: don’t include its code in your function that gets the menu option. Instead, that function should make a call to your function that implements this feature when the user selects menu option 2.
-
Use your helper function,
get_value_in_set
to get a value for the year from the user (one of 1960, 1980, 2000, 2020 or 2022, which are the five years with CO2 emissions values). -
Use your helper function,
get_value_between
to get a value for the lower limit (a value between 0 and 20000). -
You should handle missing values (ignore them in your output). Missing values are represented with
-1
. -
You should print out each country’s name and CO2 emissions in tabular format, and a header line with the year in a format similar to our output.
4.3. Hints/Tips
-
Try running your program on the smaller file to help you debug, then comment out this call from main and try on the bigger file.
-
You may add other helper functions to implement this feature if you’d like.
-
Refer to the Country Class Documentation for methods that might be helpful for implementing this menu option.
5. Menu Option 3
Implement a function to perform menu option 3: print out information about the country with the highest CO2 emissions per GDP for a given year, i.e. the one for which emissions divided by GDP is largest in that year.
Be sure to read the "Required Features" and "Hints/Tips" before you start implementing this option.
5.1. Example Output
Here is some example output from this option:
======== Menu Options: ========
1. country w/lowest and country w/highest population in given year
2. list all countries with CO2 above some value for a given year
3. country with largest CO2 per GDP for a given year
4. countries with population larger than given value for given year
5. print a country's info
6. quit
Select a menu option
Enter a value between 1 and 6: 3
Enter one of 1960 or 2020 for the year: 2000
2000 is not a valid value, try again
Enter one of 1960 or 2020 for the year: 2020
---------- largest per GDP in 2020
Iran 2.9325/B
======== Menu Options: ========
...
5.2. Required Features
-
This option must be implemented in a separate function: don’t include its code in your function that gets the menu option. Instead, that function should make a call to the your function that implements this feature when the user selects menu option 3.
-
Use your helper function,
get_value_in_set
to get a value for the year from the user (one of 1960 or 2020, the two years with GDP info). -
You should handle missing values (ignore them in your output). Missing values are represented with
-1
. -
You should print out each country’s name and GDP, and a header line with the year in a format similar to our output.
5.3. Hints/Tips
-
Try running your program on the smaller file to help you debug, then comment out this call from main and try on the bigger file.
-
You may add other helper functions to implement this feature if you’d like.
-
Refer to the Country Class Documentation for methods that might be helpful for implementing this menu option.
6. Menu Option 4
Implement a function to perform menu option 4: print out information about all countries with a population larger than a given value for the given year.
Be sure to read the "Required Features" and "Hints/Tips" before you start implementing this option.
6.1. Example Output
Here is some example output from this option:
======== Menu Options: ========
1. country w/lowest and country w/highest population in given year
2. list all countries with CO2 above some value for a given year
3. country with largest CO2 per GDP for a given year
4. countries with population larger than given value for given year
5. print a country's info
6. quit
Select a menu option
Enter a value between 1 and 6: 4
Enter one of 1960 or 2020 for the year: 2322
2322 is not a valid value, try again
Enter one of 1960 or 2020 for the year: -3
-3 is not a valid value, try again
Enter one of 1960 or 2020 for the year: 2020
Enter population lower bound in millions of people
Enter a value between 0 and 8000: -3
-3 is not a valid choice, try again
Enter a value between 0 and 8000: 66666666666666
66666666666666 is not a valid choice, try again
Enter a value between 0 and 8000: 300
Country 2020 Population > 300.000000 M
----------------------------------------------------------
China 1411.100000
European Union (27) 447.479493
India 1380.004385
United States of America 331.501080
======== Menu Options: ========
..
6.2. Required Features
-
This option must be implemented in a separate function: don’t include its code in your function that gets the menu option. Instead, that function should make a call to the your function that implements this feature when the user selects menu option 4.
-
Use your helper function,
get_value_in_set
to get a value for the year from the user (one of 1960 or 2020, the two years with population info). -
Use your helper function,
get_value_between
to get a value for the lower bound for the population (a value between 0 and 8000). -
You should handle missing values (ignore them in your output). Missing values are represented with
-1
. -
You should print out each country’s name and the population in tabular format, and with a header line with the year. Your output should be similar to ours.
6.3. Hints/Tips
-
Try running your program on the smaller file to help you debug, then comment out this call from main and try on the bigger file.
-
You may add other helper functions to implement this feature if you’d like.
-
Refer to the Country Class Documentation for methods that might be helpful for implementing this menu option.
7. Menu Option 5
Implement a function to perform menu option 5: print out a country’s information.
Be sure to read the "Required Features" and "Hints/Tips" before you start implementing this option. One requirement is that you use binary search.
7.1. Example Output
Here is some example output from this option:
======== Menu Options: ========
1. country w/lowest and country w/highest population in given year
2. list all countries with CO2 above some value for a given year
3. country with largest CO2 per GDP for a given year
4. countries with population larger than given value for given year
5. print a country's info
6. quit
Select a menu option
Enter a value between 1 and 6: 5
Enter the name of a country: Greece
Greece
1960 pop: 8.331725 2020 pop: 10.700556
1960 GDP: 4.335186 2020 GDP: 188.835202
1960 1980 2000 2020 2022
9.391500 50.888700 102.973200 55.619800 59.662800
======== Menu Options: ========
1. country w/lowest and country w/highest population in given year
2. list all countries with CO2 above some value for a given year
3. country with largest CO2 per GDP for a given year
4. countries with population larger than given value for given year
5. print a country's info
6. quit
Select a menu option
Enter a value between 1 and 6: 5
Enter the name of a country: hello
Sorry, hello is not in the database
======== Menu Options: ========
...
7.2. Required Features
-
You must use binary search to find the country with a matching name in your list of
Country
objects. -
This option must be implemented in a separate function: don’t include its code in your function that gets the menu option. Instead, that function should make a call to the your function that implements this feature when the user selects menu option 5.
-
You may print out the country’s information by calling
print
function passing in theCountry
object. (See the Country Class Documentation) -
If the country is not in the database, print out a message saying it is not present.
-
Your program does not need to handle user’s entering a country’s name in the wrong case. For example, if the user enters
greece
instead ofGreece
, it is fine if your program prints out thatgreece
is not in the database.
7.3. Hints/Tips
-
Try running your program on the smaller file to help you debug, then comment out this call from main and try on the bigger file.
-
You may add other helper functions to implement this feature if you’d like.
-
Refer to the Country Class Documentation for methods that might be helpful for implementing this menu option.
Answer the Questionnaire
After each lab, please complete the short Google Forms questionnaire. Please select the right lab number (Lab 08) from the dropdown menu on the first question.
Once you’re done with that, you should run handin21
again.
Submitting lab assignments
Remember to run handin21
to turn in your lab files! You may run handin21
as many times as you want. Each time it will turn in any new work. We
recommend running handin21
after you complete each program or after you
complete significant work on any one program.
Logging out
When you’re done working in the lab, you should log out of the computer you’re using.
First quit any applications you are running, including your vscode editor, the browser and the terminal. Then click on the logout icon ( or ) and choose "log out".
If you plan to leave the lab for just a few minutes, you do not need to log out. It is, however, a good idea to lock your machine while you are gone. You can lock your screen by clicking on the lock icon. PLEASE do not leave a session locked for a long period of time. Power may go out, someone might reboot the machine, etc. You don’t want to lose any work!