The goal of this lab is to gain more comfort with C++ functions
and arrays. We will practice using arrays to process a list of numbers
in various ways. A empty version the program will appear in your
cs35/labs/01 directory when you run update35. The
program handin35 will only submit files in this directory. In
later labs you can work with a partner, but for this lab you should
work on your own.
For this lab you will prompt the user to enter a series of integers in a given range. Then you will perform some statistical analyses on the data and report the results. The most common statistical measure is the mean, which is simply the average. Another useful measure is the standard deviation, which provides an indication of how much the individual values in the data differ from the mean. A small standard deviation indicates that the data is tightly clustered. A large standard deviation indicates that the data is widespread. Finally, a histogram is a graphical way of summarizing the data by dividing it into separate ranges and then indicating how many data values fall into each range. Here is a sample run of a program that performs these three statistical analyses:
This program calculates statistics on a set of given data. It calculates the mean, standard deviation, and prints a histogram of the values. Enter integers in the range 0-100 to be stored, -1 to end. > 90 > 89 > 97 > 95 > 123 Invalid value, try again. > 84 > 71 > 78 > 88 > 82 > 100 > 55 > -1 Read in 11 valid values. Mean: 84.455 Standard deviation: 12.324 Histogram: 0 - 9: 10 - 19: 20 - 29: 30 - 39: 40 - 49: 50 - 59: * 60 - 69: 70 - 79: ** 80 - 89: **** 90 - 99: *** 100 - 100: *
int getIntegerArray(int data[], int capacity, int min, int max, int sentinel);This function will read integers into the data array, which can hold at most capacity values. The values must be between min and max. The user will enter the sentinel value to quit. This function will return the number of valid values read in. Note that the return value should be less than the capacity. In your main function, you can define your array to have a large capacity, say 500 or 1000. Test your function to make sure it works with different sentinel parameters, e.g., -1 or 999.
float mean(int data[], int n);that returns the average of the values stored in the data array whose effective size is n. The effective size should be the number of valid integers read by getIntegerArray. Recall that this should be less than capacity. Thus, even if you array could hold 500 numbers, if the user only types in 10 numbers, the effective size is 10.
float standardDeviation(int data[], int n)that retuns the standard deviation of the values stored in the data array whose effective size is n. In order to calculate the standard deviation, perform the following steps:
void histogram(int data[], int n, int min, int max, int binSize);Notice that the return type of this function is void. This indicates that a function does not return any value. In this case the function prints information on the screen rather than returning a value. The parameters of this function include the data array, its effective size n, the smallest and largest possible values stored in the array, and the size of each bin. The number of asterisks in a particular row indicates the number of values in the data that fall into each designated range. You will need to use printf rather than cout to format the output appropriately. In the sample run shown above, the bin size was 10. However, you should not assume that the bin size will always be 10. Shown below is the same set of data used in the sample run, but with a binSize of 7. Test you code with various min, max and binSize parameters.
Histogram: 0 - 6: 7 - 13: 14 - 20: 21 - 27: 28 - 34: 35 - 41: 42 - 48: 49 - 55: * 56 - 62: 63 - 69: 70 - 76: * 77 - 83: ** 84 - 90: **** 91 - 97: ** 98 - 100: *Notice that the last bin will often be a different size than the all the other bins.
One possible way of computing the histogram is to scan the entire array of input values and count the number of values that fall in a particular bin. This is rather slow. Another option is to create an array of bucket counts (one for each bin in the histogram), and for each value, scan the array of buckets to determine if value v should go in bucket i. This is also slow (Is it equally slow?). A final option is for each value in the data array to compute the ID of the bucket containing that value, and updating the appropriate count. After processing all data values, you can then scan the list of bucket counts and print out the histogram. I encourage you to aim for this approach.
Instead of typing in a bunch of numbers each time to test your program, you can save some sample data in a file, e.g., test1.txt containing only the input values:
90 89 97 95 123 84 71 78 88 82 100 55 -1Then you can use input redirection to have your program read input from a file, e.g., ./stats < test1.txt. Try it. I have included test1.txt as one sample test. You may want to add others. This may be how your instructor tests your submisison, so it is a good idea to try it before I do (Hint: I do not use small data sets)
An alternate definition of the standard deviation is to compute the average of the squares of the entries and subtractthe squares of the averages of the entries (parse that sentence carefully). Try this method and try to reuse your mean function twice to avoid code duplication.
The median of a set of numbers is the number that would appear in the middle of a sorted list of numbers. Write a function to compute the median of your values. Is is necessary to sort? Can you reuse some of the ideas used to compute the histogram?