CS35: Lab 1

The goal of this lab is to gain more comfort with C++ functions and arrays. We will practice using arrays to process a list of numbers in various ways. A empty version the program will appear in your cs35/labs/01 directory when you run update35. The program handin35 will only submit files in this directory. In later labs you can work with a partner, but for this lab you should work on your own.

Introduction

For this lab you will prompt the user to enter a series of integers in a given range. Then you will perform some statistical analyses on the data and report the results. The most common statistical measure is the mean, which is simply the average. Another useful measure is the standard deviation, which provides an indication of how much the individual values in the data differ from the mean. A small standard deviation indicates that the data is tightly clustered. A large standard deviation indicates that the data is widespread. Finally, a histogram is a graphical way of summarizing the data by dividing it into separate ranges and then indicating how many data values fall into each range. Here is a sample run of a program that performs these three statistical analyses:

This program calculates statistics on a set of given data.
It calculates the mean, standard deviation, and prints a 
histogram of the values.

Enter integers in the range 0-100 to be stored, -1 to end.
> 90
> 89
> 97
> 95
> 123
Invalid value, try again.
> 84
> 71
> 78
> 88
> 82
> 100
> 55
> -1
Read in 11 valid values.

Mean: 84.455

Standard deviation: 12.324

Histogram:
   0 -    9: 
  10 -   19: 
  20 -   29: 
  30 -   39: 
  40 -   49: 
  50 -   59: *
  60 -   69: 
  70 -   79: **
  80 -   89: ****
  90 -   99: ***
 100 -  100: *

Program Requirements

Develop your program incrementally. Create a main function first. Declare the array that will hold the data here and pass it to the functions for processing. Add the required functions one at a time, testing each one by calling it from main. Remember that you will need to declare the functions above main and then define them below main. Once you are convinced that a function is correct, move on to the next required function.

To read in the data, you should use a function with the prototype:
```
int getIntegerArray(int data[], int capacity, int min, int max, int sentinel);
```
This function will read integers into the data array, which can hold at most capacity values. The values must be between min and max. The user will enter the sentinel value to quit. This function will return the number of valid values read in. Note that the return value should be less than the capacity. In your main function, you can define your array to have a large capacity, say 500 or 1000. Test your function to make sure it works with different sentinel parameters, e.g., -1 or 999.
To calculate the mean, you should use a function with the prototype:
```
float mean(int data[], int n);
```
that returns the average of the values stored in the data array whose effective size is n. The effective size should be the number of valid integers read by getIntegerArray. Recall that this should be less than capacity. Thus, even if you array could hold 500 numbers, if the user only types in 10 numbers, the effective size is 10.
To calculate the standard deviation, you should use a function with the prototype:
```
float standardDeviation(int data[], int n)
```
that retuns the standard deviation of the values stored in the data array whose effective size is n. In order to calculate the standard deviation, perform the following steps:
1. Calculate the mean of the values in the array.
2. Go through the individual values in the array and calculate the square of the difference between each value and the mean. Add all of these squared differences to a running total. There is a pow(base, exponent) function that you can use to calculate the square, which is part of the cmath library.
3. Take the total from the previous step and divide it by the number of values in the array.
4. Calculate the square root of the resulting quantity, which represents the standard deviation. There is a sqrt function, which is also part of the cmath library.
5. Be sure to include the cmath library at the top of your program.
To print the histogram, you should use a function with the prototype:
```
void histogram(int data[], int n, int min, int max, int binSize);
```
Notice that the return type of this function is void. This indicates that a function does not return any value. In this case the function prints information on the screen rather than returning a value. The parameters of this function include the data array, its effective size n, the smallest and largest possible values stored in the array, and the size of each bin. The number of asterisks in a particular row indicates the number of values in the data that fall into each designated range. You will need to use printf rather than cout to format the output appropriately. In the sample run shown above, the bin size was 10. However, you should not assume that the bin size will always be 10. Shown below is the same set of data used in the sample run, but with a binSize of 7. Test you code with various min, max and binSize parameters.
```
Histogram:
   0 -    6: 
   7 -   13: 
  14 -   20: 
  21 -   27: 
  28 -   34: 
  35 -   41: 
  42 -   48: 
  49 -   55: *
  56 -   62: 
  63 -   69: 
  70 -   76: *
  77 -   83: **
  84 -   90: ****
  91 -   97: **
  98 -  100: *
```
Notice that the last bin will often be a different size than the all the other bins.

Tips

Use printf to format your histogram nicely. You can read more about printf online.

One possible way of computing the histogram is to scan the entire array of input values and count the number of values that fall in a particular bin. This is rather slow. Another option is to create an array of bucket counts (one for each bin in the histogram), and for each value, scan the array of buckets to determine if value v should go in bucket i. This is also slow (Is it equally slow?). A final option is for each value in the data array to compute the ID of the bucket containing that value, and updating the appropriate count. After processing all data values, you can then scan the list of bucket counts and print out the histogram. I encourage you to aim for this approach.

Instead of typing in a bunch of numbers each time to test your program, you can save some sample data in a file, e.g., test1.txt containing only the input values:

Then you can use input redirection to have your program read input from a file, e.g., ./stats < test1.txt. Try it. I have included test1.txt as one sample test. You may want to add others. This may be how your instructor tests your submisison, so it is a good idea to try it before I do (Hint: I do not use small data sets)

Optional Extensions

There are some optional extensions you could add to this lab. Below I list a few exercises you may wish to try. These exercises are entirely optional and will neither raise nor lower your grade. Try these exercise only after you have completed the required portion of the lab.

An alternate definition of the standard deviation is to compute the average of the squares of the entries and subtractthe squares of the averages of the entries (parse that sentence carefully). Try this method and try to reuse your mean function twice to avoid code duplication.

The median of a set of numbers is the number that would appear in the middle of a sorted list of numbers. Write a function to compute the median of your values. Is is necessary to sort? Can you reuse some of the ideas used to compute the histogram?

Submit

Once you are satisfied with your code, hand it in by typing handin35. This will copy the code from your cs35/labs/01 to my grading directory. You may run handin35 as many times as you like, and only the most recent submission will be recorded.

CS35 Lab1: Introduction to C++