This assignment is to be done
individually. Most subsequent lab assignments will be done with a partner.
The task for this week's lab is fairly manageable:
given employee data from two separate files, combine the contents in a
sorted list and output the results to file.
You will implement a linked-list data structure to maintain your sorted
list of employee data.
This will simulate, at a low-level, a common
operation for storing raw data in a DBMS.
While this lab serves primarily as a warm-up exercise and reminder of
C++ programming, we will also introduce new concepts.
The learning objectives for this
assignment:
- reacquaint students to programming in C/C++
- introduce binary file I/O
- utilize low-level memory management to manage data,
including using tools such as valgrind to debug memory errors
- introduce the const keyword and C-style strings
Lab 0 Starting point
First create a cs44 directory in your home directory, and add
a labs subdirectory to it:
mkdir cs44
cd cs44
mkdir labs
cd labs
pwd
We will be using git repos hosted on the college's GitHub server for
labs in this class. If you have not used git or the college's GitHub
server before, here are some instructions:
Using Git page (follow the instructions for repos on Swarthmore's GitHub Enterprise server).
Next find your git repo for this lab assignment off the GitHub server for
our class: CS44-f16
Clone your git repo with the lab 0 starting point files into
your labs directory:
cd ~/cs44/labs
git clone [the ssh url to your your repo)
Then cd into your Lab0-you subdirectory. If all was successful, you should
see the following files
(files highlighted in blue require modification):
- Makefile - pre-defined. You may edit this file to add
extra source files or execution commands.
- sortEmployees.cpp - your main program
for reading and sorting employee data.
- employee.[cpp/h] - the implementation
of an EmployeeList class that will represent a list of sorted
employees in a linked list.
- input/ - directory of sample input files in either text format
(.txt extension) or binary (.dat).
- createbin - an executable that will convert any ascii file
to binary format. This should prove helpful when attempting to debug your
file i/o.
- README.md - a few wrap-up questions
for you to answer about the lab assignment. Information about any
late days you used on this assignment, and how many you have used so far.
(Don't use your late days on this lab).
If this didn't work, or for more detailed instructions on git see:
the
Using Git page (follow the instructions for repos on Swarthmore's GitHub Enterprise server).
Run the following to create a symlink to the createbin program that you can
use to create binary files from ascii files (more
info below about how to run it):
make setup
Implementation Details
EmployeeList class
In employee.cpp/h,
you will implement an EmployeeList and related
EmployeeNode class to manage a sorted linked list of
employee data. The data will be sorted based on the name field.
Each employee (stored as an EmployeeNode)
will be described with a name (c-string) and salary (int).
A list of more specific requirements and details:
- EmployeeList has a very limited linked-list interface.
For
example, there is only one type of method for inserting and no methods
for removing. While this may seem like poor design, it is common to
implement limited data structures to accomplish specific tasks efficiently
and predictably. Do not modify the public interface, or add additional
data members. You may add private methods, if needed.
- Furthermore, you cannot make any changes to the interface of
EmployeeNode.
- The void EmployeeList::insert method must add
a new employee to
the list such that the resulting list is in sorted order. In other
words, you are performing insertion sort.
- int EmployeeList::writeToFile (char const *filename) const
writes the entire employee list, in sorted order, to the binary file
specified by the parameter filename. The file is written
in the same format used by readFromFile() in the
main program (detailed below). writeToFile returns 0 on
success and a non-zero error value on failure (be sure to specify
the meaning of error values in the function comments).
- You may not
be able to detect some output errors as fstream methods throw
exceptions when they fail, and handling C++ exceptions is not the focus
of this lab. Detect the errors you can (like being passed a null filename)
and ignore the rest. You certainly are welcome to use C++ exceptions here,
but you do not need to.
-
You will need to allocate (and de-allocate)
memory for EmployeeNode objects
and for names. The name field is C-style string and it should be
manipulated
using C-string library functions. You should, however, allocate it as a
C++ dynamic array of chars
(i.e. use the new operator).
See C style strings for information on the C string library functions that you
can use.
Do not forget to implement, and call, destructors, and always use
new and delete to allocated and free memory in C++
(do not use malloc and free). Also, remember that c-strings
need to store a null-terminating character ('\0').
Main program
In
sortEmployees, you will then
write a main program that
reads in employee data from two unsorted
binary files (the format of these files is described below),
merges the data from two files together in sorted order
on employee name, and
outputs the resulting sorted list of employee
data to a binary file. You may use any algorithm for sorting, but pick
one that works well for this application. Your program's output should
match the sample output regardless of how you sort.
Your program will also output information to standard output as it makes
progress. Specifically, your program should use formatted output
to display the employee data as seen in the input files and finalize
with a display of the sorted list of employee data with average salary.
as an example). The three files used by your program
(two input and one output) will be passed to your program via
command line arguments. See Tips and Hints for example usage.
Some more specifics to consider for your main program:
- In sortEmployees.cpp, you must add a file reading function,
int readFromFile (char const *filename, EmployeeList *list),
that reads in
employee records from the binary file specified by the parameter
filename. The function should read in employee records from
the file, and add them to the sorted EmployeeList.
- The above function returns 0 on success and any non-zero error value on
failure (there may be multiple reasons for failure; in the function
comments, be sure to specify the error associated with each value
you return).
- All of your code should demonstrate good design - your main should
be simple, and written in a top-down, modular fashion.
- Each function
should have a single, clear purpose.
- Each file must have a top-level comment describing the file's purpose
and authors.
- Each function should have a comment explaining its purpose as
well as parameters, return values, and error conditions.
- Your program must be free of memory-access errors. This includes both
memory leaks in out of bounds memory access. Recall that C++ does not
report this to you; you need to test your code and use tools such as
valgrind and gdb. See
Compiling, Debugging and Linking Tips for a reminder on how to use
these tools.
Tips and Hints
- C-style strings should be treated as dynamic arrays; you will need
to clean up memory when it is no longer being used. Also,
variables to c-strings are pointer values - treat that as such.
- '\t' is the tab character that can be used
to get nice tabular output. An alternative, and better, solution is
to use printf. See here.
- const is a modifier that guarantees that an entity will
not be modified while in scope. So for example, sending filename
as of type char const * instead of just char * ensures
that the method readFromFile cannot modify the string. This
is enforced by the compiler. Placing a const after a method
name ensures that the function will not modify any class variables (e.g.,
print should only access the EmployeeList/Node
values not modify them).
Note that nothing changes on the caller's end of function use. This
simply provides a contract guarantee to a user that their data cannot
be modified by a called function.
-
One way to verify that your writeToFile function works is to try
using the output file as one of the input files in a subsequent run.
Another way is to run wc or ls -l to get the number of
bytes in the output and input files (the output file should have
exactly the sum of the number of bytes in the two input files).
Also, you can get a hex dump of a binary file using xxd.
- When you hand in your code, I will also test it with other input files,
so you should try other cases as well. To create new binary input files,
use the createbin program provided with the starting point code:
./createbin asciiinputfile binaryoutfile
- Don't forget to close files after finishing any reading or writing!
- Above, I mention that you will return error statuses via an int for
your read and write functions. This is common practice, particularly in
C programs where exceptions are not used. To make this aspect of your
code even more readable, consider using meaningful
constant values. For example:
//Global static variable
static const int ERROR_NULL_FILENAME = -1;
/* Function purpose
* @params ...
* @return ...
* @error returns ERROR_NULL_FILENAME if the filename pointer is NULL
*/
int readFromFile(char const *filename, EmployeeList *list){
//...
//some code
if (filename == NULL){
return ERROR_NULL_FILENAME; //returns a 1, but someone reading your
//code cares more about the reason for error
}
//...rest of program
Instead of returning -1, the function returns a variable (this
variables value happens to be -1). The key is that in main, you can
check for this by comparing the return value to the error values you
expect to find. For example, if you store the function return value in
main using status, you can write:
if (status == ERROR_NULL_FILENAME){
//handle this error
}
This can be easier to understand (and less prone to bugs) than the statement:
if(status == -1){
//handle this error
}
Format of Employee Files
Each employee file is a binary file that stores a sequence of
variable length employee records. Each employee record has two fields:
name (variable length) and salary (4 byte integer). The format of an
employee record in the binary file is as follows:
----------------------------------------------------------------------------
| 4 byte integer | N character string | 4 byte integer |
| Number of characters | Employee name | Employee salary |
| in employee name (N) | (not null-terminated) | |
----------------------------------------------------------------------------
Reading and Writing binary files
To read integers and character strings from a binary file you can use:
//open file specified in filename. Second parameter specifies that
// the file is for input and is in binary format
fstream infile(filename, ios::in | ios::binary);
int nameLen;
char *name;
// read 4 bytes into memory location of nameLen
// here we typecast the address of nameLen (the destination) as a char *
infile.read((char*)&nameLen, sizeof(int));
//Allocated space and read characters for string
name = new char[nameLen+1];
infile.read(name, nameLen);
// remember to null terminate strings with '\0'
...
infile.close();
To write integers and character strings to a binary file you can use:
fstream outfile(filename, ios::out | ios::binary);
int nameLen = ...;
char *name = ...;
...
outfile.write((char*)&nameLen, sizeof(int));
outfile.write(name, nameLen);
outfile.close();
When you are done implementing the employee list classes, you should
be able to type make and have your code compile to give an
executable called sortEmployees. Run this executable to test your code.
Sample Output
# a run with the wrong number of command line args, should exit with message
$ ./sortEmployees
usage: sortEmployees 'file1' 'file2' 'resultfile'
# a run with correct number of command line args:
$ ./sortEmployees input/infile1.dat input/infile2.dat result.dat
See
here for the expected output.
The input data files are given to you with the starting point code. These
files, however, do not test all corner cases for the linked list so be
sure to design further tests.
You should design a strategy
for verifying your output files are correct, as well. For example, you can use
createbin to create a binary file (given your expected result in text format)
and then diff your actual result with the createbin result.
Submitting your lab
Before the Due Date, push your solution to github from one of your
local repos to the GitHub remote repo.
From your local repo (in your ~you/cs44/labs/Lab01-you subdirectory)
git add *
git commit -m "my correct and well commented solution for grading"
git push
If that doesn't work, take a look at the "Troubleshooting" section of the
Using git
page.
Also, be sure to complete the README.md file.