Checkpoint for Weekly Lab 2 (Sept 16): By the time you come to lab,
your EmployeeList
and
EmployeeNode
classes should be implemented
and passing all provided tests, including checking for memory access
errors and memory leaks.
1. Overview
Due by 11:59 p.m., Saturday, September 19, 2020.
The task for this week’s lab: given employee data from two separate files, combine the contents into a sorted list and output the results to file. You will implement a linked-list data structure to maintain your sorted list of employee data.
While this lab serves primarily as a warm-up exercise and reminder of C++ programming, we will also introduce new concepts. The learning objectives for this assignment:
-
implement a data structure (linked lists) using low-level memory management
-
reacquaint students to programming in C/C++
-
introduce binary file I/O
-
manage heap memory, including using
valgrind
to debug memory errors -
introduce the
const
keyword and review C-style strings (char *
)
1.1. Getting Started
If you have not already done so, first create a course directory for this course, and add a lab subdirectory for your lab repos:
mkdir -p cs44
mkdir -p cs44/labs
cd cs44/labs
We will be using git
repos hosted on the college’s
GitHub server for labs in this class.
If you have not used git
or the college’s GitHub server before, here
are some detailed instructions on
using
git for CS44 labs.
Next find your git repo for this lab assignment off the GitHub server for our class: cs44-f20
Clone your git repo (Lab1-userID1-userID2
) containing starting point files into your
labs directory:
cd ~/cs44/labs
git clone [the ssh url to your your repo]
cd Lab1-userID1-userID2
If all was successful, you should see the following files (highlighted files require modification):
-
Makefile
- pre-defined. You may edit this file to add extra source files or execution commands. -
sortEmployees.cpp
- your main program for reading and sorting employee data. -
employee.[cpp/h]
- the implementation of anEmployeeList
class that will represent a list of sorted employees in a linked list. -
testEmployeeList.cpp
- a test file for yourEmployeeList
class. You will need to add more tests in here to ensure your program is correct. -
input/
- directory of sample input files in either text format (.txt
extension) or binary (.dat
).
1.2. Deliverables
The following will be evaluated for your lab grade:
-
The
EmployeeNode
class inemployee.cpp
. -
The
EmployeeList
class inemployee.cpp
. -
The main program in
sortEmployees.cpp
-
Your Lab 1 Questionnaire to be completed individually (This will open on the due date and close after 3 days)
2. Linked List Implementation
In employee.cpp/h
, you will implement an
EmployeeList
and related
EmployeeNode
class to manage a sorted linked list of
employee data. The data will be sorted based on the name field of
EmployeeNode
. For both
classes, you may add private methods, but you may not add public methods or any
additional data members (private or public).
2.1. EmployeeNode class
The data members and public methods have been provided. Begin by implementing
the EmployeeNode
class in employee.cpp
. This class
represents one employee as a node in the linked list:
-
name
is a C-style string containing the employees name. -
salary
is an integer. -
next
is a pointer to the nextEmployeeNode
in the linked list.
The constructor should initialize all data members (e.g., set pointers to
nullptr
) and the destructor should deallocate name
(but only if it is not
null).
Each of the above data members has a corresponding setter and getter method. Note that c-style strings are arrays of characters, so you will need to allocate and deallocate memory. See the section on c-strings below for details.
2.2. EmployeeList class
EmployeeList
is a
linked list with a
limited interface — there is only one type of
method for inserting and no methods for removing. Additionally, there is only
one data member for the linked list: the head
of the list.
You will need to implement the class methods:
-
A constructor that initializes
head
tonullptr
. -
A destructor that deletes each
EmployeeNode
in the list. -
insert()
adds a new employee to the list such that the resulting list is in sorted order by the name field. In other words, you are performing insertion sort. Your method should create anEmployeeNode
with the given information and traverse the list to find the correct spot to insert the newEmployeeNode
. -
getNumEmployees()
andgetAverageSalary()
must traverse the linked list to accumulate the respective statistics. -
writeToFile()
writes the entireEmployeeList
, in sorted order, to the binary file specified by the parameterfilename
. See details on the structure of the binary files and writing to binary files. The file output should mirror the structure of the file that is read in byreadFromFile()
in the main program. Return 0 on success and a non-zero error value on failure (e.g., illegal file name). Be sure to specify the meaning of error values in the function comments. -
print()
is useful for debugging; it output the contents of the list in a formatted table.
2.3. Developing and Testing
We have provided a few tests in testEmployeeList.cpp
. To begin your lab, you
should first look at these tests to understand the usage for
EmployeeNode
and EmployeeList
.
Next, in employee.cpp
, finish creating function stubs for all methods not
already stubbed out for you. Be sure to include (and understand) the function
comments that explain the usage of each method. When this is complete, you
should be able to make and run the tests.
$ make testEmployeeList
$ ./testEmployeeList
While your program will compile, it will immediately fail since you have not
implemented any methods. Use incremental development to implement one method
at a time, starting with the EmployeeNode
class. Note that the provided tests
are not unit tests - they test the full functionality of the
EmployeeNode
and EmployeeList
.
You will need to add your own tests to help verify, and debug,
your implementation.
Also note that the provided tests are not complete; while
passing them is a good indication, most of the tests require you to manually
verify the printed output is correct. In addition, it does not test all
corner cases for a linked list and nor does it test your writeToFile
method.
Do not move on to the main program until you are fully satisfied with your test
suite and confident in the correctness of your solution. Also be sure that
valgrind
on your tests does not find any memory errors or leaks.
3. Main program
In sortEmployees.cpp
, you will write a main program that, at a high level:
-
reads in employee data from two unsorted binary files
-
combines the data from two files together in a sorted list of employees
-
write the resulting sorted list of employee data to a binary file.
-
prints formatted information to standard output as it makes progress (see example runs).
3.1. General Requirements
-
All of your code should demonstrate good design. Your main program should written in a top-down, modular fashion.
-
Each function should have a single, clear purpose.
-
Each function should have a comment explaining its purpose as well as parameters, return values, and error conditions.
-
Your program must be free of memory-access errors. This includes both memory leaks and out of bounds memory access. Recall that C++ does not report this to you; you need to test your code and use tools such as valgrind and gdb. See Compiling, Debugging and Linking Tips for a reminder on how to use these tools.
3.2. Usage and Sample Runs
$ ./sortEmployees input/infile1.dat input/infile2.dat result.dat
Your program takes in three arguments: two binary files to read and the name of the file to write the resulting list. Incorrect usage should result in an error message e.g.,
$ ./sortEmployees
usage: sortEmployees file1 file2 resultfile
See here for the expected output.
3.3. Reading the input files
In sortEmployees.cpp
, you must implement a function to read from binary
files:
int readFromFile (char *filename, EmployeeList *list)
This function reads in employee records from the binary file specified by the
parameter filename
and adds each employee record list
. Return 0 on success
and any non-zero error value on failure (there may be multiple reasons for
failure; in the function comments, be sure to specify the error associated with
each value you return). See Reading Binary Files
for tips on how to complete this function
3.4. Formatted Output
Your program should use formatted output (e.g.,
printf
) to display the
employee data as it is read from the input files (i.e, in readFromFile()
).
Your program should finish by displaying the sorted list of employee records.
See these example runs.
4. Binary File Format
You are used to reading files that are in text format - all data is stored as ASCII characters regardless of whether it is actually a number or character. Alternatively, we could store the data in a format more native to computing - as binary. You cannot open and read these files in a text editor (because they aren’t text). But they can be much more predicable in terms of how data is formatted.
4.1. Employee Data Files
Each employee file is a binary file that stores a sequence of variable length employee records. Each employee record has three fields: the length of the name (4 byte integer), the actual name (variable length), and a salary (4 byte integer). The format of an employee record in the binary file is as follows:
---------------------------------------------------------------------------- | 4 byte integer | N character string | 4 byte integer | | Number of characters | Employee name | Employee salary | | in employee name (N) | (not null-terminated) | | ----------------------------------------------------------------------------
4.2. Reading binary files
To read integers and character strings from a binary file you can use the following code as an example:
//open file specified in filename. Second parameter specifies that
// the file is for input and is in binary format
std::fstream infile(filename, ios::in | ios::binary);
int nameLen;
char *name;
// read 4 bytes into memory location of nameLen
// read assumes everything is char, so we will here we typecast
// the address of nameLen (the destination) as a char *
infile.read((char*)&nameLen, sizeof(int));
//Allocate space and read characters for string
name = new char[nameLen+1];
infile.read(name, nameLen);
// TODO: remember to null terminate strings with '\0'
// TODO: add code to finish reading file
infile.close();
You should reference the
fstream library
for details on how to read files. Also, you will need to
loop until you read all of the records in a file. You should use the eof()
(end of file) function to do so. Here is an illustrative
example with tips on how
to use it properly.
4.3. Writing binary files
To write integers and character strings to a binary file you can use the following example code:
std::fstream outfile(filename, ios::out | ios::binary);
//You need to figure out how to set these variables and loop through the list
int nameLen = ...;
char *name = ...;
...
outfile.write((char*)&nameLen, sizeof(int));//write the length of the name
outfile.write(name, nameLen); //write the name using the specific length
outfile.close(); //close the file when done writing
Remember to close the file when done writing, otherwise your results may not be saved to disk.
5. Tips and Hints
5.1. c-strings
See Prof. Newhall’s help page on
Strings in C
for a refresher on using c-strings. In particular, you will need to use
strcpy
, strlen
, printf
, among others.
Also, recall that you cannot use comparison operators (<
,>
,==
, etc.) with
c-strings. Instead you must strcmp
.
Since your code is compiled in C++, you can use new
and delete
instead of
malloc
and free
e.g.,
char *staticVar = "this is a string"; //a static c-string
char *dynamicVar = nullptr; // we well allocate/deallocate space for this c-string
printf("%s\n", staticVar); //Prints out "this is a string"
len = strlen(staticVar);
dynamicVar = new char[len+1]; //add one for the null termination '\0'
strcpy(dynamicVar, staticVar); //copies "this is a string"
printf("%s\n", dynamicVar); //should print "this is a string"
delete [] dynamicVar //clean up when done
dynamicVar = nullptr; // set pointer to null to detect memory access errors
C-strings are arrays; allocating them dynamically requires that you also clean up memory when it is no longer being used. This also means that, c-strings variables are pointers - treat them as such.
5.2. const modifier
const
is a modifier that guarantees that an entity will not be modified while
in scope. You will see it used is a modifier to a variable’s type.
const
signifies that a modified after it is declared. When used for a
function parameter as in EmployeeList::insert
and EmployeeNode::setName
,
the keyword states that the value of the parameter cannot be modified in that
function. EmployeeList::insert()
can only read the values that have been
sent it, it is not allowed to modify them.
Note that nothing changes on the caller’s end of function use. This simply provides a contract guarantee to a user that their data cannot be modified by a called function. It is enforced by the compiler (attempting to modify a constant will lead to a compile-time error message).
Practically, this means that setName
cannot simply store the pointer. This
is illegal:
void EmployeeNode::setName(char const *nameIn){
this->name = nameIn; //type mismatch
}
For one, this is bad coding; by sharing the pointer, you create potential bugs
since the value of this→name
is now directly accessible outside the scope of
the EmployeeNode
class. Also, this will not compile as the type of
this→name
(char *
) is different than the type of nameIn
(char const *nameIn
). You will, instead, need to allocate new space for
this→name
and copy in the contents of nameIn
.
5.3. Error messages
Above, we mentioned that you will return error statuses via an int
for your
read and write functions. This is common practice, particularly in C programs
where exceptions are not used. To make this aspect of your code more readable,
define and use meaningful constants. For example:
//static makes this accessible throughout this file
//const ensures the value cannot be changed
static const int ERROR_NULL_FILENAME = -1;
/* Function purpose
* @params ...
* @return ...
* @error returns ERROR_NULL_FILENAME if the filename pointer is nullptr
*/
int readFromFile(char *filename, EmployeeList *list){
//...
//some code
if (filename == nullptr){
return ERROR_NULL_FILENAME; //returns a -1, but someone reading your
//code cares more about the reason for error
}
//...rest of program
Instead of returning -1, the function returns a variable (this variable’s value happens to be -1 but this is abstracted). The key is that in main, you can check for this by comparing the return value to the error values you expect to find. For example, if you store the function return value in main using status, you can write:
if (status == ERROR_NULL_FILENAME){
//handle this error
}
This can be easier to understand (and less prone to bugs) than the equivalent statement:
if(status == -1){
//handle this error
}
In future labs, we will instead use
C++ exceptions
to throw
exceptions when an error occurs (instead of returning an error code)
and try/catch
blocks around code that could evoke an exception. If you
prefer, you can use exceptions instead of error statements for this lab, but
that is not required or expected.
5.4. Testing your code
It is difficult to verify the output since it is not text. Here are some strategies:
-
To verify that your
writeToFile
function works, try using the output file of one run as one of the input files in a subsequent run. Read errors mean that the write phase was not correct. This will allow you to see the contents of the file as the employee records get printed to screen when it is read. -
Run
wc
(word count) orls -l
to get the number of bytes in the output and input files. The output file should have exactly the same number of bytes as the sum of the size of the two inputs. -
Generate a hex dump of a binary file using
xxd
.$ xxd infile1.dat 00000000: 0c00 0000 5475 7269 6e67 2c20 416c 616e ....Turing, Alan 00000010: c8af 0000 0d00 0000 486f 7070 6572 2c20 ........Hopper, 00000020: 4772 6163 6550 c300 000d 0000 004c 6f76 GraceP.......Lov 00000030: 656c 6163 652c 2041 6461 3aaa 0000 1000 elace, Ada:..... 00000040: 0000 4261 6262 6167 652c 2043 6861 726c ..Babbage, Charl 00000050: 6573 ed11 0100 es....
-
We will also test your code with held-aside input files, so you should try create other input files for both debugging purposes early on (i.e., simple tests) as well as more stressful tests. To create new binary input files, use the
createbin
program provided in~newhall/public/cs44/
. It will take in a text file that you create and output its binary version, which you can then feed into yoursortEmployees
program:
~newhall/public/cs44/createbin yourtextfile binaryoutfile
In input/
, we have given you infile1.dat
and infile2.dat
as sample
inputs. You will also find the corresponding text files so you can see how to
format your input to createbin
.
6. Submitting your lab
Review the lab deliverables to ensure you have completed all of your work. Before the due date, push your solution to github from one of your local repos to the GitHub remote repo.
From your local repo (in your ~/cs44/labs/Lab1-userID1-userID2
subdirectory)
make clean
git add employee.h employee.cpp sortEmployees.cpp
git commit -m "my correct and well commented solution for grading"
git push
Verify that the results appear (e.g., by viewing the the repository on cs44-f20). You will receive deductions for submitting code that does not run or repos with merge conflicts. Also, note that the time stamp of your final submission is used to record late days, so please do not update your repo until after the late period has ended.
If that doesn’t work, take a look at the "Troubleshooting" section of the Using git for CS44 labs and the Using git pages. At this point, you should submit the required Lab 1 Questionnaire (each teammate must do this).
7. Handy References
-
Review in lab exercises from Week 1
-
Some C++ Programming Resources and Links including the C++ Style Guide
-
C++ programming tools compiling, linking, debugging C++
-
C references in Dive into Systems (some useful for C++ programming too) Chapter 2: C pointers, command line arguments, C debugging tools (valgrind, gdb for C)
-
Using Git more complete Git guide