Lab 7: Binary Search Trees

Due on Wednesday, November 14 at 11:59 PM. This is a team lab. You and your assigned lab partner(s) will complete this lab together. Make sure that you are familiar with the Partner Etiquette guidelines. You may discuss the concepts of this lab with other classmates, but you may not share your code with anyone other than course staff and your lab partner(s). Do not look at solutions written by students other than your team. If your team needs help, please post on the Piazza forum or contact the instructor or ninjas. If you have any doubts about what is okay and what is not, it’s much safer to ask than to risk violating the Academic Integrity Policy.

We will be using Teammaker to form teams. You can log in to that site to indicate your preferred partner. Once you and your partner have specified each other, a GitHub repository will be created for your team. If you have any trouble using Teammaker, contact your instructor.

Overview

For this lab, you will implement the binary search tree (BST) data structure. You will then write a series of unit tests to confirm that your data structure works correctly. (You may additionally use the provided word-counting program to evaluate your BST.) Finally you will complete some written work to test your understanding of balanced BSTs and heaps.

Starting Code

Your starting code can be found in the appropriate repository for your team. The following is a description of the repository’s initial comments; bolded files are those which you must change in completing the lab.

adts/: As with your last lab, the ADT interface declarations are stored here.
linkedBST.h/linkedBST-private-inl.h/linkedBST-inl.h: The BST class you will be implementing.
linkedBSTNode.h/linkedBSTNode-inl.h: A simple node class to use in your BST implementation.
main.cpp: A small example of the use of a BST. This program counts the number of occurrences of each word in a text file. It then prints the results using a traversal of the tree.
tests.cpp: The unit testing program you will use to confirm that your BST is correct.
manualTests.cpp: A sandbox test program for you to use while you develop your program.

Part I: Implementing Binary Search Trees

Most of the work of this lab is implementing the binary search tree. Your implementation is spread across two files: linkedBST-inl.h (which contains the implementation of the LinkedBST class’s methods) and linkedBSTNode-private-inl.h (which contains private helper functions for those methods to use). The LinkedBST class is a template class with two templated type names: K (the type of the keys in the BST) and V (the type of the values). The LinkedBST class itself tracks only two pieces of data:

size, which contains the number of nodes in the BST
root, which contains the LinkedBSTNode representing the root node of the BST

A single node stores information about its key, its value, and its left and right children. If a node does not have a child, then the corresponding pointer variable will be set to nullptr.

Many of the methods in the binary search tree are natural to implement recursively. For this reason, you will write a collection of recursive helper functions for your LinkedBST methods to call. For instance, the height of a node is equal to one more than the largest of the heights of its two children. Many of the public methods of LinkedBST will be single-line calls to the private recursive functions to begin the operation at the BST’s root node.

Traversals

The LinkedBST class must provide implementations of traversal methods: methods which allow us to see every piece of data in the tree. A traversal is given by a vector of key-value pairs; we will represent such a pair as a pair object. A C++ vector is described in the next section.

Example BST

The BST interface supports four different kinds of traversal:

A pre-order traversal puts the current node first, followed by its left subtree and then its right subtree. For the example here, the pre-order traversal would be the sequence F, B, A, D, C, E, H, G, I.
A post-order traversal instead puts the current node after the left and right subtrees. The post-order traversal of this example is the sequence A, C, E, D, B, G, I, H, F.
An in-order traversal puts the current node between the left and right subtrees. The in-order traversal of this example is the sequence A, B, C, D, E, F, G, H, I. (The fact that this is sorted is not a coincidence!)
A level-order traversal reads the nodes from left to right and top to bottom, the same way we read English test. For this tree, the level-order traversal is F, B, H, A, D, G, I, C, E.

The first three forms of traversal are quite similar and are natural to define recursively. The level-order traversal is somewhat different and requires a breadth-first search approach similar to that which you used in the previous lab.

Wikipedia has some nice pictures of these traversals for reference. Note that your program is adding the elements to a vector rather than “displaying” them as described on that Wikipedia page.

The `vector` STL Class

The vector<T> class, found in the vector library, is the C++ STL implementation of an array list. It is used somewhat differently from our List<T> interface. Here’s a handy translation guide:

Operation	`List` code	`vector` code
Insert at front	`myList.insertAtHead(value)`	no simple equivalent
Insert at back	`myList.insertAtTail(value)`	`myVector.push_back(value)`
Determine size	`myList.getSize()`	`myVector.size()`
Get element by index	`myList.get(index)`	`myVector[index]`
Set element by index	no equivalent	`myVector[index] = value`
Remove from back	`myList.removeTail()`	`myVector.pop_back()`

One other difference is that the pop_back method is void; you must retrieve the last element yourself if you want to see that element before it is removed.

The primary reason that we introduce vector for this lab is because vectors can also be copied just like pairs. Consider the following code:

List<int>* listPointer1 = new STLList<int>(); // create a pointer to a new List
List<int>* listPointer2 = listPointer1;  // copy the pointer
listPointer1->insertAtTail(4);           // add an element
cout << listPointer2->getSize() << endl; // prints 1; they point to the same List
List<int> list1;                         // create a statically-allocated list
List<int> list2 = list1;                 // illegal! Lists doesn't know how to copy themselves!

vector<int> vector1;                     // create a statically-allocated
vector1.push_back(4);
vector1.push_back(6);
vector<int> vector2 = vector1;           // vectors do know how to copy themselves
cout << vector2.size() << endl;          // prints 2 (since both elements were copied)
vector1[0] = 2;                          // assigns 2 to the first position of vector 1
cout << vector2[0] << endl;              // prints 4; vector 2 is a different vector

Some methods of the BST ADT interface use vector to prevent you from having to worry about memory management (e.g. delete) for the values they return.

Testing the Binary Search Tree

In addition to implementing the LinkedBST class and the helper methods, you must also write unit tests. To get you started, some of the tests in tests.cpp have been completed for you. For each remaining test, a comment containing the string “TODO” has been provided. You must write one test for each of these comments that tests the appropriate method or function in a non-trivial way. For the recursive functions, you must provide at least one test that utilizes recursion; for BST methods, you must provide a test that operates on a tree of height greater than one. For example, consider the following test:

TEST(negativeHeight) {
    LinkedBST<int,string>* tree = new LinkedBST<int,string>();
    CHECK_EQUAL(-1, tree->getHeight());
}

If you were required to write a test for getHeight, this test would not suffice; it does not trigger recursion within the getHeightInSubtree helper function. You are welcome to include the above test (or tests like it) in your lab submission, but make sure that each of the functions and methods has at least one non-trivial test.

Please remember that these tests are a development tool as well as a lab requirement. You should write these tests as soon as possible and use them to help you find bugs and verify the correctness of your code. You may even wish to write the tests before you write your implementation just so you can make certain that you know what you expect your code to do. Also note that we have provided you with an implemented checkInvariants method. Use this liberally throughout your tests to verfiy that the BST is the proper size and satisfies the BST property.

If you use manualTests to figure out what’s happening in your code, you might also consider copying that code into tests.cpp once you figure out what you expect from it.

Implementation Strategy

Source: OpenClipArt.org

This lab involves writing more code than previous labs, so it is useful to plan an approach to your work. Below is a strategy which should organize your efforts; you are not required to proceed in this order, but it might help.

In LinkedBST-inl.h, implement the non-recursive methods: the constructor, getSize, and isEmpty. Compile and test your code. You should now pass the emptyTree test in the tests.cpp file.
Next work on implementing the recursive methods. Note that most methods within linkedBST-inl.h will be quite brief, making a call to a recursive helper method within linkendBST-private-inl.h to do most of the work. Start with insert, which should call insertInSubtree. Once you have implemented these two methods successfully, you will pass more of the initial tests provided in the tests.cpp file.
Now you should be able to get into a nice rhythm:
- In tests.cpp, go to the next TODO and add some tests for that method.
- In linkedBST-inl.h, implement the public method.
- In linkedBST-private-inl.h, implement the private helper method associated with the public one.
- Compile, test, and debug until you’ve successfully passed those tests.
- Rinse and repeat!
Debug any memory leaks as described below.

Once you are finished, you should be able to compile and run the word counting program. For your convenience, a few text documents have been provided in the test_data directory.

Memory

For this lab, your program is required to run without memory errors or leaks. You should use valgrind as you proceed in your testing to track memory errors. When you have a complete first draft of your implementation:

Commit and push what you have (in case something goes wrong).
Run valgrind ./tests to make sure that the test program does not cause memory errors.
Once memory errors are cleared, run valgrind --leak-check=full ./tests to identify and correct memory leaks.

Coding Style Requirements

You are required to observe some good coding practices:

Source: OpenClipArt.org

You should pick meaningful variable names.

// Good
int* pixels = new int[size];
// Bad
int* p = new int[size];

You should use correct and consistent indentation. Lines of code within a block (that is, surrounded by { and }) should be indented four spaces further than the lines surrounding them.
```
// Good
if (condition) {
    cout << "Test" << endl;
}
// Bad
if (condition) {
cout << "Test" << endl;
    }
```

You should use a block whenever possible, even if it’s not necessary. This helps you avoid subtle or messy bugs in the future.

// Good
if (condition) {
    cout << "Something" << endl;
}
// Bad
if (condition)
    cout << "Something" << endl;

Any new methods or fields in your header files should have comments explaining their purpose and behavior. You are permitted to omit documentation for methods that are inherited from other classes; that is, if your class has a foo method because its superclass has a foo method, you don’t need to document that method.
```
// Good
public:
    /**
     * Saves the image represented by this object to a file.
     * @param filename The name of the file to save.
     * @return The number of pixels in the saved image.
     * @throws runtime_error If the image could not be saved due to an I/O error.
     */
    int save(std::string filename);

// Bad
public:
    int save(std::string filename);
```
Your method/field documentation does not have to be in the format above, but you must describe the method’s behavior, its parameters, its return value, and any exceptions that it may throw. (If you’re indifferent, the above syntax is a good one to know; it’s a de facto standard used by Javadoc, Doxygen, and other tools that automatically process source code comments into other formats like searchable webpages.)

Part II: Written Lab

For this part of your assignment, you will give written answers much in the same way as you did in previous labs. Your submission must be in a typeset PDF format.

`textree.py`

This part of the lab involves drawing trees. This is difficult in LaTeX (and in most diagram software), so we’ve given you a Makefile rule that will make things much easier. This rule calls the script textree.py, which has been provided for you. Between this and the provided WrittenLab.tex, all you need to do is draw some ASCII art trees in text files and your PDF will be built for you.

In the written-trees directory, there are a series of files named e.g. problem1.1.tree. The textree.py script turns these tree files into .tex code which will draw those trees in the PDF. To complete the written part of the assignment, you just need edit the .tree files to contain the right trees. The following rules apply in these .tree files:

Blank lines and lines starting with a # are a comment (like in Python) and are ignored.
Lines starting with a : are treated as LaTeX text. You can write your explanation of what’s happening on those lines.
Lines containing only numbers, letters, and spaces are taken to be a binary tree. Each line is a level of the tree.
Lines containing only symbols are ignored. This allows you to include / and \ symbols if you want to draw your tree in the text file.

For instance, this example diagram is produced by the following .tree file.

Example `.tree` File

: This is a sample BST.

 2
1 3

# This line is a comment.

: I can put slashes into my tree if I want; they don't do anything.

  2
 / \
1   3

: Each node decides its parent by the text that is closest to it.  For instance,
: the 3 below is the left child of 4 (and not the right child of 1) because the
: 4 is closer to the three.  The blank lines are ignored just like lines with
: symbols are.

  2

1   4

   3

AVL Tree Questions

Show the left rotation of the following tree. Be sure to specify the X, Y, and Z subtrees used in the rotation.
Show the right rotation of the subtree rooted at 27. Be sure to specify the X, Y, and Z subtrees used in the rotation.
Using the AVL tree algorithm, insert the value 15 into the following tree. Show the tree before and after rebalancing.
Using the AVL tree algorithm, remove the value 15 from the following tree. Show the tree before and after each rebalancing.

Heap Questions

Note that these questions rely upon the discussion of heaps in lecture.

Show the addition of element 9 to max-heap below. First, show the addition of 9 to the tree; then, show each bubbling step.
Show the removal of the top element of this max-heap. First, show the swap of the root node; then, show each bubbling step.
Consider the sequence of elements[7,4,2,6,9,1,3]. Using the representation discussed in class, show the tree to which this sequence corresponds. Then, show the heapification of this tree; that is, show how this tree is transformed into a heap. Demonstrate each bubbling step.

Summary of Requirements

When you are finished, you should have

A completed implementation of LinkedBST and the helper methods
Appropriate tests for these classes
No memory errors
No memory leaks
Code which conforms to the style requirements
A typeset PDF file containing your answers to the written questions