UPDATE:: you should have all received an email detailing failures
on your LinkedBST implementation. Be sure to read the description
of tests you failed here.
Please ensure you make those
fixes in both LinkedBST and AVLTree to ensure
correct data structures.
Your work for this lab consists of two parts:
- Complete and test a templated implementation of an AVL tree,
AVLTree.
- Extend your plagiarism detector for Part A to use your
AVLTree to build an index of essays and detect plagiarised documents.
While implementing
AVLTree sounds lengthy, 95% was already done in last
week's lab. All non-insert and remove methods work the same when
comparing a
LinkedBST to
AVLTree. Below, I will detail
the modifications needed to maintain the balanced AVL property. At a high
level, you will need to accomplish the following:
- Initial setup - get and create files to for your AVLTree
implementation
- Modify insert/remove - modify insertInSubtree
and removeFromSubtree to balance
- Implement balancing and rotations - store node heights,
detect imbalances, and implement rotations
- Test your AVL Tree - add tests for balancing to testBST
- Extend cheatDetector - add AVLTree option to
your main program; run benchmark tests.
Introduction
The AVLTree is very similar to the LinkedBST
from Lab 07, except it maintains the AVL balance property to guarantee
that the tree is balanced. (The AVL balance property is that, for each node
in the tree, the height of that node's
left and right subtrees differs by at most one.)
To efficiently maintain the AVL property, each AVLTreeNode
stores its current height in addition to the other data
stored by a LinkedBSTNode. In sum, each AVLTreeNode contains:
- key: The key stored in this node.
- value: The value stored in this node.
- height: The current height of this node.
- left: A pointer to the left subtree of this node.
- right: A pointer to the right subtree of this node.
Our algorithms define an empty child node to have a height of -1 for
simplicity, but an empty child node is not an actual node. It is the
pointer NULL. You will need
to be careful to check that children nodes are not NULL before
attempting to access information such as a child node's height or children.
To maintain a roughly O(lg n) tree height, you will extend your
LinkedBST implementation to detect imbalances after insertions and
removes and then use one of four tree rotation methods to fix the imbalance.
Getting started
TODO:
- Run update35 to obtain AVLTree declaration
- Re-use your linkedBST-private-inl.h work to start your
AVLTree-private-inl.h implementations
- Modify your Makefile
First, run update35. This should add the following files:
- AVLTree.h - Declaration for AVLTree class, which
implements the BST. This is similar to LinkedBST,
withr 6 new private methods to maintain balanced height.
- AVLTree-inl.h - Implementation of all public methods. Note
that this is no different than LinkedBST public method implementations.
Neither of these files needs modifications.
Instead, all of you work will be
done in
AVLTree-private-inl.h. Since the vast majority of methods
will be the same as your
LinkedBST private methods, you'll use
your
linkedBST-private-inl.h file as a starting point. To do so,
do the following:
- Copy your linkedBST-private-inl.h file:
$ cp linkedBST-private-inl.h AVLTree-private-inl.h
- Replace all instances of BSTNode with AVLTreeNode and
LinkedBST with AVLTree. This is done using
search and replace. Each editor has a built in way of doing this, but we can
do it easily with a linux program called sed:
$ sed -i 's/BSTNode/AVLTreeNode/g' AVLTree-private-inl.h
$ sed -i 's/LinkedBST/AVLTree/g' AVLTree-private-inl.h
Since you are using your LinkedBST as your starting point, you
are responsible for ensuring you correct all errors from Lab 7. Any
lingering errors from Lab 7 will lead to further deductions in Lab 8.
Lastly, open up your Makefile and remove the comment #
on line 7 so that the AVLTree files will be included in compilation.
Inserts/Removes in AVLTree
TODO:
- Add balancing to insert
- Add balancing to remove
To insert in an element an AVL Tree, we follow the normal protocol. That is,
insertInSubtree should handle the 4 cases (i.e., base case,
duplicate key, smaller key,
larger key) for recursion the same as LinkedBST. Once that work
is complete, current is the root of a sub-tree that may now
be unbalanced and require a rotation. This check must be done
at every call since the imbalance could be detected at any step
back up the tree.
Your overall method should follow this protocol:
- Handle the 4 cases the same as LinkedBST
- Before returning:
- Calculate height of node
- Check for imbalances
- Fix imbalances with rotations
- Step 2 may have resulted in a new root for the sub-tree. Return the
root of the sub-tree to the calling function
Implementing this should take 1 to 2 lines of code at most. You completed
#1 already from Lab 7. #2 should be handled by another function (
balance()) specified below. #3 should take the value returned by
balance
and return it.
removeFromSubtree follows the same outline. Before every return statement, you should call balance to handle all of the work
of updating heights and checking for imbalances.
Balancing and rotations
TODO:
- Implement balance function
- Implement computeHeightFromChildren
- Implement four rotations
To complete the AVLTree, you should complete each of the
rotation functions discussed in class. Their prototypes can be found
in AVLTree.h file.
There is one rotation function for each of the four cases discussed in class,
and each returns the new root of the sub-tree to the calling function:
- rightRotate: rebalances the tree when an unbalanced
node's left-outer grandchild is too tall.
- leftRightRotate: rebalances the tree when an unbalanced
node's left-inner grandchild is too tall.
- leftRotate: rebalances the tree when an unbalanced
node's right-outer grandchild is too tall.
- rightLeftRotate: rebalances the tree when an unbalanced
node's right-inner grandchild is too tall.
Each rotation should do the following:
- Change the location of any child nodes that are moved
- Update the heights of any shifted nodes
- Return the root of the sub-tree
These functions are called by the balance function (which is in
turn called by insertInSubtree and removeFromSubtree).
balance does much of the book-keeping to maintain the AVL property.
To complete this method, you should take the following steps:
- If current is NULL, return current. There
is no work to do.
- Update current's height. You should implement this routine
in computeHeightFromChildren. Remember to account for potential
empty children (NULL).
- Check for a left imbalance (left child is 2 taller than right)
- If so, check which of left's children is taller. Call
either rightRotate (left grandchild is bigger) or
leftRightRotate (right grandchild is bigger)
- Check for a right imbalance (right child is 2 taller than left)
- If so, check which of right's children is taller. Call
either leftRotate (right grandchild is bigger) or
rightLeftRotate (left grandchild is bigger)
- Return the root of the sub-tree. Note that it may have changed.
Unit-testing the AVLTree
You should test the AVLTree implementation using the
testBST program from last week.
This should stress test the AVLTree implementation
for a large number of random insertions and deletions.
In addition, your tests should confirm that
each of your rotation functions behaves as expected for known examples
of the tree. While you cannot access private methods, you can come up with
specific examples that guarantee certain rotations will occur and then use
public methods such getHeight() or getPreOrder() to
inspect the contents of the tree.
Extending cheatDetector
Return to the write-up for Part A and ensure that your cheatDetector
can work with either LinkedBSTs or AVLTrees. The details
are found in that write-up. Once that is complete, run your program with
both and pay attention to the performance. How do the heights of trees change
by using balancing? You will need to answer a few questions for the
README.
Submitting your work
Submit your solution with Part A using handin35