Introduction to the Problem
Getting Started
Implementation Details
Code Design, Testing, and Extra Credit
What to Submit
I'll post answers to questions I get about project 4 here:
(also, by reading through the test code to see how B+Tree
functions are called, you can often figure out what the
functions are supposed to do).
If you have a HFPage object, then SortedPage, BTLeafPage, etc. are types that map onto this page-sized chunk of bytes (i.e. their methods are manipulating bytes on this HFPage object but using their field names to interpret the bytes differently than in HFPage).
If you are inside a function of one of these classes (BTLeafPage, BTIndexPage), and you want to call a function from SortedPage or HFPage, one way to do it is to re-cast the this pointer to the approriate class type, then invoke the function you want from that class:
((SortedPage *)this)->name_of_some_method_func_in_SortedPage(args...);
Status insertRec(const void *key, AttrType key_type, RID dataRid, RID& rid);the parameters key and dataRid make up the K* entry you are inserting on the leaf page ( K* entry is (key, dataRid)), the parameter rid, is the rid of the K* entry this function inserted (i.e. it is the page number, slot number of the K* entry on this leaf page). The insertRec function needs to set the value of rid after inserting the K* entry on to the leaf page (this function determines the slot number into which the (key, dataRid) is being inserted)
In this assignment, you will implement a B+ tree in which leaf level pages contain entries of the form [key, rid] (Alternative 2 for data entries in terms of the textbook). You must implement the full search and insert algorithms as discussed in class. Your insert routine must be capable of dealing with overflows (at any level of the tree) by splitting pages. Deletes will be handled by simply marking the corresponding leaf entry as 'deleted'. You do not need to implement merging of nodes on deletes, nor should you do sibling redistribution on deletes and inserts; inserting into a full page always causes a split.
You will be given HFPage and SortedPage. SortedPage is derived from HFPage, and it augments the insertRecord method of HFPage by storing records on the HFPage in sorted order by a specified key value. The key value will be included as the initial part of each inserted record to enable easy comparison of the key value of a new record with the key values of existing records on a page. Read the documentation in the header files to understand what class functions do.
You need to implement two pagelevel classes, BTIndexPage and BTLeafPage, both of which are derived from SortedPage. These page classes are used to build the B+ tree index. You also will write code to create, destroy, open and close a B+ tree index, and code that will open a scan on the B+ tree, allowing its caller to iterate through all of the data entries (from the leaf pages) that satisfy some search criterion.
2 Getting Started
Copy the starting point directories and files:
Contents:
3 Design Overview You
should begin by reviewing the chapter on Tree Structured Indexing in
the textbook. Make sure you understand insert, and search operations on
a B+ Tree. There is also
information about the B+ tree layer in the Minibase HTML documentation.
3.1 A Note on Keys for this
Assignment
You should note that key values are passed to functions using void *
pointers (pointing to the key values). The contents of a key should be
interpreted using the AttrType variable. The key can be either a
string(attrString) or an integer(attrInteger), as per the definition
of AttrType in minirel.h. If the key is a string, it has a fixed maximum
length, MAX_KEY_SIZE1, defined in bt.h. There are C-style functions
for creating,accessing files of, and comparing keys in Key.C.
Although the specifications for some methods (e.g., the constructor of
BTreeFile) suggest that keys can be of (the more general enumerated)
type AttrType, you can return an error message if the keys are not of
type attrString or attrInteger.
The SortedPage class implements the insertRecord method
by storing records on a page in sorted order according to a specified
key value. It assumes that the key value is included as the initial part
of each record, to enable easy comparison of the key value of a new
record with the key values of existing records on a page.
3.2 B+ Tree Page-Level
Classes There are four separate pages classes, of which
you will implement two. HFPage is the base class (given), and from it
is derived SortedPage. You will derive BTIndexPage and BTLeafPage from
SortedPage. Note that, as in the HFPage assignment, you must not
add any private data members to BTIndexPage or BTLeafPage
as they have to map onto the structure of a Page (i.e. they must all
be the same number of bytes).
The relationship between the the page-level classes is shown in the
figure below.
Important Note: These classes do not have constructors. The
information they store must be mapped onto a Page object. To create
a new page of the desired type, do the following:
Lastly, you will need to create a structure to represent the header
page of the B+ tree. Despite its name, the data structure used to
represent the header page need not be derived from a Page object. It
can be implemented simply as a C++ struct, with a field for each piece
of information that must be stored in the header page. Just remember
to cast pointers to this struct as (Page *) pointers when making calls
to functions such as pinPage().
3.3 Other B+ Tree Classes
A BTreeFile contains a header page and a number of BTIndexPages
and BTLeafPages. The header page is used to hold information about the
tree as a whole, such as the page id of the root page, the type of the
search key, the length of the key field(s) (which has a fixed maximum
size in this assignment), etc. When a B+ tree index is opened, you
should read the header page first, and keep it pinned until the file
is closed. Given the name of the B+ tree index file, you can locate
its header page if its file name has been registered with the DB object.
The DB class has a method that lets you register this information when a
B+tree index file fname is created:
The header page contains the page id of the root of the tree, and every
other page in the tree is accessed through the root page.
The following two figures show examples of how a valid B+ Tree might
look.
3.3.1 IndexFile and
IndexFileScan A BTree is one particular type of
index. There are other types, for example a Hash index. However, all
index types have some basic functionality in common. The
basic index functionality is defined in a virtual base class called
IndexFile. You won't write any code for IndexFile. However, any class
derived from an IndexFile should support IndexFile(), Delete(), and
insert(). (IndexFile and IndexFileScan are defined in include/index.h).
Likewise, an IndexFileScan is a virtual base class that contains the basic
functionality that all index file scans should support.
3.3.2 BTreeFile
The main class to implement for this assignment is
BTreeFile. BTreeFile is a derived class of the IndexFile class, which
means a BTreeFile is a kind of IndexFile. However, since IndexFile is
a virtual base class all of the methods associated with IndexFile must
be implemented for BTreeFile.
The methods to implement include:
BTreeFile::BTreeFile There are two constructors for BTreeFile
(as defined in btfile.h): one that only opens an existing index,
and another that creates a new index on disk, with a given type
and key size. Observe that the key type is passed as a value of type
AttrType. For this assignment, you only need to handle keys of
type attrString and attrInteger. If there is a call with a key whose
type is not one of the two, return an error. The constructor should
pin the header page into the bufferpool.
BTreeFile::insert The BTreeFile::insert method takes two
arguments: (a pointer to) a key and the rid of a data
record. The data entry to be inserted (i.e. the `record' in the leaf
pages of the BTreeFile which consists of a pair [key, rid of data
record]).
If a leaf page overflows (i.e., no space for the new entry), you should
split the page. Do not implement re-distribution with a sibling,
instead always split if a leaf page is full.
You may have to insert additional entries of the form
[key, id of child page] into the higher level index pages as part of a
split. Note that this could recursively go all the way up to the root,
possibly resulting in a split of the root node of the B+ tree. Keep in
mind the difference between a "Copy up" of an index entry from a leaf to
a parent node and a "Push up" of an index entry from an internal node
to a parent index node.
BTreeFile::Delete The BTreeFile::Delete routine simply removes
the entry from the appropriate BTLeafPage. You are not required to
implement redistribution or page merging when the number of entries
falls below a threshold.
In addition, you will likely want to add some private method functions
to implement a solution that uses good modular design.
Remember that this is a tree data structure; you should be thinking
recursive algorithms.
3.3.3 BTreeFileScan
Finally, you will implement scans that return data entries from the
leaf pages of the tree. You will create the scan through a member
function of BTreeFile (BtreeFile::new_scan(...) as defined in
btfile.h).
The parameters passed to
new_scan() specify the range of keys that will be scanned in
the B+ tree. They are explained in detail in btfile.h.
3.4 Errors
In the previous Minibase assignments, you learned how to use the Minibase
error protocol. You should use it again in this assignment. In
previous assignments, all the errors you returned belonged to one of the
categories in new error.h, for example BUFMGR. In this assignment, you will
need to use BTREE, BTLEAFPAGE, and BTINDEXPAGE.
4 Code Design, Testing, and Extra Credit
Make sure that your solution uses good OO design, is robust, uses
Minibase error protocol, is free of memory access errors and memory leaks,
and is well commented. Re-read the
C++ Code
Style Guide if you lost points on previous assignments for bad
code design or comments. A correct program only counts for part of your
grade. To receive an A, your code must also be well designed, robust,
and be well commented.
Implement and test methods INDEPENDENTLY and INCREMENTALLY as much
as possible. For example, first try to create a B+tree with a single
leaf page. Next, try to insert some data entries into this single
page B+Tree index. Next, try to handle inserting into this full
single leaf page B+Tree. Next, ...
To help you with incremental testing and debugging, you may want to
add a debugging method that prints out the btree's contents.
Also, you will want to modify the test code, perhaps commenting out
almost all of it at first and adding back more and more as you add
more functionality.
I will test your solutions with more than the tests in btree_driver.C, so
you should also come up with your own test code after passing these tests.
To better see the test output, I suggest re-directing your btree program's
output to a file and then open the file in vi/emacs. To re-direct stdout
and stderr to a file named 'output_file':
Extra credit of up to 15 percentage points is available. Do not start
on this until you have completed the basic assignment!
An incomplete basic assignment with an extra credit feature will receive
a lower grade than a complete basic assignment with no extra credit
feature.
If you work on an extra-credit solution, you should do so in a directory
separate from your solution to the basic assignment. Copy your basic
solution files over to your extra credit directory as a starting point.
You will then submit two versions of your code, each in a separate
subdirectory:
% mkdir cs44/proj4
% cp -r /home/newhall/public/cs44/proj4 cs44/proj4/.
You can find other useful include files in the include subdirectory
:bt.h, hfpage.h,
sorted_page.h, index.h,test_driver.h, btree
driver.h,
minirel.h and new_error.h.
BTLeafPage *newLeafPage;
PageId newId;
// allocate a new page from the BufMgr (it will be pinned in the BP)
Status status = MINIBASE_BM->newPage( newId, ((Page *&)newLeafPage) );
if(status != OK) { return ( MINIBASE_CHAIN_ERROR(BTREE, status) ); }
// initialize this new leaf page
newLeafPage->init(newId);
...
// now use it and remember to unpin it when you are done with it
Look at the header files for each class for
further details about the individual methods.
Status add_file_entry(const char* fname, PageId header_page_num);
There are similar methods for deleting and reading these
`file entries' ([file name, header page] pairs) as well, which can be
used when the file is destroyed or opened (See db.h)
% btree >& output_file
Extra Credit
The tasks are listed below with the extra percentage points they carry
in parentheses: