In this assignment, you will implement the page structure for the Heap File layer. You will be given a library containing the lower layers (Buffer Manager and Disk Space Manager), and some driver routines to test the code.
Begin by reviewing HeapFile organization. In particular, you should review the description of Page and record formats for Heap Files in chapter 9 of the text. A HeapFile is seen as a collection of records. Internally, records are stored on a collection of HFPage objects. For this assignment, you will implement just the HFPage class, and not all of the HeapFile code.
Your HFPages will use the page format for storing variable sized records that uses a slot array, similar to the one shown in figure 9.7 of the text. However, in the description in the text, the slot directory is located at the end of the page, and grows toward the beginning. In your implementation, the slot array will be at the beginning of the page (immediately after the page header) and will grow towards the end of the page. Variable-sized records will be stored starting at the end of the page and moving towards the top of the page. All records should be compacted on the page, and all available free space should be between the end of the slot array and the end of the compacted records (remember they grow from the bottom up). You will need to write the code so the records themselves are placed starting at the end of the page and growing towards the beginning of the page; be very careful with your pointer arithmetic.
Keep in mind that in order to add a record to a page, there has to be room for the record in the data area and also room for a new slot in the slot area (unless there happens to be a pre-allocated slot that's empty). The HFPage is illustrated below:
Create a working directory for this project, then copy the
starting point file:
In src, do 'make depend' to build dependencies. If you make the
project, it will create an executable named hfpage . Right now,
it does not work; you will need to fill in the bodies of the HFPage class
methods.
Sample output of a correct implementation is available in sample_output.
Design Overview and Implementation Details
void HFPage::init(PageId pageNo): This member function is used to initialize a new heap file page with page number pageNo. It should set the following data members to reasonable defaults: nextPage, PrevPage, slotCnt, curPage, usedPtr, freeSpace. You will find the definitions of these data members in hfpage.h. The nextPage and prevPage data members are used for keeping track of pages in a HeapFile.
A good default unknown value for a PageId is INVALID_PAGE, as defined
in page.h. Note that usedPtr is an offset into the data array, not a pointer.
PageId HFPage::getNextPage(): This member function should return the page id stored in the nextPage data member.
PageId HFPage::getPrevPage(): This member function should return the page id stored in the prevPage data member.
void HFPage::setNextPage(PageId pageNo): This member function sets the nextPage data member.
void HFPage::setPrevPage(PageId pageNo): This member function sets the prevPage data member.
Status HFPage::insertRecord(char* recPtr, int reclen, RID& rid):
This member function should add a new record to the page. It returns OK if
everything went OK, and DONE if sufficient space does not exist on the page
for the new record. If it returns OK, it should set rid to be the RID of the
new record (otherwise it can leave rid untouched.) Please note in the
parameter list recPtr is a char pointer and RID&
denotes passed by reference. The Status enumerated type is defined in
new_error.h if you're curious about it. Use memcpy to copy
the record to its correct position in the page. Your implementation should
use unused slots in the slot array, before creating a new one.
DONE is a special code for non-errors that are nonetheless not "OK": it generally means "finished" or "not found." FAIL is for errors that happen outside the bounds of a subsystem.
The RID struct is defined in include/minirel.h:
The operators ==, !=, and << are overloaded for RID types, so that you can
compare two RID object by doing somehthing like the following:
Status HFPage::deleteRecord(const RID& rid): This member function
deletes the record with RID rid from the page, compacting the hole created.
Compacting the hole, in turn, requires that all the offsets (in the slot
array) of all records after the hole be adjusted by the size of the hole,
because you are moving these records to "fill" the hole. You should leave
a "hole" in the slot array for the slot which pointed to the deleted record,
if necessary, to make sure that the rids of the remaining records do
not change. The slot array can be compacted only if the record corresponding
to the last slot is being deleted. It returns OK if everything goes
OK, or FAIL otherwise.
Status HFPage::firstRecord(RID& firstRid): This routine should set firstRid to be the rid of the "first" record on the page. The order in which you return records from a page is entirely up to you. If you find a first record, return OK, else return DONE.
Status HFPage::nextRecord(RID curRid, RID& nextRid): Given a valid current RID, curRid, this member function stores the next RID on the page in the nextRid variable. Again, the order of your return records is up to you, but do make sure you return each record exactly once if someone continually calls nextRecord! Don't worry about changes to the page between successive calls (e.g. records inserted to or deleted from the page). If you find a next RID, return OK, else return DONE. In case of an error, return FAIL.
Status HFPage::getRecord(RID rid, char * recPtr, int& recLen): Given a rid, this routine copies the associated record into the memory address *recPtr. You may assume that the memory pointed by *recPtr has been allocated by the caller. RecLen is set to the number of bytes that the record occupies. If all goes well, return OK, else return FAIL.
Status HFPage::returnRecord(RID rid, char*& recPtr, int& recLen): This routine is very similar to HFPage::getRecord, except in this case you do not copy the record into a caller-provided pointer, but instead you set the caller's recPtr to point directly to the record on the page. Again, return either OK or FAIL.
int HFPage::available_space(void): This routine should return the amount of space available for a new record that is left on the page. For instance, if all slots are full and there are 100 bytes of free space on the page, this method should return (100 - sizeof(slot_t)) bytes. This accounts for the fact that sizeof(slot_t) bytes must be reserved for a new slot and cannot be used by a new record.
bool HFPage::empty(void): Returns true if the page has no records in it, and false otherwise.
Please follow the Minibase Error Protocol.
An example of how to use this protocol is given in the file ErrProc.sample. Basically, the first detector of an error returns
MINIBASE_FIRST_ERROR(status_value, error_number). The caller will either handle
the error, or more typically pass it on to higher layers, by calling
MINIBASE_CHAIN_ERROR(status_value, errornumber). status_value is one of
the Status enumerated types defined in include/new_error.h. These correspond
to layers in the DBMS or data structures in the DBMS. Do not add new values.
For this assignment, use HEAPFILE as the status_value.
If our code first detects an error, we would do something like this:
When you submit your solution, test that your solution's output on
the tests I give you is identical to my sample output (with
the exception of page memory addresses which may vary); you should not
submit a solution with extra debugging output.
Create a submit subdirectory of your working directory, and copy into it all
your source files that I need to compile, run and test your buffer manger
solution (.C, .h, and Makefile). In this directory add a README file with the
following information:
Then create a tar file of your submit subdirectory and submit it via
cs44handin before the due date.
% cd cs44
% mkdir proj2
% cp -r ~newhall/public/cs44/proj2/* .
src contains staring point code (.h and .C files that you will implement)
include contains .h files for other layers of Minibase (don't change these)
lib contains the library of lower Minibase layers used by this layer.
struct RID {
PageID pageNo;
int slotNo;
int operator == (const RID rid) const
{ return (pageNo == rid.pageNo) && (slotNo == rid.slotNo); };
int operator != (const RID rid) const
{ return (pageNo != rid.pageNo) || (slotNo != rid.slotNo); };
};
The pageNo identifies a physical page number (something that
the buffer manager and the DB layers understand) in the file.
The slotNo specifies an entry in the slot array on the page (something
your code will set when a record is inserted on an HFPage).
struct RID rid1, rid2;
...
if(rid1 == rid2) { ... }
if(rid1 != rid2) { ... }
cout << rid1 << endl;
return MINIBASE_FIRST_ERROR(HEAPFILE, INVALID_SLOTNO);
If an call to a lower layer returns an error, then we typically pass it on
to a higher layer (it is also possible, but unlikely, that we handle it
and don't pass it on):
status = MINIBASE_DB->write_page(pid, &bufPool[i]);
if(status != OK) {
return MINIBASE_CHAIN_ERROR(HEAPFILE, status);
}
For this assignment, I don't think code you write will need to invoke
MINIBASE_CHAIN_ERROR, but there are cases where you will need to call
MINIBASE_FIRST_ERROR. Because you are not implementing the full HeapFile
layer, the Heapfile error codes are pre-defined for you and you cannot
change them (they are in include/heapfile.h). For the next assignment,
you will define your own error codes.
Test code
In hfp_driver.C are a number of functions
that look like HfpDriver::testX(). sample_output
contains correct output for exactly these tests (your correct submitted
program should produce the same output). As you implement and test
your code incrementally, you likely do not want to start by running all
of these tests. What you will want to do is either comment out
calls to them in TestDriver::runAllTests in test_driver.C, and add them
back in one at a time as you add more functionality and/or comment out
all or part of the body of the HfpDriver::testX functions, and as you
add and test more functionality, uncomment more and more parts. You
should additionally add test code to test conditions/cases that may not
be fully tested in the test code I'm giving you.
What to Turn In