Due Date
Complete lab: Due by 11:59 p.m., Friday, March 28, 2025.
Checkpoint: Due by 11:59 p.m., Wednesday, March 19, 2025.
Note: we are giving you some extra time on both of these due to the midterm and spring break timing. We encourage you to complete the checkpoint before weekly lab.
Your lab partner for Lab 5 is listed here: Lab 5 lab partners
Our guidelines for working with partners: working with partners, etiquette and expectations
Overview
In this lab you will implement the SwatDB HeapFile
class for storing
pages of variable-sized records. This is a part of the File Management
layer of SwatDB that implements the abstraction of a heap file (an unordered
collection of records). In the previous lab, you implemented a
HeapPage
for storing one page of records in a heap file. For this
lab you will implement an entire heap file. A heap file is an unsorted
file of pages of records. This implementation organizes its pages into
two linked list of HeapPage
pages to make record insertion faster.
The full
list contains pages that are "full" (do not have space
to insert any more records); and the free
list contains pages that
have so free space (one, or more, of them may have enough free space to
handle the next record insertion request).
The primary goal of the SwatDB lab assignments is to gain an understanding of the details of how a relational DBMS works by implementing and testing parts of a relational DBMS. The SwatDB code base is quite extensive and will require close reading of its documentation (see Useful SwatDB Files).
Lab Goals
The main goals of the SwatDB Heap File Lab are:
-
Understanding the structure of a DBMS heap file, and details of implementing its interface.
-
Understand how the file layer interacts with the buffer manager and disk manager layers of a DBMS.
-
Further practice with manipulating low-level data structures in C++, and mapping types onto raw bytes of memory.
-
Developing a thorough testing methodology for a large system.
-
Understanding the role of abstraction in large systems.
-
Practice working with part of a large code base, much of which you have access to only through its interface definition (i.e.,
.h
files and generated interface documentation).
Starting Point Code
Find your git repo for this lab assignment off the GitHub server for our class: CS44-s25
Here are some detailed instructions on using git for CS44 labs.
Clone your git repo (Lab5-userID1-userID2
) containing starting point files into your
labs directory:
cd ~/cs44/labs
git clone [the ssh url to your your repo]
cd Lab5-userID1-userID2
If all was successful, you should see the following files (highlighted files require modification):
Lab 5 Files
-
Makefile
- pre-defined. You may edit this file to add extra targets. Here are a few of the most common commands that you will use:make make clean # rm all built files and any DB files created in test code make runtests # runs all test programs make gcov # build a gcov version of untittests (gcovunit)
-
README.adoc
- some directions about how to compile and run test programs for theHeapFile
. -
heapfile.h
- the SwatDBHeapFile
class, and related struct definitions.Do not add any new data members or public methods. You may add private helper methods for good modular code design.
-
heapfile.cpp
- the SwatDBHeapFile
class implementation. Most of the code you implement will be in this file. All methods defined inheapfile.h
are implemented in this file. Note that some are already implemented for you.Be sure to include good function comments in the this file too (function comments in
.h
files alone is not sufficient, the function definitions in the.cpp
files should also be well commented). For any missing functions, start by copying and pasting function comments from the .h file into here. Then, modify the comments to provide information to a reader of the C++ implementation). Also, include inline comments for any complex code sequences.You will likely want to add and use some private helper method functions for good modular desgin, and to implement some common functionality that may be used by several public method functions.
-
heapfilescanner.h
- the SwatDBHeapFileScanner
class. Defines an object for scanning (or iterating) over all records in aHeapFile
. Its main method isgetNext
.Do not add any new data members or public methods. You may add private helper methods for good modular code design.
-
heapfilescanner.cpp
- the SwatDBHeapFileScanner
class implementation. You will implement the main functions in this class to scan all the records in aHeapFile
. The records on pages in the full list should be scanned first, followed by the records on the pages on the free list.Again, add complete comments. You may add and use private helper method functions for good modular design.
-
checkpt.cpp
- unit testing code for passing the checkpoint. -
checksaved.cpp
- test code to test the persistence of heap files. This is written in an application program style vs. using the unit-test infrastructure (like the sandbox programs from other labs). Feel free to use this add your own testing code. In addition to testing persistence, it is meant to complement the unit tests and to help with implementation, debugging, or designing new tests. It uses both theHeapFile
and theHeapFileScanner
classes, so it may not be as useful for initial testing. -
fileunittests.cpp
- unit testing code forHeapFile
. This is fairly extensive unit tests for HeapFile and HeapFileScanner classes. However, you should add at least four new tests to this file. See theTODO
comments to see the four tests you are required to add (function stubs are provided for these), and be sure to comment your tests saying what you are testing. Specifically, you need to add tests for:-
insertRecord: add tests for exceptions.
-
insertRecord: add tests for pages moving from free to full list.
-
deleteRecord: add tests for exceptions.
-
deleteRecord: add tests for pages moving from full to free list.
We encourage you to add additional tests beyond the four required to help test functionality more extensively or more iteratively as you implement your code. Note that these unit tests explicitly tests the sizes of the free and full lists (which is different from the checkpt that just checks total number of pages).
-
-
filescannerunittests.cpp
- unit testing code forHeapFileScanner
. This is fairly extensive unit tests for HeapFile and HeapFileScanner classes. However, you should add at least one new test to this file. See theTODO
comments to see the test you are required to add (a function stub is provided for this), and be sure to comment your tests saying what you are testing. Specifically, you need to add tests for scanning after inserts, deletes, and updates. You can add multiple test functions for this if this is easier. We encourage you to add additional tests beyond the one required.
Deliverables
The following will be evaluated for your lab grade:
-
A complete and robust implementation of the
HeapFile
class inheapfile.cpp
file. The class and struct definitions to implement are defined in theheapfile.h
file. Note that there are two methods you do not need to implement (atomicAppend
andanonAppend
are not part of this lab assignment), and that some methods have already been implemented for you. You are encouraged to add private helper methods for good modular code design. -
The class definition in
heapfile.h
. Only add private helper methods to class definitions to support good modular design of your solution. Do not add public methods or data members to any classes or structs defined in this file. -
A complete and robust implementation of the
HeapFileScanner
class inheapfilescanner.cpp
file. The class and struct definitions to implement are defined in theheapfilescanner.h
file. -
The class definition in
heapfilescanner.h
. Only add private helper methods to class definitions to support good modular design of your solution. Do not add public methods or data members to any classes or structs defined in this file. -
Passes all test program tests
-
Complete Test Suite. You must add at least two tests to
fileunittests.cpp
and one test tofilescannerunittests.cpp
. Test code added should be well commented, clearly explaining the specificHeapFile
orHeapFileScanner
functionality it tests. -
Your Lab 5 Questionnaire to be completed individually (This will open on the due date and close after 3 days)
Running Test Code
The unit test programs and the checksaved test programs create SwatDB heap files that are stored on disk. The test code is designed to clean-up these files before it exits. However, if your code crashes these files may stick around, and they will cause subsequent runs to crash.
With your starting point code, is a clean-up script that you can run to clean-up these files:
./checkpt # run the checkpoint unittests
./cleanup.sh # remove .db and .rel files not cleaned up from running checkpt
If the .sh
scripts do not run, make sure they are executable, and
run chmod
to add executable permissions if not:
ls -l *.sh
-rwx------ cleanup.sh # should see x permission (read,write,execute)
chmod 700 cheanup.sh # set to rwx if not (7, 111 in binary, sets rwx)
ls -l *.sh
You can run each test program individually from the command line:
./checkpt
./fileunittests
./filescannerunittests
./checksaved
You can run all the test programs using a makefile command:
make runtests
Or can run some or all of them using one of the run script that runs each and clean-ups any left over files after each one (this is the easiest way to run multiple test programs if some are buggy):
./runchkpt.sh # runs just the checkpoint test programs
./runtests.sh # runs all of them
You can also create your own run scripts to run a subset of tests, using ours as a starting point:
cp runtests.sh runmine.sh
chmod 700 runmine.sh # set this file to executable
vim runmine.sh # edit the contents to do what you want
-
Here are some bash shell programming links
-
Appendix 2 of Dive into Systems contains more information about using unix, including chmod.
Checkpoint
Before the checkpoint due date, you should complete the HeapFile
functionality necessary to pass all the unit tests in the checkpt
program.
We recommend that you run and
test these using the run scripts that cleanup state after crashed
runs:
./runchkpt.sh
You can also the checkpt program directly, and run the cleanup.sh
script to clean up any files left over on incomplete runs
(these files end in .rel
or .db
):
./checkpt # run the checkpoint unittests
./cleanup.sh # clean-up any state not cleaned up from this program
While we recommend dealing with exceptions as you implement the methods, we do not require that all exceptions are implemented for the checkpoint.
The checkpoint functionality includes most HeapFile
methods fully
implemented, except for deleteRecord, and updateRecord.
This means that you need to handle adding pages to the file, and moving
pages from the free to the full list, but do not need to handle
moving pages from the full to the free list or removing pages from the
file.
You may want to add some of your own unit tests as you work on the checkpoint. We encourage you to add them into the mytests
SUITE of the
fileunittests.cpp
file, but you are welcome to add them into checkpt.cpp
if you’d like.
Useful SwatDB Files
In this assignment you will implement the the HeapFile
and HeapFileScanner
part of the File Management layer of SwatDB. Implementing a relation file
requires support for creating and deleting a file, and for inserting, deleting,
and updating records in the file. These operations may result in requests to
the Buffer Manager layer to get, release, allocate or deallocate a Page for the
file. The implementation also makes use of the HeapPage
class' interface to
perform operations on individual pages of records.
For information about SwatDB, including a link to its on-line code documentation, see this page:
SwatDB documentation that will be particularly helpful for this lab includes:
-
The Buffer Manager class defines the interface to the buffer manager layer of SwatDB. Note: when calling buffer manager method functions, be sure to appropriately handle or pass through any exceptions it throws.
-
The File Class defines that base class for all File objects in the system.
HeapFile
is derived from the class. Look at its documentation to see methods and data members that theHeapFile
inherits. -
The Page Class defines that base class for all pages in the system. Its
getData
method provides access to the raw page data: thePAGE_SIZE
bytes of memory space for a page. No derived class ofPage
can add additional data methods, they can only map structure on top of the raw page data. -
The Record class is a structure for storing record information. It has a data member of type Data class is a simple structure that stores the actual record data.
Many
HeapFile
methods are passed pointers toRecords
as parameters. Looking at the test code examples and at the documentation to help understand how to access and set information for records inserted or retrieved from the page. -
The HeapPage Class is used by
HeapFile
. You should call methods of this class to perform operations on individual pages of the file. -
The HeapPageScanner Class is used by
HeapFileScanner
to scan all the records on a singleHeapPage
. You can call methods of this class to perform operations on individual pages of the file. -
Type and constant definitions in swatdb_types.h.
PageNum
,PageId
,RecordId
,SlotId
, and other types are defined here. -
The SwatdDB Exceptions classes defined at the HeapFile, BufferManager and DiskManager layers in swatdb_exceptions.h that
HeapFile
methods may need to catch or throw. Check the method function comments inheapfile.h
andheapfilescanner.h
to see if a particular method needs to throw an exception(s). Exceptions withBufMgr
orDiskMgr
suffixes are thrown by those layers and are passed through to callers ofHeapFile
methods.Look at the SwatDB documentation for the exception classes, and look at the SwatDB info page for examples of how to throw and catch exceptions.
Lab Details
The HeapFile
class is defined in the
heapfile.h
header file included with the lab starting point code.
You should not need to modify this file with the exception of
adding private helper methods for good modular design.
Do not make any other modifications to the definitions in this file.
Open this file in an editor to read its contents:
vim heapfile.h
The HeapFileScanner
class is defined in the heapfilescanner.h
header file is included with the lab starting point code.
You should not need to modify this file with the exception that
you may add private helper methods for good modular design.
Do not make any other modifications to the definitions in this file.
Open this file in an editor to read its contents:
vim heapfilescanner.h
Implementation Overview
You will implement a heap file that stores all of its pages of records in two linked-lists of HeapPage pages:
-
one list is a list of full pages (new records cannot be inserted into them)
-
the other is a list of pages with free space (new records may be able to be inserted into one of them).
The records in the heap file are stored on HeapPage
type pages.
The next_page
and prev_page
fields in the HeapPageHeader
of
each page link pages together into a doubly-linked list of the
file’s pages.
A HeapPage
on one list may be moved to another list when a record is
inserted, removed, or updated:
-
Inserting a record may result in a new
HeapPage
being allocated and added to one of these lists (when there is no existing page with enough space to store the new record), or could cause a page to move from thefree
list to thefull
list if the insertion of the record makes the page full. -
Deleting a record may result in the file shrinking by one page in size when the last record on a page is deleted, or could result in a page on the
full
list moving to thefree
list if it was full and now has more space. -
Updating a record may result in a new page being added to the file (if the record increases in size and no current page can store it), and can result in pages moving between the
free
tofull
lists based on changes in the size of the updated record.
You will complete the implementations of the HeapFile
and
HeapFileScanner
classes,
and implement more tests in unittests.cpp
to
test both correctness and robust error handling. The
testing part is described in
more detail in the Testing your code section.
A heap file is organized as a single header page of meta-data about
the heapfile and two linked list of HeapPage
pages that store records.
The heap file’s single header page of metadata includes:
-
free
: thePageNum
of the first page on the file’s linked list ofHeapPages
that have some space in them. -
full
: thePageNum
of the first page on the file’s linked list ofHeapPages
that are full (do not have any space to insert records into). -
free_size
: the number of pages in the free list -
full_size
: the number of pages in the full list -
num_records
: the number of records in the file
as shown in this figure:

HeapFile is derived from the File base class that defines a generic interface to all file types in the system.
Note that the HeapFile
object only exists when SwatDB is running. However,
its underlying set of pages persists between runs (i.e. its header page and
linked list of pages of HeapPage
structured page data are stored as a file on
disk, managed by the DiskManager
layer).
HeapFile objects are created in SwatDB in response to one of two actions:
-
When SwatDB start with an initial .db file, then any DB relation files that are stored on disk are opened and a new
HeapFile
object is created and associated with each relation file. -
During SwatDB runtime, if new relation file is created, a
HeapFile
object is created and associated with the new file.
In the test code with this lab, we provide all the code needed to
create HeapFile
object(s) in response to both of these scenarios.
HeapFile Class
HeapFile
(in heapfile.[h,cpp]
) inherits the following data fields from
the File
class:
-
header_id
: this is thePageId
of the heap file header page. You will allocate a newPage
for a HeapFile’s header page, initialize it, and set this field to the new page’sPageId
value. -
buf_mgr
: a pointer to the SwatDBBufferManager
object that manages the buffer pool. HeapFile methods will need to make calls to allocate/deallocate/get/release file pages usingBufferManager
methods. -
file_id
: the file’s unique identifier in the system. This is set by the File Manager when theHeapFile
object is created. -
catalog
: a pointer to the SwatDB Catalog object that contains information about all files in the system. (you do not need to use this object in your implementation, but it is used in the checksaved test codeprint_fids
function). -
schema
: a pointer to the Schema object describing the relation stored in this file (you do not need to use this in your implementation beyond some error checking).
Heap File Organization
The heap file (shown in [HeapFileFig]) is organized so that:
-
Page 0 is the file’s header page.
-
Remaining Pages are stored in one of two doubly linked list of
HeapPage
pages that store record data. Records can be inserted on any page in the file, and individualHeapPages
can be in any order in the linked lists (i.e. it is not sorted byPageNum
or record field values). -
Full pages must be stored in the full list, and pages that have some space should be stored in the free linked list.
The header page and the record data pages are stored on disk as a file. The DiskManager manages how they are stored on disk.
Heap File Header Page and type re-casting
In heapfile.h
is a struct definition that defines the header page
fields. The header page stores heap file-specific metadata about
the file and its structure. It is mapped onto the underlying Page
data bytes.
The header page of the HeapFile should be allocated on a separate Page
that
stores only the heap file meta data. A struct HeapFileHeader
is mapped onto
the first bytes of the raw data of the Page
. You must not change the
struct definition for the header (its bytes need to map into the exact
locations into the Page
class’s data
array as they are given in the
definition).
Do not store any records in the header page of the heap file. The first
record stored in the file should be stored on a newly allocated HeapPage
which will be added to either the full or free list depending on if it has
space left on it or not.
HeapFile Methods
HeapFile
methods will call
BufferManager
method functions to allocate, deallocate, get, release, and flush pages
from the buffer pool to handle operations on the file. NOTE the
buffer manager flushPage method should not be called on regular heapfile
operations like inserting or removing a record from the file. This method
forces a page to be written to disk, which is useful for rare DB operations,
but for most operations the higher levels should just let the buffer
manager manage the buffer pool, and let its replacement policy decide
when a page in the buffer pool gets written out to disk.
HeapFile
methods will call
HeapPage
to insert/delete/update records on individual pages of the file,
and to set page’s prev_page
and next_page
fields to link pages
together in the two linked lists of pages of record data.
Type recasting
You will need to use type-recasting in a few ways in this assignment.
-
The
BufferManager
methods returnPage *
, but your code needs to manipulate the pages as specific types. You can re-cast return values as a pointer to the specific type to do this. For example:HeapPage *pg; ... // recast the return type of getPage to (HeapPage *) pg = (HeapPage *)buf_mgr->getPage(pg_id);
-
We do not give you methods that recast the raw page data of the header page to a
HeapFileHeader *
in a similar way that we did with theHeapPage
lab. However, we suggest that you add one as a private helper method function:private: /** * @return address of header page recast as * a (HeapFileHeader *) * TODO: what are the pre and post conditions? * TODO: does this throw any exceptions (or pass any through)? */ HeapFileHeader *_getHeaderPage();
We recommend that you define and implement this as the first method, and use it throughout your code. Look at the
HeapPage
lab starting point code for examples of similar methods.
HeapFile interface methods
These methods are implemented for you (look at their implementation for some hints at how to to call Buffer Manager layer methods):
-
constructor: invokes File base class constructor and sets header page to
INVALID_PAGE_ID
. -
createHeader
: allocates and initializes a new header page for the heap file. This method is only invoked when a new heap file is created in the system vs. when an existing heap file is loaded from disk. You do not need to create a header page in your HeapFile code: the SwatDB FileManager creates aHeapFile
object, and it sets the HeapFile’s header page or calls this method to create one. -
flushHeader
: flushes the header page to disk.
The following are HeapFile interface methods you need to implement (you may also want to add some private helper method functions too). NOTE: most of these methods call BufferManager and HeapPage interface method functions to accomplish some of the listed subtasks. Be sure you review the interfaces of those two classes before beginning implementation.
-
insertRecord: inserts a record into the file and returns the
RecordId
of the inserted record.HeapFile
meta data is updated to reflect that the record has been inserted, or an error exception is thrown if the insert fails. The passed record is inserted into the first page on the doubly linked list offree
pages that has enough free space to store it (callgetFreeSpace
method to get the amount of free space on the page). If there are no existing pages that have enough space to store the record, then a new page is allocated for the file, and added to the head of thefree
list. Inserting a record into a page may result in the page filling up (callisFull
on the page to test), in which case the page should be moved from thefree
list to the head of thefull
list.Each Page that needs to be accessed to handle an
insertRecord
(including the header page and allHeapPages
) need to utilize theBufferManager
to first bring the page into the buffer pool. To access a Page, it needs to be pinned (either through a call togetPage
orallocatePage
BufferManager methods), and should be unpinned (via a call toreleasePage
BufferManager method) when the method is done using the page. The dirty bit should be set appropriately for all pages that are modified on an insert record. Any page pinned by this method must be unpinned before the method returns (or before it throws an exception). You will need to keep careful track of when a page is still pinned and when it is no longer needed. Any exceptions in between those two points need to unpin the page before throwing/rethrowing the exception.This method throws exceptions on errors, some of which may come from calls to HeapPage methods or to BufferManager methods that are thrown by the buffer manager or disk manager layer. Look at the function comments for more specific information about these. Any time you access either the HeapPage or BufferManager methods, be sure to check the documentation to make sure you are aware of potential exceptions you may need to handle.
Here are the main steps of this method (note: calls to BufferManager layer to
getPage
,allocatePage
andreleasePage
need to be made in implementing some of these steps):-
Start the implementation of
insertRecord
by checking for some error conditions and throwing exceptions when the checks fail.-
If the passed record is too big, throw an InsufficientSpaceHeapPage exception. HINT:
MAX_RECORD_SIZE
is the largest record aHeapPage
can store. -
The passed record schema must match this files’s schema. This just requires comparing the two Schema pointer values for equality. Throw an InvalidSchemaHeapFile exception if they do not. Examine the class data members and the Record interface to access the schemas.
-
-
Search for a page in the doubly linked
free
list for one that has space to insert the record. There are two main cases:-
If such a page is found, insert the record on the page (call
insertRecord
on the HeapPage). Think carefully about pinning/ unpinning and handling exceptions. -
If the list is exhausted with no suitable candidate, insertRecord will allocate a new
HeapPage
that is inserted at the beginning of the linked list offree
pages.Remember that a call to the
BufferManager
allocatePage
method, just returns a pointer to a Page of the buffer pool (a pointer to a Page-size chunk of memory data). The buffer manager has no idea what type of page the caller wants to map on top of the allocated Page of memory space (it could be an index page, it could be a HeapPage, it could be a header page, it could be …). If you want to use the Page of buffer pool space to store particular state (HeapPage data), you need to initialize that Page of memory space to the appropriate values before you start accessing its contents and interpreting their values as having heap page meaning (the Page returned by the buffer manager is just a page of garbage values in memory). See theHeapPage
interface for any useful methods for this.Again, you will need to carefully think about pinning/unpinning/handling exceptions. Also examine [HeapFileFig] to identify all of the various updates you need to make to insert a new
HeapPage
at the beginning of a doubly-linked list.Note: A
HeapFile
can only store up toMAX_PAGE_NUM
pages of data; throw InsufficientSpaceFilePage exception if adding a new page exceeds this amount.
-
-
Insert the record into the page with space that was found or added to the
free
list.If the page is now full (no more free space after this record is inserted). Remove this page from the free list and add it to the beginning of the linked list of
full
pages. TheHeapPage
isFull
method is useful here. -
Update HeapFile header information with results of the insert.
-
a note about isFull
The SwatDB As a result, this means that sometimes inserting a record into a page
may result in it being considered full even when Your HeapPage implementation should use the result of a call to If you are interested, the exact threshold value is defined in
|
-
getRecord: given a
RecordId
value and a passedRecord
to fill, this method constructs aPageId
from the passedRecordId
and from the files'FileId
value, requests the page from the BufferManger (via a call to thegetPage
method), and calls theHeapPage
getRecord
method to copy the record data from the heap page into the passedRecord *
. The page unpinned from the buffer pool before this function returns or throws an exception (via calls toreleasePage
BufferManager method).This method throws exceptions on errors, some of which may come from calls to
HeapPage
methods or toBufferManager
methods that are thrown by the buffer manager or disk manager layer. Look at the function comments for more specific information about them. -
updateRecord: updates an existing record to its new passed value. Note: an updated record must retain its
RecordId
value. This means that an update cannot move a record from one page of the heap file to another. The HeapPagegetFreeSpace
will be helpful here.Like other HeapFile methods, any pages of the file that are accessed by the method needs to be pinned before accessed (via getPage/allocatePage) and should be unpinned (via releasePage) when the method is done using them. All pages pinned by this method must be unpinned before the method returns (or before it throws an exception). This method throws exceptions, some of which may come from calls to HeapPage or BufferManager methods.
A successful updateRecord may cause a page to move from the
free
list to the beginning of thefull
list (if the new record is larger than its previous value and now fills the page), or may cause a page to move from thefull
list to the beginning of thefree
list (if the new record is smaller than its previous value, and is now considered to have enough space to no longer be full (see the NOTE ininsertRecord
above aboutisFull
). TheHeapPage
isFull
method will be helpful here. -
deleteRecord deletes a record, given its
RecordId
, from the file. This method could shrink the heap file by one page, or it could result in a page on thefull
list moving to the beginning of thefree
list.Here are some of the big steps
deleteRecord
must take (you need to refine these):-
Get the page corresponding to this record from the buffer manager.
-
Determine if the
RecordId
is valid. -
Delete the record from the page (using the correct
HeapPage
method) -
Determine if the page containing the deleted record can be removed from the file and remove it from the linked list of pages (the
HeapPage
isEmpty
method will be helpful here). -
If the page is not empty after remove, and if this page is on the full list and now has free space, check to see if the page is no longer full (call
isFull
to check), and if so, move it to the beginning of thefree
list. If you correctly move pages from the full to free list on insert and update, then the HeapPageisFull
method is sufficient test if the page is on thefull
list (before removing the record)); you do not need to traverse thefull
list to see if the page is on it. (see the NOTE ininsertRecord
above aboutisFull
). -
Update
HeapFile
metadata in the header to reflect a successful record delete.
Like other methods, all pages accessed need to be pinned in the buffer pool and all pages pinned by this method need to be unpinned when the method returns (or throws an exception). It may throw exceptions, which may result from its calls to
BufferManager
orHeapPage
method functions. -
HeapFile debugging methods
There are several methods already implemented that you can use
for debugging purposes only. See the checksaved.cpp
test program for
some examples of their use, you can also call these from gdb
or
use them in debugging printout in unittest.cpp
.
Also, search for THIS METHOD IS FOR DEBUGGING ONLY
in the comments
of heapfile.[h,cpp]
and heapfilescanner.[h,cpp]
files to find them.
Exceptions
For information about how to throw and catch different SwatDB exception
objects, look at the Exceptions Section of
the SwatDB information page. You will need to think carefully about the
cause of these exceptions in order to know how to handle them. Use the
method function comments @throw
as a guide for which exceptions methods
may need to throw or to pass through. These comments will also help you
think about some of the error conditions your code may need to check for
and handle appropriately.
For this lab Buffer Manager and Disk Manager layers may throw exceptions in response to calls from HeapFile method functions. Your code needs to determine if it needs to catch the exception or let it pass through.
Remember that if a method throws (or passes through) an exception, it should clean-up any internal state it has partially modified. For this lab, this may include unpinning pages that the method has pinned prior to throwing (or re-throwing) an exception.
HeapFileScanner Class
The HeapFileScanner
class implements a scanner over all records of a
HeapFile
. Your solutions should first scan over all records in the
full
list, followed by all records in the free
list. To scan the
records on each page, you will use a HeapPageScanner
object on the page.
You will implement two methods:
-
HeapFileScanner
constructor: This method sets up state for starting to scan records on the first page in the file (scanning all pages on the full list first). It should leaved the first page of the file that is being scanned pinned in the buffer pool (in the case when the file is not empty), but should not leave the file’s header page pinned. It sets up the following state for the heapfile scanning:-
buf_mgr
: a pointer to theBufferManager
object. Needed to get and release pages being scanned. -
file
: a pointer to theHeapFile
being scanned. -
cur_page
: a pointer to the currentHeapPage
of the file being scanned (either the head page of thefull
list or the head page of thefree
list, unless if the file is empty). -
cur_pid
: the PageId of the current page (the first page to scan, orINVALID_PAGE_ID
if the file is empty). -
scanner
: aHeapPageScanner
object associated with the firstHeapPage
of records to be scanned. -
end_of_full
: set to true if the full list is empty, false otherwise. This is a flag that is used to indicate that all the pages in the full list have been scanned. -
end_of_free
: set to true if the full list and the free list are both empty, likely set to false otherwise. This is a flag that is used to indicate that all the pages in the free list have been scanned (indicating that all records in the file have been scanned since the pages in the full list are scanned before the pages in the free list). For a file where the full list has pages, and the free list is empty, you likely want to initialize this to false even though there are no free list pages. However, you should initialize this field appropriately depending on how you use this flag in your implementation ofgetNext
.
-
-
getNext
: returns theRecordId
of the next record in the file. It keeps the current page pinned in the buffer pool. It may need to occasionally pin the header page to find the next record, but it should never leave the header page pinned in the buffer pool. If there are no more records in the file, it returnsINVALID_RECORD_ID
, and it should leave no pinned pages in the buffer pool.In the easiest case, this method just involves calling and returning the
RecordId
the is returned from calling togetNext
on itsscanner
field. If, however, this call returnsINVALID_SLOT_ID
, then all the records on the current page have been scanned and this method needs to find the next page in the file, and return the first record on that page. Here are some of the main steps for getting the record on the next page:-
release the old current page from the buffer pool. You may want to get some state from this page before you release it.
-
get the next page in the file. If this page’s next page is
INVALID_PAGE_NUM
(it is the last page in the list), and if you are currently scanning the full list, then get the first page on the free list as the next page. Be sure to set theend_of_full
field to true to indicate that all the pages on the full list have been scanned. If the free list is empty, or if this is the last page on the free list, then the scan is complete (set theend_of_free
flag to true). Be sure to release the header page before this method returns if you need the header page to get the next page in the file. -
update
cur_page
andcur_pid
to the new page or to signify there are no more pages in the file. -
reset
thescanner
on this newcur_page
. -
call
getNext
on theHeapPage
scanner to get the next Record, and return itsRecordId
.
-
Lab Requirements
In addition to completing the implementation of the HeapFile
class, and adding more tests to fileunittest.cpp
and
filescannerunittests.cpp
, you should also:
-
Declare and use variables of the types defined in
swatdb_types.h
as opposed to their underlying type definition. Use constants and enum types defined in this file - they help make the code more readable. For example, if a method returns aFrameId
, declare a variable of typeFrameId
rather thanstd:uint32_t
orint
to store its return value:FrameId frame_num; PageId pg_id; //get returns a FrameId, storing it as an int compiles but loses valuable // information about the purpose of frame_num frame_num = buf_map->get(pg_id);
-
Write good C++ code design, and good modular design in your solution. This includes using defined constants and types.
-
Ensure you code is robust to errors, in particular, be sure to test for error handling for exceptions that could be thrown by the buffer manager or disk manager layers, and determine if they need to be caught or passed through.
-
Ensure your code is free of valgrind errors.
-
Make sure your code is well-commented, and there is no line wrapping. (See our C++ Style guide link from the Handy Links section).
-
Your submitted code should have all of our TODO comments removed…as you implement a TODO, remove it. These are also helpful to find parts of the given code that you need to implement.
Testing your code
There are four test files in the starting point code:
-
checkpt.cpp
: unit tests for the checkpoint# run all of the checkpt test suites ./checkpt # or you can run individual test suites alone using -s testSuiteName ./checkpt -s X # run just the X test suite # to list the test suites names run with -h ./checkpt -h
You can also use the run scripts to run (and cleanup) checkpt and checksaved tests:
./runchkpt.sh
See Running Test Code for more information about running test code and scripts.
-
checksaved.cpp
: checks the persistence of your heapfile operations. Any modifications to theHeapFile
, like inserting records, should persist between runs of SwatDB. This test checks that your solution is implemented so that changes to theHeapFile
in memory translated to changes on disk. This is accomplished by your solution callingBufferManager
methods correctly. -
fileunittests.cpp
: the start of a set of unit tests for the heap page. You are required as part of this lab to develop more unit tests that you must add in this file. -
filescannerunittests.cpp
: the unit tests for the heap file scanner class. You are required as part of this lab to develop more unit tests that you must add in this file.
You may use any or all of these to start your HeapFile
implementation,
but you will need to use all of them to verify your program is working
correctly. The checkpt.cpp
, and checksaved.cpp
unit tests are
useful for testing early on.
Required Unit Tests
-
The
fileunittest.cpp
implements some Heap File test code using the unit tests framework from CS35 (as do the other test programs likecheckpt.cpp
). You must add at least 4 additional tests tofileunittest.cpp
by following the code examples in this file to help you structure your code.fileunittest.cpp
contains an incomplete set of test functions. Use this file to add tests beyond those tested by thecheckpt.cpp
. There are function stubs for the 4 types of tests to add. Specifically, you need to add tests for:-
insertRecord: add tests for exceptions.
-
insertRecord: add tests for pages moving from free to full list.
-
deleteRecord: add tests for exceptions.
-
deleteRecord: add tests for pages moving from full to free list.
Here are some things to consider as you develop these and others:
-
tests for exceptions
-
tests that consider boundary testing or stress testing each main operations: insert, delete, update
-
tests that stress test different cases of combinations of operations.
-
tests that produce file sizes that are larger than the size of memory (the file has more total pages the the buffer pool).
-
-
filescannerunittests.cpp
: the start of a set of unit tests for the heap page scanner. You are required to add at least one more test to test the scanner after deletes and updates. Feel free to break these up into multiple tests. There is a function stub for this in the starting point.
You are welcome to add additional test suites beyond the required ones. Follow the same structure as these and the other test SUITES in this file to do so.
Cleaning up Corrupted Files
SwatDB maintains several files to allow for persistent storage of data. While that is not a central feature of this lab, a consequence of a program crash is that temporary files do not get properly cleaned up since the DBMS did not shutdown cleanly. This can cause problems when you try to rerun the program, so you will need to clean this up. Use the two options below:
./cleanup.sh
make clean # or make clean also runs cleanup.sh then just re-build
make
Also see Running Test Code for running using the run scripts that include calls to the cleanup script for you.
Tips and Hints
The following are some tips to help you implement the HeapFile:
-
Run the test programs to see which tests you are passing and which fail. This will help you find missing functionality in your code, and some missing cases. These tests are incomplete and part of this assignment is designing and writing more unittests to test your implementation more thoroughly.
-
Implement and test incrementally. Use the checkpt.cpp tests as a guide for what functionality to implement and test first and fileunittests.cpp and filescannerunittests.cpp for more detailed tests (note the the tests in
checkpt.cpp
do not check that the pages are on the correct list (free
orfull
) like the other tests programs do. We suggest this implementation order (as you add post-checkpoint functionality, you should also add more test to the other unit test programs to completely test functionality):-
insertRecord
: just inserting some records to not completely fill one Heap page of records.-
need to allocate a new page for the first insert and add it to the beginning of the
free
list. -
call
insertRecord
on theHeapPage
to insert the record on the page. -
update the header page information to reflect a successful insert.
-
make sure all file pages are unpinned after the call to insertRecord
-
-
getRecord
: implement all functionality, at least all without possibly complete exception/error handling. -
insertRecord
: add support for inserting records that span more than one heap page of records.-
This requires traversing the linked list of pages to find a page with enough free space to insert the record.
-
This may require moving a page from the free to the full list.
-
This may require adding in a new page the free or full linked list of pages (a new page should be added to the front of the lists), and updating appropriate next and prev fields in pages.
-
ensure that header page information is updated correctly.
-
ensure that all pages modified are unpinned before this method returns.
-
-
updateRecord
: implement all functionality, or at least all perhaps missing some error handling.-
call
updateRecord
onHeapPage
to update. -
make sure all file pages are unpinned after the call to insertRecord.
-
-
deleteRecord
:-
first try a delete that does not result in removing a page from the file
-
check file header information is correct after a delete and that succeeds
-
check that none of the pages accessed in deleteRecord, are pinned in buffer pool after the call.
-
next, try delete that moves a page from the full to the free list.
-
next, try delete that deletes the last record on a page. The page should be removed from the free or full linked list of pages that it is currently in, and
deallocatePage
should be called to remove it from the underlying file.
-
-
HeapFileScanner
:-
constructor: find the first page in the file to scan. Scan pages on the full list before any on the free list.
-
getNext
: make sure that you handle all cases-
for a file with only one page
-
handling the last record in the file
-
for a file with with multiple pages
-
for a file with only pages on the full list
-
for a file with only pages on the free pages
-
for a file with both full and free list pages
-
-
-
Run the
checksaved
program to ensure that file changes are persistent. This program uses your HeapFileScanner implementation.
-
-
Refer to Wednesday in-lab code examples for C++, gdb, gcov, and valgrind reminders.
-
Remember the
&
operator returns the address of its argument (this is C-style operator, not&
used to specify reference parameter types in C++). Its argument must be an lvalue (a storage location, such as the name of a variable or an array bucket). For example, to get the address of the 3rd bucket in and array of ints:int array[20];
you would use the
&
operator like this (in this example I’m assigning its value to an int * variable):int *ptr; ptr = &(array[3]);
-
Refer to the recasting part of Lab Details for some example type recasting code. Also look at the private method functions in the starting point for the
HeapPage
lab as another example.
Submitting your lab
Review the lab deliverables to ensure you have completed all of your work. Before the due date, push your solution to github from one of your local repos to the GitHub remote repo.
From your local repo (in your ~/cs44/labs/Lab5-userID1-userID2
subdirectory)
make clean
git add *.h *.cpp # only add .h and .cpp file: DO NOT DO git add *
git commit -m "my correct and well commented solution for grading"
git push
Verify that the results appear (e.g., by viewing the the repository on CS44-s25). You will receive deductions for submitting code that does not run or repos with merge conflicts. Also note that the time stamp of your final submission is used to verify late days, so please do not update your repo until after the late period has ended.
If that doesn’t work, take a look at the "Troubleshooting" section of the Using git for CS44 labs and the Using git pages. At this point, you should submit the required Lab 5 Questionnaire (each lab partner must do this).
Handy References
Examples
-
Review in lab exercises from
SwatDB
-
Information about SwatDB
C++ programming
-
Some C++ Programming Resources and Links including the C++ Style Guide
-
C++ programming tools compiling, linking, debugging C++
-
gdb and valgrind from Dive into Systems, and my (very similar) gdb and valgrind guides.
-
C references in Dive into Systems (some useful for C++ programming too) Chapter 2: C pointers, command line arguments, C debugging tools (valgrind, gdb for C)
Unix
-
Appendix 2: Using Unix from Dive into Systems
-
my CS help pages (Unix tools, programming links)
Git
-
Using Git detailed Git guide