Database Systems

Due Date

Complete lab: Due by 11:59 p.m., Friday, March 28, 2025.

Checkpoint: Due by 11:59 p.m., Wednesday, March 19, 2025.

Note: we are giving you some extra time on both of these due to the midterm and spring break timing. We encourage you to complete the checkpoint before weekly lab.

Your lab partner for Lab 5 is listed here: Lab 5 lab partners

Our guidelines for working with partners: working with partners, etiquette and expectations

Overview

In this lab you will implement the SwatDB HeapFile class for storing pages of variable-sized records. This is a part of the File Management layer of SwatDB that implements the abstraction of a heap file (an unordered collection of records). In the previous lab, you implemented a HeapPage for storing one page of records in a heap file. For this lab you will implement an entire heap file. A heap file is an unsorted file of pages of records. This implementation organizes its pages into two linked list of HeapPage pages to make record insertion faster. The full list contains pages that are "full" (do not have space to insert any more records); and the free list contains pages that have so free space (one, or more, of them may have enough free space to handle the next record insertion request).

The primary goal of the SwatDB lab assignments is to gain an understanding of the details of how a relational DBMS works by implementing and testing parts of a relational DBMS. The SwatDB code base is quite extensive and will require close reading of its documentation (see Useful SwatDB Files).

Lab Goals

The main goals of the SwatDB Heap File Lab are:

Understanding the structure of a DBMS heap file, and details of implementing its interface.
Understand how the file layer interacts with the buffer manager and disk manager layers of a DBMS.
Further practice with manipulating low-level data structures in C++, and mapping types onto raw bytes of memory.
Developing a thorough testing methodology for a large system.
Understanding the role of abstraction in large systems.
Practice working with part of a large code base, much of which you have access to only through its interface definition (i.e., .h files and generated interface documentation).

Starting Point Code

Find your git repo for this lab assignment off the GitHub server for our class: CS44-s25

Here are some detailed instructions on using git for CS44 labs.

Clone your git repo (Lab5-userID1-userID2) containing starting point files into your labs directory:

cd ~/cs44/labs
git clone [the ssh url to your your repo]
cd Lab5-userID1-userID2

If all was successful, you should see the following files (highlighted files require modification):

Lab 5 Files

Makefile - pre-defined. You may edit this file to add extra targets. Here are a few of the most common commands that you will use:

make
make clean      # rm all built files and any DB files created in test code
make runtests   # runs all test programs
make gcov       # build a gcov version of untittests (gcovunit)

README.adoc - some directions about how to compile and run test programs for the HeapFile.
heapfile.h - the SwatDB HeapFile class, and related struct definitions.

Do not add any new data members or public methods. You may add private helper methods for good modular code design.
heapfile.cpp - the SwatDB HeapFile class implementation. Most of the code you implement will be in this file. All methods defined in heapfile.h are implemented in this file. Note that some are already implemented for you.

Be sure to include good function comments in the this file too (function comments in .h files alone is not sufficient, the function definitions in the .cpp files should also be well commented). For any missing functions, start by copying and pasting function comments from the .h file into here. Then, modify the comments to provide information to a reader of the C++ implementation). Also, include inline comments for any complex code sequences.

You will likely want to add and use some private helper method functions for good modular desgin, and to implement some common functionality that may be used by several public method functions.
heapfilescanner.h - the SwatDB HeapFileScanner class. Defines an object for scanning (or iterating) over all records in a HeapFile. Its main method is getNext.

Do not add any new data members or public methods. You may add private helper methods for good modular code design.
heapfilescanner.cpp - the SwatDB HeapFileScanner class implementation. You will implement the main functions in this class to scan all the records in a HeapFile. The records on pages in the full list should be scanned first, followed by the records on the pages on the free list.

Again, add complete comments. You may add and use private helper method functions for good modular design.
checkpt.cpp - unit testing code for passing the checkpoint.
checksaved.cpp - test code to test the persistence of heap files. This is written in an application program style vs. using the unit-test infrastructure (like the sandbox programs from other labs). Feel free to use this add your own testing code. In addition to testing persistence, it is meant to complement the unit tests and to help with implementation, debugging, or designing new tests. It uses both the HeapFile and the HeapFileScanner classes, so it may not be as useful for initial testing.
fileunittests.cpp - unit testing code for HeapFile. This is fairly extensive unit tests for HeapFile and HeapFileScanner classes. However, you should add at least four new tests to this file. See the TODO comments to see the four tests you are required to add (function stubs are provided for these), and be sure to comment your tests saying what you are testing. Specifically, you need to add tests for:
- insertRecord: add tests for exceptions.
- insertRecord: add tests for pages moving from free to full list.
- deleteRecord: add tests for exceptions.
- deleteRecord: add tests for pages moving from full to free list.
We encourage you to add additional tests beyond the four required to help test functionality more extensively or more iteratively as you implement your code. Note that these unit tests explicitly tests the sizes of the free and full lists (which is different from the checkpt that just checks total number of pages).
filescannerunittests.cpp - unit testing code for HeapFileScanner. This is fairly extensive unit tests for HeapFile and HeapFileScanner classes. However, you should add at least one new test to this file. See the TODO comments to see the test you are required to add (a function stub is provided for this), and be sure to comment your tests saying what you are testing. Specifically, you need to add tests for scanning after inserts, deletes, and updates. You can add multiple test functions for this if this is easier. We encourage you to add additional tests beyond the one required.

Deliverables

The following will be evaluated for your lab grade:

A complete and robust implementation of the HeapFile class in heapfile.cpp file. The class and struct definitions to implement are defined in the heapfile.h file. Note that there are two methods you do not need to implement (atomicAppend and anonAppend are not part of this lab assignment), and that some methods have already been implemented for you. You are encouraged to add private helper methods for good modular code design.
The class definition in heapfile.h. Only add private helper methods to class definitions to support good modular design of your solution. Do not add public methods or data members to any classes or structs defined in this file.
A complete and robust implementation of the HeapFileScanner class in heapfilescanner.cpp file. The class and struct definitions to implement are defined in the heapfilescanner.h file.
The class definition in heapfilescanner.h. Only add private helper methods to class definitions to support good modular design of your solution. Do not add public methods or data members to any classes or structs defined in this file.
Passes all test program tests
Complete Test Suite. You must add at least two tests to fileunittests.cpp and one test to filescannerunittests.cpp. Test code added should be well commented, clearly explaining the specific HeapFile or HeapFileScanner functionality it tests.
Your Lab 5 Questionnaire to be completed individually (This will open on the due date and close after 3 days)

Running Test Code

The unit test programs and the checksaved test programs create SwatDB heap files that are stored on disk. The test code is designed to clean-up these files before it exits. However, if your code crashes these files may stick around, and they will cause subsequent runs to crash.

With your starting point code, is a clean-up script that you can run to clean-up these files:

./checkpt        # run the checkpoint unittests
./cleanup.sh     # remove .db and .rel files not cleaned up from running checkpt

If the .sh scripts do not run, make sure they are executable, and run chmod to add executable permissions if not:

ls -l *.sh
-rwx------   cleanup.sh   # should see x permission (read,write,execute)
chmod 700 cheanup.sh      # set to rwx if not (7, 111 in binary, sets rwx)
ls -l *.sh

You can run each test program individually from the command line:

./checkpt
./fileunittests
./filescannerunittests
./checksaved

You can run all the test programs using a makefile command:

make runtests

Or can run some or all of them using one of the run script that runs each and clean-ups any left over files after each one (this is the easiest way to run multiple test programs if some are buggy):

./runchkpt.sh           # runs just the checkpoint test programs
./runtests.sh           # runs all of them

You can also create your own run scripts to run a subset of tests, using ours as a starting point:

cp runtests.sh runmine.sh
chmod 700 runmine.sh         # set this file to executable
vim runmine.sh               # edit the contents to do what you want

Here are some bash shell programming links
Appendix 2 of Dive into Systems contains more information about using unix, including chmod.

Checkpoint

Before the checkpoint due date, you should complete the HeapFile functionality necessary to pass all the unit tests in the checkpt program. We recommend that you run and test these using the run scripts that cleanup state after crashed runs:

./runchkpt.sh

You can also the checkpt program directly, and run the cleanup.sh script to clean up any files left over on incomplete runs (these files end in .rel or .db):

./checkpt        # run the checkpoint unittests
./cleanup.sh     # clean-up any state not cleaned up from this program

While we recommend dealing with exceptions as you implement the methods, we do not require that all exceptions are implemented for the checkpoint.

The checkpoint functionality includes most HeapFile methods fully implemented, except for deleteRecord, and updateRecord. This means that you need to handle adding pages to the file, and moving pages from the free to the full list, but do not need to handle moving pages from the full to the free list or removing pages from the file.

You may want to add some of your own unit tests as you work on the checkpoint. We encourage you to add them into the mytests SUITE of the fileunittests.cpp file, but you are welcome to add them into checkpt.cpp if you’d like.

Useful SwatDB Files

In this assignment you will implement the the HeapFile and HeapFileScanner part of the File Management layer of SwatDB. Implementing a relation file requires support for creating and deleting a file, and for inserting, deleting, and updating records in the file. These operations may result in requests to the Buffer Manager layer to get, release, allocate or deallocate a Page for the file. The implementation also makes use of the HeapPage class' interface to perform operations on individual pages of records.

For information about SwatDB, including a link to its on-line code documentation, see this page:

About SwatDB

SwatDB documentation that will be particularly helpful for this lab includes:

The Buffer Manager class defines the interface to the buffer manager layer of SwatDB. Note: when calling buffer manager method functions, be sure to appropriately handle or pass through any exceptions it throws.
The File Class defines that base class for all File objects in the system. HeapFile is derived from the class. Look at its documentation to see methods and data members that the HeapFile inherits.
The Page Class defines that base class for all pages in the system. Its getData method provides access to the raw page data: the PAGE_SIZE bytes of memory space for a page. No derived class of Page can add additional data methods, they can only map structure on top of the raw page data.
The Record class is a structure for storing record information. It has a data member of type Data class is a simple structure that stores the actual record data.

Many HeapFile methods are passed pointers to Records as parameters. Looking at the test code examples and at the documentation to help understand how to access and set information for records inserted or retrieved from the page.
The HeapPage Class is used by HeapFile. You should call methods of this class to perform operations on individual pages of the file.
The HeapPageScanner Class is used by HeapFileScanner to scan all the records on a single HeapPage. You can call methods of this class to perform operations on individual pages of the file.
Type and constant definitions in swatdb_types.h. PageNum, PageId, RecordId, SlotId, and other types are defined here.
The SwatdDB Exceptions classes defined at the HeapFile, BufferManager and DiskManager layers in swatdb_exceptions.h that HeapFile methods may need to catch or throw. Check the method function comments in heapfile.h and heapfilescanner.h to see if a particular method needs to throw an exception(s). Exceptions with BufMgr or DiskMgr suffixes are thrown by those layers and are passed through to callers of HeapFile methods.

Look at the SwatDB documentation for the exception classes, and look at the SwatDB info page for examples of how to throw and catch exceptions.

Lab Details

The HeapFile class is defined in the heapfile.h header file included with the lab starting point code. You should not need to modify this file with the exception of adding private helper methods for good modular design. Do not make any other modifications to the definitions in this file.

Open this file in an editor to read its contents:

vim heapfile.h

The HeapFileScanner class is defined in the heapfilescanner.h header file is included with the lab starting point code. You should not need to modify this file with the exception that you may add private helper methods for good modular design. Do not make any other modifications to the definitions in this file.

Open this file in an editor to read its contents:

vim heapfilescanner.h

Implementation Overview

You will implement a heap file that stores all of its pages of records in two linked-lists of HeapPage pages:

one list is a list of full pages (new records cannot be inserted into them)
the other is a list of pages with free space (new records may be able to be inserted into one of them).

The records in the heap file are stored on HeapPage type pages. The next_page and prev_page fields in the HeapPageHeader of each page link pages together into a doubly-linked list of the file’s pages.

A HeapPage on one list may be moved to another list when a record is inserted, removed, or updated:

Inserting a record may result in a new HeapPage being allocated and added to one of these lists (when there is no existing page with enough space to store the new record), or could cause a page to move from the free list to the full list if the insertion of the record makes the page full.
Deleting a record may result in the file shrinking by one page in size when the last record on a page is deleted, or could result in a page on the full list moving to the free list if it was full and now has more space.
Updating a record may result in a new page being added to the file (if the record increases in size and no current page can store it), and can result in pages moving between the free to full lists based on changes in the size of the updated record.

You will complete the implementations of the HeapFile and HeapFileScanner classes, and implement more tests in unittests.cpp to test both correctness and robust error handling. The testing part is described in more detail in the Testing your code section.

A heap file is organized as a single header page of meta-data about the heapfile and two linked list of HeapPage pages that store records. The heap file’s single header page of metadata includes:

free: the PageNum of the first page on the file’s linked list of HeapPages that have some space in them.
full: the PageNum of the first page on the file’s linked list of HeapPages that are full (do not have any space to insert records into).
free_size: the number of pages in the free list
full_size: the number of pages in the full list
num_records: the number of records in the file

as shown in this figure:

The Heap File has a header page with meta data about the file including the head page of each of its linked lists of pages of records, the number of pages in each list, and the number of records in the file. Each page is a HeapPage storing variable-sized record data.

HeapFile is derived from the File base class that defines a generic interface to all file types in the system.

Note that the HeapFile object only exists when SwatDB is running. However, its underlying set of pages persists between runs (i.e. its header page and linked list of pages of HeapPage structured page data are stored as a file on disk, managed by the DiskManager layer).

HeapFile objects are created in SwatDB in response to one of two actions:

When SwatDB start with an initial .db file, then any DB relation files that are stored on disk are opened and a new HeapFile object is created and associated with each relation file.
During SwatDB runtime, if new relation file is created, a HeapFile object is created and associated with the new file.

In the test code with this lab, we provide all the code needed to create HeapFile object(s) in response to both of these scenarios.

HeapFile Class

HeapFile (in heapfile.[h,cpp]) inherits the following data fields from the File class:

header_id: this is the PageId of the heap file header page. You will allocate a new Page for a HeapFile’s header page, initialize it, and set this field to the new page’s PageId value.
buf_mgr: a pointer to the SwatDB BufferManager object that manages the buffer pool. HeapFile methods will need to make calls to allocate/deallocate/get/release file pages using BufferManager methods.
file_id: the file’s unique identifier in the system. This is set by the File Manager when the HeapFile object is created.
catalog: a pointer to the SwatDB Catalog object that contains information about all files in the system. (you do not need to use this object in your implementation, but it is used in the checksaved test code print_fids function).
schema: a pointer to the Schema object describing the relation stored in this file (you do not need to use this in your implementation beyond some error checking).

Heap File Organization

The heap file (shown in [HeapFileFig]) is organized so that:

Page 0 is the file’s header page.
Remaining Pages are stored in one of two doubly linked list of HeapPage pages that store record data. Records can be inserted on any page in the file, and individual HeapPages can be in any order in the linked lists (i.e. it is not sorted by PageNum or record field values).
Full pages must be stored in the full list, and pages that have some space should be stored in the free linked list.

The header page and the record data pages are stored on disk as a file. The DiskManager manages how they are stored on disk.

Heap File Header Page and type re-casting

In heapfile.h is a struct definition that defines the header page fields. The header page stores heap file-specific metadata about the file and its structure. It is mapped onto the underlying Page data bytes.

The header page of the HeapFile should be allocated on a separate Page that stores only the heap file meta data. A struct HeapFileHeader is mapped onto the first bytes of the raw data of the Page. You must not change the struct definition for the header (its bytes need to map into the exact locations into the Page class’s data array as they are given in the definition).

Do not store any records in the header page of the heap file. The first record stored in the file should be stored on a newly allocated HeapPage which will be added to either the full or free list depending on if it has space left on it or not.

HeapFile Methods

HeapFile methods will call BufferManager method functions to allocate, deallocate, get, release, and flush pages from the buffer pool to handle operations on the file. NOTE the buffer manager flushPage method should not be called on regular heapfile operations like inserting or removing a record from the file. This method forces a page to be written to disk, which is useful for rare DB operations, but for most operations the higher levels should just let the buffer manager manage the buffer pool, and let its replacement policy decide when a page in the buffer pool gets written out to disk.

HeapFile methods will call HeapPage to insert/delete/update records on individual pages of the file, and to set page’s prev_page and next_page fields to link pages together in the two linked lists of pages of record data.

Type recasting

You will need to use type-recasting in a few ways in this assignment.

The BufferManager methods return Page *, but your code needs to manipulate the pages as specific types. You can re-cast return values as a pointer to the specific type to do this. For example:
```
HeapPage *pg;
...
// recast the return type of getPage to (HeapPage *)
pg = (HeapPage *)buf_mgr->getPage(pg_id);
```
We do not give you methods that recast the raw page data of the header page to a HeapFileHeader * in a similar way that we did with the HeapPage lab. However, we suggest that you add one as a private helper method function:
```
private:
  /**
   * @return address of  header page recast as
   *         a (HeapFileHeader *)
   * TODO: what are the pre and post conditions?
   * TODO: does this throw any exceptions (or pass any through)?
   */
  HeapFileHeader *_getHeaderPage();
```
We recommend that you define and implement this as the first method, and use it throughout your code. Look at the HeapPage lab starting point code for examples of similar methods.

HeapFile interface methods

These methods are implemented for you (look at their implementation for some hints at how to to call Buffer Manager layer methods):

constructor: invokes File base class constructor and sets header page to INVALID_PAGE_ID.
createHeader: allocates and initializes a new header page for the heap file. This method is only invoked when a new heap file is created in the system vs. when an existing heap file is loaded from disk. You do not need to create a header page in your HeapFile code: the SwatDB FileManager creates a HeapFile object, and it sets the HeapFile’s header page or calls this method to create one.
flushHeader: flushes the header page to disk.

The following are HeapFile interface methods you need to implement (you may also want to add some private helper method functions too). NOTE: most of these methods call BufferManager and HeapPage interface method functions to accomplish some of the listed subtasks. Be sure you review the interfaces of those two classes before beginning implementation.

insertRecord: inserts a record into the file and returns the RecordId of the inserted record. HeapFile meta data is updated to reflect that the record has been inserted, or an error exception is thrown if the insert fails. The passed record is inserted into the first page on the doubly linked list of free pages that has enough free space to store it (call getFreeSpace method to get the amount of free space on the page). If there are no existing pages that have enough space to store the record, then a new page is allocated for the file, and added to the head of the free list. Inserting a record into a page may result in the page filling up (call isFull on the page to test), in which case the page should be moved from the free list to the head of the full list.

Each Page that needs to be accessed to handle an insertRecord (including the header page and all HeapPages) need to utilize the BufferManager to first bring the page into the buffer pool. To access a Page, it needs to be pinned (either through a call to getPage or allocatePage BufferManager methods), and should be unpinned (via a call to releasePage BufferManager method) when the method is done using the page. The dirty bit should be set appropriately for all pages that are modified on an insert record. Any page pinned by this method must be unpinned before the method returns (or before it throws an exception). You will need to keep careful track of when a page is still pinned and when it is no longer needed. Any exceptions in between those two points need to unpin the page before throwing/rethrowing the exception.

This method throws exceptions on errors, some of which may come from calls to HeapPage methods or to BufferManager methods that are thrown by the buffer manager or disk manager layer. Look at the function comments for more specific information about these. Any time you access either the HeapPage or BufferManager methods, be sure to check the documentation to make sure you are aware of potential exceptions you may need to handle.

Here are the main steps of this method (note: calls to BufferManager layer to getPage, allocatePage and releasePage need to be made in implementing some of these steps):
1. Start the implementation of insertRecord by checking for some error conditions and throwing exceptions when the checks fail.
  1. If the passed record is too big, throw an InsufficientSpaceHeapPage exception. HINT: MAX_RECORD_SIZE is the largest record a HeapPage can store.
  2. The passed record schema must match this files’s schema. This just requires comparing the two Schema pointer values for equality. Throw an InvalidSchemaHeapFile exception if they do not. Examine the class data members and the Record interface to access the schemas.
2. Search for a page in the doubly linked free list for one that has space to insert the record. There are two main cases:
  1. If such a page is found, insert the record on the page (call insertRecord on the HeapPage). Think carefully about pinning/ unpinning and handling exceptions.
  2. If the list is exhausted with no suitable candidate, insertRecord will allocate a new HeapPage that is inserted at the beginning of the linked list of free pages.
    
    Remember that a call to the BufferManager allocatePage method, just returns a pointer to a Page of the buffer pool (a pointer to a Page-size chunk of memory data). The buffer manager has no idea what type of page the caller wants to map on top of the allocated Page of memory space (it could be an index page, it could be a HeapPage, it could be a header page, it could be …). If you want to use the Page of buffer pool space to store particular state (HeapPage data), you need to initialize that Page of memory space to the appropriate values before you start accessing its contents and interpreting their values as having heap page meaning (the Page returned by the buffer manager is just a page of garbage values in memory). See the HeapPage interface for any useful methods for this.
    
    Again, you will need to carefully think about pinning/unpinning/handling exceptions. Also examine [HeapFileFig] to identify all of the various updates you need to make to insert a new HeapPage at the beginning of a doubly-linked list.
    
    Note: A HeapFile can only store up to MAX_PAGE_NUM pages of data; throw InsufficientSpaceFilePage exception if adding a new page exceeds this amount.
3. Insert the record into the page with space that was found or added to the free list.
  
  If the page is now full (no more free space after this record is inserted). Remove this page from the free list and add it to the beginning of the linked list of full pages. The HeapPage isFull method is useful here.
4. Update HeapFile header information with results of the insert.

a note about isFull

The SwatDB isFull method of the HeapPage class is implemented slightly differently than you implemented it in the HeapPage lab assignment. It considers a page to be full if it is above some threshold of fullness. In other words a call to this method returns true when the page is close to full or completely full. This is a common space/time tradeoff in a real system to avoid having a lot of pages on the free list that are almost full with little or no real free space for inserting new records.

As a result, this means that sometimes inserting a record into a page may result in it being considered full even when getFreeSpace returns a postive value, and that sometimes deleting a very small record from a "full" page results in the page still being considered full by SwatDB (isFull still returns true after the delete).

Your HeapPage implementation should use the result of a call to isFull after an insert, or delete, or update, to decide if the page should be moved between the free and the full list of the HeapFile; don’t use getFreeSpace to make these decisions.

If you are interested, the exact threshold value is defined in swatdb_types.h by the MAX_HEAP_PAGE_LOAD constant. You should not use this value directly, instead call HeapPage isFull (or isEmpty) method to determine if a page is full (or empty) after you perform an operation on a page.

getRecord: given a RecordId value and a passed Record to fill, this method constructs a PageId from the passed RecordId and from the files' FileId value, requests the page from the BufferManger (via a call to the getPage method), and calls the HeapPage getRecord method to copy the record data from the heap page into the passed Record *. The page unpinned from the buffer pool before this function returns or throws an exception (via calls to releasePage BufferManager method).

This method throws exceptions on errors, some of which may come from calls to HeapPage methods or to BufferManager methods that are thrown by the buffer manager or disk manager layer. Look at the function comments for more specific information about them.
updateRecord: updates an existing record to its new passed value. Note: an updated record must retain its RecordId value. This means that an update cannot move a record from one page of the heap file to another. The HeapPage getFreeSpace will be helpful here.

Like other HeapFile methods, any pages of the file that are accessed by the method needs to be pinned before accessed (via getPage/allocatePage) and should be unpinned (via releasePage) when the method is done using them. All pages pinned by this method must be unpinned before the method returns (or before it throws an exception). This method throws exceptions, some of which may come from calls to HeapPage or BufferManager methods.

A successful updateRecord may cause a page to move from the free list to the beginning of the full list (if the new record is larger than its previous value and now fills the page), or may cause a page to move from the full list to the beginning of the free list (if the new record is smaller than its previous value, and is now considered to have enough space to no longer be full (see the NOTE in insertRecord above about isFull). The HeapPage isFull method will be helpful here.
deleteRecord deletes a record, given its RecordId, from the file. This method could shrink the heap file by one page, or it could result in a page on the full list moving to the beginning of the free list.
Here are some of the big steps deleteRecord must take (you need to refine these):
1. Get the page corresponding to this record from the buffer manager.
2. Determine if the RecordId is valid.
3. Delete the record from the page (using the correct HeapPage method)
4. Determine if the page containing the deleted record can be removed from the file and remove it from the linked list of pages (the HeapPage isEmpty method will be helpful here).
5. If the page is not empty after remove, and if this page is on the full list and now has free space, check to see if the page is no longer full (call isFull to check), and if so, move it to the beginning of the free list. If you correctly move pages from the full to free list on insert and update, then the HeapPage isFull method is sufficient test if the page is on the full list (before removing the record)); you do not need to traverse the full list to see if the page is on it. (see the NOTE in insertRecord above about isFull).
6. Update HeapFile metadata in the header to reflect a successful record delete.
Like other methods, all pages accessed need to be pinned in the buffer pool and all pages pinned by this method need to be unpinned when the method returns (or throws an exception). It may throw exceptions, which may result from its calls to BufferManager or HeapPage method functions.

HeapFile debugging methods

There are several methods already implemented that you can use for debugging purposes only. See the checksaved.cpp test program for some examples of their use, you can also call these from gdb or use them in debugging printout in unittest.cpp.

Also, search for THIS METHOD IS FOR DEBUGGING ONLY in the comments of heapfile.[h,cpp] and heapfilescanner.[h,cpp] files to find them.

Exceptions

For information about how to throw and catch different SwatDB exception objects, look at the Exceptions Section of the SwatDB information page. You will need to think carefully about the cause of these exceptions in order to know how to handle them. Use the method function comments @throw as a guide for which exceptions methods may need to throw or to pass through. These comments will also help you think about some of the error conditions your code may need to check for and handle appropriately.

For this lab Buffer Manager and Disk Manager layers may throw exceptions in response to calls from HeapFile method functions. Your code needs to determine if it needs to catch the exception or let it pass through.

Remember that if a method throws (or passes through) an exception, it should clean-up any internal state it has partially modified. For this lab, this may include unpinning pages that the method has pinned prior to throwing (or re-throwing) an exception.

HeapFileScanner Class

The HeapFileScanner class implements a scanner over all records of a HeapFile. Your solutions should first scan over all records in the full list, followed by all records in the free list. To scan the records on each page, you will use a HeapPageScanner object on the page.

You will implement two methods:

HeapFileScanner constructor: This method sets up state for starting to scan records on the first page in the file (scanning all pages on the full list first). It should leaved the first page of the file that is being scanned pinned in the buffer pool (in the case when the file is not empty), but should not leave the file’s header page pinned. It sets up the following state for the heapfile scanning:
- buf_mgr: a pointer to the BufferManager object. Needed to get and release pages being scanned.
- file: a pointer to the HeapFile being scanned.
- cur_page: a pointer to the current HeapPage of the file being scanned (either the head page of the full list or the head page of the free list, unless if the file is empty).
- cur_pid: the PageId of the current page (the first page to scan, or INVALID_PAGE_ID if the file is empty).
- scanner: a HeapPageScanner object associated with the first HeapPage of records to be scanned.
- end_of_full: set to true if the full list is empty, false otherwise. This is a flag that is used to indicate that all the pages in the full list have been scanned.
- end_of_free: set to true if the full list and the free list are both empty, likely set to false otherwise. This is a flag that is used to indicate that all the pages in the free list have been scanned (indicating that all records in the file have been scanned since the pages in the full list are scanned before the pages in the free list). For a file where the full list has pages, and the free list is empty, you likely want to initialize this to false even though there are no free list pages. However, you should initialize this field appropriately depending on how you use this flag in your implementation of getNext.
getNext: returns the RecordId of the next record in the file. It keeps the current page pinned in the buffer pool. It may need to occasionally pin the header page to find the next record, but it should never leave the header page pinned in the buffer pool. If there are no more records in the file, it returns INVALID_RECORD_ID, and it should leave no pinned pages in the buffer pool.

In the easiest case, this method just involves calling and returning the RecordId the is returned from calling to getNext on its scanner field. If, however, this call returns INVALID_SLOT_ID, then all the records on the current page have been scanned and this method needs to find the next page in the file, and return the first record on that page. Here are some of the main steps for getting the record on the next page:
- release the old current page from the buffer pool. You may want to get some state from this page before you release it.
- get the next page in the file. If this page’s next page is INVALID_PAGE_NUM (it is the last page in the list), and if you are currently scanning the full list, then get the first page on the free list as the next page. Be sure to set the end_of_full field to true to indicate that all the pages on the full list have been scanned. If the free list is empty, or if this is the last page on the free list, then the scan is complete (set the end_of_free flag to true). Be sure to release the header page before this method returns if you need the header page to get the next page in the file.
- update cur_page and cur_pid to the new page or to signify there are no more pages in the file.
- reset the scanner on this new cur_page.
- call getNext on the HeapPage scanner to get the next Record, and return its RecordId.

Lab Requirements

In addition to completing the implementation of the HeapFile class, and adding more tests to fileunittest.cpp and filescannerunittests.cpp, you should also:

Declare and use variables of the types defined in swatdb_types.h as opposed to their underlying type definition. Use constants and enum types defined in this file - they help make the code more readable. For example, if a method returns a FrameId, declare a variable of type FrameId rather than std:uint32_t or int to store its return value:
```
FrameId frame_num;
PageId  pg_id;

//get returns a FrameId, storing it as an int compiles but loses valuable
// information about the purpose of frame_num
frame_num = buf_map->get(pg_id);
```
Write good C++ code design, and good modular design in your solution. This includes using defined constants and types.
Ensure you code is robust to errors, in particular, be sure to test for error handling for exceptions that could be thrown by the buffer manager or disk manager layers, and determine if they need to be caught or passed through.
Ensure your code is free of valgrind errors.
Make sure your code is well-commented, and there is no line wrapping. (See our C++ Style guide link from the Handy Links section).
Your submitted code should have all of our TODO comments removed…as you implement a TODO, remove it. These are also helpful to find parts of the given code that you need to implement.

Testing your code

There are four test files in the starting point code:

checkpt.cpp: unit tests for the checkpoint

# run all of the checkpt test suites
./checkpt
# or you can run individual test suites alone using -s testSuiteName
./checkpt -s X    # run just the X test suite
# to list the test suites names run with -h
./checkpt -h

You can also use the run scripts to run (and cleanup) checkpt and checksaved tests:

./runchkpt.sh

See Running Test Code for more information about running test code and scripts.

checksaved.cpp: checks the persistence of your heapfile operations. Any modifications to the HeapFile, like inserting records, should persist between runs of SwatDB. This test checks that your solution is implemented so that changes to the HeapFile in memory translated to changes on disk. This is accomplished by your solution calling BufferManager methods correctly.
fileunittests.cpp: the start of a set of unit tests for the heap page. You are required as part of this lab to develop more unit tests that you must add in this file.
filescannerunittests.cpp: the unit tests for the heap file scanner class. You are required as part of this lab to develop more unit tests that you must add in this file.

You may use any or all of these to start your HeapFile implementation, but you will need to use all of them to verify your program is working correctly. The checkpt.cpp, and checksaved.cpp unit tests are useful for testing early on.

Required Unit Tests

The fileunittest.cpp implements some Heap File test code using the unit tests framework from CS35 (as do the other test programs like checkpt.cpp). You must add at least 4 additional tests to fileunittest.cpp by following the code examples in this file to help you structure your code.

fileunittest.cpp contains an incomplete set of test functions. Use this file to add tests beyond those tested by the checkpt.cpp. There are function stubs for the 4 types of tests to add. Specifically, you need to add tests for:
- insertRecord: add tests for exceptions.
- insertRecord: add tests for pages moving from free to full list.
- deleteRecord: add tests for exceptions.
- deleteRecord: add tests for pages moving from full to free list.
Here are some things to consider as you develop these and others:
- tests for exceptions
- tests that consider boundary testing or stress testing each main operations: insert, delete, update
- tests that stress test different cases of combinations of operations.
- tests that produce file sizes that are larger than the size of memory (the file has more total pages the the buffer pool).
filescannerunittests.cpp: the start of a set of unit tests for the heap page scanner. You are required to add at least one more test to test the scanner after deletes and updates. Feel free to break these up into multiple tests. There is a function stub for this in the starting point.

You are welcome to add additional test suites beyond the required ones. Follow the same structure as these and the other test SUITES in this file to do so.

Cleaning up Corrupted Files

SwatDB maintains several files to allow for persistent storage of data. While that is not a central feature of this lab, a consequence of a program crash is that temporary files do not get properly cleaned up since the DBMS did not shutdown cleanly. This can cause problems when you try to rerun the program, so you will need to clean this up. Use the two options below:

./cleanup.sh

make clean     # or make clean also runs cleanup.sh then just re-build
make

Also see Running Test Code for running using the run scripts that include calls to the cleanup script for you.

Tips and Hints

The following are some tips to help you implement the HeapFile:

Run the test programs to see which tests you are passing and which fail. This will help you find missing functionality in your code, and some missing cases. These tests are incomplete and part of this assignment is designing and writing more unittests to test your implementation more thoroughly.
Implement and test incrementally. Use the checkpt.cpp tests as a guide for what functionality to implement and test first and fileunittests.cpp and filescannerunittests.cpp for more detailed tests (note the the tests in checkpt.cpp do not check that the pages are on the correct list (free or full) like the other tests programs do. We suggest this implementation order (as you add post-checkpoint functionality, you should also add more test to the other unit test programs to completely test functionality):
1. insertRecord: just inserting some records to not completely fill one Heap page of records.
  - need to allocate a new page for the first insert and add it to the beginning of the free list.
  - call insertRecord on the HeapPage to insert the record on the page.
  - update the header page information to reflect a successful insert.
  - make sure all file pages are unpinned after the call to insertRecord
2. getRecord: implement all functionality, at least all without possibly complete exception/error handling.
3. insertRecord: add support for inserting records that span more than one heap page of records.
  - This requires traversing the linked list of pages to find a page with enough free space to insert the record.
  - This may require moving a page from the free to the full list.
  - This may require adding in a new page the free or full linked list of pages (a new page should be added to the front of the lists), and updating appropriate next and prev fields in pages.
  - ensure that header page information is updated correctly.
  - ensure that all pages modified are unpinned before this method returns.
4. updateRecord: implement all functionality, or at least all perhaps missing some error handling.
  - call updateRecord on HeapPage to update.
  - make sure all file pages are unpinned after the call to insertRecord.
5. deleteRecord:
  - first try a delete that does not result in removing a page from the file
  - check file header information is correct after a delete and that succeeds
  - check that none of the pages accessed in deleteRecord, are pinned in buffer pool after the call.
  - next, try delete that moves a page from the full to the free list.
  - next, try delete that deletes the last record on a page. The page should be removed from the free or full linked list of pages that it is currently in, and deallocatePage should be called to remove it from the underlying file.
6. HeapFileScanner:
  - constructor: find the first page in the file to scan. Scan pages on the full list before any on the free list.
  - getNext: make sure that you handle all cases
    
    for a file with only one page
    
    handling the last record in the file
    
    for a file with with multiple pages
    
    for a file with only pages on the full list
    
    for a file with only pages on the free pages
    
    for a file with both full and free list pages
7. Run the checksaved program to ensure that file changes are persistent. This program uses your HeapFileScanner implementation.
Refer to Wednesday in-lab code examples for C++, gdb, gcov, and valgrind reminders.
Remember the & operator returns the address of its argument (this is C-style operator, not & used to specify reference parameter types in C++). Its argument must be an lvalue (a storage location, such as the name of a variable or an array bucket). For example, to get the address of the 3rd bucket in and array of ints:
```
int array[20];
```
you would use the & operator like this (in this example I’m assigning its value to an int * variable):
```
int *ptr;
ptr = &(array[3]);
```
Refer to the recasting part of Lab Details for some example type recasting code. Also look at the private method functions in the starting point for the HeapPage lab as another example.

Submitting your lab

Review the lab deliverables to ensure you have completed all of your work. Before the due date, push your solution to github from one of your local repos to the GitHub remote repo.

From your local repo (in your ~/cs44/labs/Lab5-userID1-userID2 subdirectory)

make clean
git add *.h *.cpp   # only add .h and .cpp file:  DO NOT DO git add *
git commit -m "my correct and well commented solution for grading"
git push

Verify that the results appear (e.g., by viewing the the repository on CS44-s25). You will receive deductions for submitting code that does not run or repos with merge conflicts. Also note that the time stamp of your final submission is used to verify late days, so please do not update your repo until after the late period has ended.

If that doesn’t work, take a look at the "Troubleshooting" section of the Using git for CS44 labs and the Using git pages. At this point, you should submit the required Lab 5 Questionnaire (each lab partner must do this).

Handy References

Examples

Review in lab exercises from
- C programming and debugging: Week 1, Week 2, Week 3,
- Pointer arithmetic and type recasting: Week 5

SwatDB

Information about SwatDB
SwatDB Code Documentation

C++ programming

Some C++ Programming Resources and Links including the C++ Style Guide
C++ programming tools compiling, linking, debugging C++
gdb and valgrind from Dive into Systems, and my (very similar) gdb and valgrind guides.
C references in Dive into Systems (some useful for C++ programming too) Chapter 2: C pointers, command line arguments, C debugging tools (valgrind, gdb for C)

Unix

Appendix 2: Using Unix from Dive into Systems
my CS help pages (Unix tools, programming links)
CS Department help pages

Git

Using git for CS44 labs
Using Git detailed Git guide

CS44 Lab 5: SwatDB Heap File