This assignment is to be done with your assigned Lab 1 partner. You may not work with other groups, and the share of workload must be even between both partners. Failing to do either is a violation of the department Academic Integrity policy. Please read over Expectations for Working with Partners on CS Lab work
Lab 1 Partner List for Lab Section A
Lab 1 Partner List for Lab Section B
The code base is quite extensive and will require much reading of API documentation and thorough testing. You are responsible for making significant progress early on; waiting until the last few days will not be manageable. The goals of this assignment include:
Next, clone your Lab 1 git repo into your cs44/labs subdirectory, cd into your repo, and run make setup to build symlinks to library code needed to build the executable:
cd cd cs44/labs git clone [the ssh url to your repo] cd Lab1-partner1-partner2 make setupHere are some instructions on Using Git page (follow the instructions for repos on Swarthmore's GitHub Enterprise server).
If all was successful, you should see the following files and symlinks (files highlighted in blue require modification):
The lowest layer of the WiscDB database systems is the I/O layer. This layer allows the upper level of the system to:
Before reading further you should first read the documentation that describes the I/O layer of WiscDB so that you understand its capabilities. In a nutshell the I/O layer provides an object-oriented interface to the Unix file system with methods to open and close files and to read/write pages of a file. You will utilize these methods to move pages into the buffer pool and higher-level methods (e.g., main.cpp to process a query.
A database buffer pool is an array of fixed-sized memory buffers called frames that are used to hold database pages (also called blocks) that have been read from disk into memory. A page is the unit of transfer between the disk and the buffer pool residing in main memory. Most modern database systems use a page size of at least 8,192 bytes. Another important thing to note is that a database page in memory is an exact copy of the corresponding page on disk when it is first read in. Once a page has been read from disk to the buffer pool, the DBMS software can update information stored on the page, causing the copy in the buffer pool to be different from the copy on disk. Such pages are termed "dirty".
Since the database on disk itself is often larger than the amount of main memory that is available for the buffer pool, only a subset of the database pages fit in memory at any given time. The buffer manager is used to control which pages are memory resident. Whenever the buffer manager receives a request for a data page, the buffer manager checks to see if the requested page is already in the one of the frames that constitutes the buffer pool. If so, the buffer manager simply returns a pointer to the page. If not, the buffer manager frees a frame (possibly by writing the page to disk if it is dirty) and then loads the requested page from disk into the newly available frame.
There are many ways of deciding which page to replace when a free frame is needed. Commonly used policies in operating systems are FIFO (first in first out), MRU (most recently used), and LRU (least recently used). LRU, arguably the most useful policy, suffers from high overhead costs due to the need for a priority queue. An alternative approach, the circular array buffer, or clock algorithm, approximates LRU behavior with much better run time performance.
The image to the left shows the conceptual idea of the clock array. Each square box corresponds to a frame in the buffer pool. Assume that the buffer pool contains numFrames frames, numbered 0 to numFrames-1. Conceptually, all the frames in the buffer pool are arranged in a circular list. Associated with each frame is a bit termed the refbit.
The algorithm is depicted in the flow chart to the right. At any point in time the clock hand (an integer whose value is between 0 and numFrames-1) is advanced in a clockwise fashion. If a page is not valid (i.e., unoccupied), it is an obvious candidate for replacement. Otherwise, we check to see if the page is still pinned since do not want to remove a page from the pool that is still being used. If a page is not in use, we resort to the refbit to approximate our LRU algorithm. If the refbit is true, the page has been recently unpinned and gets a "free pass" (i.e, set the bit to false and move on). Otherwise, we have found a replacement. If the selected buffer frame is dirty (i.e., it has been modified), the page currently occupying the frame is written back to disk. Regardless, the frame is cleared and the requested page from disk is read in to the freed frame. Further details are available below.
The WiscDB buffer manager uses three C++ classes: BufferManager, Frame and BufferHashTable. There is only one instance of the BufferManager class. A key component of this class is the actual buffer pool which consists of an array of numFrames frames, each the size of a database page. In addition to this array, the BufferManager instance also contains an array of numFrames instances of the Frame class that is used to describe the state of each frame in the buffer pool. A hash table is used to keep track of the pages that are currently resident in the buffer pool. This hash table is implemented by an instance of the BufferHashTable class. This instance is a private data member of the BufferManager class. These classes are described in detail below.
The BufferHashTable class is used to map file and page numbers to buffer pool frames and is implemented using chained bucket hashing (separate chaining). You must complete this implementation using the provided definition. The key structure is a HashItem, which stores one item in the hash table (similar to a node in a linked list). This points to a file (File *) and stores the page number (PageId) in the file (these two combine to form the key). The HashItem also stores a frame number (FrameId) (the value associated with the key) to recover the page from the buffer pool. Lastly, the next pointer is used to implement separate chaining; the item points to the next HashItem in the bucket.
You have been provided a hash function as well as a constructor. Do not modify either, but instead read and understand how the work. You will need to complete the implementation for the destructor, insert, lookup, and remove methods as explained in the documentation (either the header file or document web page). Pay attention to corner cases (e.g., empty buckets) as well as the exceptions you will need to throw for errors.
The Frame class is used to keep track of the state of each frame in the buffer pool. It is defined as follows:
First notice that all attributes of the Frame class are private and that the BufferManager class is defined to be a friend. While this may seem strange, this approach restricts access to Frame's private variables to only the BufferManager class. The alternative (making everything public) opens up access too far.
The purpose of most of the attributes of the Frame class should be pretty obvious. The dirty bit, if true indicates that the page is dirty (i.e. has been updated) and thus must be written to disk before the frame is used to hold another page. The pinCnt indicates how many times the page has been pinned. The refbit is used by the clock algorithm. The valid bit is used to indicate whether the frame contains a valid page. You you will need to implement the basic methods of the class (i.e., reset() and load()), just to ensure understanding of the purpose. reset() is invoked when a Frame is emptied and should reset all class variables to a default state. Note that rather than inventing your own constants for invalid states, you should take a look at the documentation to see if any constants have been defined already. For example, the Page class has defined a variable to indicate invalid page numbers (Page::INVALID_NUMBER). Also, the FrameId does not change for an individual frame; this should only be modified by the BufferManager.
load() is invoked after a page has been assigned to a frame; this method should set all member variables appropriately. This method is utilized for a new Page being loaded into the frame, so its pin count should be set to 1.
The BufferManager class is the heart of the buffer manager. This is where you write your code for this assignment. Note that, at a high level, the BufferManager manages a buffer pool of frames that contain pages. In the class definition, this is represented by two arrays: bufPool and frameTable. frameTable is the array of Frames that comprise the pool, holding meta information about each frame. The actual page being stored in a frame at a particular index (i.e., FrameId) is in the bufPool array. That is, if you would like to know the status the frame at FrameId 1, you access frameTable[1]. If you would like the Page stored at FrameId 1, you access bufPool[1]. As mentioned previously, hashTable is a directory that helps find a particular page in the pool quickly. That is, when checking the status of a certain Page object, we obtain it's FrameId by looking it up in the hash table.
This class is defined as follows:
Keep these style and testing guidelines in mind:
Much of the code base was provided and developed by the University of Wisconsin Database Group.
Before the Due Date, push your solution to github from one of your local repos to the GitHub remote repo. Only one of you or your partner needs to do this, but it doesn't hurt if you both do.
From your local repo (in your ~you/cs44/labs/Lab01-partner1-partner2 subdirectory)
make clean git add * git commit -m "our correct, robust, and well commented solution for grading" git push
If that doesn't work, take a look at the "Troubleshooting" section of the Using git page. Also, be sure to complete the README.md file, add and push it.