1. Due Date
Complete lab: Due by 11:59 p.m., Thursday, April 24, 2025.. Pass all tests in smalltests
, projecttests
,
selecttests
results are similar to ours.
Checkpoint: Tuesday, April 15. Pass all tests in smalltests
.
Your lab partner for Lab 7 is listed here: Lab 7 lab partners
Our guidelines for working with partners: working with partners, etiquette and expectations
2. Overview
In this lab you will implement Select and Project operators in the RelOps layer of the SwatDB database management system.
The RelOps layer interacts with the FileManager layer of SwatDB. Your implementations of Select and Project will make use of Index and Heap files to access individual records to perform the requested operation, and will store the result in a new Heap. For Project the schema of the result file will be a subset of the source file’s schema.
You will implement two versions of Select. One version uses a file scan to find records that meet the select condition, and the other uses an index to find records matching the select condition. You will implement a file scan algorithm for performing the Project operation, and you will not remove duplicate records in the result (note: this is unlike what a project operation should do, but is a simplification for this lab assignment).
For this assignment you will not write large amounts of code. However, you will need to spend significant amount of time reading the starting point code and reading SwatDB documentation in order to figure out how to implement your solution; the code you write for your solution uses many SwatDB classes, some of which you have used and/or written in past assignments, but many of which are new to this assignment.
The primary goal of the SwatDB lab assignments is to gain an understanding of the details of how a relational DBMS works by implementing and testing parts of a relational DBMS.
The SwatDB code base is quite extensive and will require significant reading of its documentation (see Section 3).
2.1. Lab Goals
The main goals of the SwatDB RelOps Lab are:
-
Understanding how the relational operators Select and Project can be implemented in a DBMS.
-
Understanding different ways of implementing the same operation using different access methods to the underlying relational data.
-
Gaining some practice evaluating different implementations of the same relational operation.
-
Developing a testing strategy for a large, complex system; making use of a provided unit testing framework, and debugging tools.
-
Practice working with part of a large code base, much of which you have access to only through its interface definition (i.e.,
.h
files and generated documentation).
2.2. Starting Point Code
If you have not already done so, first create a course directory for this course, and add a lab subdirectory for your lab repos:
mkdir -p cs44
mkdir -p cs44/labs
cd cs44/labs
We will be using git
repos hosted on the college’s
GitHub server for labs in this class.
If you have not used git
or the college’s GitHub server before, here
are some detailed instructions on
using
git for CS44 labs.
Next find your git repo for this lab assignment off the GitHub server for our class: CS44-s25
Clone your git repo (Lab7-userID1-userID2
) containing starting point files into your
labs directory:
cd ~/cs44/labs
git clone [the ssh url to your your repo]
cd Lab7-userID1-userID2
If all was successful, you should see the following files (highlighted files require modification):
2.2.1. Lab 7 Files
-
Makefile
- pre-defined. You may edit this file to add extra source files or execution commands. -
README.adoc
- some directions about how to compile and run test programs for the RelOps layer. See information in this file about running test programs and about cleaning up relation files if need be.
The test programs create relation files and these can be
be corrupted if your programs exit unexpectedly. There is a
cleanup.sh
file you can run to clean these up in this case.
Running make clean
will also run the clean up script.
2.2.2. RelOps Manager layer Files
Most of the RelOpsManager is implemented for you (most of these
files are for your reference). However, you do need to
implement one method function in the RelOpsManager
class.
You should not add any new data members to
these classes or public methods. You may add private helper
methods for good modular code design.
-
relopsmgr.h
: interface to the relops manager layer. These are interface functions to initiate different operations (specific versions of select, project, and join) on specific relation operand(s). You do not need to modify this file. -
relopsmgr.cpp
: contains some of the mainRelOpsManager
methods, including the_createResultFile
method that you will use to store SELECT operation results. You do not need to modify this file. -
relopsmgr_projets.cpp
: you will implement the_createProjectRes
method that creates a file to store PROJECT operation results.It also contains the
project
method that picks the project operation type. You do not need to modify theproject
method, but we suggest you read through it to understand what it is doing, and to understand the values it passes to the specific project operation. -
relopsmgr_selects.cpp
: contains theselect
method that picks a select operation type to perform. You do not need to modify this file, but we suggest you read through theselect
method to understand what it is doing. and to understand the values it passes to the specific select operation.
2.2.3. Select and Project Operation Files
Specific operations are derived from the Operations
base class.
-
operation.[h,cpp]
: the Operation base class. You do not need to modify this, but is here for your reference. -
project.[h,cpp]
: the Project class and implementation. You will implement therunOperation
method that implements project. -
select.[h,cpp]
: the Select class. FileScan and IndexScan are derived from this class. You will implement the constructor. -
filescan.[h,cpp]
: the File Scan implementation of Select. This class is derived from theSelect
class. You will implement therunOperation
method. -
indexscan.[h,cpp]
: the Index Scan implementation of Select. This class is derived from theSelect
class. You will implement therunOperation
method.
2.2.4. Select and Project Test Files
We are giving you a lot of unit tests for this assignment.
You are not required to add more tests as part of this assignment. However, you still may want to add some more for further stress testing your solution. If you add new tests, please add verbose comments describing what your test is testing.
-
smalltests.cpp
- unit testing code for the small DB. The checkpoint for this lab assignment. -
projecttests.cpp
- unit testing code for Project. -
selecttests.cpp
- unit testing code for the two Select operations.
2.2.5. Scripts (.sh files) and Test Databases
This lab creates and uses some swatDB instances with some relations files.
The instances and files are created in
/scratch/<your user name>/cs44swatDBfiles/
. After you run make
,
you can cat out the .db files to see the relations and their schema.
For example, user tnas
would do this:
cat /scratch/tnas/cs44swatDBfiles/small.db
-
README.adoc
: documentation about the scripts -
Makefile
: make command run these scripts for you (you can run them through make command vs. running a.sh
file at the command line. -
Make sure the
.sh
files are executable:ls -l # should list x permission with executable files chmod 700 *.sh # set permission to rwx for owner if not
-
./cleanup.sh
: cleanup created DB and files from incomplete runs. (make clean
will also do this) -
./mktestconf.sh
followed by./getfiles.sh
will build the examples SwatDB databases for testing. However, just runmake
to build the .db files (easier than running the scripts by hand).
2.3. Deliverables
The following will be evaluated for your lab grade:
-
A complete and robust implementation of the assigned select and project operations and the RelOps manager methods. This includes adding complete comments, and removing all my
TODO
comments to you. -
Passes all unit tests in:
smalltests.cpp
,selecttests.cpp
, andprojecttests.cpp
. You are not required to add additional tests for this lab, but you may want to add some to test your solution, and are welcome to add more to these files (please add descriptive comments to any that you add). -
The class definitions in
.h
files. Only add private helper methods to class definitions to support good modular design of your solution. Do not add public methods or data members to any classes defined in these file. Also note where you can and cannot add private data members to classes (look for comments in.h
files and in lab write-up). -
Your TBA[Lab 7 Questionnaire] to be completed individually (This will open on the due date and close after 3 days)
2.4. Checkpoint
Before the checkpoint due date, you should complete the
functionality to pass all the unit tests in the smalltests
program.
The checkpoint functionality includes:
-
Implementation of the file scan version of Select.
-
Implementation of Project (and _createProjectFile)
While we recommend dealing with exceptions as you implement the methods, we will not require that exceptions are implemented for the checkpoint.
Also note that other unit test programs stress test these methods, so it is possible that after passing the checkpoint, you may still need to go back to these implementations and do some debugging.
3. SwatDB
This assignment implements parts of the RelOps Manager part of SwatDB, including implementing specific relational operators.
For information about SwatDB, including a link to its on-line code documentation, see this page:
In addition to the .h
files distributed with this lab, the
SwatDB documentation that will be particularly helpful for this lab
includes:
-
schema.h: create a
Schema
object for relation file results -
record.h: get
Record
data from file tuples, create result Record to add to result file. -
key.h: create and use search keys.
-
searchkeyformat.h: need to create a
SearchKeyFormat
object that you use to initialize aKey
used for selects. -
filemgr.h: create
HeapFile
relation file for the operation result. -
heapfilescanner.h: scan through records in a HeapFile.
-
hashindexcanner.h: use this to scan through index to get find
Rid
values that match the selection criteria. -
Common SwatDB type definitions, defined in swatdb_types.h
-
The Exceptions classes are defined in swatdb_exceptions.h. You may need to catch some, and throw others.
4. Lab Details
All relational operations are invoked through the RelOpsManager
object
(defined in relopsmgr.h
, relopsmgr.cpp
, relopsmgr_selects.cpp
, and
relopsmgr_projects.cpp
)
that implements the interface to the relational operations manager layer
of SwatDB. Its has interface methods for each of the main operations: select,
project, and join. These methods take parameters that specify the relation(s)
on which to perform the operation and some take a parameter that specifies
the specific type of algorithm to use to perform the operation.
Specific relation operations are implemented as classes derived from the
Operation
base class (defined in operation.[h,cpp]
). Start by
investigating the class hierarchy for different versions of relations
operators to understand what different project and select classes inherit. For
example, filescan
is derived from Select
which is derived from Operation
.
The on-line documentation is useful for seeing the class hierarchy. You can
view .h
files there or we have included several of the related .h
files for
you with the lab starting point code.
4.1. What to implement
For this lab, you will not implement a large amount of code, but you will need to spend a fair amount of time reading starting point code and SwatDB class documentation to determine how and where to implement the select and project operations.
4.2. RelOpsManager
RelOpsManager class
: this class implements the interface to the
relational operators level of SwatDB; it has public methods for
performing select, project, and join operations. It also has a
method, checkFilesEqual
, that can be used to test the results of
operations.
In addition to public methods, it has several private methods that create the correct result file for the operation. For this assignment, you will implement one of these functions for creating the file for the project results.
The RelOpsManager
methods are implemented in several .cpp
files:
realopsmgr.cpp
, relopsmgr_selects.cpp
, and relopsmgr_projects.cpp
)
-
In
relopsmgr.cpp
, several methods are implemented for you, including:-
FileID FileId _createResultFile(Schema *schema)
: creates a new HeapFile to store the result of an operation.RelOpsManager::select
(inrelopsmgr_selects.cpp
) calls this method directly to create the result file (the result and source relations have the same schema. You do not need to implement this method, but we recommend that you read it to understand what it does.
-
4.2.1. relopsmgr_projects.cpp
-
RelOpsManager::project
: runs project operation. You do not need to implement this method, but we recommend that you read it to understand what it does. -
FileID _createProjectRes(Schema *rel_schema, std::vector<FieldId> fields)
: creates a new HeapFile to store the result of a project operation. Note that the result file’s schema is a subset of the relation file on which the project operation is performed. You will implement this function.This function should:
-
Create the
Schema
for the project result from the files’srel_schema
given the vector ofFieldIds
in the project result.Note that the
RelOpsManager
class is a friend class of theSchema
class, which means that it can directly access thefield_list
field of the passedrel_schema
Schema object.Also note that
FieldId
values are the positions (the index values) of each field in thefield_list
vector of therel_schema
of the relation.The primary key for the result Schema can be empty (i.e., pass
{ }
). -
Call the
_createResultFile
method, passing it the Schema you create in (1).
-
4.2.2. relopsmgr_selects.cpp
-
RelOpsManager::select
: runs a select operation of a specified type. You do not need to implement this method, but we recommend that you read it to understand what it does.
4.3. Operation
The Operation
class is the base class of all relational operations.
There is nothing for you to implement, but you should look through the
operation.[h,cpp]
files to see the class definition and what some
of its methods do. In particular, note:
-
struct fileState
(defined inoperation.h
). This structure stores state for the operand files and indices and for the result file of the particular operation. Some of the fields in this struct can be used to store and manipulate record data as part of the operation.Since every operation has a result file, the
fileState
for the result relation file is a field of theOperation
base class (result_state
), and it is initialized by the constructor of theOperation
class.A
fileState
struct for the source relation (and index) files is initialized theSelect
andProject
constructors. Look at the starting point code that contains calls to_initState
that performs this operation. -
The
_initState
method is called by theOperation
constructor and by the constructors of the derived classesSelect
andProject
. This method initializes thefileState
structs associated with the result and the source files for the operation. -
The
_delState
method cleans-up andfileState
structs created for the operation.
4.4. Project
The Project
class is derived from the Operation
class.
-
Project class
: this is the base class for specific project operations. You do not need to modify this class, but look at its declaration (inproject.h
) to see the data members that are part of the base class, and which you will use to implement project operations.
You do not need to remove duplicates from the result relation for project. The default in SQL is to not remove duplicates, so since this a relational operator to implement an SQL query, you should not remove duplicates from the result (strict relational algebra project does, SQL project does not). |
4.5. Select
The Select class
: this is the base class for specific select operations.
It is derived from the Operation
class.
-
Select::Select
: complete the implementation of the constructor, by adding error detection and handling. Check that the field ids are within bounds and check that the values comparisons and fields passed in are all the same size. Throw appropriate exceptions if not.Also, look at this class definition (in
select.h
) to see the data members that are part of the base class, and which you will use to implement select operations.If an exception is thrown in the constructor, the destructor of its base class,
Operation
, is invoked, but the destructor of theSelect
class is not (the object has not yet been successfully created). As a result, any state you allocated before the exception, needs to be cleaned-up in the constructor before the exception is thrown.
4.5.1. FileScan
The FileScan
class: derived from the Select
class.
-
runOperation
: implements the file scan select operation. Your algorithm should scan over all records in the file, and add those matching the select conditions to the result file. You will need to use theHeapFileScanner
class to scan over all records in the file.
4.5.2. IndexScan
The IndexScan class: derived from the Select
class
-
IndexScan
: complete the implementation of the constructor. This should initialize theindex_file
data member by calling thegetFile
method of the catalog, and check that the index fields match the fields on which the select operation is being performed. This should check for errors and throw appropriate exceptions on different errors (see comments in .h file). Note that theSelect
base class parts are already initialized in the part of the constructor we give you with the starting point.the index’s key fields and the select fields must be the same for the index scan version of select to proceed. Although an index on a subset of the select fields could be used if the select condition is in conjunctive normal form, you do not need to handle this case for this lab assignment.
Also, see the NOTE about the Select constructor behavior when an exception is thrown.
-
runOperation
: implement the index scan select operation. Your algorithm will create aHashIndexScanner
on theHashIndex
to find RIDs of records in the file that match the selection criteria (remember that only equality select criteria can be used with a hash index). You can then callgetRecord
on the relation file to extract the full matching record data from the relation file to add to the result file.You will make use of the
HashIndexScanner::HashIndexScanner(HashIndexFile* index, Key *key_val)
method that scans over all hash index entries with a matchingkey_val
.You will also need to make use of the
Key
andSearchKeyFormat
classes to create objects that can be passed to and compared with index file entries.When creating a
Key
for the index scan, you can just create a newKey
object ofMAX_RECORD_SIZE
to allocate enough space for a key of any size.Also note that this
Key
constructor allocates space that the caller must explicitly free when done. See its function comment for more information.Be sure that any temporary objects created in this method are deleted before exit.
5. Lab Requirements
In addition to correctly implementing parts of the Project and Select operations, and adding code to test your implementation, you should also:
-
Declare and use variables of the types defined in
swatdb_types.h
as opposed to their underlying type definition. Also use constants and enum types defined in this file - they help make the code more readable. For example, if a method returns aFileId
, declare a variable of typeFileId
rather thanstd:uint32_t
orint
to store its return value:FileId result_fileid;
-
Write good C++ code design, and good modular design in your solution. This includes using defined constants and types.
-
Ensure you code is robust to errors, in particular, be sure to test for error handling for exceptions that should be thrown and caught by the buffer manager.
-
Ensure your code is free of valgrind errors.
-
Make sure your code is well-commented, and there is no line wrapping. (See our C++ Style guide link from the Handy Links section.
-
Your code should be free of all compiler warnings. The one exception is that there is a known deprecation warning with
SHA1_
functions that SwatDB uses for hashing. If you see these, you may ignore them. -
Your submitted code should have all of our TODO comments removed…as you implement a TODO, remove it. These (as well as NOTE comments) are also helpful to find parts of the given code that you need to implement.
6. Testing your code
There are several test files in the starting point code. They use the same unittests framework you used in CS35, and test various relational operator functionality and exceptions:
6.1. unit tests
-
smalletests.cpp
: file scan select and project on a very small relation DB (one that you can print out all relations and examine results) -
projecttests.cpp
: extensive tests of project on a much larger DB -
selecttests.cpp
: extensive tests of select on a much larger DB
You can add additional tests to any of these files by following the examples
in this the files (add them as a new test SUITE
separate from the ones
we give you.
6.2. Test DB relations
When you type make
, along with building the unittest executables, the
Makefile rule runs the getfiles.sh
script which creates .db
and
relations files in /scratch/yourusername/cs44swatDBfiles/
directory.
Two DBs are created, small.db
and tables.db
, and are used in the
unit test programs.
When you type make clean
the Makefile is set up to run the cleanup.sh
script to remove these DB files.
The .db
files created are ascii files and are readable in an editor
program. The unit test code also has commented out calls for printing
out the Catalog and relation files in the test code (note: for large
relation files, only the first 50 records are printed).
You should not need to ever run either by hand, but you can. See the README.adoc files form more information.
6.3. To run unit test programs:
# run all of the unittest test suites
make runtests
# run individual tests
./smalltests
./projecttests
./selecttests
# or you can run individual test suites alone using -s testSuiteName
./projecttests -s ExceptionTests # run just ExceptionTests test suite
# to list the test suites names run with -h, for example:
./selecttests -h
6.4. Cleaning up corrupted files
Run make clean
or you can explicitly run the ./cleanup.sh
to
remove the DB files.
./cleanup.sh
7. Tips and Hints
7.1. General Tips
-
Spend some time reading the starting point source code, and looking at SwatDB docs to get an idea of how the methods you need to implement are called starting from the
RelopsManager
. There is a fair amount of inheritance here, particularly with the select operations. In addition to some of the File and Index interface functions, take time to understand theRecord
andSchema
class interfaces as well. -
Implement at test incrementally. Use the
smalltests.cpp
to help guide the order and testing that you do. -
Make use of gdb and valgrind to help you as you go.
-
Look at past weekly lab page for help with C++, gdb, and valgrind.
-
Make use of the
cleanup.sh
script to clean up state from incomplete previous runs. You can also runmake clean; make
to clean-up and to regenerate the source test files.Look at the information in Section 2.2.1 about the script files and how to use them (or implicitly use them with make command), and how to view the
.db
file contents for the test SwatDB database instances used by the test code. -
Read the
README.adoc
file about some of the scripts. Also, look at theMakefile
to see what is being built (and cleaned up) and where. See Week 8 for more information about Makefiles to help you read it.
7.2. Suggested Order
Here is a suggestion for an order in which to implement Relations Operators:
-
Start with the file scan version of Select (in
filescan.[h,cpp]
, and get it to work with thesmalltests.cpp
unit tests. You will want to refer to the base classselect.[h,cpp]
andoperation.[h,cpp]
as you implement. The main steps are:-
Create a new HeapFile for the result (this step is done for you in the starting point).
-
Create a new HeapFileScanner object on the HeapFile source.
-
Scan each record in the source relation file, and if it matches the selection criteria, insert the record in the result file.
-
Test on the smalltests.cpp unit tests first.
-
-
Next, implement project operation, and get it to work with the
smalltests.cpp
unit tests.Like Select, project will do a file scan of the source relation, and for each record add its projected version to the result file. Unlike Select, the schema for the result relation is different from that of the source relation.
-
Start by implementing the
_createProjectRes
method of theRelOpsManager
class inrelopsmgr_projects.cpp
. This create a new HeapFile result file for the projection operation. You will need to create a newSchema
for this file that consists of a subset of the fields of the source file. Once you create the new Schema, you can call the RelOpsManager_createResultFile
method to create the result file. -
Next, perform a scan of the source heap file, and for each record create a new record with just the projected fields and add it to the result file.
-
Test on the smalltests.cpp unit tests first.
-
Test on the smalltests.cpp unit tests first.
-
8. Submitting your lab
Review the lab deliverables to ensure you have completed all of your work. Before the due date, push your solution to github from one of your local repos to the GitHub remote repo.
From your local repo (in your ~/cs44/labs/Lab7-userID1-userID2
subdirectory)
make clean
git add *.h *.cpp
git commit -m "my correct and well commented solution for grading"
git push
Be careful not to add binary files to your repo (executable or .o files
that are compiled when you run To avoid adding these files to your repo,
NEVER run these commands: Instead add only the files you want to add to your repo by explicitly
listing them ( Here are three alternative ways to submit a group of changes to lab files
using some
Run |
Verify that the results appear (e.g., by viewing the the repository on CS44-s25). You will receive deductions for submitting code that does not run or repos with merge conflicts. Also note that the time stamp of your final submission is used to verify late days, so please do not update your repo until after the late period has ended.
If that doesn’t work, take a look at the "Troubleshooting" section of the Using git for CS44 labs and the Using git pages. At this point, you should submit the required TBA[Lab 7 Questionnaire] (each lab partner must do this).
9. Handy References
-
Information about SwatDB
-
Review in lab exercises from Week 9
-
Some C++ Programming Resources and Links including the C++ Style Guide
-
C++ programming tools compiling, linking, debugging C++
-
C references in Dive into Systems (some useful for C++ programming too) Chapter 2: C pointers, command line arguments; Chapter 3: debugging tools (valgrind, gdb for C)
-
gdb Guide also in Chapt. 3 of Dive into systems
-
Valgrind Guide also in Chapt. 3 of Dive into systems
-
my CS help pages (Unix tools, programming links)
-
Appendix 2: Using Unix from Dive into Systems textbook
-
Using Git more complete Git guide