SwatDB
Loading...
Searching...
No Matches
Public Member Functions | Private Member Functions | Private Attributes | List of all members
ParallelHashJoin Class Reference

#include <parallelHashJoin.h>

Inheritance diagram for ParallelHashJoin:
Inheritance graph
[legend]
Collaboration diagram for ParallelHashJoin:
Collaboration graph
[legend]

Public Member Functions

 ParallelHashJoin (FileId outer_id, FileId inner_id, FileId result_id, std::vector< FieldId > outer_fields, std::vector< FieldId > inner_fields, std::uint32_t num_buckets, std::uint32_t num_threads, std::string temp_path, Catalog *catalog, BufferManager *buf_mgr, FileManager *file_mgr)
 Constructor for the join operater using a hash join algorithm. Sets up the state for a single join operation using the specified state.
 
 ~ParallelHashJoin ()
 Destructor for hashjoin class.
 
void runOperation ()
 Performs the join operation using the hash join alogrithm.
 
- Public Member Functions inherited from HashJoin
 HashJoin (FileId outer_id, FileId inner_id, FileId result_id, std::vector< FieldId > outer_fields, std::vector< FieldId > inner_fields, std::uint32_t num_buckets, std::string temp_path, Catalog *catalog, BufferManager *buf_mgr, FileManager *file_mgr)
 Constructor for the join operater using a hash join algorithm. Sets up the state for a single join operation using the specified state.
 
 ~HashJoin ()
 Destructor for hashjoin class.
 
- Public Member Functions inherited from Join
 Join (FileId outer_id, FileId inner_id, FileId result_id, std::vector< FieldId > outer_fields, std::vector< FieldId > inner_fields, Catalog *catalog)
 Constructor for Join operation. Join subclasses use this constructor.
 
 ~Join ()
 Destructor for the Join Operation.
 
- Public Member Functions inherited from Operation
 Operation (FileId result_id, Catalog *catalog)
 Constructor for the Operation class. Because Operation is an abstract class, an object cannot be created, but this constructor is used by derived classes.
 
virtual ~Operation ()
 Destructor for the Operation class. Cleans up dynamic memory in result state.
 

Private Member Functions

void _firstHash (std::uint32_t tid, bool is_outer)
 Performs the initial hashing of each relation for the first step of hash join.
 
void _secondHash (std::uint32_t tid, std::uint32_t n_threads)
 Performs the second hashing of each relation for the second step of hash join.
 
RecordId _parallelPart1 (Record *record, bool is_outer, BlockHeapFileScanner *scanner)
 Performs the main looping functionality for first hash.
 
void _barrier (std::uint32_t n_threads)
 Helper function for _parallelRun which has threads wait until all threads have called this function.
 
void _parallelRun (std::uint32_t tid)
 Performs the parallel functionality of parallel run, to be called by all threads.
 

Private Attributes

std::uint32_t num_threads
 
std::uint32_t result_num
 
std::mutex * mtx_table
 
std::mutex mtx
 
std::condition_variable cv
 
std::uint32_t tcount
 
std::exception_ptr thd_exc_ptr
 
std::mutex exc_mtx
 
std::vector< std::thread > threads
 

Additional Inherited Members

- Protected Member Functions inherited from HashJoin
std::uint32_t hash1 (Record *rec, bool is_outer)
 Performs the first hash function on the inputted record.
 
void _createTempFiles ()
 Performs the set up phase of hash join, creating temporary hashed files to be used by hash join algorithm.
 
void _firstHash (bool is_outer)
 Performs the initial hashing of each relation for the first step of hash join.
 
void _secondHash ()
 Performs the second hashing of each relation for the second step of hash join.
 
RecordId _part1 (Record *record, bool is_outer, BlockHeapFileScanner *scanner)
 Performs the main looping functionality for first hash.
 
void cleanup ()
 Function that cleans up state and deletes all allocated memory. It will be called once by one thread when any function throws an error.
 
- Protected Member Functions inherited from Operation
void _initState (FileId file_id, std::vector< FieldId > fields, fileState *state)
 Performs the file and temporary record setup for relational operators.
 
void _delState (fileState *file_state)
 Deletes objects created in relop structs.
 
- Protected Attributes inherited from HashJoin
BufferManagerbuf_mgr
 
FileManagerfile_mgr
 
std::vector< HeapFile * > outer_partitions
 
std::vector< HeapFile * > inner_partitions
 
std::uint32_t num_buckets
 
std::vector< std::pair< HeapPage *, PageId > > hash_table
 
std::string temp_path
 Holds the path to which the temp files should be saved. "/local/" is recommended for performance.
 
std::uint32_t result_num
 
- Protected Attributes inherited from Join
fileState outer
 
fileState inner
 
std::vector< FieldIdouter_fields
 
std::vector< FieldIdinner_fields
 
- Protected Attributes inherited from Operation
fileState result_state
 
Catalogcatalog
 

Detailed Description

SwatDB ParallelHashJoin Class. The interface to relational operators layer of the system: manages relational operators.

Constructor & Destructor Documentation

◆ ParallelHashJoin()

ParallelHashJoin::ParallelHashJoin ( FileId  outer_id,
FileId  inner_id,
FileId  result_id,
std::vector< FieldId outer_fields,
std::vector< FieldId inner_fields,
std::uint32_t  num_buckets,
std::uint32_t  num_threads,
std::string  temp_path,
Catalog catalog,
BufferManager buf_mgr,
FileManager file_mgr 
)

Constructor for the join operater using a hash join algorithm. Sets up the state for a single join operation using the specified state.

Precondition
The types of the fields being joined on have the same type.
Postcondition
The set up for the hash join is complete.
Parameters
outer_id.FileId of the outer relation file.
inner_id.FileId of the inner relation file.
result_id.FileId of the result relation file.
outer_field_ids.Vector of FieldIds corresponding to the join field in outer_rel
inner_field_ids.Vector of FieldIds corresponding to the join field in inner_rel
num_buckets.Amount of partitions for the hashing
num_threads.Amount of threads for the computation
catalog.Pointer to the catalog of the Swatdb object
buf_mgr.Pointer to the buffer manager of the dbms
temp_path.filename for temporary files

Member Function Documentation

◆ _barrier()

void ParallelHashJoin::_barrier ( std::uint32_t  n_threads)
private

Helper function for _parallelRun which has threads wait until all threads have called this function.

Postcondition
All threads have reached this function.

◆ _firstHash()

void ParallelHashJoin::_firstHash ( std::uint32_t  tid,
bool  is_outer 
)
private

Performs the initial hashing of each relation for the first step of hash join.

Parameters
tid.Thread id of the thread performing this specific round of \ computation.
is_outer.Boolean indicator of which relation is being hashed

◆ _parallelPart1()

RecordId ParallelHashJoin::_parallelPart1 ( Record record,
bool  is_outer,
BlockHeapFileScanner scanner 
)
private

Performs the main looping functionality for first hash.

Parameters
record.Pointer to the record object which holds the current rec
is_outer.Boolean indicator of current relation
scanner.Scanner object which is iterating over the relation
Returns
rid. Record id

◆ _parallelRun()

void ParallelHashJoin::_parallelRun ( std::uint32_t  tid)
private

Performs the parallel functionality of parallel run, to be called by all threads.

Parameters
tid.ThreadId of current thread
Postcondition
Parallel functionality of parallel run is complete and threads can be joined.

◆ runOperation()

void ParallelHashJoin::runOperation ( )
virtual

Performs the join operation using the hash join alogrithm.

Precondition
Join state correctly initialized
Postcondition
HeapFile contains all matching results. The object should be destroyed, state is invalid.

Reimplemented from HashJoin.

Member Data Documentation

◆ cv

std::condition_variable ParallelHashJoin::cv
private

Hashjoin condition variable to implement barrier synchronization

◆ exc_mtx

std::mutex ParallelHashJoin::exc_mtx
private

Exception mutex. Helps keep track of whether the exception ptr has been set for the threaded portion

◆ mtx

std::mutex ParallelHashJoin::mtx
private

Hashjoin mutex

◆ mtx_table

std::mutex* ParallelHashJoin::mtx_table
private

Stores an array of mutexs which corresponds to the hash table

◆ num_threads

std::uint32_t ParallelHashJoin::num_threads
private

The number of threads the relations are being hashed into.

◆ tcount

std::uint32_t ParallelHashJoin::tcount
private

counter to implement barrier synchronziation with cv

◆ thd_exc_ptr

std::exception_ptr ParallelHashJoin::thd_exc_ptr
private

Exception pointer to catch exceptions in threads After all threads are joined, check this value to rethrow exception

◆ threads

std::vector<std::thread> ParallelHashJoin::threads
private

Stores an array of threads which correspond to hash table


The documentation for this class was generated from the following file: