The top level class is Compiler, which contains a main method that takes a single parameter of the interpreter config file.
The selection pushing is implemented in src/main/common/ComparisonEvaluator, which utilizes
UnionFind class to build a set of UnionFindElement, each UnionFindElement is a cluster of columns
that share the same lower bound and upper bound value.
The choice of logical selection operator in our project is the same as deciding whether to use a indexScan or a regular scan. Our project follows the instruction is calculating the estimate cost using the formula provided.
Cost of using index:
If p is the number of pages in the relation, t the number of tuples, r the reduction factor and l the
number of leaves in the index, the cost for a clustered index is 3 + p ∗ r while for an unclustered index it is
3 + l ∗ r + t ∗ r. The cost
Cost of not using index(regular scan):
Page number of stored data. Calculating of the number of page will be Ceil(numAttribute * numTuple * 4 / pageSize)
These code of choosing index can be found in /src/main/builder/LogicalPlanBuilder's chooseScanIndex method.
The choice of join order uses a dynamic programming approach described in the instruction.
The code can be found in /src/main/builder/JoinSequenceBuilder
-
We build a Object class to keep in track the state of current tree
-
We build a 2D table for Dynamic programming
- Base case - Leaf - The first row in the table
- Inductive case - Add root to the previous row, and update cost based on the previous row
- Result - find best option in the last row of the table by soring the cost
- Keep the same ouput format, we convert the list to a queue so that we don't need to modify the old join logic
-
V-Value
- We utilized union find to find all possible column combination
- And find the best(lowest) cost among all combinations
- Lower and upper bound are determined by the union find bounds and the bounds from the table stats
The choice of join operator is also following the example given in instruction:
"SMJ runs much faster than BNLJ, so I implement all joins as SMJ where possible. However, SMJ does not apply to joins that have other-than-equality comparisons or to pure cross-products, so those are implemented using BNLJ"
The code can be found in /src/main/builder/PhysicalPlanBuilder in public void visit(JoinOperatorNode node)
For previous phases, check README_Phase2.md.
This phase is focusing on index related functionalities.
Each node contains 4096 bytes flat, no need to worry about space exceeding.
- Triggered by
build()according to configuration. - If clustered index, it will sort the data and replace original relation file.
- Triggered by
IndexScanOperatoraccording to configuration. - Traverse from 'head' to leaf by default to prepare for tuple fetch.
- Two functions to handle each leaf / index accordinly.
- If clustered index, it will reset tuple reader at first time, then scan sequentially.
- If unclustered index, every fetch requires a set and get for the tuple reader
- the address of the root, stored at offset 0 on the header page
- the number of leaves in the tree, at offset 4
- the order of the tree, at offset 8.
- the integer 1 as a flag to indicate this is an index node (rather than a leaf node)
- the number of keys in the node
- the actual keys in the node, in order
- the addresses of all the children of the node, in order
- the integer 0 as a flag to indicate this is a leaf node
- the number of data entries in the node
- the serialized representation of each data entry in the node, in order.
- value of data entry
- number of rids
- [p, t] for rid.....
RelationName AttributeName isClusteredFlag TreeOrder
Sailors A 0 10
- first: input directory
- second: output directory
- third: temporary sort directory
- fourth: should build indexes (0 = no, 1 = yes)
- fifth: should evaluate the SQL queries (0 = no, 1 = yes).
-
first - Join algorithms:
0: TNLJ1 x: BNLJ with buffer size as x page2: SMJ
-
second - sort algorithms:
0: In-Memory Sort1 x: External Sort with buffer size as x page
-
third - use index:
0: use the full-scan implementation1: use indexes