-
Notifications
You must be signed in to change notification settings - Fork 290
Support gds var length label pruning #4868
base: master
Are you sure you want to change the base?
Conversation
| auto satisfyForwardPruning = forwardPruningFunc(srcTableID, dstTableID); | ||
| if (directionType == RelDirectionType::BOTH) { | ||
| if (satisfyForwardPruning || | ||
| (dstTableIDSet.contains(srcTableID) && srcTableIDSet.contains(dstTableID))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have a backwardPruningFunc in order to be consistent.
| if (rel.isRecursive()) { | ||
| auto nodeTableIDs = getNodeTableIDs(); | ||
| // there is no label on both sides | ||
| if (rel.getUpperBound() == 0 || (srcTableIDSet.size() == dstTableIDSet.size() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this check srcTableIDSet.size() == dstTableIDSet.size() && dstTableIDSet.size() == nodeTableIDs.size() necessary? Seems your logic would still work in this case. So we would we return early?
| if (currentLength >= upperBound) { | ||
| return; | ||
| } | ||
| if (graph.contains(curTableID)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be KU_ASSERT. Is there a case where graph not contains curTableID?
| recursiveInfo = std::move(recursiveInfo_); | ||
| } | ||
| const RecursiveInfo* getRecursiveInfo() const { return recursiveInfo.get(); } | ||
| RecursiveInfo* getRecursiveInfo() { return recursiveInfo.get(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename -> getRecursiveInfoUnsafe
src/function/gds/rec_joins.cpp
Outdated
| } | ||
|
|
||
| std::vector<table_id_set_t> RJBindData::getStepActiveRelTableIDs() const { | ||
| if (flipPath) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's good time to rename flipPath to extendLeftToRight. Otherwise reading this part is a bit confusing.
| } | ||
| } | ||
| rel.setEntries(prunedEntries); | ||
| return stepActiveTableIDs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me try to summarize the logic here and pls correct me if I'm wrong.
Consider topology Person - knows -> Person and Person - livesIn -> City and query
MATCH (a:Person) - [*2..2] -> (b:City)
WLOG, we assume going from left to right.
The dfs part will first generate all size 2 path
<<knows>, <knows>>
<<knows>, <livesIn>>
and because <<knows>, <knows>> dst is not b:City it will be pruned.
So eventually we only compute <<knows>, <livesIn>>.
If my understanding is correct, can u also give a benchmark number in the PR description to make sure we actually get performance benefit.
|
|
||
| // There may be multiple rel type between src and dst | ||
| using PATH = std::vector<table_id_vector_t>; | ||
| std::vector<std::vector<PATH>> paths(upperBound + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to comment
std::vector<PATH> are paths of the same length.
And the outer-most std::vector are paths of different length.
In fact, it might be easier to read if just use std::vector<PATH> here.
| const table_id_set_t srcTableIDSet, const table_id_set_t dstTableIDSet, size_t lowerBound, | ||
| size_t upperBound, RelDirectionType relDirectionType) const { | ||
| // src-->[dst,[rels]] | ||
| std::unordered_map<table_id_t, std::unordered_map<table_id_t, table_id_vector_t>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up to use but I would use std::unrdered_map<table_id_t, STRUCT>
and STRUCT contains
table_id_t dstTableID
std::vector<tabel_id_t> relTableIDs
because we always access by srcTableID instead of src&dstTableID
| // todo we need reset rel entries? | ||
| auto temp = mergeTableIDs(stepFromLeftTableIDs, stepFromRightTableIDs); | ||
| std::vector<TableCatalogEntry*> newRelEntries{temp.begin(), temp.end()}; | ||
| if (!isSameTableCatalogEntryVector(newRelEntries, rel.getEntries())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we do l284-l312 as a separate PR because
- it can be viewed as another optimization after pruning recursive rel labels; and
- we will also need to get a benchmark number to make sure there are performance improvements for node pruning.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #4868 +/- ##
==========================================
+ Coverage 85.80% 85.83% +0.02%
==========================================
Files 1397 1397
Lines 60504 60614 +110
Branches 7451 7473 +22
==========================================
+ Hits 51917 52026 +109
- Misses 8406 8407 +1
Partials 181 181 ☔ View full report in Codecov by Sentry. |
|
I found gds join performance worse than before, there are so many memory minor fault. This opt does not provide any performance improvement in current code. Under recursive join ,this opt can get 900ms -->300ms in sf100,"match (a:person {id:24189255912498})-[f*2]-(b) return count(1)" (Previous test results); I will provide performance benchmark, when i resolve the current performance problem. |
c5130de to
78f3840
Compare
78f3840 to
2ae6a39
Compare
Description
Fixes #4396
Contributor agreement