-
Notifications
You must be signed in to change notification settings - Fork 180
feat: support with ties in FetchRel #797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once there's consensus on the approach among reviewers, please also include the documentation update in site/docs/relations/logical_relations.md as part of this PR.
I suspect this will be discussed in the Substrait Community meeting this week. There has been some discussion around adding first class support for ordering. If we had ordering ties would merely be a boolean option in FetchRel. Concepts for first class ordering include required ordering (such as after a sort) and optional ordering (perhaps set via a hint in a ReadRel or after some other operation). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on your point about
we could simply have a boolean field with_ties then requires immediate SortRel below FetchRel. However, such conditional inter-rel dependency is confusing and the consumers requiring extra validation or inference to deduce sort order as the operation is not self-contained.
I think it makes sense to include the ties in the FetchRel as you did to avoid inter-rel dependency.
I don't have a strong opinion on this but now that I enumerated the names, with_ties_sorts seems like a good one to me.
I think your idea for with_ties_sorts
, along with an explanatory comments, should suffice. Adding a wrapping message feels a little redundant.
…(OFFSET)? FETCH (WITH TIES)? semantic
Hi @vbarua,
Glad to hear that you agree! :)
Thank you! While I was revising the PR including the documentation, I realized that it is better to keep the desired sort separate from Additional benefit is that we can drop The change is still completely backward compatible -- everything still holds if nothing is set but producers have freedom to express the intent without explicit sort. PTAL! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor grammatical suggestions but this seems good otherwise.
What should happen if there are no sort fields and with_ties
is true?
- The plan returns an error
- The plan runs as if
with_ties
is false
I see that you state "At least one filed MUST be specified when with_ties
is true" so I think the answer is "The plan returns an error" but I want to confirm.
If that is true, can we add something like "If with_ties is true then there must be at least one sort field or else the plan is invalid" to the .md
file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me.
It makes sense, in my opinion, to include the sort fields on the FetchRel because the WITH TIES concept is heavily interlinked with the sort. This does potentially make the SortRel redundant, because every SortRel can be expressed as a FetchRel, but I think that's okay.
An alternative approach to this would be to encode this as some combination of
FetchRel[...]
SortRel[...]
but that depends on systems understanding physical properties, specifically orderedness, and retaining/propagating them correctly to avoid losing the sort before the fetch during translation/optimization/execution. I don't think we're at a point were we can rely on this, and having the sort on the FetchRel makes it 100% unambiguous what needs to happen.
This does introduce a little bit of redundancy, because in theory all SortRels can be expressed as FetchRels now if we wanted, but I think that's okay.
// | ||
// Note: the output records are in the order of `sorts` if at least one sort field is specified. | ||
// Otherwise, the input orderedness is preserved. | ||
repeated SortField sorts = 7; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During sync up, @jacques-n asked what is the minimal thing need to be done to support the scenario. The minimal is to specify list of column comparisons the same way as the backing Sort
. In the Fetch
requirement is weaker than Sort
because we do not care the order across columns as well as directions but we do need to align how we calculate the equality
of a column.
This (comparison, equality) happen two places in Substrait today: SortField
and ComparisonJoinKey
but they are separate.
Perhaps, we can do is
message EqualField {
Expression.FieldReference field = 1;
ComparisonJoinKey.ComparisonType comparison = 2;
}
message FetchRel {
..
repeated EqualField tie_breakers = 7;
bool with_ties = 8; // optional. maybe implicit based on non-empty `tie_breakers` field.
..
}
Then to implement with_ties
, the producer must put the appropriate Sort
below and setting up the tie breaking fields consistent to the Sort
. I see a value in this approach (keeping a naive implementation of fetch
simple) but not quite sure whether this is simpler than the proposed change, especially in the sense that a producer can concisely expression what needs to be done, and consumer has an option to implement or execute how.
feat: support with ties in FetchRel
WITH TIES
in SQL standard andFetchRel
is missing the support.Industry Adoption
WITH TIES
in non-standardTOP
.WITH TIES
in ANSI SQLOFFSET ... FETCH ...
clause.All implementations require
ORDER BY
clause as it does not make sense to evaluate ties without specific order.