Skip to content

Conversation

yongchul
Copy link
Contributor

feat: support with ties in FetchRel

WITH TIES in SQL standard and FetchRel is missing the support.

Industry Adoption

  • Tranasct SQL variants implement WITH TIES in non-standard TOP.
  • Oracle, PostgreSQL, MariaDB, implement WITH TIES in ANSI SQL OFFSET ... FETCH ... clause.

All implementations require ORDER BY clause as it does not make sense to evaluate ties without specific order.

jcamachor
jcamachor previously approved these changes Mar 21, 2025
Copy link
Contributor

@jcamachor jcamachor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once there's consensus on the approach among reviewers, please also include the documentation update in site/docs/relations/logical_relations.md as part of this PR.

@EpsilonPrime
Copy link
Member

I suspect this will be discussed in the Substrait Community meeting this week. There has been some discussion around adding first class support for ordering. If we had ordering ties would merely be a boolean option in FetchRel. Concepts for first class ordering include required ordering (such as after a sort) and optional ordering (perhaps set via a hint in a ReadRel or after some other operation).

Copy link
Member

@vbarua vbarua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on your point about

we could simply have a boolean field with_ties then requires immediate SortRel below FetchRel. However, such conditional inter-rel dependency is confusing and the consumers requiring extra validation or inference to deduce sort order as the operation is not self-contained.

I think it makes sense to include the ties in the FetchRel as you did to avoid inter-rel dependency.

I don't have a strong opinion on this but now that I enumerated the names, with_ties_sorts seems like a good one to me.

I think your idea for with_ties_sorts, along with an explanatory comments, should suffice. Adding a wrapping message feels a little redundant.

@yongchul
Copy link
Contributor Author

Hi @vbarua,

Based on your point about

we could simply have a boolean field with_ties then requires immediate SortRel below FetchRel. However, such conditional inter-rel dependency is confusing and the consumers requiring extra validation or inference to deduce sort order as the operation is not self-contained.

I think it makes sense to include the ties in the FetchRel as you did to avoid inter-rel dependency.

Glad to hear that you agree! :)

I don't have a strong opinion on this but now that I enumerated the names, with_ties_sorts seems like a good one to me.

I think your idea for with_ties_sorts, along with an explanatory comments, should suffice. Adding a wrapping message feels a little redundant.

Thank you! While I was revising the PR including the documentation, I realized that it is better to keep the desired sort separate from with ties (i.e., sorts and with_ties separate). If we do so, we can solely capture the semantic of (ORDER BY)? OFFSET? FETCH (WITH TIES)? in a single logical operator.

Additional benefit is that we can drop typically with desired orderedness from input.

The change is still completely backward compatible -- everything still holds if nothing is set but producers have freedom to express the intent without explicit sort.

PTAL!

Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor grammatical suggestions but this seems good otherwise.

What should happen if there are no sort fields and with_ties is true?

  • The plan returns an error
  • The plan runs as if with_ties is false

I see that you state "At least one filed MUST be specified when with_ties is true" so I think the answer is "The plan returns an error" but I want to confirm.

If that is true, can we add something like "If with_ties is true then there must be at least one sort field or else the plan is invalid" to the .md file?

Copy link
Contributor Author

@yongchul yongchul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed comments.

@vbarua vbarua requested review from vbarua, westonpace and jcamachor May 21, 2025 17:57
Copy link
Member

@vbarua vbarua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

It makes sense, in my opinion, to include the sort fields on the FetchRel because the WITH TIES concept is heavily interlinked with the sort. This does potentially make the SortRel redundant, because every SortRel can be expressed as a FetchRel, but I think that's okay.

An alternative approach to this would be to encode this as some combination of

FetchRel[...]
   SortRel[...]

but that depends on systems understanding physical properties, specifically orderedness, and retaining/propagating them correctly to avoid losing the sort before the fetch during translation/optimization/execution. I don't think we're at a point were we can rely on this, and having the sort on the FetchRel makes it 100% unambiguous what needs to happen.

This does introduce a little bit of redundancy, because in theory all SortRels can be expressed as FetchRels now if we wanted, but I think that's okay.

//
// Note: the output records are in the order of `sorts` if at least one sort field is specified.
// Otherwise, the input orderedness is preserved.
repeated SortField sorts = 7;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During sync up, @jacques-n asked what is the minimal thing need to be done to support the scenario. The minimal is to specify list of column comparisons the same way as the backing Sort. In the Fetch requirement is weaker than Sort because we do not care the order across columns as well as directions but we do need to align how we calculate the equality of a column.

This (comparison, equality) happen two places in Substrait today: SortField and ComparisonJoinKey but they are separate.

Perhaps, we can do is

message EqualField {
  Expression.FieldReference field = 1;
  ComparisonJoinKey.ComparisonType comparison = 2;
}

message FetchRel {
  ..
  repeated EqualField tie_breakers = 7;
  bool with_ties = 8; // optional. maybe implicit based on non-empty `tie_breakers` field.
  ..
}

Then to implement with_ties, the producer must put the appropriate Sort below and setting up the tie breaking fields consistent to the Sort. I see a value in this approach (keeping a naive implementation of fetch simple) but not quite sure whether this is simpler than the proposed change, especially in the sense that a producer can concisely expression what needs to be done, and consumer has an option to implement or execute how.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants