Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions proto/substrait/algebra.proto
Original file line number Diff line number Diff line change
Expand Up @@ -319,6 +319,21 @@ message FetchRel {
// Recommended type for count is int64.
Expression count_expr = 6;
}

// List of fields to sort by to retrieve the number of records specified by `count_mode`.
// At least one field MUST be specified when `with_ties` is true.
//
// Note: the output records are in the order of `sorts` if at least one sort field is specified.
// Otherwise, the input orderedness is preserved.
repeated SortField sorts = 7;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During sync up, @jacques-n asked what is the minimal thing need to be done to support the scenario. The minimal is to specify list of column comparisons the same way as the backing Sort. In the Fetch requirement is weaker than Sort because we do not care the order across columns as well as directions but we do need to align how we calculate the equality of a column.

This (comparison, equality) happen two places in Substrait today: SortField and ComparisonJoinKey but they are separate.

Perhaps, we can do is

message EqualField {
  Expression.FieldReference field = 1;
  ComparisonJoinKey.ComparisonType comparison = 2;
}

message FetchRel {
  ..
  repeated EqualField tie_breakers = 7;
  bool with_ties = 8; // optional. maybe implicit based on non-empty `tie_breakers` field.
  ..
}

Then to implement with_ties, the producer must put the appropriate Sort below and setting up the tie breaking fields consistent to the Sort. I see a value in this approach (keeping a naive implementation of fetch simple) but not quite sure whether this is simpler than the proposed change, especially in the sense that a producer can concisely expression what needs to be done, and consumer has an option to implement or execute how.


// Whether to yields 'tie' records of the last record specified by `count_mode`.
// Tie is determined by the `sorts` order (i.e., `sorts` field values are the same if records are 'tie').
// If this is true, `sorts` MUST specify at least one sort field or else the plan is invalid.
// If this is true and there are more input records specified by the limit `N`, defined by `count_mode`,
// then more than `N` records may be returned if there are ties of the `N`th record.
bool with_ties = 8;

substrait.extensions.AdvancedExtension advanced_extension = 10;
}

Expand Down
6 changes: 4 additions & 2 deletions site/docs/relations/logical_relations.md
Original file line number Diff line number Diff line change
Expand Up @@ -349,16 +349,18 @@ The fetch operation eliminates records outside a desired window. Typically corre
| -------------------- | --------------------------------------- |
| Inputs | 1 |
| Outputs | 1 |
| Property Maintenance | Maintains distribution and orderedness. |
| Property Maintenance | Maintains input distribution. Orderedness is `Sort Fields` if specified. Otherwise, input orderedness. |
| Direct Output Order | Unchanged from input. |

### Fetch Properties

| Property | Description | Required |
| ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -------------------------- |
| Input | A relational input, typically with a desired orderedness property. | Required |
| Input | A relational input. | Required |
| Offset Expression | An expression which evaluates to a non-negative integer or null (recommended type is `i64`). Declares the offset for retrieval of records. An expression evaluating to null is treated as 0. | Optional, defaults to a 0 literal. |
| Count Expression | An expression which evaluates to a non-negative integer or null (recommended type is `i64`). Declares the number of records that should be returned. An expression evaluating to null indicates that all records should be returned. | Optional, defaults to a null literal. |
| Sort Fields | List of one or more sort fields to define a desired orderedness. If 'With Ties' is true then there must be at least one sort field or else the plan is invalid. | Optional, the default will preserve the input orderedness |
| With Ties | Whether to return "tied rows" which are rows equal to last row that would be returned by 'Count Expression'. If false, at most 'Count Expression' records are returned. If true, it may yield more records than 'Count Expression'. | Optional, defaults to false. |

=== "FetchRel Message"

Expand Down
Loading