-
Notifications
You must be signed in to change notification settings - Fork 180
docs: clarify advanced extensions #833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
624b170
to
e0cd598
Compare
site/docs/extensions/index.md
Outdated
|
||
Advanced extensions come in several main forms, discussed below: | ||
|
||
1. Embedded extensions: These use the `AdvancedExtension` message for adding custom data to existing Substrait elements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure or don't remember whether element
is commonly used across the doc. Now that I check again, advanced extensions are generously sprinkled over the proto file... :) That said, it would be clearer to mention the scope of the embedded extension like adding custom data to enclosing Substrait message (or element)
.
The above mentioned protobuf and call out Any
. So I'm wondering whether we can stick to Substraite message
rather than using element
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, message
makes more sense than element
for consistency!
site/docs/extensions/index.md
Outdated
| **`RelCommon`** | Extensions for any relational operator | | ||
| **Relations** (e.g. `ProjectRel`) | Extensions for a specific relation type | | ||
| **Hints** | Extensions within optimization hints | | ||
| **`ReadRel.NamedTable`** | Add custom metadata to named table references | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove Add
to be consistent with the rest of Usages? (i.e., either all starts with verbs or nouns)
site/docs/extensions/index.md
Outdated
| **`RelCommon`** | Extensions for any relational operator | | ||
| **Relations** (e.g. `ProjectRel`) | Extensions for a specific relation type | | ||
| **Hints** | Extensions within optimization hints | | ||
| **`ReadRel.NamedTable`** | Add custom metadata to named table references | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like LocalFiles
also has advanced extension.
site/docs/extensions/index.md
Outdated
| **Hints** | Extensions within optimization hints | | ||
| **`ReadRel.NamedTable`** | Add custom metadata to named table references | | ||
| **`WriteRel.NamedObjectWrite`** | Add custom metadata to write targets | | ||
| **`DdlRel.NamedObjectWrite`** | Add custom metadata to DDL targets | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noticed some of the advanced extension tags are not consistent in these Ddl related messages. Just note to self.
|
||
- Provide hints to improve performance but don't change the meaning of operations | ||
- Can be safely ignored by consumers that don't understand them | ||
- Examples: memory usage hints, preferred algorithms, caching strategies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: example goes last?
site/docs/extensions/index.md
Outdated
|
||
- Modify the semantic behavior of operations | ||
- Must be understood by consumers or the plan cannot be executed correctly | ||
- Examples: custom aggregation logic, specialized join conditions, new relation types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
relational operators? or just custom operators?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I can clarify a bit here!
|
||
| Extension Type | Description | | ||
| ------------------------------------ | ------------------------------------------------------------ | | ||
| Relation Modification (semantic) | Extensions to an existing relation that will alter the semantics of that relation. These kinds of extensions require that any plan consumer understand the extension to be able to manipulate or execute that operator. Ignoring these extensions will result in an incorrect interpretation of the plan. An example extension might be creating a customized version of Aggregate that can optionally apply a filter before aggregating the data. <br /><br />Note: Semantic-changing extensions shouldn't change the core characteristics of the underlying relation. For example, they should *not* change the default direct output field ordering, change the number of fields output or change the behavior of physical property characteristics. If one needs to change one of these behaviors, one should define a new relation as described below. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These old content looks good to me (at least the guideline parts). Perhaps, we could reorganize them in each section and call out as notes.
@yongchul - thanks for the review! Good comments - I've updated to address them. Can you give them a look? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for taking suggestions! Looks good to me :)
The third form of advanced extensions allows you to define extension data sources and destinations: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section can also have the note
like above sections that checking with the community before going down the extension path. If the scenario turns out to be common enough, it may go directly to the specification.
7c70ca1
to
e7380b3
Compare
Thanks again! Incorporated. I also realized I had the |
The Advanced Extensions section of the docs seemed somewhat mismatched from what was in the protobufs, so I tried to clarify it, based on what was in the protobuf definitions.
Thoughts welcome! Also could use some sanity-checking on the details!
Naming
I also included custom relations (
ExtensionLeafRel
, …) and custom reads and writes (ExtensionTable
, …) all under the heading "Advanced Extensions", even though there is anAdvancedExtension
message that doesn't cover those. The name "Advanced Extension" seems a bit ambiguous here - does it cover all of the above, or only the enhancements and optimizations in theAdvancedExtension
message? - but it seems to be what we have, so I went with it. Thoughts?Guidance
This is a bit low on guidance on when to use which - e.g.
ExtensionLeafRel
,ExtensionTable
, orReadRel.advanced_extension.enhancement
could potentially all be used for an unusual kind of read. I'm not sure what the guidance here should be, or if there should be any, so I left it out; I'm not sure any guidance here would be that helpful.