-
Notifications
You must be signed in to change notification settings - Fork 180
feat: add relations to ExtendedExpression #794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
What's a usecase were we would want to allow an ExtendedExpression message to reference a relation? My understanding is that ExtendedExpressions are primarily intended as a way to permit the encoding of expression only based messages. If there's an ExtendedExpression that references a full relation, I would prefer to push users towards using a standard Plan at that point. |
Moving my comment from #726 to here since it made more sense. I'm assuming the I'm also wondering:
|
My immediate use case might be a bit unusual. I'm using ExtendedExpression messages as necessary intermediary steps in substrait-python builders while incrementally building up plans. see an example. In order for some sort of a subquery expression to refer to a CTE, you would need to pass that information through, hence the need for a list of relations in ExtendedExpressions.
My understanding is that they encode computation (similar to plans) where the final form of the compute is an expression rather than a relation, so that they can be (re)used in different contexts. Take something like |
I'm not sure I fully follow the question tbh. The intention is to enable the use of CTEs in the context of
ReferenceRels by definition right now refer to a Rel defined in the |
I think I can follow that the
Naively, I think an Edit: I now see that on the substrait website |
@drin "resolver" functions are basically closures of type UnboundExtendedExpression that return ExtendedExpression objects. In this example they are meant to be used as indermediate step only, the actual goal is to set necessary |
Circling back to this as we talked about this a bit at the last Substrait Sync.
ExtendedExpressions were intended as sugar to that targets expression only evaluation. Once you start introducing CTEs into the mix, the guidance at the moment would be to use a full Plan relation instead. If we added relation handling to ExtendedExpressions, there wouldn't be much of a difference between a Plan and and ExtendedExpression really. In terms of your usecase
This is something that we would push back on. Specifically, making modifications to the serialization definitions to accommodate library internals. Most of the libraries end up implementing a native data layer, like the expressions and relations classes in substrait-java, and the interface and structs in substrait-go to make manipulations easier. |
I don't disagree actually 😄 although to be fair I never claimed that my immediate use case justified the change, it was simply what prompted me to open a PR. (it's also why I didn't mention it in the PR description)
This sounds very ambiguous to me, what does "expression-only" mean in this context? subqueries are expressions after all, aren't they? I just listened to the relevant part of the meeting recording and tbh it's still fuzzy to me where one would draw the line for the original intent of extended expressions. Are subqueries containing Rels okay? Or is it only subqueries containing ReferenceRels specifically that violate the intent? Whatever the answer it feels like an arbitrary line in the sand that's bound to be violated one way or another anyway. I still don't get what's the issue is with simply allowing all valid expressions.
I wholeheartedly disagree on this one. Of course I can't speak to the original intent, but the way I see it the primary distinction between a Plan and an ExtendedExpression is that the latter can hold the expressions w/o specifying the full context other than the base_schema.
|
Fair fair 😅
Thinking about it, I'm not super wedded the original intent. Use cases evolve, and I'm happy for Substrait to change with them.
I agree that's its ambiguous, and we don't have a good line for this The original intent may have been to not permit subqueries, but maybe that isn't a good restriction. I am a little strapped for time right now, so I can't fully digests your argument, but I am open changing my mind on this. |
This kind of makes ExtendedExpression feel like it is colliding with Plan. I wonder if there is a way instead to deprecate ExtendedExpression and introduce a way to just add Expression to Plan. One option would be PlanRel oneof adds Expression. Would love people's thoughts on that. |
I agree that it makes a lot of sense. I've previously tried to map out how it would look like. It's a bit more involved than simply adding an Expression to PlanRel, but definitely doable.
In short, I think we will end up with something like this:
There are probably better ways to structure a combined Plan (for example extracting out relations into a separate array and just leave a single oneof to refer to either |
Adds
relations
field toExtendedExpression
message. This is necessary because Expressions may contain ReferenceRels that currently can't point to anything. I chose to addrelations
with arepeated ExtendedExpressionRel
type rather than arepeated Rel
because I'm anticipating the merge of #726 which will require this message to also contain an anchor.