Skip to content

Conversation

rkrishn7
Copy link
Contributor

Which issue does this PR close?

Closes #17511

Rationale for this change

When building equal conditions in a data source node, we want to ignore any stale references to columns that may have been swapped out (e.g. from try_swapping_with_projection).

The current code reassigns predicate columns from the filter to refer to the corresponding ones in the updated schema. However, it only ignores non-projected columns. reassign_predicate_columns builds an invalid column expression (with index usize::MAX) if the column is not projected in the current schema. We don't want to refer to this in the equal conditions we build.

What changes are included in this PR?

Ignores any binary expressions that reference non-existent columns in the current schema (e.g. due to unnecessary projections being removed).

Are these changes tested?

Are there any user-facing changes?

N/A

@github-actions github-actions bot added the datasource Changes to the datasource crate label Sep 12, 2025
@rkrishn7
Copy link
Contributor Author

The fact that reassign_predicate_columns can return invalid column expressions seems like a big footgun. I didn't change within this PR due to usage elsewhere but we might want to think about refactoring it

@rkrishn7
Copy link
Contributor Author

cc @adriangb I think this was inadvertently introduced in #17323

@rkrishn7
Copy link
Contributor Author

Re-ran TPCH benchmark with the same configuration as the referenced issue and all the tests pass now.

Will add a regression test here in a bit!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasource Changes to the datasource crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TPCH(sf=1) Q10 and Q19 failures when running with datafusion.execution.parquet.pushdown_filters
1 participant