Add ImpactRangeQuery for Impact-Based Document Range Prioritization #15023

atris · 2025-08-01T09:41:15Z

Implements a query wrapper that prioritizes document ranges based on their
scoring potential using Lucene's impact information. The implementation
divides the document space into ranges and evaluates each range's maximum
possible score using ImpactsEnum data, then processes ranges in descending
order of scoring potential.

Key features:

Supports range sizes and min and max document bounds.
Uses ImpactsEnum when ScoreMode.TOP_SCORES indicates impacts are available
Falls back to standard scoring when impacts are unavailable
Supports early termination based on competitive scoring thresholds

This optimization is particularly beneficial for indices where document
clustering (BP-style ordering) groups similar documents with adjacent IDs,
allowing efficient skipping of low-scoring document ranges.

Signed-off-by: Atri Sharma <[email protected]>

github-actions · 2025-08-01T09:42:07Z

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

Signed-off-by: Atri Sharma <[email protected]>

atris · 2025-08-01T09:52:30Z

@jpountz Please review

github-actions · 2025-08-01T09:53:07Z

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

jpountz · 2025-08-01T13:36:37Z

Very cool. Have you been able to measure any speedup with this approach?

FYI, this breaks some API contracts, e.g. a BulkScorer is expected to score ranges of doc IDs in doc ID order. You would need to create a new BulkScorer every time that you need to go back.

Somewhat related, I think that implementing this via a helper function that evaluates a query against an entire index would be better than a query wrapper, you'd be less subjet to fewer expectations. E.g. IndexSearcher sometimes splits the doc ID space into ranges of increasing size via TimeLimitingBulkScorer. This would likely not play well with this change which really expects to score the whole index at once. This would also allow you to order ranges by priority across all segments and not only within a single segment.

Initial commit

fc776bf

Signed-off-by: Atri Sharma <[email protected]>

github-actions bot added the module:misc label Aug 1, 2025

Update lint

90ae22d

Signed-off-by: Atri Sharma <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add ImpactRangeQuery for Impact-Based Document Range Prioritization #15023

Add ImpactRangeQuery for Impact-Based Document Range Prioritization #15023

Uh oh!

atris commented Aug 1, 2025

Uh oh!

github-actions bot commented Aug 1, 2025

Uh oh!

atris commented Aug 1, 2025

Uh oh!

github-actions bot commented Aug 1, 2025

Uh oh!

jpountz commented Aug 1, 2025

Uh oh!

Uh oh!

Add ImpactRangeQuery for Impact-Based Document Range Prioritization #15023

Are you sure you want to change the base?

Add ImpactRangeQuery for Impact-Based Document Range Prioritization #15023

Uh oh!

Conversation

atris commented Aug 1, 2025

Uh oh!

github-actions bot commented Aug 1, 2025

Uh oh!

atris commented Aug 1, 2025

Uh oh!

github-actions bot commented Aug 1, 2025

Uh oh!

jpountz commented Aug 1, 2025

Uh oh!

Uh oh!