-
Notifications
You must be signed in to change notification settings - Fork 689
feat(streaming): support locality enforcement and locality backfill #23275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…input has an shuffle. It is used to ensure locality provider is in its own fragment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces locality enforcement and locality backfill to RisingWave's streaming engine, extending the existing index selection functionality to improve performance during large historical data backfilling in memory-limited scenarios.
Key changes include:
- Implementation of LocalityProvider operators that buffer data with locality column ordering during backfill
- Extension of dependency ordering to ensure proper sequencing between scan backfill and locality backfill operations
- Addition of session configuration to enable/disable locality backfill functionality
Reviewed Changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
src/stream/src/executor/locality_provider.rs | Core executor implementation for locality-aware backfilling with proper state management |
src/stream/src/from_proto/locality_provider.rs | Protocol buffer conversion and executor builder for LocalityProvider |
src/frontend/src/optimizer/plan_node/stream_locality_provider.rs | Stream plan node implementation with state and progress table catalog building |
src/frontend/src/optimizer/plan_node/logical_locality_provider.rs | Logical plan node with column pruning and predicate pushdown support |
src/meta/src/stream/stream_graph/fragment.rs | Fragment dependency analysis for LocalityProvider ordering |
src/meta/src/model/stream.rs | Backfill upstream type classification for LocalityProvider |
src/common/src/session_config/mod.rs | Session configuration parameter for enabling locality backfill |
proto/stream_plan.proto | Protocol buffer definition for LocalityProviderNode |
Comments suppressed due to low confidence (1)
src/stream/src/executor/locality_provider.rs:1
- The magic number 1024 should be defined as a named constant or made configurable to improve maintainability and allow tuning.
// Copyright 2025 RisingWave Labs
} | ||
} | ||
|
||
// TODO: truncate the state table after backfill. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This TODO comment indicates incomplete functionality. State table truncation after backfill should be implemented or tracked in a proper issue management system.
// TODO: truncate the state table after backfill. | |
// Truncate the state table after backfill to free resources. | |
state_table.truncate().await?; |
Copilot uses AI. Check for mistakes.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
This PR is ready to review.
|
Very cool, I'll take a look tomorrow! |
This pull request introduces a new "locality backfill" feature for streaming queries, which enables more efficient backfilling by grouping and buffering input data using specified locality columns. The changes span the SQL test suite, session configuration, planner, protobuf definitions, and catalog metadata to support this feature end-to-end. The most important changes are grouped below: Locality Backfill Feature Implementation
Configuration and Test Coverage
Catalog and Utility Updates
These changes collectively enable more granular and efficient backfill operations in streaming queries, controlled by a session-level configuration and fully integrated into the query planner and execution engine. |
I'm wondering if it's possible to split this PR into smaller ones to make the review process easier? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me! Reviewed all files except stream
.
kill_cluster | ||
} | ||
|
||
test_locality_backfill() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theoretically, if we push this into a larger scope or even enable this by default, then
- we can run all e2e tests under this mode
- all existing planner tests need to be updated
May I know your plan for this?
pub fn fields_pretty<'a>(&self) -> Vec<(&'a str, pretty_xmlish::Pretty<'a>)> { | ||
let locality_columns_str = format!("{:?}", self.locality_columns); | ||
vec![("locality_columns", locality_columns_str.into())] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we show the field name?
Some( | ||
LogicalLocalityProvider::new( | ||
self.clone_with_left_right(self.left(), self.right()).into(), | ||
columns.to_owned(), | ||
) | ||
.into(), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realized that we generate a LocalityProvider
only until we encounter an upstream with different locality. If I'm understanding correctly, we will generate a plan like
Join -> LocalityProvider -> Filter -> Project -> Agg
instead of
Join -> Filter -> Project -> LocalityProvider -> Agg
Does it mean that we need to buffer unnecessary data that could have been filtered out by those stateless executors?
if let Some(better_plan) = self.try_better_locality_inner(columns) { | ||
Some(better_plan) | ||
} else if self.ctx().session_ctx().config().enable_locality_backfill() { | ||
Some(LogicalLocalityProvider::new(self.clone().into(), columns.to_owned()).into()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern seems to appear multiple times. Is there any way to extract it?
// If the input is hash-distributed, we make it a UpstreamHashShard distribution | ||
// just like a normal table scan. It is used to ensure locality provider is in its own fragment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean that this is actually a workaround for the backfill order control to work correctly? I guess we can keep the HashShard
and omit one shuffle if it's solely for correctness.
catalog_builder.add_order_column(*locality_col_idx, OrderType::ascending()); | ||
} | ||
// add streaming key of the input as the rest of the primary key | ||
if let Some(stream_key) = input.stream_key() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if let Some(stream_key) = input.stream_key() { | |
input.expect_stream_key() |
|
||
// Set locality columns as primary key. | ||
for locality_col_idx in self.locality_columns() { | ||
catalog_builder.add_order_column(*locality_col_idx, OrderType::ascending()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realize that ideally we should also have Order
as a requirement for "better locality".
/// Schema: | vnode | pk(locality columns + input stream keys) | `backfill_finished` | `row_count` | | ||
/// Key: | vnode | pk(locality columns + input stream keys) | | ||
fn build_progress_catalog(&self, state: &mut BuildFragmentGraphState) -> TableCatalog { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we unify this with other backfill nodes?
} | ||
} | ||
|
||
// 2. Add dependencies: all backfill fragments should run before LocalityProvider fragments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary, or just for the sake of simplicity? For the example below, it seems that we can backfill Locality
and Fact
simultaneously.
Dim -> Locality
-> Join
Fact
BackfillUpstreamType::MView => mv_count += 1, | ||
BackfillUpstreamType::Source => source_count += 1, | ||
BackfillUpstreamType::Values => (), | ||
BackfillUpstreamType::LocalityProvider => mv_count += 1, /* Count LocalityProvider as an MView for progress */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if there's (or we should add) a way to inspect the progress of all backfill (and locality provider) nodes separately? This seems to benefit observability of backfill order control a lot, especially we will now "generate" or "manipulate" the backfill order to also handle the locality provider nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We currently have fragment level backfill observability, in the form of rw_fragment_backfill_progress
. Because LocalityProvider
is also dedicated to a single fragment, we can reuse this mechanism to track the progress, provided LocalityProvider
also has a progress state table. We just need to update the system catalog.
@@ -0,0 +1,32 @@ | |||
# This file is automatically generated. See `src/frontend/planner_test/README.md` for more information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a test for NoShuffleExchange? We should make sure that the LocalityProvider lies on a different Fragment is no shuffle exchange is used.
// Force a no shuffle exchange to ensure locality provider is in its own fragment. | ||
// This is important to ensure the backfill ordering can recognize and build | ||
// the dependency graph among different backfill-needed fragments. | ||
StreamExchange::new_no_shuffle(input).into() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to test this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Executor part looks good to me.
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
try_better_locality
method to enforce locality for an operator if it can't provide. The operator calledLocalityProvider
. Currently, we only generate this operator for LogicalJoin and LogicalScan. With this ability, users don't even need to create index by themselves. But if users know their workload well, they can create indexes to share indexes across different jobs.LocalityProvider
operators could depend on each other. We extend these dependencies ordering control based on feat(meta,frontend,streaming): support fixed backfill order control #20967.Performance
TPCH Q18 with scale = 1g
We limit the compute node memory to 2g
Backfill throughput
With locality backfill, our backfilling could finish in 7min, while without the locality backfill, it jobs is too slow to finish.
Cache miss ratio
With locality backfill, our cache miss ratio is much lower than without it.
Checklist
Documentation
Release note