Skip to content

Conversation

chenzl25
Copy link
Contributor

@chenzl25 chenzl25 commented Sep 15, 2025

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

  • Introduce an new type of explain explain (advisor) + sql.
  • Recommend index for backfill to users by notice message.

Example

dev=>  create table t (a int, b int, c int);

dev=> explain (advisor) create materialized view v as select count(*) from t where c > 1 group by a;
NOTICE:  To speed up the backfilling, consider creating an index: CREATE INDEX "__recommended_idx_of_t_a" ON "t" ("a")
                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 StreamMaterialize { columns: [count, t.a(hidden)], stream_key: [t.a], pk_columns: [t.a], pk_conflict: NoCheck }
 └─StreamProject { exprs: [count, t.a] }
   └─StreamHashAgg { group_key: [t.a], aggs: [count] }
     └─StreamExchange { dist: HashShard(t.a) }
       └─StreamProject { exprs: [t.a, t._row_id] }
         └─StreamFilter { predicate: (t.c > 1:Int32) }
           └─StreamTableScan { table: t, columns: [a, _row_id, c] }
(7 rows)
dev=> CREATE INDEX "__recommended_idx_of_t_a" ON "t" ("a");
CREATE_INDEX
dev=> explain create materialized view v as select count(*) from t where c > 1 group by a;;
                                                                                      QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 StreamMaterialize { columns: [count, __recommended_idx_of_t_a.a(hidden)], stream_key: [__recommended_idx_of_t_a.a], pk_columns: [__recommended_idx_of_t_a.a], pk_conflict: NoCheck }
 └─StreamProject { exprs: [count, __recommended_idx_of_t_a.a] }
   └─StreamHashAgg { group_key: [__recommended_idx_of_t_a.a], aggs: [count] }
     └─StreamProject { exprs: [__recommended_idx_of_t_a.a, __recommended_idx_of_t_a.t._row_id] }
       └─StreamFilter { predicate: (__recommended_idx_of_t_a.c > 1:Int32) }
         └─StreamTableScan { table: __recommended_idx_of_t_a, columns: [a, t._row_id, c] }
(6 rows)

Checklist

  • I have written necessary rustdoc comments.
  • I have added necessary unit tests and integration tests.
  • I have added test labels as necessary.
  • I have added fuzzing tests or opened an issue to track them.
  • My PR contains breaking changes.
  • My PR changes performance-critical code, so I will run (micro) benchmarks and present the results.
  • I have checked the Release Timeline and Currently Supported Versions to determine which release branches I need to cherry-pick this PR into.

Documentation

  • My PR needs documentation updates.
Release note

@github-actions github-actions bot added the type/feature Type: New feature. label Sep 15, 2025
Copy link
Member

@BugenZhao BugenZhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I'm not sure if it's a good idea to provide such recommendation in SQL interface. Using index to accelerate queries (no matter batch or streaming) seems to be an advanced practice: typically users need to understand the locality of their query and learn to inspect & comprehend query plan with EXPLAIN commands. Given that for batch queries, such topic is only covered in documentation, maybe we should be aligned with that for streaming queries as well. 🤔 cc @hzxa21 for discussion

Comment on lines 731 to 732
if self.table_indexes().is_empty() {
self.notice_recommended_index(columns);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the primary key of this table scan already has the best locality? Do we still recommend index here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do we still send a notice when user is actually going to create an index?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the primary key of this table scan already has the best locality? Do we still recommend index here?

Yes. We check it inside notice_recommended_index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do we still send a notice when user is actually going to create an index

What do you mean when user is actually going to create an index? If users have created the related indexes, it means we will select those indexes, so we won't recommend it. This PR only recommend indexes when there are no indexes could be used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, when user is going to follow the instruction to issue a CREATE INDEX command, will they still receive a notice for the TableScan of the index job? 😆

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Because we only recommend an index when there is no indexes could be used, but once users create the index we recommend, the index must be able to be used.

@chenzl25 chenzl25 added the user-facing-changes Contains changes that are visible to users label Sep 15, 2025
@chenzl25
Copy link
Contributor Author

Personally I'm not sure if it's a good idea to provide such recommendation in SQL interface. Using index to accelerate queries (no matter batch or streaming) seems to be an advanced practice: typically users need to understand the locality of their query and learn to inspect & comprehend query plan with EXPLAIN commands. Given that for batch queries, such topic is only covered in documentation, maybe we should be aligned with that for streaming queries as well. 🤔 cc @hzxa21 for discussion

Considering we are supporting index selection for backfilling, we should update the doc of index as well to tell users how to create an index to give the streaming queries a better locality .

@hzxa21
Copy link
Collaborator

hzxa21 commented Sep 17, 2025

Personally I'm not sure if it's a good idea to provide such recommendation in SQL interface. Using index to accelerate queries (no matter batch or streaming) seems to be an advanced practice: typically users need to understand the locality of their query and learn to inspect & comprehend query plan with EXPLAIN commands. Given that for batch queries, such topic is only covered in documentation, maybe we should be aligned with that for streaming queries as well. 🤔 cc @hzxa21 for discussion

I think regardless of whether we give index recommendation via SQL notice. We need to add documentation in the doc site to guide user on how to create index to accelerate backfill and why. I have created an issue a couple weeks agao: risingwavelabs/risingwave-docs#617. Please provide more contexts if needed to help doc team draft the doc.

Regarding to index recommendation via SQL, I think showing the notice only in EXPLAIN, but not in DDL is better because if user is interested in running EXPLAIN, he/she probably is looking for optimizing the query anyway. It also depends on how effective the recommendation is. As discussed in the meeting, it may only help for large table.

Copy link

graphite-app bot commented Sep 17, 2025

Looks like this PR extends new SQL syntax or updates existing ones. Make sure that:

  • Test cases about the new/updated syntax are added in src/sqlparser/tests/testdata. Especially, double check the formatted_sql is still a valid SQL #20713
  • The meaning of each enum variant is documented in PR description. Additionally, document what it means when each optional clause is omitted.

@chenzl25
Copy link
Contributor Author

Personally I'm not sure if it's a good idea to provide such recommendation in SQL interface. Using index to accelerate queries (no matter batch or streaming) seems to be an advanced practice: typically users need to understand the locality of their query and learn to inspect & comprehend query plan with EXPLAIN commands. Given that for batch queries, such topic is only covered in documentation, maybe we should be aligned with that for streaming queries as well. 🤔 cc @hzxa21 for discussion

I think regardless of whether we give index recommendation via SQL notice. We need to add documentation in the doc site to guide user on how to create index to accelerate backfill and why. I have created an issue a couple weeks agao: risingwavelabs/risingwave-docs#617. Please provide more contexts if needed to help doc team draft the doc.

Regarding to index recommendation via SQL, I think showing the notice only in EXPLAIN, but not in DDL is better because if user is interested in running EXPLAIN, he/she probably is looking for optimizing the query anyway. It also depends on how effective the recommendation is. As discussed in the meeting, it may only help for large table.

Adding an explain (advisor) + sql to recommend indexes.

ACTION,
ADAPTIVE,
ADD,
ADVISOR,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI #23208

@chenzl25
Copy link
Contributor Author

Close this PR, since we have locality enforcement right now. #23275

@chenzl25 chenzl25 closed this Oct 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type/feature Type: New feature. user-facing-changes Contains changes that are visible to users

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants