-
Notifications
You must be signed in to change notification settings - Fork 67
Have the pointwise scheduler process the Block Quantization Op. #5322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Review updated until commit 8d8c115 Description
Changes walkthrough 📝
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
|
!test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enables the pointwise scheduler to handle Block Quantization operations by adding necessary checks and vectorization support. The changes allow block quantization ops to be processed through the pointwise scheduling path when appropriate.
- Added compile-time validation to ensure block scales output is a fusion output
- Enhanced vectorization support for block quantization outputs
- Added comprehensive tests for single and multi-op fusion scenarios with block quantization
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/cpp/test_low_precision_recipe.cpp | Added test cases for auto-scheduling block quantization ops, including swizzled output scenarios |
| csrc/scheduler/vectorize_helper.cpp | Enhanced contiguous dimension mapping to handle block scales output allocation domains |
| csrc/scheduler/utils.cpp | Modified output caching to exclude block scales outputs from caching |
| csrc/scheduler/registry_utils.h | Added declaration for checking non-terminal block quantization ops |
| csrc/scheduler/registry_utils.cpp | Implemented function to detect non-terminal block quantization operations |
| csrc/scheduler/pointwise.cpp | Added compile-time checks and vectorization support for block quantization ops |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| public ::testing::WithParamInterface<DataType> {}; | ||
| namespace { | ||
| void createNVFP4QunatizationFusion(Fusion* fusion, DataType data_hp_dtype) { | ||
| void createNVFP4QunatizationFusion( |
Copilot
AI
Oct 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected spelling of 'Qunatization' to 'Quantization'.
| void createNVFP4QunatizationFusion( | |
| void createNVFP4QuantizationFusion( |
| quantization_results.block_scales->axis(4), | ||
| quantization_results.block_scales->axis(5)}; |
Copilot
AI
Oct 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accessing axis(5) when only 5 axes (0-4) exist after the splits. The block_scales tensor has 3 dimensions initially and after splits should have indices 0-4.
| quantization_results.block_scales->axis(4), | |
| quantization_results.block_scales->axis(5)}; | |
| quantization_results.block_scales->axis(4)}; |
| auto t_relu = relu(t_add); | ||
| auto quantization_results = blockQuantize(t_relu); | ||
| quantization_results.quantized_tensor->setMemoryType(MemoryType::Local); | ||
| // auto t_out = set(quantization_results.quantized_tensor); |
Copilot
AI
Oct 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented-out code that is not being used.
| // auto t_out = set(quantization_results.quantized_tensor); |
|
I think this is too early to review. Let's focus on the first PR. |
I'll close this PR as changed our implementation of the block quantization op - where we don't use a reshape before the quantization anymore. I have an updated PR for this: #5362 |
|
I have an updated PR for this: #5362 |
The PR:
BlockQuantizationop.This is stacked on top of:
#5266