Use pointwise scheduler to schedule Block Quantization #5362

protonu · 2025-10-09T20:00:56Z

Stacked on top of: #5266

This extends the pointwise scheduler to accept a fusion with a block quantization op.
The block scale output of the block quantization op must be a fusion segment output.

github-actions · 2025-10-09T20:04:26Z

Description

Enable pointwise scheduler for block quantization ops
Ensure block scales output is fusion output
Add transitive check for block scales output
Add test for auto-scheduling with swizzle

Changes walkthrough 📝

Relevant files

Bug fix

logical_domain_map.cpp `Refine domain mapping for block quantization` csrc/logical_domain_map.cpp Restrict non-mapping domain logic to only when producer is input and consumer is block scales Ensures correct handling of last logical dimension in block quantization context	+1/-1
utils.cpp `Avoid caching block scales outputs` csrc/scheduler/utils.cpp Exclude block scales outputs from caching/forking in `cacheAndForkOutputs` Prevents incorrect memory optimization on block quantization outputs	+4/-1

Enhancement

pointwise.cpp `Support block quantization in pointwise scheduling` csrc/scheduler/pointwise.cpp Add check to reject scheduling if block scales is not a fusion output Include block quantization outputs in vectorization list	+18/-0
registry_utils.cpp `Add check for terminal block quantization output` csrc/scheduler/registry_utils.cpp Implement `hasNonTerminalBlockQuantizeOp` to check if block scales is not a fusion output Used to reject invalid scheduling configurations	+13/-0
domain_map.cpp `Skip domain checks for block scales outputs` csrc/scheduler/tools/domain_map.cpp Add `isTransitiveBlockScaleOuput` helper to detect block scales through producer chain Skip validity checks for such outputs in `isValidReference`	+53/-1
registry_utils.h `Declare block quantization output check function` csrc/scheduler/registry_utils.h Declare `hasNonTerminalBlockQuantizeOp` function Used in pointwise scheduler to validate fusion structure	+4/-0

Tests

test_low_precision_recipe.cpp `Add auto-scheduling test for block quantization` tests/cpp/test_low_precision_recipe.cpp Add test `AutoScheduleSingleOpWithSwizzle` for pointwise scheduling of block quantization Validates correctness against baseline implementation	+69/-0

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests
⚡ Recommended focus areas for review Scheduling Restriction The PR rejects scheduling if any Block Quantization Op's block scales are not fusion outputs, but does not validate whether this restriction is necessary or if alternative handling (e.g., intermediate usage) could be supported. This limitation should be justified with performance or correctness reasoning. if (registry_utils::hasNonTerminalBlockQuantizeOp(fusion)) { scheduler_debug_utils::canScheduleRejectReason( schedulerType(), "no support for block quantization where block scales is not a fusion " "output"); return false; Transitive Check Logic The function isTransitiveBlockScaleOuput traverses only the first producer in case of multiple producers, which may miss valid block scale paths. This simplification could lead to incorrect domain mapping decisions if other producer branches contain block scale outputs. // Move to the first producer for continued traversal // If there are multiple producers, we check the first one for simplicity // This could be extended to check all paths if needed if (!producers.empty()) { current_tv = producers[0]; } else { current_tv = nullptr; // No more producers to check } } CacheFork Bypass The cacheFork optimization skips outputs of BlockQuantizationOp that are block scales, but does not consider whether such outputs might still benefit from caching in certain fusion patterns. This could limit optimization opportunities for complex fusions involving block quantization. output->definition()->isA<ScatterOp>() \|\| (output->definition()->isA<BlockQuantizationOp>() && output->definition()->as<BlockQuantizationOp>()->blockScales() == output)) {

protonu added 3 commits October 9, 2025 12:25

test with pointwise scheduler

bed4f5d

clean up

c3a3ee6

clean up

b8996e1

protonu mentioned this pull request Oct 9, 2025

Have the pointwise scheduler process the Block Quantization Op. #5322

Closed

protonu requested review from jjsjann123 and naoyam October 9, 2025 20:06

protonu added 2 commits October 14, 2025 11:33

adding validation checks

887e36f

modifying vectorization heuristic

8ba8e62

This was referenced Oct 15, 2025

[Draft] Add support for block scaling factors with a swizzled output. #5401

Draft

Create a new node for Block Quantization to NVFP4 and plumb it to a device function. #5266

Open

nvMelissa mentioned this pull request Oct 21, 2025

Pointwise scheduler changes #5419

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Use pointwise scheduler to schedule Block Quantization #5362

Use pointwise scheduler to schedule Block Quantization #5362

protonu commented Oct 9, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Use pointwise scheduler to schedule Block Quantization #5362

Are you sure you want to change the base?

Use pointwise scheduler to schedule Block Quantization #5362

Conversation

protonu commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 9, 2025

Description

Changes walkthrough 📝

PR Reviewer Guide 🔍

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

protonu commented Oct 9, 2025 •

edited

Loading