IdModel loop graph fix #5426

jjsjann123 · 2025-10-23T18:34:13Z

The issue is root cause from having un-connected IDs in allocation domain, triggering loop promotion assert on ID not covered by loop domain. However, since loop domain should only check coverage on logical sizes, we shouldn't included allocation domain in loop graph in the first place.

Changes in this PR:

exclude IDs only on path to allocation domain from LOOP and IEL graphs. This is done by modifying TensorDomain::allIds method to exclude allocation domain from the pairwise path traversal.
minor changes to add inline support for layout op.

jjsjann123 · 2025-10-23T18:34:43Z

!test

github-actions · 2025-10-23T18:35:58Z

Description

Exclude allocation domain IDs in loop graph construction
Enhance intersection graph with permissive mapping support
Add test for loop promotion issue in inference
Improve ID discovery for tensor domains

Changes walkthrough 📝

Relevant files

Bug fix

id_model.cpp `Add ID filtering excluding allocation domains and enhance graph` `initialization` csrc/id_model/id_model.cpp Added `discoverIdsExceptAllocation` to traverse tensor domains excluding allocation paths Introduced `findAllIdsExceptAllocation` to collect non-allocation IDs from TensorView Implemented `initializeIdGraphExcludeAllocation` for building loop graphs without allocation domain IDs Modified `buildIntersection` to support permissive mapping and allocation domain exclusion Updated `buildStatefulInliningInfo` to use non-allocation IDs for producer/consumer mapping	+130/-9

Enhancement

loop_promotion.cpp `Use permissive intersection with allocation domain exclusion in loop` `promotion` csrc/id_model/loop_promotion.cpp Updated `buildIntersection` call to use permissive mapping and exclude allocation domains Modified terminal ID detection to check LOOP graph group membership before comparison	+13/-2
id_model.h `Update IdModel interface for allocation-excluded graph initialization` csrc/id_model/id_model.h Declared `initializeIdGraphExcludeAllocation` method for loop graph initialization Extended `buildIntersection` with parameters for permissive mapping and allocation domain exclusion	+9/-1
logical_domain_map.h `Support PreprocessGroupedMatmulInputSf in logical domain mapping` csrc/logical_domain_map.h Added `handle(PreprocessGroupedMatmulInputSf*)` override to support new op in logical domain mapping	+4/-0

Tests

test_layout_op.cpp `Add test for loop promotion in inference scenario` tests/cpp/test_layout_op.cpp Added `InferenceBenchmarkLoopPromotionIssue` test to validate loop promotion behavior Created fusion with grouped matmul preprocessing and block scaling factors Verified output correctness using `validateGroupedLayout`	+65/-1

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests

⚡ Recommended focus areas for review

Possible Issue

The function discoverIdsExceptAllocation collects IDs from multiple domains but may not correctly handle the shortest path logic when building out2in via getExprsBetween, as it accumulates all inputs and outputs from the path expressions without ensuring directional consistency or avoiding redundant or invalid mappings.

void discoverIdsExceptAllocation(
    const TensorView* tv,
    VectorOfUniqueEntries<IterDomain*>& discovered_ids) {
  std::vector<const std::vector<IterDomain*>*> all_domains = {
      &tv->getLoopDomain(),
      &tv->getLogicalDomain(),
      &tv->getInitialLoopDomain(),
      &tv->domain()->additionalIDs()};
  if (tv->hasRoot()) {
    all_domains.push_back(&tv->getRootDomain());
  }
  if (tv->getAlternateLoopDomain().has_value()) {
    all_domains.push_back(&tv->getAlternateLoopDomain().value());
  }

  for (auto domain : all_domains) {
    discovered_ids.pushBack(*domain);
  }

  // We only care about IDs on the shortest path between domains
  std::unordered_multimap<IterDomain*, IterDomain*> out2in;
  for (auto i : arange(all_domains.size() - 1)) {
    if (all_domains[i]->empty()) {
      continue;
    }
    for (auto j : arange(i + 1, all_domains.size())) {
      if (all_domains[j]->empty()) {
        continue;
      }
      auto path = getExprsBetween<IRBFS>(
                      {all_domains[i]->begin(), all_domains[i]->end()},
                      {all_domains[j]->begin(), all_domains[j]->end()},
                      false)
                      .first;
      for (auto [expr, _] : path) {
        discovered_ids.pushBack(
            ir_utils::filterByType<IterDomain>(expr->outputs()));
        discovered_ids.pushBack(
            ir_utils::filterByType<IterDomain>(expr->inputs()));
      }
    }
  }
};

Performance Concern

The repeated calls to findAllIdsExceptAllocation in buildStatefulInliningInfo may lead to redundant computation since each call recomputes the same set of IDs for the same TensorView; caching or sharing the result could improve performance.

auto all_producer_ids = findAllIdsExceptAllocation(producer_tv);
auto all_consumer_ids = findAllIdsExceptAllocation(consumer_tv);

Correctness Risk

The use of idGraph(IdMappingMode::LOOP) in LoopPromotionMapBuilder::build to check group membership assumes the graph contains all relevant IDs, but since the LOOP graph excludes allocation domain IDs, this may lead to incorrect filtering if some uses involve such IDs.

return idGraph(IdMappingMode::LOOP).hasGroup(out) &&
    group != idGraph(IdMappingMode::LOOP).toGroup(out);

jjsjann123 · 2025-10-27T04:34:17Z

To run @crcrpar 's example with nvfp4, i needed a few small fixes.

So we would want to use thunder branch in Lightning-AI/lightning-thunder#2691

I used nvfuser code in this PR, as well as the small fix for nvfp4 in #5428 (already approved, will update comment after merge).

In order to run the benchmark, the command to use (run this in Thunder's directory). The program did run to completion, but I haven't got around to verify the model:

NVFUSER_DISABLE=parallel_compile NVFUSER_ENABLE="id_model(all)" python thunder/benchmarks/benchmark_inference.py --mode thunder --enable-nv-linear --enable-nvfp4 --output-length 2

Note: we need to add --output-length 2 otherwise the run takes forever....

tagging @naoyam @protonu

jjsjann123 added 4 commits October 23, 2025 09:48

fix id_model loop graph to exclude allocation domains

d44b48d

more code comments

57e1b84

clangformat

e14548b

clangformat

64bf440

jjsjann123 requested a review from naoyam October 23, 2025 18:34

Merge branch 'main' into jj/idmodel_loop_graph_patch

2457122

jjsjann123 requested review from jacobhinkle, rdspring1 and zasdfgbnm October 23, 2025 21:00

jjsjann123 mentioned this pull request Oct 24, 2025

[SmokeTest][DoNotReview] #5143

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

IdModel loop graph fix #5426

IdModel loop graph fix #5426

jjsjann123 commented Oct 23, 2025 •

edited

Loading

Uh oh!

jjsjann123 commented Oct 23, 2025

Uh oh!

github-actions bot commented Oct 23, 2025

Uh oh!

jjsjann123 commented Oct 27, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

IdModel loop graph fix #5426

Are you sure you want to change the base?

IdModel loop graph fix #5426

Conversation

jjsjann123 commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jjsjann123 commented Oct 23, 2025

Uh oh!

github-actions bot commented Oct 23, 2025

Description

Changes walkthrough 📝

PR Reviewer Guide 🔍

Uh oh!

jjsjann123 commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jjsjann123 commented Oct 23, 2025 •

edited

Loading

jjsjann123 commented Oct 27, 2025 •

edited

Loading