Replay allocation in rfactor #5090

Priya2698 · 2025-08-29T19:34:55Z

For multidevice, inferring tensor shapes relies on allocation domain. Rfactor does not replay the allocation domain, which led to wrong shape being inferred in the given test case and issue.

Priya2698 · 2025-08-29T19:35:01Z

!test

github-actions · 2025-08-29T20:25:06Z

Review updated until commit 8888217

Description

Fix shape inference in multidevice by replaying allocation domain
Replay rfactor transformations on allocation domain for correctness
Propagate static allocation IDs to preserve compute semantics
Add test for inner reduction in multidevice context

Changes walkthrough 📝

Relevant files

Bug fix

transform_rfactor.cpp `Extend rFactor replay to allocation domain` csrc/transform_rfactor.cpp Added `allocation_domain_` to track allocation domain during rFactor Modified `splitId` and `mergeId` to handle allocation domain with static IDs Updated `ReplayRFactor` constructor to compute static allocation IDs Set allocation domain on producer and consumer in `runReplay`	+101/-55

Tests

test_multidevice.py `Add test for inner reduction on multidevice` tests/python/multidevice/test_multidevice.py Added new test `test_inner_reduction` for multidevice Tests inner dimension reduction with device mesh Verifies correct output using `assert torch.allclose`	+28/-0

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests

⚡ Recommended focus areas for review

Possible Issue

The updateRFactorDomain function unconditionally processes both logical and allocation domains for splits and merges, but the original code had conditional checks to ensure only static IDs were processed. This may lead to unintended modifications of non-static allocation domains.

if (Split* split = dynamic_cast<Split*>(expr)) {
  splitId(logical_domain_, split, static_logical_ids_);
  splitId(allocation_domain_, split, static_allocation_ids_);
}
if (Merge* merge = dynamic_cast<Merge*>(expr)) {
  mergeId(logical_domain_, merge, static_logical_ids_);
  mergeId(allocation_domain_, merge, static_allocation_ids_);
}

Logic Error

The mergeId function checks that both inner and outer IDs have the same static status, but only uses the outer ID's status to decide whether to proceed. This could result in inconsistent handling of merge operations when one ID is static and the other is not, despite the NVF_ERROR check.

NVF_ERROR(
    static_ids.contains(merge->inner()) ==
        static_ids.contains(merge->outer()),
    "If one input to a merge is a static id, the other must be as well.");
if (!static_ids.contains(merge->outer())) {
  return;
}
auto outer_it = domain.erase(merge->outer()).second;
domain.insert(outer_it, merge->out(), std::monostate());
domain.erase(merge->inner());

Missing Validation

The allocation domain is now being replayed and set unconditionally when present, but there is no validation that the transformed allocation domain maintains necessary properties or aligns with the logical domain structure.

if (original_td->hasAllocation()) {
  std::vector<IterDomain*> transformed_original_allocation =
      replay_rfactor.allocation();
  std::vector<IterDomain*> new_producer_allocation_domain = replayDomain(
      transformed_original_allocation,
      original_to_producer_id_map,
      /*ignore_ids=*/{},
      /*propagate_padding=*/false,
      /*propagate_parallelization=*/false);
  producer_domain->setAllocationDomain(
      new_producer_allocation_domain,
      TensorDomain::getContiguityFilledWith(
          new_producer_allocation_domain, true));
}

Priya2698 · 2025-08-29T20:33:15Z

!test

…ction

Priya2698 · 2025-09-01T23:07:51Z

!test

This approach should be faster and simplifies the update logic. For PR #5090

Separating out replaying domain to a function to allow reuse. For Issue #5079, PR #5090 will also replay allocation.

Priya2698 · 2025-09-05T04:49:43Z

csrc/transform_rfactor.cpp

-  // Axes in the original_td that are in the history of the rfactored domains.
-  // These will mark which iter domains must be preserved as static
-  // transformations to preserve compute semantics.
-  auto all_deps_of_logical = DependencyCheck::getAllValsBetween(


code movement: moved to ReplayRFactor constructor. We compute vals between maybeRoot/logical/allocation to rfactor_axes.

Priya2698 · 2025-09-05T04:57:20Z

!test --diff

Priya2698 · 2025-09-05T17:30:27Z

!test --diff

--global added 4 commits August 26, 2025 15:10

test cases

3ab9a29

isolated rfactor test

665d4ae

working draft of allocation domain replay

2cce4e5

make replayDomain a util for reuse

6591926

Priya2698 added 3 commits August 29, 2025 12:52

parallelize before padding

2242117

clean tests

328576e

Merge remote-tracking branch 'origin/main' into pm/2dreduction

b24ef8c

remove test case

311ae41

separate replayDomain to allow reuse

cea43e7

Priya2698 mentioned this pull request Sep 1, 2025

Create a replayDomain utility to allow reuse #5097

Merged

Priya2698 added 4 commits September 1, 2025 14:47

use linked hash map

248de65

Merge remote-tracking branch 'origin/pm/replay_domain' into pm/2dredu…

0573852

…ction

Merge remote-tracking branch 'origin/pm/linked' into pm/2dreduction

2dd5c6e

compute static allocation ids

8b31294

Priya2698 mentioned this pull request Sep 2, 2025

Use linked hash map to update logical domain in rfactor #5100

Merged

Priya2698 added a commit that referenced this pull request Sep 3, 2025

Use linked hash map to update logical domain in rfactor (#5100)

0927880

This approach should be faster and simplifies the update logic. For PR #5090

Priya2698 added a commit that referenced this pull request Sep 4, 2025

Create a replayDomain utility to allow reuse (#5097)

04c2043

Separating out replaying domain to a function to allow reuse. For Issue #5079, PR #5090 will also replay allocation.

Priya2698 added 4 commits September 4, 2025 11:08

Merge remote-tracking branch 'origin/main' into pm/2dreduction

7fc21ec

merge update

43eb1ec

reuse dep vals code

9eed760

fix merge issues

d4e731d

Priya2698 commented Sep 5, 2025

View reviewed changes

Priya2698 and others added 2 commits September 4, 2025 21:55

rename

8236142

Merge branch 'main' into pm/2dreduction

960e04e

Merge branch 'main' into pm/2dreduction

8888217

Priya2698 marked this pull request as ready for review September 5, 2025 17:31

Priya2698 requested review from naoyam and jjsjann123 September 5, 2025 17:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replay allocation in rfactor #5090

Replay allocation in rfactor #5090

Uh oh!

Priya2698 commented Aug 29, 2025 •

edited

Loading

Uh oh!

Priya2698 commented Aug 29, 2025

Uh oh!

github-actions bot commented Aug 29, 2025 •

edited

Loading

Uh oh!

Priya2698 commented Aug 29, 2025

Uh oh!

Priya2698 commented Sep 1, 2025

Uh oh!

Priya2698 Sep 5, 2025

Uh oh!

Priya2698 commented Sep 5, 2025

Uh oh!

Priya2698 commented Sep 5, 2025

Uh oh!

Uh oh!

Replay allocation in rfactor #5090

Are you sure you want to change the base?

Replay allocation in rfactor #5090

Uh oh!

Conversation

Priya2698 commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Priya2698 commented Aug 29, 2025

Uh oh!

github-actions bot commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes walkthrough 📝

PR Reviewer Guide 🔍

Uh oh!

Priya2698 commented Aug 29, 2025

Uh oh!

Priya2698 commented Sep 1, 2025

Uh oh!

Priya2698 Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

Priya2698 commented Sep 5, 2025

Uh oh!

Priya2698 commented Sep 5, 2025

Uh oh!

Uh oh!

Priya2698 commented Aug 29, 2025 •

edited

Loading

github-actions bot commented Aug 29, 2025 •

edited

Loading