Skip to content

Conversation

@wujingyue
Copy link
Collaborator

@wujingyue wujingyue commented Oct 3, 2025

Fixes bugs like #5356

We should probably add some knobs so callers decide what to replay, in a separate PR. So far, this function tries to replay everything (namely, loop, allocation and contiguity), which seems to work fine.

@wujingyue
Copy link
Collaborator Author

!test

@github-actions
Copy link

github-actions bot commented Oct 3, 2025

Review updated until commit 23ee554

Description

  • Fix contiguity replay in TransformReplay::selfReplay

  • Refactor selfReplay to handle loop and allocation consistently

  • Improve error handling and validation in replay logic

  • Add test for contiguity replay with empty allocation


Changes walkthrough 📝

Relevant files
Bug fix
nodes.cpp
Improve contiguity validation                                                       

csrc/ir/nodes.cpp

  • Replace NVF_CHECK with NVF_CHECK_EQ for clearer error message
  • Validate contiguity vector size against allocation domain
  • +3/-6     
    transform_replay.cpp
    Refactor and fix selfReplay logic                                               

    csrc/transform_replay.cpp

  • Refactor selfReplay into a lambda to build IterDomainMap
  • Remove ignore_reductions flag and simplify reduction handling
  • Fix contiguity replay by properly mapping and setting contiguity
  • Add proper handling of symbolic and broadcast domains
  • +130/-85
    Tests
    test_alias.cpp
    Add test for accumulate slices aliasing                                   

    tests/cpp/test_alias.cpp

  • Add AccumulateSlices test for aliasing with symbolic domains
  • Test replay of contiguity from broadcast to symbolic ID
  • +55/-25 
    test_replay.cpp
    Add contiguity replay test                                                             

    tests/cpp/test_replay.cpp

  • Add ContiguityWithEmptyAllocation test
  • Verify contiguity replay when allocation domain is empty
  • +16/-0   

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    🧪 PR contains tests
    ⚡ Recommended focus areas for review

    Contiguity Replay Logic

    The new code introduces a complex replay mechanism for contiguity, particularly when handling symbolic and broadcast dimensions. The logic for setting contiguity on symbolic IDs may need validation, especially the decision to default to true when contiguity is null. This could impact correctness in cases involving symbolic dimensions.

    } else {
      const std::vector<IterDomain*>& new_logical = new_self->logical();
      const auto new_rank = std::ssize(new_logical);
      std::vector<std::optional<bool>> new_contiguities(new_rank, std::nullopt);
    
      int new_pos = 0;
      for (auto [id, contiguity] : zip(self->logical(), self->contiguity())) {
        IterDomain* new_id = getOrDefault(replay, id);
        if (new_id == nullptr) {
          continue;
        }
    
        // Find the corresponding contiguity in new_logical. Mapped IterDomains
        // in self->logical() and new_logical follow the same order. So it's safe
        // to only increment `new_pos`.
        while (new_pos < new_rank && new_logical.at(new_pos) != new_id) {
          new_pos++;
        }
        NVF_ERROR_LT(
            new_pos,
            new_rank,
            "Failed to find ",
            new_id->toString(),
            " in ",
            new_logical);
        std::optional<bool>& new_contiguity = new_contiguities.at(new_pos);
    
        new_contiguity = contiguity;
        // When used during or before concretization, TransformReplay::selfReplay
        // can be applied to replay transformations from symbolic dimensions to
        // concrete dimensions, or in the reverse direction. Therefore,
        // `new_contiguity` is not always identical to `contiguity`.
        if (new_id->isBroadcast()) {
          new_contiguity = std::nullopt;
        } else if (new_id->isSymbolic()) {
          // See AliasTest.AccumulateSlices for an example. aliasOutputToInput is
          // called before concretization and tries to replay contiguity from a
          // broadcast IterDomain to a symbolic IterDomain. However, a symbolic
          // IterDomain can't have contiguity null.
          if (!new_contiguity.has_value()) {
            new_contiguity = true;
          }
        }
      }
    
      new_self->setContiguity(new_contiguities);
    Reduction Axis Handling

    The updated code removes the ignore_reductions flag and instead uses mapped_new_ids to filter reduction axes. This change alters the control flow significantly and may affect how reduction dimensions are replayed. The interaction between mapped_new_ids and reduction axes should be carefully reviewed for correctness.

    if (self->loop() != self->logical()) {
      std::vector<IterDomain*> new_loop;
      for (auto* new_id : new_self->logical()) {
        if (mapped_new_ids.count(new_id) == 0) {
          NVF_ERROR(
              new_id->isReduction(),
              new_id->toString(),
              " should be a reduction.");
          new_loop.push_back(new_id);
        }
    NVF_CHECK_EQ Usage

    The use of NVF_CHECK_EQ in validateContiguity has been introduced to replace the previous NVF_CHECK with manual size comparison. While this simplifies the code, it may obscure debugging information if the check fails. The trade-off between conciseness and debuggability should be considered.

    NVF_CHECK_EQ(
        contiguity.size(),
        allocation_domain.size(),
        "Invalid contiguity information provided, incorrect size.");

    @wujingyue wujingyue changed the title Replay loop unconditionally TransformReplay::selfReplay replays contiguity Oct 6, 2025
    @wujingyue
    Copy link
    Collaborator Author

    !test

    @wujingyue
    Copy link
    Collaborator Author

    !test

    @wujingyue
    Copy link
    Collaborator Author

    !test

    @wujingyue
    Copy link
    Collaborator Author

    !test

    @wujingyue
    Copy link
    Collaborator Author

    !test

    @wujingyue
    Copy link
    Collaborator Author

    !test

    @wujingyue
    Copy link
    Collaborator Author

    !test

    @wujingyue wujingyue requested a review from Priya2698 October 7, 2025 04:19
    fusion->addInput(in);
    fusion->addInput(i);
    fusion->addOutput(acc_out);
    fusion->aliasOutputToInput(acc_out, acc_in, AllocationType::ReuseBuffer);
    Copy link
    Collaborator Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    This calls TransformReplay::selfReplay to replay transformations from concrete dimensions to symbolic dimensions.

    auto out_tensors = executor_cache.runFusionWithInputs({in_tensor});
    ASSERT_EQ(out_tensors.size(), 1);
    at::Tensor out_tensor = out_tensors[0].as<at::Tensor>();
    auto out_tensor = out_tensors[0].as<at::Tensor>();
    Copy link
    Collaborator Author

    @wujingyue wujingyue Oct 7, 2025

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Use auto because as<at::Tensor> already specifies the type.

    @Priya2698 Priya2698 requested a review from naoyam October 7, 2025 17:52
    @wujingyue wujingyue requested a review from Priya2698 October 7, 2025 22:35
    @wujingyue
    Copy link
    Collaborator Author

    !test

    @wujingyue
    Copy link
    Collaborator Author

    !test

    @naoyam
    Copy link
    Collaborator

    naoyam commented Oct 8, 2025

    It would be really helpful to have some quick PR introduction to remind the context. My mental capacity is not big enough to remember everything currently going on.

    @Priya2698
    Copy link
    Collaborator

    Priya2698 commented Oct 8, 2025

    Thanks @wujingyue for the clarifications.
    Changes LGTM, I'll defer approval to @naoyam since he had concerns about changes in this function.

    @wujingyue
    Copy link
    Collaborator Author

    It would be really helpful to have some quick PR introduction to remind the context. My mental capacity is not big enough to remember everything currently going on.

    Done

    @wujingyue
    Copy link
    Collaborator Author

    !test

    }

    new_self->setAllocationDomain(new_allocation, new_contiguities);
    } else {
    Copy link
    Collaborator

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    This feels a little unexpected to me because, even though there's nothing to replay for the allocation, the contiguity could be modified.

    Copy link
    Collaborator

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    What would happen if self doesn't have an allocation domain but new_self does? Would it work?

    @naoyam
    Copy link
    Collaborator

    naoyam commented Oct 17, 2025

    !test --diff

    @naoyam
    Copy link
    Collaborator

    naoyam commented Oct 17, 2025

    Fixes bugs like #5356

    We should probably add some knobs so callers decide what to replay, in a separate PR. So far, this function tries to replay everything (namely, loop, allocation and contiguity), which seems to work fine.

    Changes of contiguity may not cause test failures but could result in, e.g., different vectorizations. Started the diff check just in case.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    None yet

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    4 participants