Skip to content

Conversation

@zasdfgbnm
Copy link
Collaborator

No description provided.

@github-actions
Copy link

github-actions bot commented Oct 18, 2025

Review updated until commit f756717

Description

  • Introduce output evaluation via ExpressionEvaluator for specific outputs

  • Centralize output size inference using inferOutputSizes

  • Add debug check for contiguity flag consistency

  • Remove duplicated ExpressionEvaluator logic from runtime


Changes walkthrough 📝

Relevant files
Enhancement
allocations.cpp
Support direct output evaluation in allocation inference 

csrc/runtime/allocations.cpp

  • Add early evaluation path for outputs marked with
    AllocationType::Evaluate
  • Use ExpressionEvaluator with meta tensors to infer output sizes
  • Bind input tensors with preserved strides and dtype on meta device
  • Return evaluated results directly without further allocation
  • +39/-0   
    fusion_kernel_runtime.cpp
    Simplify input preparation with centralized output inference

    csrc/runtime/fusion_kernel_runtime.cpp

  • Replace inline ExpressionEvaluator logic with inferOutputSizes call
  • Introduce consistency check between scheduler type and output alias
  • Use segmented fusion to generate runtime fusion for evaluation
  • Pass contiguity update flag based on scheduler type
  • +36/-59 

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    🧪 No relevant tests
    ⚡ Recommended focus areas for review

    Possible Issue

    The logic for handling outputs marked with AllocationType::Evaluate relies on dynamic casting and direct access to output aliases. This may not correctly handle all edge cases, especially if the output is not a TensorView or if alias information is missing, leading to incorrect evaluation or missed evaluations.

    // If any output is marked to be evaluated by ExpressionEvaluator,
    // evaluate outputs directly following ATen's striding rules.
    bool has_evaluate_outputs = false;
    for (Val* out : fusion->outputs()) {
      if (auto* tv = dynamic_cast<TensorView*>(out)) {
        if (fusion->getOutputAlias(tv).type == AllocationType::Evaluate) {
          has_evaluate_outputs = true;
          break;
        }
      }
    }
    
    if (has_evaluate_outputs) {
      ExpressionEvaluator eval_fusion;
      // Bind inputs; for tensors, bind meta tensors preserving size/stride and dtype.
      for (auto i : arange(fusion->inputs().size())) {
        const auto& pv = args[i];
        if (pv.is<at::Tensor>()) {
          const auto t = pv.as<at::Tensor>();
          if (t.defined()) {
            const auto meta_t = at::empty_strided(
                t.sizes(),
                t.strides(),
                c10::TensorOptions().device(c10::Device(c10::DeviceType::Meta)).dtype(t.dtype()));
            eval_fusion.bind(fusion->inputs()[i], meta_t);
          } else {
            eval_fusion.bind(fusion->inputs()[i], t);
          }
        } else {
          eval_fusion.bind(fusion->inputs()[i], pv);
        }
      }
      for (auto v : fusion->outputs()) {
        auto result = eval_fusion.evaluate(v);
        output_tensor_proxies.push(result);
      }
      return output_tensor_proxies;
    }
    Consistency Check

    The debug check comparing need_update_contiguity and has_evaluate_outputs assumes these two conditions should always match. However, if the scheduler type and output alias type are not kept in sync by other parts of the system, this assertion could fail, indicating a deeper inconsistency in the design or state management.

    // Debug check: ensure alias-based expectation matches scheduler-based need_update_contiguity
    bool has_evaluate_outputs = false;
    for (Val* out : fusion_to_run->outputs()) {
      if (auto* tv = dynamic_cast<TensorView*>(out)) {
        if (fusion_to_run->getOutputAlias(tv).type == AllocationType::Evaluate) {
          has_evaluate_outputs = true;
          break;
        }
      }
    }
    NVF_ERROR(
        need_update_contiguity == has_evaluate_outputs,
        "need_update_contiguity mismatches has_evaluate_outputs in prepareInputs");
    Redundant Logic

    The same pattern of checking for Evaluate outputs and setting need_update_contiguity is duplicated in multiple functions. This redundancy increases maintenance burden and the risk of inconsistencies if the logic is updated in one place but not the others.

    // Debug check: ensure alias-based expectation matches scheduler-based need_update_contiguity
    bool has_evaluate_outputs = false;
    for (Val* out : fusion_to_run->outputs()) {
      if (auto* tv = dynamic_cast<TensorView*>(out)) {
        if (fusion_to_run->getOutputAlias(tv).type == AllocationType::Evaluate) {
          has_evaluate_outputs = true;
          break;
        }
      }
    }
    NVF_ERROR(
        need_update_contiguity == has_evaluate_outputs,
        "need_update_contiguity mismatches has_evaluate_outputs in prepareInputs");

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    None yet

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    1 participant