Add fmin_fmax_promotion presegmentation pass #5337

tbqh · 2025-10-07T18:13:41Z

Rewrite and rebase of #5121. Adds a new presegmentation pass "fmin_fmax_promotion" which switches min/max reductions with fmin/fmax reductions where possible. Original motivation on #319.

The new pass does dataflow analysis by attaching an enum to IterDomain's. It flows these downward and checks whether any corrupted "BAD" states end up in the output. Currently can only handle 4 operator types:

UnaryOp
ReduceOp
BroadcastOp
BinaryOp

For any other operator type, or if at any point we fail to map an IterDomain through an operator, we treat the operator as a fusion output.

github-actions · 2025-10-07T18:15:12Z

Review updated until commit df9a676

Description

Promote min/max reductions to fmin/fmax when safe
Prevent NaN propagation issues in reduction outputs
Analyze dataflow using IdModel for accurate mapping
Add comprehensive tests for fmin/fmax promotion cases

Changes walkthrough 📝

Relevant files

Enhancement

fmin_fmax_promotion.cpp `Implement fmin/fmax promotion with dataflow analysis` csrc/preseg_passes/fmin_fmax_promotion.cpp Implement new presegmentation pass to promote min/max to fmin/fmax Use NanStatus enum for tracking NaN propagation through reductions Perform downward dataflow analysis to detect unsafe promotions Utilize IdModel for accurate IterDomain mapping	+251/-0
pre_segmenter.cpp `Register fmin/fmax promotion pass in pipeline` csrc/preseg_passes/pre_segmenter.cpp Include new fmin_fmax_promotion.h header Register FMinFMaxPromotionPass in pre-segmentation pipeline Position pass after AddAxiomsPass and before MoveSplitCatPass	+2/-0
internal_nodes.h `Add markUnsafe method for reduction ops` csrc/ir/internal_nodes.h Add markUnsafe() method to ReductionOp class Convert BinaryOpType from Min/Max to FMin/FMax Enable promotion of reduction operations in IR	+9/-0
fmin_fmax_promotion.h `Declare fmin/fmax promotion pass interface` csrc/preseg_passes/fmin_fmax_promotion.h Declare FMinFMaxPromotionPass class Document NaN propagation behavior differences Explain conditions under which promotion is safe Define pass as OptimizationPass specialization	+41/-0

Tests

test_math_opt.cpp `Add tests for fmin/fmax promotion pass` tests/cpp/test_math_opt.cpp Add FMinFMaxPromotionTest with 9 test cases Test various reduction topologies and broadcast patterns Verify fmax presence/absence in generated kernel code Include NaN values in test tensors for validation	+121/-0

Configuration changes

CMakeLists.txt `Add fmin_fmax_promotion to build system` CMakeLists.txt Add fmin_fmax_promotion.cpp to NVFUSER_SRCS Include new presegmentation pass in build	+1/-0

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests
⚡ Recommended focus areas for review Possible Issue The function `reductionDomainIsCovered` assumes that `expr->output(0)` is a TensorView and uses it directly. However, in cases where the expression has multiple outputs or no TensorView outputs, this could lead to incorrect behavior or crashes. The check `auto* out_tv = dynamic_cast<TensorView>(expr->output(0));` should ensure that the output is valid, but there is no handling for expressions with more than one output, which may result in missed analysis or incorrect propagation of NanStatus. auto out_tv = dynamic_cast<TensorView>(expr->output(0)); bool canBeAnalyzed = expr->isA<UnaryOp>() \|\| expr->isA<ReductionOp>() \|\| Performance Concern* The current implementation runs a separate dataflow analysis for each reduction domain in `minMaxOpIsCovered`, which can be inefficient for fusions with many reduction axes. Although the comment acknowledges this, it would be beneficial to consider merging these analyses into a single traversal to improve performance, especially for large fusions. // // Note that this currently re-runs the traversal/dataflow analysis for every // single reduction domain. This could be merged into a single traversal, // however it would require per-domain tracked of the NanStatusMap, and it would // make the propagation code more complicated. Possible Issue The `markUnsafe` method modifies the reduction operation type in-place without validating whether the operation is actually safe to change. If this method is called multiple times on the same ReductionOp, it will incorrectly convert FMax to FMin and vice versa, leading to incorrect kernel behavior. A safeguard should be added to prevent re-entry or redundant conversion. void markUnsafe() { if (attribute<BinaryOpType>(1) == BinaryOpType::Max) { attribute<BinaryOpType>(1) = BinaryOpType::FMax; } if (attribute<BinaryOpType>(1) == BinaryOpType::Min) { attribute<BinaryOpType>(1) = BinaryOpType::FMin; } }

tbqh · 2025-10-07T18:15:42Z

csrc/ir/internal_nodes.h

    return attribute<BinaryOpType>(1);
  }

+  void markUnsafe() {


TODO: Jacob recommended we get rid of this function, and instead replace the entire Expr with a new one.

csrc/preseg_passes/fmin_fmax_promotion.cpp

naoyam · 2025-10-07T22:44:58Z

csrc/preseg_passes/fmin_fmax_promotion.cpp

+
+  // Full-size statuses
+  DEFAULT,
+  BAD_BROADCAST,


Still trying to understand the analysis, but wondering why we need a separate status for reduction and broadcast. Just having GOOD and BAD not enough?

It is still unclear to me why there is both DEFAULT and GOOD. I also don't understand why we need separate state for broadcasted BAD.

csrc/preseg_passes/fmin_fmax_promotion.cpp

naoyam · 2025-10-07T22:52:29Z

csrc/preseg_passes/fmin_fmax_promotion.cpp

+  for (auto input : expr->inputs()) {
+    if (auto* in_tv = dynamic_cast<TensorView*>(input)) {
+      for (IterDomain* id : in_tv->getLogicalDomain()) {
+        IterDomainStatus status = iterMap[id];


Is iterMap guaranteed to have a mapping for id? If so, let's use at so that we can mark iterMap as a const ref.

These have changed names but the question is still valid:

Is the map (NanStatusMap) guaranteed to have a mapping?

No, the mapping may not exist for every node. For example:

TensorView* tv1 = max(in0, {0, 1}); TensorView* tv2 = add(in0, in2);

The add node here has 2 inputs, but only the in0 TensorView will have a mapping during analysis. This is what the None state is for, it's the default state for unmapped TV's.

csrc/preseg_passes/fmin_fmax_promotion.cpp

naoyam · 2025-10-07T23:13:47Z

csrc/preseg_passes/fmin_fmax_promotion.cpp

+        IterDomainStatus status = iterMap[in_id];
+        auto out_id = p2c[in_id];


Can you avoid using [] as it's not very clear what is intended. Are you assuming the index has a mapping or are you relying on automatic addition of a new mapping?

I am relying on automatic addition of a new mapping to handle unmapped expression inputs. Their states will be "None".

csrc/preseg_passes/fmin_fmax_promotion.cpp

naoyam · 2025-10-07T23:25:28Z

csrc/preseg_passes/fmin_fmax_promotion.cpp

+
+namespace nvfuser::preseg_passes {
+
+// IterDomainStatus are attached to IterDomains and propagated with a


I'm actually not sure iter domains are the right granularity of the analysis. If one iter domain has a bad status, its tensor should be considered bad as well. Also, reductions remove iter domains, so "bad" iter domains would just disappear from the fusion. It seems to me tensors are the right level of this analysis. What do you think?

csrc/preseg_passes/fmin_fmax_promotion.cpp

jacobhinkle

More specific comments below. I think you should focus on explaining the algorithm and really thinking about what state is needed. I agree with @naoyam that it seems like only "good" and "bad" states are needed. Also, why not have an initialization step where all IDs of fusion inputs are marked GOOD instead of NONE?

jacobhinkle · 2025-10-08T23:41:54Z

tests/cpp/test_math_opt.cpp

+    expectFMax = true;
+  }
+
+  if (testIndex == 3) {


Tip: in cases like this I typically would create a new class for FMinFMaxPromotionTest instead of an alias. In there I would implement SetUp() and TearDown() and those would hold everything from this current test other than the if (testIndex == *) parts. That lets you directly give a descriptive name to each test, even without parametrization (i.e. you can use TEST_F instead of TEST_P then unless you have further parametrizations to do.

csrc/preseg_passes/fmin_fmax_promotion.cpp

jacobhinkle · 2025-10-08T23:56:21Z

csrc/preseg_passes/fmin_fmax_promotion.cpp

+// Once we identify a target reduction, we perform a downward pass starting from
+// the target's direct input. The pass propagates IterDomainStatus information.
+// At the end, we check all output TV's for bad statuses. If at any point we
+// encounter a node we don't know how to propagate information through, we treat
+// it like to a graph output and fail if it has any incoming bad statuses.


In this comment, it would be very instructive to add a couple complete examples where you show a fusion and trace down through the fusion showing how the ID statuses propagate from a given max/min reduction.

jacobhinkle · 2025-10-09T00:01:59Z

tests/cpp/test_math_opt.cpp

+  bool expectFMax = false;
+
+  if (testIndex == 1) {
+    TensorView* tv3 = add(max(tv0, {0, 1}), tv0);


It is often clearer to put one operation per line. This lets you mark the axes for each tensor in the fusion. For example, in this case you would have something like

TensorView* tv3 = max(tv0, {0, 1}); // [ rS5{i0}, rS6{i1} ] // Note: the implicit broadcast tv4 here is not shown in your current code TensorView* tv4 = broadcast(tv3, {true, true}); // [ bS7{1}, bS8{1} ] TensorView* tv5 = add(tv4, tv0); // [ iS9{i0}, iS10{i1} ] TensorView* tv6 = sum(tv5, {0, 1}); // [ rS11{i0}, rS12{i1} ] // NOTE: tv7 below is not shown currently either TensorView* tv7 = broadcast(tv6, {true, true}); // [ bS13{i0}, bS14{i1} ] TensorView* tv8 = add(tv5, tv7); // [ iS15{i0}, iS16{i1} ] fusion->addOutput(tv8);

Also a short comment can help indicate what it is we're testing in each case.

tests/cpp/test_math_opt.cpp

tbqh · 2025-10-19T21:13:49Z

Pushed a new algorithm which should handle a lot of the issues with reductions / broadcasting not being supported. The new algorithm focuses on a single source IterDomain at a time, and then propagates information along TensorViews. This solves the issues that arise when tracking IterDomain's through reductions and broadcasts.

The current code is messy and needs to be cleaned up.

There is one unsolved issue which is handling sibling rewrites. So if we promote one fmax somewhere in the fusion, right now it can break other fmax's. I thought this could easily be solved by doing rewrites in reverse-topological order, however this does not solve the case for sibling expressions. This is tested by the test case #8 right now, which is the only failing test case.

csrc/preseg_passes/fmin_fmax_promotion.cpp

jacobhinkle · 2025-10-20T18:53:05Z

csrc/preseg_passes/fmin_fmax_promotion.cpp

+      if (valMap[expr->input(0)->as<TensorView>()] == ValStatus::DEFAULT ||
+          valMap[expr->input(0)->as<TensorView>()] == ValStatus::BAD_DEFAULT) {


Nit:

Suggested change

if (valMap[expr->input(0)->as<TensorView>()] == ValStatus::DEFAULT ||

valMap[expr->input(0)->as<TensorView>()] == ValStatus::BAD_DEFAULT) {

auto *it = valMap.find(expr->input(0)->as<TensorView>());

if (it == valMap.end() || it->second == ValStatus::DEFAULT ||

it->second == ValStatus::BAD_DEFAULT) {

If we expect input to always be found in valMap, then I'd do this instead:

ValStatus in_status = valMap.at(expr->input(0)->as<TensorView>()); if (in_status == ValStatus::DEFAULT || in_status == ValStatus::BAD_DEFAULT) {

We don't expect there to always be a value in the map, we utilize the default value being set to "None".

This seems to be a very common comment on this PR, I guess we usually do not use the default value with unordered_map. Let me know if you want me to explicitly check whether a mapping exists (e.g. with .contains()). It's a lot more verbose to do so though.

csrc/preseg_passes/fmin_fmax_promotion.cpp

tests/cpp/test_math_opt.cpp

- Function names start with lowercase letters - Use snake_case instead of camelCase - Add anonymous namespace to file-scoped things

csrc/preseg_passes/fmin_fmax_promotion.cpp

Add fmin_fmax_promotion preseg pass

4e9b960

tbqh mentioned this pull request Oct 7, 2025

Add unsafe_reduce presegmentation pass #5121

Closed

tbqh requested review from jacobhinkle and naoyam October 7, 2025 18:14

tbqh commented Oct 7, 2025

View reviewed changes

csrc/preseg_passes/fmin_fmax_promotion.cpp Outdated Show resolved Hide resolved

naoyam reviewed Oct 7, 2025

View reviewed changes

csrc/preseg_passes/fmin_fmax_promotion.cpp Outdated Show resolved Hide resolved

naoyam reviewed Oct 7, 2025

View reviewed changes

csrc/preseg_passes/fmin_fmax_promotion.cpp Outdated Show resolved Hide resolved

naoyam reviewed Oct 7, 2025

View reviewed changes

csrc/preseg_passes/fmin_fmax_promotion.cpp Outdated Show resolved Hide resolved

naoyam reviewed Oct 7, 2025

View reviewed changes

csrc/preseg_passes/fmin_fmax_promotion.cpp Show resolved Hide resolved

naoyam reviewed Oct 7, 2025

View reviewed changes

csrc/preseg_passes/fmin_fmax_promotion.cpp Outdated Show resolved Hide resolved

naoyam reviewed Oct 7, 2025

View reviewed changes

csrc/preseg_passes/fmin_fmax_promotion.cpp Show resolved Hide resolved

naoyam reviewed Oct 7, 2025

View reviewed changes

csrc/preseg_passes/fmin_fmax_promotion.cpp Show resolved Hide resolved

naoyam reviewed Oct 7, 2025

View reviewed changes

csrc/preseg_passes/fmin_fmax_promotion.cpp Outdated Show resolved Hide resolved

naoyam reviewed Oct 7, 2025

View reviewed changes

csrc/preseg_passes/fmin_fmax_promotion.cpp Outdated Show resolved Hide resolved

naoyam reviewed Oct 7, 2025

View reviewed changes

csrc/preseg_passes/fmin_fmax_promotion.cpp Outdated Show resolved Hide resolved

jacobhinkle reviewed Oct 9, 2025

View reviewed changes

liqiangxl mentioned this pull request Oct 10, 2025

inner persistent scheduler uses cluster reduction #5268

Open

Rewrite fmin_fmax_promotion preseg pass

730f011

jacobhinkle reviewed Oct 20, 2025

View reviewed changes

tbqh added 3 commits October 26, 2025 21:54

Fix dependent fmin/fmax promotion

fb7409d

Address style nits

a42414a

- Function names start with lowercase letters - Use snake_case instead of camelCase - Add anonymous namespace to file-scoped things

Use IdModel for all IterDomain mapping needs

7b09901

jacobhinkle reviewed Oct 27, 2025

View reviewed changes

csrc/preseg_passes/fmin_fmax_promotion.cpp Outdated Show resolved Hide resolved

tbqh added 3 commits October 27, 2025 08:41

Rename states and cleanup logic

16c4e10

Rename and add code comments

bac3247

Only build IdModel graph once per-pass

df9a676

		IterDomainStatus status = iterMap[in_id];
		auto out_id = p2c[in_id];


		namespace nvfuser::preseg_passes {

		// IterDomainStatus are attached to IterDomains and propagated with a

		if (valMap[expr->input(0)->as<TensorView>()] == ValStatus::DEFAULT \|\|
		valMap[expr->input(0)->as<TensorView>()] == ValStatus::BAD_DEFAULT) {

-      if (valMap[expr->input(0)->as<TensorView>()] == ValStatus::DEFAULT ||
-          valMap[expr->input(0)->as<TensorView>()] == ValStatus::BAD_DEFAULT) {
+      auto *it = valMap.find(expr->input(0)->as<TensorView>());
+      if (it == valMap.end() || it->second == ValStatus::DEFAULT ||
+          it->second == ValStatus::BAD_DEFAULT) {

Uh oh!

Add fmin_fmax_promotion presegmentation pass #5337

Are you sure you want to change the base?

Add fmin_fmax_promotion presegmentation pass #5337

Uh oh!

Conversation

tbqh commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes walkthrough 📝

PR Reviewer Guide 🔍

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jacobhinkle left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tbqh commented Oct 19, 2025

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tbqh commented Oct 7, 2025 •

edited

Loading

github-actions bot commented Oct 7, 2025 •

edited

Loading