Add a repro #5124

wujingyue · 2025-09-05T23:25:51Z

@wujingyue so I don't forget

github-actions · 2025-09-05T23:27:05Z

Relevant files

Bug fix

repro.py `Add repro for DTensor-nvfuser stride issue` repro.py Added repro script for DTensor-nvfuser stride mismatch Created strided tensor and converted to replicated DTensor Implemented multidevice schedule with device mesh setup Execution fails with stride-contiguity mismatch error	+107/-0

Here are some key observations to aid the review process:

🧪 No relevant tests
⚡ Recommended focus areas for review Possible Issue The PR introduces a repro for a stride mismatch error in nvFuser when executing with DTensor inputs. The error occurs during execution, indicating a potential issue in how contiguity and stride information are handled in the fusion definition or scheduling logic. # Currently this fails. # RuntimeError: Stride mismatch with contiguity info. allocation domain: iS2{2}, iS0{5}, iS1{6}: sizes: [2, 5, 6]: strides: [1, 12, 2]; contiguity: f, f, t; dim: 2; expected stride: 1; actual stride: 2 actual = nvfd.execute_with_dtensors(fd, [in_dtensor]) Missing Test Validation The PR does not include any new tests or updates to existing tests to validate the behavior or fix for the reported stride mismatch issue. This reduces confidence in the correctness and robustness of the solution. torch.testing.assert_close(actual[0], expected) Contiguity Handling The computation of contiguity and stride order using `compute_contiguity` may not correctly reflect the actual memory layout of the DTensor's local tensor, especially when the global tensor has non-contiguous strides. This could lead to incorrect fusion scheduling and execution errors. contiguity, stride_order = compute_contiguity(in_dtensor.shape, in_dtensor.stride()) print(contiguity, "CONTIGUITY") print(stride_order, "STRIDE ORDER")

Add a repro

1d62476

wujingyue self-assigned this Sep 5, 2025

Provide feedback