-
Notifications
You must be signed in to change notification settings - Fork 7
Open
Description
Some motivation
Right now we are using opinfo as our ground truth for testing. However, it has some pretty bogus inputs and outputs (especially with our testing harness in allclose
. Effectively, for random or fill ops it outputs empty tensors or watermarked inputs. Some examples of this are below.
randint.default
[2025-08-22 15:05:16][INFO][eval.py] Looking at randint.default with
[2025-08-22 15:05:16][INFO][eval.py] args - (10, torch.Size([0, 5, 0]))
[2025-08-22 15:05:16][INFO][eval.py] kwargs - {'device': 'cuda'}
[2025-08-22 15:05:16][INFO][eval.py] reference (which is aten) output is: tensor([], device='cuda:0', size=(0, 5, 0), dtype=torch.int64)
[2025-08-22 15:05:16][INFO][eval.py] aten output is: tensor([], device='cuda:0', size=(0, 5, 0), dtype=torch.int64)
Bernoulli.default
[2025-08-22 15:05:16][INFO][eval.py] Looking at bernoulli.default with
[2025-08-22 15:05:16][INFO][eval.py] args - (tensor([], device='cuda:0', size=(0, 3), dtype=torch.bfloat16),)
[2025-08-22 15:05:16][INFO][eval.py] kwargs - {}
[2025-08-22 15:05:16][INFO][eval.py] reference (which is aten) output is: tensor([], device='cuda:0', size=(0, 3), dtype=torch.bfloat16)
[2025-08-22 15:05:16][INFO][eval.py] aten output is: tensor([], device='cuda:0', size=(0, 3), dtype=torch.bfloat16)
empty_like.default
[2025-08-22 15:05:16][INFO][eval.py] Looking at empty_like.default with
[2025-08-22 15:05:16][INFO][eval.py] args - (tensor(-6.7188, device='cuda:0', dtype=torch.bfloat16),)
[2025-08-22 15:05:16][INFO][eval.py] kwargs - {}
[2025-08-22 15:05:16][INFO][eval.py] reference (which is aten) output is: -6.71875
[2025-08-22 15:05:16][INFO][eval.py] aten output is: -6.71875
[2025-08-22 15:05:16][INFO][eval.py] Looking at empty_like.default with
[2025-08-22 15:05:16][INFO][eval.py] args - (tensor([], device='cuda:0', size=(0, 5, 0), dtype=torch.bfloat16),)
[2025-08-22 15:05:16][INFO][eval.py] kwargs - {}
[2025-08-22 15:05:16][INFO][eval.py] reference (which is aten) output is: tensor([], device='cuda:0', size=(0, 5, 0), dtype=torch.bfloat16)
[2025-08-22 15:05:16][INFO][eval.py] aten output is: tensor([], device='cuda:0', size=(0, 5, 0), dtype=torch.bfloat16)
new_empty_strided.default
[2025-08-22 15:05:16][INFO][eval.py] Looking at new_empty_strided.default with
[2025-08-22 15:05:16][INFO][eval.py] args - (tensor(-6.7188, device='cuda:0', dtype=torch.bfloat16), (), ())
[2025-08-22 15:05:16][INFO][eval.py] kwargs - {}
[2025-08-22 15:05:16][INFO][eval.py] Error in allclose
[2025-08-22 15:05:16][INFO][eval.py]
Exception raised for None:
args: ((T([], bf16), T([], bf16),), {})
exc: Scalars are not close!
Expected 0.0 but got -6.71875.
Absolute difference: 6.71875 (up to 0.01 allowed)
Relative difference: inf (up to 0.01 allowed)
[2025-08-22 15:05:16][INFO][eval.py] reference (which is aten) output is: -6.71875
[2025-08-22 15:05:16][INFO][eval.py] aten output is: 0.0
[2025-08-22 15:05:16][INFO][eval.py] for new_empty_strided.default is_correct=False abs_error=6.71875 rel_error=1.0
What to do about it
For pytorch the testing of distributions and random ops can be found at
test_distributions.py and test_random
For fill / tensor creation ops test_tensor_creation_ops.py is where we find those tests
We need to add this testing to backendbench
Metadata
Metadata
Assignees
Labels
No labels