Skip to content

Add explicit tests for random ops and tensor creation ops in backend bench #112

@PaliC

Description

@PaliC

Some motivation

Right now we are using opinfo as our ground truth for testing. However, it has some pretty bogus inputs and outputs (especially with our testing harness in allclose. Effectively, for random or fill ops it outputs empty tensors or watermarked inputs. Some examples of this are below.

randint.default

[2025-08-22 15:05:16][INFO][eval.py] Looking at randint.default with
[2025-08-22 15:05:16][INFO][eval.py] args - (10, torch.Size([0, 5, 0]))
[2025-08-22 15:05:16][INFO][eval.py] kwargs - {'device': 'cuda'}
[2025-08-22 15:05:16][INFO][eval.py] reference (which is aten) output is: tensor([], device='cuda:0', size=(0, 5, 0), dtype=torch.int64)
[2025-08-22 15:05:16][INFO][eval.py] aten output is: tensor([], device='cuda:0', size=(0, 5, 0), dtype=torch.int64)

Bernoulli.default

[2025-08-22 15:05:16][INFO][eval.py] Looking at bernoulli.default with
[2025-08-22 15:05:16][INFO][eval.py] args - (tensor([], device='cuda:0', size=(0, 3), dtype=torch.bfloat16),)
[2025-08-22 15:05:16][INFO][eval.py] kwargs - {}
[2025-08-22 15:05:16][INFO][eval.py] reference (which is aten) output is: tensor([], device='cuda:0', size=(0, 3), dtype=torch.bfloat16)
[2025-08-22 15:05:16][INFO][eval.py] aten output is: tensor([], device='cuda:0', size=(0, 3), dtype=torch.bfloat16)

empty_like.default

[2025-08-22 15:05:16][INFO][eval.py] Looking at empty_like.default with
[2025-08-22 15:05:16][INFO][eval.py] args - (tensor(-6.7188, device='cuda:0', dtype=torch.bfloat16),)
[2025-08-22 15:05:16][INFO][eval.py] kwargs - {}
[2025-08-22 15:05:16][INFO][eval.py] reference (which is aten) output is: -6.71875
[2025-08-22 15:05:16][INFO][eval.py] aten output is: -6.71875

[2025-08-22 15:05:16][INFO][eval.py] Looking at empty_like.default with
[2025-08-22 15:05:16][INFO][eval.py] args - (tensor([], device='cuda:0', size=(0, 5, 0), dtype=torch.bfloat16),)
[2025-08-22 15:05:16][INFO][eval.py] kwargs - {}
[2025-08-22 15:05:16][INFO][eval.py] reference (which is aten) output is: tensor([], device='cuda:0', size=(0, 5, 0), dtype=torch.bfloat16)
[2025-08-22 15:05:16][INFO][eval.py] aten output is: tensor([], device='cuda:0', size=(0, 5, 0), dtype=torch.bfloat16)

new_empty_strided.default

[2025-08-22 15:05:16][INFO][eval.py] Looking at new_empty_strided.default with
[2025-08-22 15:05:16][INFO][eval.py] args - (tensor(-6.7188, device='cuda:0', dtype=torch.bfloat16), (), ())
[2025-08-22 15:05:16][INFO][eval.py] kwargs - {}
[2025-08-22 15:05:16][INFO][eval.py] Error in allclose
[2025-08-22 15:05:16][INFO][eval.py] 
Exception raised for None:
    args: ((T([], bf16), T([], bf16),), {})
    exc: Scalars are not close!

Expected 0.0 but got -6.71875.
Absolute difference: 6.71875 (up to 0.01 allowed)
Relative difference: inf (up to 0.01 allowed)

[2025-08-22 15:05:16][INFO][eval.py] reference (which is aten) output is: -6.71875
[2025-08-22 15:05:16][INFO][eval.py] aten output is: 0.0
[2025-08-22 15:05:16][INFO][eval.py] for new_empty_strided.default is_correct=False abs_error=6.71875 rel_error=1.0

What to do about it

For pytorch the testing of distributions and random ops can be found at
test_distributions.py and test_random

For fill / tensor creation ops test_tensor_creation_ops.py is where we find those tests

We need to add this testing to backendbench

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions