Add explicit tests for random ops and tensor creation ops in backend bench

### Some motivation
Right now we are using opinfo as our ground truth for testing. However, it has some pretty bogus inputs and outputs (especially with our testing harness in `allclose`. Effectively, for random or fill ops it outputs empty tensors or watermarked inputs. Some examples of this are below.

randint.default
```
[2025-08-22 15:05:16][INFO][eval.py] Looking at randint.default with
[2025-08-22 15:05:16][INFO][eval.py] args - (10, torch.Size([0, 5, 0]))
[2025-08-22 15:05:16][INFO][eval.py] kwargs - {'device': 'cuda'}
[2025-08-22 15:05:16][INFO][eval.py] reference (which is aten) output is: tensor([], device='cuda:0', size=(0, 5, 0), dtype=torch.int64)
[2025-08-22 15:05:16][INFO][eval.py] aten output is: tensor([], device='cuda:0', size=(0, 5, 0), dtype=torch.int64)
``` 
Bernoulli.default
```
[2025-08-22 15:05:16][INFO][eval.py] Looking at bernoulli.default with
[2025-08-22 15:05:16][INFO][eval.py] args - (tensor([], device='cuda:0', size=(0, 3), dtype=torch.bfloat16),)
[2025-08-22 15:05:16][INFO][eval.py] kwargs - {}
[2025-08-22 15:05:16][INFO][eval.py] reference (which is aten) output is: tensor([], device='cuda:0', size=(0, 3), dtype=torch.bfloat16)
[2025-08-22 15:05:16][INFO][eval.py] aten output is: tensor([], device='cuda:0', size=(0, 3), dtype=torch.bfloat16)
```

empty_like.default
```
[2025-08-22 15:05:16][INFO][eval.py] Looking at empty_like.default with
[2025-08-22 15:05:16][INFO][eval.py] args - (tensor(-6.7188, device='cuda:0', dtype=torch.bfloat16),)
[2025-08-22 15:05:16][INFO][eval.py] kwargs - {}
[2025-08-22 15:05:16][INFO][eval.py] reference (which is aten) output is: -6.71875
[2025-08-22 15:05:16][INFO][eval.py] aten output is: -6.71875

[2025-08-22 15:05:16][INFO][eval.py] Looking at empty_like.default with
[2025-08-22 15:05:16][INFO][eval.py] args - (tensor([], device='cuda:0', size=(0, 5, 0), dtype=torch.bfloat16),)
[2025-08-22 15:05:16][INFO][eval.py] kwargs - {}
[2025-08-22 15:05:16][INFO][eval.py] reference (which is aten) output is: tensor([], device='cuda:0', size=(0, 5, 0), dtype=torch.bfloat16)
[2025-08-22 15:05:16][INFO][eval.py] aten output is: tensor([], device='cuda:0', size=(0, 5, 0), dtype=torch.bfloat16)
```

new_empty_strided.default
```
[2025-08-22 15:05:16][INFO][eval.py] Looking at new_empty_strided.default with
[2025-08-22 15:05:16][INFO][eval.py] args - (tensor(-6.7188, device='cuda:0', dtype=torch.bfloat16), (), ())
[2025-08-22 15:05:16][INFO][eval.py] kwargs - {}
[2025-08-22 15:05:16][INFO][eval.py] Error in allclose
[2025-08-22 15:05:16][INFO][eval.py] 
Exception raised for None:
    args: ((T([], bf16), T([], bf16),), {})
    exc: Scalars are not close!

Expected 0.0 but got -6.71875.
Absolute difference: 6.71875 (up to 0.01 allowed)
Relative difference: inf (up to 0.01 allowed)

[2025-08-22 15:05:16][INFO][eval.py] reference (which is aten) output is: -6.71875
[2025-08-22 15:05:16][INFO][eval.py] aten output is: 0.0
[2025-08-22 15:05:16][INFO][eval.py] for new_empty_strided.default is_correct=False abs_error=6.71875 rel_error=1.0
```

### What to do about it
For pytorch the testing of distributions and random ops can be found at 
[test_distributions.py](https://github.com/pytorch/pytorch/blob/4c36c8a99463c898190a462300ba7f05b5b3384e/test/distributions/test_distributions.py) and [test_random](https://github.com/pytorch/pytorch/blob/c8bb0e4720ddddf3cd1b0b48b336978f763c71ca/test/torch_np/test_random.py)

For fill / tensor creation ops [test_tensor_creation_ops.py](https://github.com/pytorch/pytorch/blob/main/test/test_tensor_creation_ops.py) is where we find those tests

We need to add this testing to backendbench

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add explicit tests for random ops and tensor creation ops in backend bench #112

Some motivation

What to do about it

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add explicit tests for random ops and tensor creation ops in backend bench #112

Description

Some motivation

What to do about it

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions