Add support for running microbenchmarks using a manifest json #4912

caaavik-msft · 2025-08-20T21:03:34Z

This PR is part of a larger piece of work around being able to run targeted performance tests such as comparing multiple runtime versions side-by-side. Currently the way we define what jobs/tests we run in performance tests are by using command line arguments. However, the arguments can be a bit difficult to construct sometimes, and for some configurations it does not support the ability to run some combinations of jobs at the same time, only being possible to run in separate invocations.

With this PR, I have made it so that you can pass in a manifest.json argument which defines:

The list of test cases you want to run
The base job settings
The list of jobs to run
Run setting overrides per test-case

As an example, the following json can be used to validate dotnet/perf-autofiling-issues#60871:

{
    "benchmarkCases": [
        "System.Net.Primitives.Tests.CredentialCacheTests.ForEach(uriCount: 10, hostPortCount: 10)",
        "System.Net.Primitives.Tests.CredentialCacheTests.GetCredential_HostPort(host: \"notfound\", hostPortCount: 10)",
        "System.Net.Primitives.Tests.CredentialCacheTests.GetCredential_HostPort(host: \"name5\", hostPortCount: 10)",
        "System.Collections.TryGetValueFalse<Int32, Int32>.SortedDictionary(Size: 512)"
    ],
    "benchmarkCaseRunOverrides": {
        "System.Net.Primitives.Tests.CredentialCacheTests.ForEach(uriCount: 10, hostPortCount: 10)": {
            "operationCount": 5471872
        },
        "System.Net.Primitives.Tests.CredentialCacheTests.GetCredential_HostPort(host: \"notfound\", hostPortCount: 10)": {
            "operationCount": 18616768
        },
        "System.Net.Primitives.Tests.CredentialCacheTests.GetCredential_HostPort(host: \"name5\", hostPortCount: 10)": {
            "operationCount": 19320704
        },
        "System.Collections.TryGetValueFalse<Int32, Int32>.SortedDictionary(Size: 512)": {
            "operationCount": 40512
        }
    },
    "baseJob": {
        "run": {
            "warmupCount": 10,
            "launchCount": 5,
            "iterationCount": 15
        }
    },
    "jobs": {
        "Baseline-108fa785": {
            "infrastructure": {
                "toolchain": {
                    "type": "CoreRun",
                    "tfm": "net10.0",
                    "coreRunPath": "C:\\path\to\\baseline\\corerun.exe"
                }
            }
        },
        "Compare-edb570cd": {
            "infrastructure": {
                "toolchain": {
                    "type": "CoreRun",
                    "tfm": "net10.0",
                    "coreRunPath": "C:\\path\to\\compare\\corerun.exe"
                }
            }
        }
    }
}

With this, the operation counts for each test case are fixed for a much fairer comparison. This is not possible currently in BDN as there is no way to do per-test-case overrides.

The schema for defining jobs in the manifest matches identically to the schema used for defining jobs inside BDN itself. This means it is possible to construct any possible set of jobs with full flexibility.

I also changed our argument parsing logic to use the same library that BenchmarkDotNet itself uses, that way when you run --help on our executable, you can see both our custom args and BDN's custom args in the same dialog.

I have a separate PR #4911 which works well alongside this for creating multiple core_root payloads from the build artifacts, so that one can build a script to automatically validate performance changes detected by the auto-filer.

This PR needs a bit more testing on some other configurations as I have just been focusing on the corerun scenario, but creating this now for early feedback.

LoopedBard3

This looks good to me. I think if the other configurations are straight forward to add and doesn't bloat this PR, then they can either be included in this PR or another, otherwise I think it would be great to get this in so we can start using the working cases if we have immediate use cases in mind.

caaavik-msft added 9 commits August 19, 2025 17:16

Add test manifest support for running microbenchmarks

938bb8a

Add missing argument to script

40df0c5

Fix bugs

77b7fd4

Fix command line parsing

abc025e

Abstract out payload building

a5857af

Minor fixes

7b0ef69

Abstracting out more parts

3a425ea

Some more cleanup/fixes

16e88be

Remove python/yml changes (now in a separate PR)

8d56056

caaavik-msft requested review from DrewScoggins and LoopedBard3 August 20, 2025 21:03

LoopedBard3 approved these changes Aug 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for running microbenchmarks using a manifest json #4912

Add support for running microbenchmarks using a manifest json #4912

Uh oh!

caaavik-msft commented Aug 20, 2025 •

edited

Loading

Uh oh!

LoopedBard3 left a comment

Uh oh!

Uh oh!

Add support for running microbenchmarks using a manifest json #4912

Are you sure you want to change the base?

Add support for running microbenchmarks using a manifest json #4912

Uh oh!

Conversation

caaavik-msft commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LoopedBard3 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

caaavik-msft commented Aug 20, 2025 •

edited

Loading