Skip to content

Conversation

caaavik-msft
Copy link
Contributor

@caaavik-msft caaavik-msft commented Aug 20, 2025

This PR is part of a larger piece of work around being able to run targeted performance tests such as comparing multiple runtime versions side-by-side. Currently the way we define what jobs/tests we run in performance tests are by using command line arguments. However, the arguments can be a bit difficult to construct sometimes, and for some configurations it does not support the ability to run some combinations of jobs at the same time, only being possible to run in separate invocations.

With this PR, I have made it so that you can pass in a manifest.json argument which defines:

  • The list of test cases you want to run
  • The base job settings
  • The list of jobs to run
  • Run setting overrides per test-case

As an example, the following json can be used to validate dotnet/perf-autofiling-issues#60871:

{
    "benchmarkCases": [
        "System.Net.Primitives.Tests.CredentialCacheTests.ForEach(uriCount: 10, hostPortCount: 10)",
        "System.Net.Primitives.Tests.CredentialCacheTests.GetCredential_HostPort(host: \"notfound\", hostPortCount: 10)",
        "System.Net.Primitives.Tests.CredentialCacheTests.GetCredential_HostPort(host: \"name5\", hostPortCount: 10)",
        "System.Collections.TryGetValueFalse<Int32, Int32>.SortedDictionary(Size: 512)"
    ],
    "benchmarkCaseRunOverrides": {
        "System.Net.Primitives.Tests.CredentialCacheTests.ForEach(uriCount: 10, hostPortCount: 10)": {
            "operationCount": 5471872
        },
        "System.Net.Primitives.Tests.CredentialCacheTests.GetCredential_HostPort(host: \"notfound\", hostPortCount: 10)": {
            "operationCount": 18616768
        },
        "System.Net.Primitives.Tests.CredentialCacheTests.GetCredential_HostPort(host: \"name5\", hostPortCount: 10)": {
            "operationCount": 19320704
        },
        "System.Collections.TryGetValueFalse<Int32, Int32>.SortedDictionary(Size: 512)": {
            "operationCount": 40512
        }
    },
    "baseJob": {
        "run": {
            "warmupCount": 10,
            "launchCount": 5,
            "iterationCount": 15
        }
    },
    "jobs": {
        "Baseline-108fa785": {
            "infrastructure": {
                "toolchain": {
                    "type": "CoreRun",
                    "tfm": "net10.0",
                    "coreRunPath": "C:\\path\to\\baseline\\corerun.exe"
                }
            }
        },
        "Compare-edb570cd": {
            "infrastructure": {
                "toolchain": {
                    "type": "CoreRun",
                    "tfm": "net10.0",
                    "coreRunPath": "C:\\path\to\\compare\\corerun.exe"
                }
            }
        }
    }
}

With this, the operation counts for each test case are fixed for a much fairer comparison. This is not possible currently in BDN as there is no way to do per-test-case overrides.

The schema for defining jobs in the manifest matches identically to the schema used for defining jobs inside BDN itself. This means it is possible to construct any possible set of jobs with full flexibility.

I also changed our argument parsing logic to use the same library that BenchmarkDotNet itself uses, that way when you run --help on our executable, you can see both our custom args and BDN's custom args in the same dialog.

I have a separate PR #4911 which works well alongside this for creating multiple core_root payloads from the build artifacts, so that one can build a script to automatically validate performance changes detected by the auto-filer.

This PR needs a bit more testing on some other configurations as I have just been focusing on the corerun scenario, but creating this now for early feedback.

Copy link
Member

@LoopedBard3 LoopedBard3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I think if the other configurations are straight forward to add and doesn't bloat this PR, then they can either be included in this PR or another, otherwise I think it would be great to get this in so we can start using the working cases if we have immediate use cases in mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants