feat(core): Multiple endpoint types on benchmark and endpoint end #277

prokotg · 2025-10-06T15:43:27Z

This MR introduces multiple type definition on both benchmark and endpoint ends. With these changes, we can define a list of required types on benchmark end. For example: [chat, vlm]. On the endpoint side, we define capabilites which must be a superset of required typed. More information is provided in the documentation

Signed-off-by: Tomasz Grzegorzek <[email protected]>

copy-pr-bot · 2025-10-06T15:43:31Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

prokotg · 2025-10-06T15:43:47Z

/ok to test cc7936d

wprazuch

Added some comments :)

packages/nemo-evaluator/src/nemo_evaluator/core/input.py

wprazuch · 2025-10-08T07:32:10Z

packages/nemo-evaluator/src/nemo_evaluator/core/input.py

+                benchmark_type_combination = [benchmark_type_combination]
+
+            if model_types.issuperset(set(benchmark_type_combination)):
+                is_target_compatible = True


We only keep information about last successful check in this variable - is this not a potential issue? To be completely honest with you, I have a hard time understanding this part, could you elaborate on that? :D

Benchmark requirements are strict. From a benchmark perspective, you define a list of sets that are accepted. For example, let's image that your benchmark is a VLM benchmark but accepts both base and chat models. Then, on benchmark side, you would define evaluation.config.supported_endpoint_types as a list of :

config: supported_endpoint_types: - [vlm, chat] - [vlm, completions]

This means, that your model must have all capabilities in at least one of these sets (those sets are implemented as lists but that does not matter here IMO)

So if we define on model endpoint capabilities it must be a superset of at least one element from supported_endpoint_types list.

Let's have a look at examples.

Model [chat] - netiher a superset of [vlm, chat] nor [vlm, completions]

Model [chat, vlm] - superset of [vlm, chat] but not [vlm, completions]

Model [chat, vlm, completions] - superset of both [vlm, chat] and [vlm, completions]

Model [chat, vlm, completions, audio] - superset of both [vlm, chat] and [vlm, completions]

It does not matter which one is accepted - it only matters that at least one is compatible. So we keep the last one

But I do see a crash where we have multiple combinations compatible - which one is passed down to the evaluation harness?

wprazuch · 2025-10-08T07:33:15Z

packages/nemo-evaluator/src/nemo_evaluator/core/input.py

+            if model_types.issuperset(set(benchmark_type_combination)):
+                is_target_compatible = True
+
        if evaluation.target.api_endpoint.type is None:


If I got this right, this will never be true? Since it's going to be wrapped in a list, if it's None, in line 359

you're right, thanks!

…combination Signed-off-by: Tomasz Grzegorzek <[email protected]>

feat(core): Multiple endpoint types on benchmark and endpoint end

cc7936d

Signed-off-by: Tomasz Grzegorzek <[email protected]>

prokotg requested review from a team as code owners October 6, 2025 15:43

copy-pr-bot bot temporarily deployed to nemo-ci October 6, 2025 15:44 Inactive

copy-pr-bot bot temporarily deployed to test October 6, 2025 15:44 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci October 6, 2025 15:44 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci October 6, 2025 15:46 Inactive

wprazuch reviewed Oct 8, 2025

View reviewed changes

fix(core): raise on more than one compatible endpoint-benchmark type …

82c512c

…combination Signed-off-by: Tomasz Grzegorzek <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(core): Multiple endpoint types on benchmark and endpoint end #277

feat(core): Multiple endpoint types on benchmark and endpoint end #277

Uh oh!

prokotg commented Oct 6, 2025

Uh oh!

copy-pr-bot bot commented Oct 6, 2025

Uh oh!

prokotg commented Oct 6, 2025

Uh oh!

wprazuch left a comment

Uh oh!

Uh oh!

wprazuch Oct 8, 2025

Uh oh!

prokotg Oct 8, 2025 •

edited

Loading

Uh oh!

prokotg Oct 8, 2025

Uh oh!

wprazuch Oct 8, 2025 •

edited

Loading

Uh oh!

prokotg Oct 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(core): Multiple endpoint types on benchmark and endpoint end #277

Are you sure you want to change the base?

feat(core): Multiple endpoint types on benchmark and endpoint end #277

Uh oh!

Conversation

prokotg commented Oct 6, 2025

Uh oh!

copy-pr-bot bot commented Oct 6, 2025

Uh oh!

prokotg commented Oct 6, 2025

Uh oh!

wprazuch left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wprazuch Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

prokotg Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

prokotg Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

wprazuch Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

prokotg Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

prokotg Oct 8, 2025 •

edited

Loading

wprazuch Oct 8, 2025 •

edited

Loading