-
Notifications
You must be signed in to change notification settings - Fork 1
feat(core): Multiple endpoint types on benchmark and endpoint end #277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Tomasz Grzegorzek <[email protected]>
/ok to test cc7936d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments :)
benchmark_type_combination = [benchmark_type_combination] | ||
|
||
if model_types.issuperset(set(benchmark_type_combination)): | ||
is_target_compatible = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only keep information about last successful check in this variable - is this not a potential issue? To be completely honest with you, I have a hard time understanding this part, could you elaborate on that? :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmark requirements are strict. From a benchmark perspective, you define a list of sets that are accepted. For example, let's image that your benchmark is a VLM benchmark but accepts both base and chat models. Then, on benchmark side, you would define evaluation.config.supported_endpoint_types
as a list of :
config:
supported_endpoint_types:
- [vlm, chat]
- [vlm, completions]
This means, that your model must have all capabilities in at least one of these sets (those sets are implemented as lists but that does not matter here IMO)
So if we define on model endpoint capabilities it must be a superset of at least one element from supported_endpoint_types list.
Let's have a look at examples.
- Model
[chat]
- netiher a superset of[vlm, chat]
nor[vlm, completions]
- Model
[chat, vlm]
- superset of[vlm, chat]
but not[vlm, completions]
- Model
[chat, vlm, completions]
- superset of both[vlm, chat]
and[vlm, completions]
- Model
[chat, vlm, completions, audio]
- superset of both[vlm, chat]
and[vlm, completions]
It does not matter which one is accepted - it only matters that at least one is compatible. So we keep the last one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I do see a crash where we have multiple combinations compatible - which one is passed down to the evaluation harness?
if model_types.issuperset(set(benchmark_type_combination)): | ||
is_target_compatible = True | ||
|
||
if evaluation.target.api_endpoint.type is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I got this right, this will never be true? Since it's going to be wrapped in a list, if it's None, in line 359
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right, thanks!
…combination Signed-off-by: Tomasz Grzegorzek <[email protected]>
This MR introduces multiple type definition on both benchmark and endpoint ends. With these changes, we can define a list of required types on benchmark end. For example:
[chat, vlm]
. On the endpoint side, we define capabilites which must be a superset of required typed. More information is provided in the documentation