Open
Description
Is your feature request related to a problem? Please describe.
Being able to run agent evaluation is super useful. Often however the test cases require multimodal input and more specifically images that are passed as part of the input.
Describe the solution you'd like
Expected behaviour: to be able to pass in more complex input queries in the evalset .json that will include both text part and image/other parts.