Skip to content

Conversation

@nullfunc
Copy link
Contributor

@nullfunc nullfunc commented Nov 5, 2025

Description

Using genkit evaluation. We add a testing framework to check whether changes to LLMs or tool name/descriptions will yield the same tool calls when given the same input. The current_evaluation.json file contains the current test run and pass rates. The Makefile (make genkit-help) has been updated with information on how the workflow is expected to run as well as a README.md. Any changes to LLM or tools/descriptions should have the same or better pass rate.

Linked Issues

fixes #1486

Checklist

  • I have performed a self-review of my code
  • I have added appropriate tests
  • I have updated the Defang CLI docs and/or README to reflect my changes, if necessary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants