genkit eval changes #1611

nullfunc · 2025-11-05T22:41:04Z

Description

Using genkit evaluation. We add a testing framework to check whether changes to LLMs or tool name/descriptions will yield the same tool calls when given the same input. The current_evaluation.json file contains the current test run and pass rates. The Makefile (make genkit-help) has been updated with information on how the workflow is expected to run as well as a README.md. Any changes to LLM or tools/descriptions should have the same or better pass rate.

Linked Issues

fixes #1486

Checklist

I have performed a self-review of my code
I have added appropriate tests
I have updated the Defang CLI docs and/or README to reflect my changes, if necessary

allow googlegenai provider if GOOGLE_API_KEY is found in env

jordanstephens added 30 commits November 4, 2025 10:43

rough agent loop

b170be8

setup tool translation from genkit to mcp-go

40b5d40

clean up login tool placeholder

070eacf

skip in-process mcp tests for now

16a641d

move common tool interfaces to agent package

c976901

move mcp tool handlers to agent tools

2d18243

expose login tool to cli agent

4363cf3

expose services tool to cli agent

e2e8e09

expose deploy tool to cli agent

0966938

expose destroy tool to cli agent

0081fa0

expose logs tool to cli agent

1e8527c

expose estimate tool to cli agent

609290b

expose set_config tool to cli agent

d56420c

expose remove_config tool to cli agent

cc4dbed

expose set provider tools to cli agent

ff9198c

factor out input reader to handle sigterm

138f40c

avoid reprinting user messages

9cba73e

refactor estimate tool params to use simple strings

a6f08e7

configure oai_compat gateway plugin

f455a0b

require auth before using agent

c0ba91c

update vendor hash

fced94d

refactor tool call handling

d90b44b

pass access token as openai api key

a4da820

refactor ComposeUp params

b9a8a06

configure fabric genkit provider

d9ae7bb

allow googlegenai provider if GOOGLE_API_KEY is found in env

fix term

9488f1b

tweak system prompt

ef4f98b

pritn with term, use /exit, add welcome screen

3f558e8

style prompt

e933218

optional project_name, default working_directory

5cc9721

jordanstephens and others added 11 commits November 4, 2025 11:47

factor out timeutils

2b7471b

add current working directory to system instructions

79a7c53

add LoaderParams descriptions

ee7c3a4

add LogsParams descriptions

3c655f2

avoid returning an error when projects are not deployed

ab76fa4

remove cwd from output

0b13734

clean up "provider not configured" errors

0e79cc0

handle tool calls ourselves so we can log them

7e05459

cleanup

9b27951

update nix vendor hash

3cd0854

genkit eval changes

a143a8c

nullfunc requested a review from jordanstephens November 6, 2025 17:15

update vendor hash

eb66fe6

jordanstephens force-pushed the jordan/cli-agent branch from 3cd0854 to e88e117 Compare November 7, 2025 01:07

update to remove go.mod, use same one as in src/

30e9454

jordanstephens force-pushed the jordan/cli-agent branch from e88e117 to 0a08d68 Compare November 7, 2025 22:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

genkit eval changes #1611

genkit eval changes #1611

Uh oh!

nullfunc commented Nov 5, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

genkit eval changes #1611

Are you sure you want to change the base?

genkit eval changes #1611

Uh oh!

Conversation

nullfunc commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Linked Issues

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nullfunc commented Nov 5, 2025 •

edited

Loading