-
Notifications
You must be signed in to change notification settings - Fork 63
Description
I'm exploring the new Testing & Evaluation (T&E) feature in LiveKit Agent v1.2+ to simulate conversations for agent testing.
My goal is to allow users of my platform to create test cases (scripted messages) and simulate full user-agent conversations, verifying the LLM-generated responses at each step. This is meant for testing — to ensure agent logic and prompts behave as expected before going out to the real world.
Use Case:
1- Users define test cases with:
a) A name
b) A sequence of user messages
c) Optionally, expected agent responses
2- The platform should process these test cases and evaluate how the agent responds
3- Each test should run the agent logic using the LiveKit Agent engine and return generated responses for review
What I Need Help With:
1- How do I properly invoke the T&E mode for this kind of simulation?
2- Is there an exposed function/class I can call with test case data?
3- Is there a documented structure for test case input?
4- Can this be done in a headless/scriptable way (e.g. CLI or programmatic API)?
5- Is there any example available showing how to use test_and_evaluate in a typical agent integration project (e.g. agent-starter)?
Current Challenge:
The existing documentation doesn't clearly show how to leverage the T&E feature in a way that fits my use case. The implementation seems coupled tightly with the Python CLI and doesn't provide a reusable interface (or it's undocumented).
Why This Is Important:
This kind of simulation is essential for giving users confidence in their agent logic, enabling them to iterate quickly and deploy only when the agent performs as expected.
Would really appreciate a short guide, example, or clarification on how to best use the Testing & Evaluation feature in this context.
Thanks!