Skip to content

Need clearer support for Testing & Evaluation: simulate user-agent conversations through test cases #7

@PragramAllDay

Description

@PragramAllDay

I'm exploring the new Testing & Evaluation (T&E) feature in LiveKit Agent v1.2+ to simulate conversations for agent testing.

My goal is to allow users of my platform to create test cases (scripted messages) and simulate full user-agent conversations, verifying the LLM-generated responses at each step. This is meant for testing — to ensure agent logic and prompts behave as expected before going out to the real world.

Use Case:
1- Users define test cases with:
a) A name
b) A sequence of user messages
c) Optionally, expected agent responses
2- The platform should process these test cases and evaluate how the agent responds
3- Each test should run the agent logic using the LiveKit Agent engine and return generated responses for review

What I Need Help With:
1- How do I properly invoke the T&E mode for this kind of simulation?
2- Is there an exposed function/class I can call with test case data?
3- Is there a documented structure for test case input?
4- Can this be done in a headless/scriptable way (e.g. CLI or programmatic API)?
5- Is there any example available showing how to use test_and_evaluate in a typical agent integration project (e.g. agent-starter)?

Current Challenge:
The existing documentation doesn't clearly show how to leverage the T&E feature in a way that fits my use case. The implementation seems coupled tightly with the Python CLI and doesn't provide a reusable interface (or it's undocumented).

Why This Is Important:
This kind of simulation is essential for giving users confidence in their agent logic, enabling them to iterate quickly and deploy only when the agent performs as expected.
Would really appreciate a short guide, example, or clarification on how to best use the Testing & Evaluation feature in this context.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions