Skip to content

Commit ec5f63d

Browse files
committed
update readme for alternate tool call eval
1 parent 796d32b commit ec5f63d

File tree

1 file changed

+29
-7
lines changed

1 file changed

+29
-7
lines changed

README.md

Lines changed: 29 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -291,7 +291,7 @@ embedding:
291291
| `attachments` | list[string] | ❌ | Attachments | ❌ |
292292
| `expected_response` | string | 📋 | Expected response for comparison | ❌ |
293293
| `expected_intent` | string | 📋 | Expected intent for intent evaluation| ❌ |
294-
| `expected_tool_calls` | list[list[dict]] | 📋 | Expected tool call sequences | ❌ |
294+
| `expected_tool_calls` | list[list[list[dict]]] | 📋 | Expected tool call sequences (multiple alternative sets) | ❌ |
295295
| `tool_calls` | list[list[dict]] | ❌ | Actual tool calls from API | ✅ (if API enabled) |
296296
| `verify_script` | string | 📋 | Path to verification script | ❌ |
297297
| `turn_metrics` | list[string] | ❌ | Turn-specific metrics to evaluate | ❌ |
@@ -302,7 +302,7 @@ embedding:
302302
Examples
303303
> - `expected_response`: Required for `custom:answer_correctness`
304304
> - `expected_intent`: Required for `custom:intent_eval`
305-
> - `expected_tool_calls`: Required for `custom:tool_eval`
305+
> - `expected_tool_calls`: Required for `custom:tool_eval` (multiple alternative sets format)
306306
> - `verify_script`: Required for `script:action_eval` (used when API is enabled)
307307
> - `response`: Required for most metrics (auto-populated if API enabled)
308308

@@ -314,15 +314,37 @@ Examples
314314
| `[]` (empty list) | Skip evaluation for this turn |
315315
| `["metric1", ...]` | Use specified metrics only |
316316

317+
#### Tool Evaluation
318+
319+
The `custom:tool_eval` metric supports flexible matching with multiple alternative patterns:
320+
321+
- **Format**: `[[[tool_calls, ...]], [[tool_calls]], ...]` (list of list of list)
322+
- **Matching**: Tries each alternative until one matches
323+
- **Use Cases**: Optional tools, multiple approaches, default arguments, skip scenarios
324+
- **Empty Sets**: `[]` represents "no tools" and must come after primary alternatives
325+
317326
#### Tool Call Structure
318327

319328
```yaml
329+
# Multiple alternative sets format: [[[tool_calls, ...]], [[tool_calls]], ...]
320330
expected_tool_calls:
321-
-
322-
- tool_name: oc_get # Tool name
323-
arguments: # Tool arguments
324-
kind: pod
325-
name: openshift-light* # Regex patterns supported for flexible matching
331+
- # Alternative 1: Primary approach
332+
- # Sequence 1
333+
- tool_name: oc_get
334+
arguments:
335+
kind: pod
336+
name: openshift-light* # Regex patterns supported
337+
- # Sequence 2 (if multiple parallel tool calls needed)
338+
- tool_name: oc_describe
339+
arguments:
340+
kind: pod
341+
- # Alternative 2: Different approach
342+
- # Sequence 1
343+
- tool_name: kubectl_get
344+
arguments:
345+
resource: pods
346+
- # Alternative 3: Skip scenario (optional)
347+
[] # When model has information from previous conversation
326348
```
327349

328350
#### Script-Based Evaluations

0 commit comments

Comments
 (0)