fix: Notify agent of failed agent transfer in function response #1206
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What happens now
If an error is raised when calling a sub agent - for example, because one of the toolsets raises an error on get_tools (expected on an MCP toolset connection error, for example) then the error is silently caught, and the transfer response is returned as is. The means two things:
_find_agent_to_run
What should happen
given resolving the failure in general is not possible, I think the only stable solution is to accept that the transfer failed, and to ensure the
transfer_to_agent
response does not imply otherwise.Alternatives
Remove the transfer call
The call and response could be stripped. This seems likely to result in the agent trying again though, without a fix being applied.
Try again
There are no general solution to a failed run_async - for example, a misconfigured MCPToolset, or an MCP server outage will result in persistent failures until that is resolved. Given this can easily take longer than the request timeout it. seems better to accept that failure may occur sometimes.
Proposed solution (in this PR)
If the
run_async
orrun_live
methods emit at least one event, consider the subagent to be working and transfer successful. If not, assume a failure and re-write the function response.This involves just a little juggling of the yield to delay when it was yielded before until we are sure, which will put a small amount of latency on the result, but before an event is yielded it is not known if the agent was successful.
Happy to implement an alternative solution if this is not the way the project wants to take this.