ModelRetry bug with pydantic AI, AG-UI, and Open AI

### Initial Checks

- [x] I confirm that I'm using the latest version of Pydantic AI
- [x] I confirm that I searched for my issue in https://github.com/pydantic/pydantic-ai/issues before opening this issue

### Description

First, thank you for your hard work and making this code available!

The attached example runs the ag-ui protocol with open ai backend.
I'm seeing an issue where conversation history becomes corrupted when a tool raises `ModelRetry`. Because the messages are streamed to the front end, the initial tool_call message stays in the conversation history without a corresponding tool response, creating an invalid message sequence that OpenAI's API rejects on the next turn.

```bash
openai.BadRequestError: Error code: 400 - {'error': {'message': "An assistant message with 'tool_calls' must be followed by tool messages responding to each 'tool_call_id'. The following tool_call_ids did not have response messages: call_90tHcBQV7gr3uoMtBv8VKYdl", 'type': 'invalid_request_error', 'param': 'messages.[3].role', 'code': None}}

The above exception was the direct cause of the following exception:

pydantic_ai.exceptions.ModelHTTPError: status_code: 400, model_name: gpt-4o-mini, body: {'message': "An assistant message with 'tool_calls' must be followed by tool messages responding to each 'tool_call_id'. The following tool_call_ids did not have response messages: call_90tHcBQV7gr3uoMtBv8VKYdl", 'type': 'invalid_request_error', 'param': 'messages.[3].role', 'code': None}
```

This doesn't happen using `TestModel()`. Here `UnexpectedModelBehavior` is successfully raised and the frontend seems to manage the conversation correctly.


### Example Code

```python

# test_minimal.py

"""Minimal Playwright test demonstrating ModelRetry conversation history bug.

Expected: Both messages complete successfully
Actual: Second message fails with OpenAI 400 error about incomplete tool_call_ids

To run: python test_minimal.py
"""

import asyncio
import threading

import uvicorn
from playwright.async_api import async_playwright

import minimal_chat_app
from minimal_chat_app import agent
from pydantic_ai.models.test import TestModel


async def test_modelretry_bug():
    """Send two messages - second one fails due to corrupted conversation history."""
    
    # Uncomment to test with TestModel (works correctly):
    # with agent.override(model=TestModel()):
    
    # Start server
    config = uvicorn.Config(
        minimal_chat_app.app, host="127.0.0.1", port=8000, log_level="error"
    )
    server = uvicorn.Server(config)
    threading.Thread(target=lambda: asyncio.run(server.serve()), daemon=True).start()
    await asyncio.sleep(2)  # Wait for server to start
    
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        page = await browser.new_page()
        
        try:
            # Navigate to chat app
            await page.goto("http://127.0.0.1:8000/")
            await page.wait_for_selector("form")
            await asyncio.sleep(1)  # Wait for agent init
            
            # First message - triggers ModelRetry, completes successfully
            print("📤 Sending first message: 'password is 42'")
            await page.fill("#input", "password is 42")
            await page.press("#input", "Enter")
            
            # Wait for response (give it time to render)
            await page.wait_for_function(
                "document.querySelectorAll('.assistant').length > 0",
                timeout=10000
            )
            await asyncio.sleep(0.5)  # Let content render
            response1 = await page.locator(".assistant").last.inner_text()
            print(f"✅ First response: {response1[:100] if response1 else '(empty)'}...")
            
            # Second message - should work but fails with OpenAI due to corrupted history
            print("\n📤 Sending second message: 'Maybe 43?'")
            await page.fill("#input", "Maybe 43?")
            await page.press("#input", "Enter")
            
            # Wait for second response
            try:
                await page.wait_for_function(
                    "document.querySelectorAll('.assistant').length > 1",
                    timeout=10000
                )
                await asyncio.sleep(0.5)  # Let content render
                response2 = await page.locator(".assistant").last.inner_text()
                
                if "Error" in response2:
                    print(f"❌ Second message failed: {response2}")
                    print("\n🐛 BUG REPRODUCED: ModelRetry corrupted conversation history")
                    print("   OpenAI rejected the second message due to incomplete tool_call_ids")
                else:
                    print(f"✅ Second response: {response2[:100] if response2 else '(empty)'}...")
                    print("\n✅ Test passed! (Using TestModel or bug is fixed)")
            except Exception as e:
                print(f"❌ Second message timed out or failed: {e}")
                # Check if there's an error message displayed
                conv_text = await page.locator("#conv").inner_text()
                print(f"Conversation content:\n{conv_text}")
                print("\n🐛 BUG REPRODUCED: Second message failed to complete")
            
        finally:
            await browser.close()


if __name__ == "__main__":
    print("🧪 Testing ModelRetry conversation history bug\n")
    asyncio.run(test_modelretry_bug())
```

```python

# minimal_chat_app.py

from pathlib import Path

import uvicorn
import fastapi
from fastapi.requests import Request
from fastapi.responses import FileResponse, Response
from pydantic_ai import Agent
from pydantic_ai.exceptions import ModelRetry
from pydantic_ai.ag_ui import handle_ag_ui_request


THIS_DIR = Path(__file__).parent
app = fastapi.FastAPI()

# Switch between TestModel (works) and gpt-4o-mini (fails on second message)
# from pydantic_ai.models.test import TestModel
# model = TestModel()
model = 'gpt-4o-mini'

agent = Agent(model, instructions='Be helpful!')


@agent.tool_plain()
def secret(password: int) -> int:
    """Tool that always raises ModelRetry."""
    raise ModelRetry("Try again")


@app.get('/')
async def index() -> FileResponse:
    return FileResponse(THIS_DIR / 'index.html', media_type='text/html')


@app.get('/index.ts')
async def index_ts() -> FileResponse:
    return FileResponse(THIS_DIR / 'index.ts', media_type='text/plain')


@app.post('/chat/')
async def chat(request: Request) -> Response:
    return await handle_ag_ui_request(
        agent, request, model_settings={'parallel_tool_calls': False}
    )


if __name__ == '__main__':
    uvicorn.run('minimal_chat_app:app', reload=True, reload_dirs=[str(THIS_DIR)])
```

```html

# index.html

<!DOCTYPE html>
<html>
<head>
  <meta charset="UTF-8">
  <title>ModelRetry Test</title>
</head>
<body>
  <h1>ModelRetry Bug Test</h1>
  <div id="conv"></div>
  <form>
    <input id="input" placeholder="Type a message..." autofocus>
  </form>
</body>
<script src="https://cdnjs.cloudflare.com/ajax/libs/typescript/5.6.3/typescript.min.js"></script>
<script type="module">
  async function loadTs() {
    const response = await fetch('/index.ts');
    const tsCode = await response.text();
    const jsCode = window.ts.transpile(tsCode, { target: "es2015" });
    const script = document.createElement('script');
    script.type = 'module';
    script.text = jsCode;
    document.body.appendChild(script);
  }
  loadTs();
</script>
</html>

```

```javascript

# index.ts


import { HttpAgent } from 'https://cdn.skypack.dev/@ag-ui/client'

const conv = document.getElementById('conv') as HTMLDivElement
const input = document.getElementById('input') as HTMLInputElement
const form = document.querySelector('form') as HTMLFormElement

const agent = new HttpAgent({
  url: '/chat/',
  threadId: `chat-${Date.now()}`,
})

form.onsubmit = async (e) => {
  e.preventDefault()
  const prompt = input.value.trim()
  if (!prompt) return

  input.value = ''
  input.disabled = true

  // Show user message
  conv.innerHTML += `<div class="user">${prompt}</div>`

  // Add to agent history
  agent.messages.push({
    id: crypto.randomUUID(),
    role: 'user',
    content: prompt,
  })

  // Show assistant response
  let response = ''
  const msgDiv = document.createElement('div')
  msgDiv.className = 'assistant'
  conv.appendChild(msgDiv)

  try {
    await agent.runAgent(
      { threadId: agent.threadId },
      {
        onTextMessageContentEvent({ event }: any) {
          response += event.delta
          msgDiv.textContent = response
        },
      }
    )
    input.disabled = false
    input.focus()
  } catch (error) {
    msgDiv.textContent = `Error: ${error}`
    input.disabled = false
  }
}
```

### Python, Pydantic AI & LLM client version

```Text
pydantic-ai==1.1.0
pydantic-ai-slim==1.1.0
ag-ui-protocol==0.1.9
openai==2.5.0
playwright==1.55.0
pytest==8.4.2
fastapi==0.119.0
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ModelRetry bug with pydantic AI, AG-UI, and Open AI #3197

Initial Checks

Description

Example Code

Python, Pydantic AI & LLM client version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ModelRetry bug with pydantic AI, AG-UI, and Open AI #3197

Description

Initial Checks

Description

Example Code

Python, Pydantic AI & LLM client version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions