Skip to content

Unrestricted caching keyed by generated types causes memory leak in multi-threaded regimes #2672

@rona-sh

Description

@rona-sh

Confirm this is an issue with the Python library and not an underlying OpenAI API

  • This is an issue with the Python library

Describe the bug

In the OpenAI.responses.parse function, as part of validating the formatted response so it matches the required text_format (denoted hereinafter MyClass), the code creates the type ParsedResponseOutputMessage[MyClass], which is later passed to an unbounded lru_cache of pydantic.TypeAdapter.

In a multi-threaded setting, pydantic generates the type anew, and so its hash changes in every run, increasing the cache size ad infinitum. Concretely, a standard webserver that uses the responses.parse function is affected.

This issue reproduces under any model, and regardless of user input or target class.

To Reproduce

Consider the following snippet:

from openai import OpenAI
from pydantic import BaseModel
import psutil


class Fact(BaseModel):
    fact: str


client = OpenAI()
model = "gpt-4.1-nano"

def f():
    _ = client.responses.parse(model=model, input="Give a fun fact", text_format=Fact)
    print(psutil.Process().memory_info().rss / 2**20)

f invokes a responses.parse call, and prints the memory usage (in MB).

When running it on a single thread, the memory usage changes minimally:

for _ in range(10):
    f()

However, when running it on multiple threads (such as in default webserver), more memory is used for every request:

import time
import threading

for _ in range(10):
    time.sleep(0.1)
    threading.Thread(target=f).start()

Code snippets

OS

macOS

Python version

Python v3.12.11

Library version

openai v1.107.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions