-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Description
Confirm this is an issue with the Python library and not an underlying OpenAI API
- This is an issue with the Python library
Describe the bug
In the OpenAI.responses.parse
function, as part of validating the formatted response so it matches the required text_format
(denoted hereinafter MyClass
), the code creates the type ParsedResponseOutputMessage[MyClass]
, which is later passed to an unbounded lru_cache
of pydantic.TypeAdapter
.
In a multi-threaded setting, pydantic
generates the type anew, and so its hash
changes in every run, increasing the cache size ad infinitum. Concretely, a standard webserver that uses the responses.parse
function is affected.
This issue reproduces under any model, and regardless of user input or target class.
To Reproduce
Consider the following snippet:
from openai import OpenAI
from pydantic import BaseModel
import psutil
class Fact(BaseModel):
fact: str
client = OpenAI()
model = "gpt-4.1-nano"
def f():
_ = client.responses.parse(model=model, input="Give a fun fact", text_format=Fact)
print(psutil.Process().memory_info().rss / 2**20)
f
invokes a responses.parse
call, and prints the memory usage (in MB).
When running it on a single thread, the memory usage changes minimally:
for _ in range(10):
f()
However, when running it on multiple threads (such as in default webserver), more memory is used for every request:
import time
import threading
for _ in range(10):
time.sleep(0.1)
threading.Thread(target=f).start()
Code snippets
OS
macOS
Python version
Python v3.12.11
Library version
openai v1.107.0