Skip to content

Possible invalid request formatting for max_completion_tokens #210

Discussion options

You must be logged in to vote

Tldr: you can ignore that error. We set both max_completion_tokens and max_tokens to the same value.

The standard for setting a max tokens is kinda of messy. Some model servers support max_tokens as the flag and some support max_completion_tokens. In the official OpenAI documentation the legacy endpoint only supports max_tokens while the chat endpoint supports max_tokens for older models and max_completion_tokens for everything. Ollama supports only max_completion_tokens

vLLM started with max_completion_tokens but at some point switched to max_tokens and throws a harmless warning for the former. We kept both in GuideLLM for compatibility, but some model servers cough cough Ollama don't li…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by sjmonson
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #208 on June 27, 2025 19:03.