Skip to content

Conversation

@LaurentMazare
Copy link

Some VLLM models, e.g. gemma 3 12B, expect some strict ordering on message roles, i.e. these should be system, user, assistant, user, assistant, etc. I haven't been able to get this to work with any of the existing provider_fmt options, mistrlai ensures that the last message is a user message but doesn't compact the existing user or assistant messages.

This PR adds a new format provider option, vllm, which ensures that the roles are presented in the proper ordering. Consecutive user messages are compacted together. This was tested to work properly with vllm 0.9.1 and various models (gemma 3, mistral).

Not sure if this the right way to solve this issue so let me know if there is a better alternative.

@CLAassistant
Copy link

CLAassistant commented Oct 8, 2025

CLA assistant check
All committers have signed the CLA.

@davidzhao davidzhao requested a review from theomonnom October 12, 2025 07:54
if len(messages) == 1 and messages[0]["role"] == "system":
messages.append({"role": "user", "content": dummy_user_message})
if len(messages) > 1 and messages[0]["role"] == "system" and messages[1]["role"] == "assistant":
messages.insert(1, {"role": "user", "content": "Hello"})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to inject "Hello" here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good solution, I know a lot of LLMs doesn't support chat_ctx starting with an assistant role.

inject_dummy_user_message should already handle this

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks for catching this, it was indeed meant to be dummy_user_message parameter rather than the hardcoded "Hello".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants