[Model][Frontend] Adding timeseries modality support and Qwen2.5-ChatTS model support #16852

chemeris · 2025-04-18T16:21:38Z

This pull request has two parts:

Adds generic infrastructure for handling time series as a modality, including an OpenAI API server.
Adds support for ChatTS model inference that relies on the above change for both offline inference and online serving using the OpenAI API server.

Please refer to the official ChatTS documentation for details about the model architecture: https://github.com/NetManAIOps/ChatTS/
This code is based on the original ChatTS code, but works with the latest vllm code, and adds support for V1 vLLM engine and OpenAI API serving.

To use the current version of ChatTS requires --trust-remote-code and --hf-overrides in order to load config and processing classes from the ChatTS HF repo, but use the vllm implementation of the model itself.

Example script to serve ChatTS via an OpenAI API server with vLLM:

vllm serve ../ChatTS-model \
    --served-model-name chatts \
    --trust-remote-code \
    --hf-overrides '{"model_type":"chatts"}' \
    --max-model-len 6000 \
    --gpu-memory-utilization 0.8 \
    --limit-mm-per-prompt timeseries=50 \
    --allowed-local-media-path $(pwd) \
    --host 0.0.0.0 \
    --port 8090

github-actions · 2025-04-18T16:21:50Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

mergify · 2025-04-19T09:30:08Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @chemeris.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

DarkLight1337

Thanks for adding this model to vLLM and expanding the multi-modality code! Some initial comments.

vllm/multimodal/time_series.py

vllm/transformers_utils/configs/chatts.py

vllm/transformers_utils/processors/chatts.py

vllm/model_executor/models/chatts.py

vllm/model_executor/models/registry.py

DarkLight1337 · 2025-04-20T03:16:11Z

Please verify your model by following the guide in https://docs.vllm.ai/en/latest/contributing/model/tests.html

Also make sure to add this model to the Supported Models page in the docs!

chemeris · 2025-04-20T20:58:29Z

@DarkLight1337 Thank you. I'll read the guide and see what's required to add the model.

mergify · 2025-04-30T15:15:45Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @chemeris.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

chemeris · 2025-06-05T14:20:43Z

@DarkLight1337 @ywang96 A kind ping about this. Please, could we merge this PR?

DarkLight1337 · 2025-06-05T14:30:01Z

Sorry for the delay, @Isotr0py @jeejeelee are you two able to help? I am quite busy lately.

DarkLight1337 · 2025-06-05T14:34:38Z

Regarding the prefix caching issue, maybe @heheda12345 can help as well?

vllm/multimodal/time_series.py

vllm/model_executor/models/chatts.py

chemeris · 2025-06-05T19:33:10Z

@Isotr0py All changes made as you suggested, thank you. Especially for catching the hardcoded float16.

heheda12345 · 2025-06-06T02:41:32Z

(a) why is it not caching these 9 tokens

Because we only cache full blocks that won't be further modified. For example, with block_size 4, if we cache [E] of a request [ABCD,E], and there are two new requests [ABCD, EF] and [ABCD, EG] that reuse [E], they will modify block [E] with different values.

(b) why is the result different when it processes them the second time?

I'm not sure. Maybe you can print the block_ids and the kv_cache tensor of these blocks to see if there is any problem.

chemeris · 2025-06-06T08:25:06Z

(a) why is it not caching these 9 tokens

Because we only cache full blocks that won't be further modified. For example, with block_size 4, if we cache [E] of a request [ABCD,E], and there are two new requests [ABCD, EF] and [ABCD, EG] that reuse [E], they will modify block [E] with different values.

Thank you for the explanation. I'm not sure this matches my observations, though.

From my memory, when I was sending the exact same prompt four times, I saw:

Output X (full prompt is processed)
Output Y (only the tail of the prompt is re-processed)
Output Y (nothing is re-processed)
Output Y (nothing is re-processed)

So it looked like the tail had been cached, but only after the second try.

I did try printing token IDs and vectors, but couldn't see anything obviously wrong - without the full understanding of the underlying caching machinery at least.

I'm happy to look again if you could give me a hand with a bit more detailed insight into what exactly to debug.

chemeris · 2025-06-06T08:27:59Z

@Isotr0py @DarkLight1337 Looks like the tests are passing now, and comments by @Isotr0py have been implemented. Is it possible to merge the PR while we're looking at the caching issue, as it seems to be unrelated to this specific PR?

DarkLight1337

Sorry again for the delay! Overall the PR looks good to me!

Isotr0py · 2025-06-06T16:25:33Z

Please take a look to the failing basic model tests and multimodal tests.

And nearly forgotten, can you update the supported_models documentation to include this model?

Signed-off-by: Alexander Chemeris <[email protected]>

jeejeelee · 2025-06-13T03:19:40Z

vllm/model_executor/models/qwen3ts.py

+        valid_lengths = mask.sum(dim=1).long()  # Shape: (batch_size)
+
+        patch_cnt = (valid_lengths + self.patch_size -
+                     1) // self.patch_size  # 向上取整


Suggested change

1) // self.patch_size # 向上取整

1) // self.patch_size

jeejeelee · 2025-06-13T03:20:31Z

vllm/model_executor/models/qwen3ts.py

@@ -0,0 +1,442 @@
+# SPDX-License-Identifier: Apache-2.0


Suggested change

# SPDX-License-Identifier: Apache-2.0

# SPDX-License-Identifier: Apache-2.0

# SPDX-FileCopyrightText: Copyright contributors to the vLLM project

jeejeelee · 2025-06-13T03:27:05Z

vllm/model_executor/models/qwen3ts.py

+    dummy_inputs=Qwen3TSDummyInputsBuilder,
+)
+class Qwen3TSForCausalLM(nn.Module, SupportsMultiModal, SupportsPP,
+                         SupportsLoRA):


If you want to add LoRA for multimodal models, you also need to implement get_mm_mapping. Please refer to get_mm_mapping, or you can remove SupportsLoRA

jeejeelee · 2025-06-13T03:29:29Z

vllm/model_executor/models/chatts.py

@@ -0,0 +1,442 @@
+# SPDX-License-Identifier: Apache-2.0


Suggested change

# SPDX-License-Identifier: Apache-2.0

# SPDX-License-Identifier: Apache-2.0

# SPDX-FileCopyrightText: Copyright contributors to the vLLM project

chemeris requested review from DarkLight1337 and ywang96 as code owners April 18, 2025 16:21

mergify bot added frontend multi-modality Related to multi-modality (#4194) labels Apr 18, 2025

chemeris force-pushed the timeseries branch from 0e184be to e766838 Compare April 19, 2025 09:29

mergify bot added the needs-rebase label Apr 19, 2025

chemeris force-pushed the timeseries branch from e766838 to b19b44f Compare April 19, 2025 09:35

mergify bot removed the needs-rebase label Apr 19, 2025

chemeris force-pushed the timeseries branch 4 times, most recently from 868f817 to af856a8 Compare April 19, 2025 15:10

DarkLight1337 added this to Multi-modal Model Requests Apr 19, 2025

DarkLight1337 moved this to In Progress in Multi-modal Model Requests Apr 19, 2025

DarkLight1337 reviewed Apr 19, 2025

View reviewed changes

chemeris force-pushed the timeseries branch 3 times, most recently from 1572d8e to d27a6d6 Compare April 19, 2025 17:40

DarkLight1337 reviewed Apr 20, 2025

View reviewed changes

vllm/model_executor/models/chatts.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Apr 20, 2025

View reviewed changes

vllm/model_executor/models/registry.py Outdated Show resolved Hide resolved

chemeris force-pushed the timeseries branch from d27a6d6 to b8aa508 Compare April 20, 2025 20:54

mergify bot added the needs-rebase label Apr 30, 2025

chemeris force-pushed the timeseries branch from b8aa508 to 934599a Compare May 10, 2025 17:48

mergify bot removed the needs-rebase label May 10, 2025

chemeris force-pushed the timeseries branch from 934599a to c5b4e57 Compare May 10, 2025 17:51

Isotr0py reviewed Jun 5, 2025

View reviewed changes

chemeris force-pushed the timeseries branch from 89397fe to f144547 Compare June 5, 2025 19:31

mergify bot added the ci/build label Jun 5, 2025

chemeris force-pushed the timeseries branch 2 times, most recently from 93577f7 to 9dfe24e Compare June 5, 2025 20:34

DarkLight1337 approved these changes Jun 6, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) June 6, 2025 09:45

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 6, 2025

Alexander Chemeris added 2 commits June 7, 2025 19:30

Support time series modality for offline and OpenAI API inference.

6c7c417

Signed-off-by: Alexander Chemeris <[email protected]>

Add Qwen2-ChatTS model

c3e8f50

Signed-off-by: Alexander Chemeris <[email protected]>

auto-merge was automatically disabled June 7, 2025 23:33
Head branch was pushed to by a user without write access

chemeris force-pushed the timeseries branch from 9dfe24e to ae343b2 Compare June 7, 2025 23:33

chemeris requested a review from aarnphm as a code owner June 7, 2025 23:33

chemeris and others added 2 commits June 8, 2025 10:44

Add Qwen2-ChatTS model tests

49c70a2

Signed-off-by: Alexander Chemeris <[email protected]>

Add Qwen3TS model

75aa552

Signed-off-by: Alexander Chemeris <[email protected]>

chemeris force-pushed the timeseries branch from ae343b2 to 75aa552 Compare June 8, 2025 10:44

aarnphm approved these changes Jun 13, 2025

View reviewed changes

jeejeelee reviewed Jun 13, 2025

View reviewed changes

mergify bot added the qwen Related to Qwen models label Jun 19, 2025

	# SPDX-License-Identifier: Apache-2.0
	# SPDX-License-Identifier: Apache-2.0
	# SPDX-FileCopyrightText: Copyright contributors to the vLLM project

Uh oh!

[Model][Frontend] Adding timeseries modality support and Qwen2.5-ChatTS model support #16852

Are you sure you want to change the base?

[Model][Frontend] Adding timeseries modality support and Qwen2.5-ChatTS model support #16852

Conversation

chemeris commented Apr 18, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 18, 2025

Uh oh!

mergify bot commented Apr 19, 2025

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 commented Apr 20, 2025

Uh oh!

chemeris commented Apr 20, 2025

Uh oh!

mergify bot commented Apr 30, 2025

Uh oh!

chemeris commented Jun 5, 2025

Uh oh!

DarkLight1337 commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chemeris commented Jun 5, 2025

Uh oh!

heheda12345 commented Jun 6, 2025

Uh oh!

chemeris commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chemeris commented Jun 6, 2025

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

Isotr0py commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeejeelee Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

jeejeelee Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

jeejeelee Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

jeejeelee Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chemeris commented Apr 18, 2025 •

edited by github-actions bot

Loading

DarkLight1337 commented Jun 5, 2025 •

edited

Loading

DarkLight1337 commented Jun 5, 2025 •

edited

Loading

chemeris commented Jun 6, 2025 •

edited

Loading

Isotr0py commented Jun 6, 2025 •

edited

Loading