[V1] [Spec decode] Llama4 type eagle support in v1 #18369

RonaldBXu · 2025-05-19T22:43:07Z

This PR adds the capability for llama4-type eagle heads to be used for speculative decoding in vLLM v1. This is my first major PR in vLLM, so feedback is appreciated : )

Signed-off-by: Ronald Xu <[email protected]>

github-actions · 2025-05-19T22:43:15Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

sarckk · 2025-05-19T23:38:36Z

cc @zixi-qi @morgendave

RonaldBXu · 2025-05-21T04:15:09Z

Ready for review now

WoosukKwon · 2025-05-22T18:43:07Z

@RonaldBXu Looks good to me overall. Could you please add a test? Also, is there any available EAGLE head we can test this on?

aarnphm · 2025-06-05T22:33:17Z

Could you please add a test? Also, is there any available EAGLE head we can test this on?

I found this from nvidia: https://huggingface.co/nvidia/Llama-4-Maverick-17B-128E-Eagle3, but it seems they are using eagle3 architecture

RonaldBXu · 2025-06-11T00:21:53Z

Hi @WoosukKwon when you say add a test do you mean an e2e test like in https://github.com/vllm-project/vllm/blob/main/tests/spec_decode/e2e/test_eagle_correctness.py or https://github.com/vllm-project/vllm/blob/main/tests/models/registry.py#L407? I think I'd have to open-source a compatible eagle head first, right?

Could you point me to other tests I could work on while I wait for approval for a compatible eagle head? Thanks!

Signed-off-by: Ronald Xu <[email protected]>

WoosukKwon · 2025-06-15T05:40:49Z

Hi @RonaldBXu, the PR looks good to me overall, but we'd like to have a test or at least a way to run the code.

Please refer to https://github.com/vllm-project/vllm/blob/main/tests/v1/spec_decode/test_eagle.py and

vllm/tests/v1/e2e/test_spec_decode.py

Line 109 in ee1531b

def test_eagle_correctness(

I think I'd have to open-source a compatible eagle head first, right?

Yes. We need an eagle head for Llama 4. Could we use https://huggingface.co/nvidia/Llama-4-Maverick-17B-128E-Eagle3 (@aarnphm mentioned)?

RonaldBXu · 2025-06-15T05:45:44Z

Thanks, I'll look at those tests. I don't think we can use that head since it is EAGLE3, but the good news is I got approval to release a compatible eagle head for my code. I should hopefully have it ready sometime next week!

Signed-off-by: Ronald Xu <[email protected]>

RonaldBXu · 2025-06-21T07:17:15Z

Hi @WoosukKwon , I added the tests. Just wanted to call out that for Llama4 Maverick, tp=1 was not sufficient (cuda out of memory error) so I made my test initialize the LLM on tp=8. Although I guess I could change it to Llama4 scout.. Please let me know what you think would be the best option here. Thanks!

Signed-off-by: Ronald Xu <[email protected]>

aarnphm · 2025-06-23T19:53:29Z

Let's make a separate script for testing llama 4 head. I don't think we want to run llama 4 on CI right now.

and can you add an entry to for instructing how to run this tests locally on users that have tp=8 setup? I can test this as well (have access to a 8xH100 box atm).

RonaldBXu · 2025-06-23T21:52:49Z

Sure, I can put the tests in a separate file and add some instructions. Is there something I should edit to make the CI skip my new specific file for llama4? In here? https://github.com/vllm-project/vllm/blob/main/.buildkite/test-pipeline.yaml

edit: no, I don't need to edit the file since the test manually runs each file. So if I make a new file it won't be run in the CI.

aarnphm · 2025-06-23T22:14:49Z

yeah just include the test file there, and note in the file the instructions to run it with pytest should be good enough.

llama4 type eagle support in v1

da3dc61

Signed-off-by: Ronald Xu <[email protected]>

RonaldBXu closed this May 19, 2025

RonaldBXu reopened this May 19, 2025

RonaldBXu closed this May 19, 2025

RonaldBXu reopened this May 21, 2025

Merge branch 'vllm-project:main' into llama4_v1_support

b61e6be

markmc added speculative-decoding v1 labels May 21, 2025

WoosukKwon self-requested a review May 22, 2025 18:41

Merge branch 'main' into llama4_v1_support

924be7b

mergify bot added the llama Related to Llama models label Jun 9, 2025

Merge branch 'vllm-project:main' into llama4_v1_support

f40d973

RonaldBXu and others added 4 commits June 14, 2025 14:13

Merge branch 'vllm-project:main' into llama4_v1_support

a4dd030

updating code to match current standards. removed redundant lm_head

06bfb26

Signed-off-by: Ronald Xu <[email protected]>

add spdx filecopyright text

40df89d

Signed-off-by: Ronald Xu <[email protected]>

fix linter

e9d9241

Signed-off-by: Ronald Xu <[email protected]>

WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 15, 2025

RonaldBXu and others added 2 commits June 21, 2025 00:07

Merge branch 'vllm-project:main' into llama4_v1_support

88ecec6

tests

25bf276

Signed-off-by: Ronald Xu <[email protected]>

RonaldBXu added 2 commits June 21, 2025 07:26

fix linter

1868c12

Signed-off-by: Ronald Xu <[email protected]>

fix linter

23136c8

Signed-off-by: Ronald Xu <[email protected]>

remove whitespace

5c65200

Signed-off-by: Ronald Xu <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[V1] [Spec decode] Llama4 type eagle support in v1 #18369

[V1] [Spec decode] Llama4 type eagle support in v1 #18369

RonaldBXu commented May 19, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented May 19, 2025

Uh oh!

sarckk commented May 19, 2025

Uh oh!

RonaldBXu commented May 21, 2025

Uh oh!

WoosukKwon commented May 22, 2025

Uh oh!

aarnphm commented Jun 5, 2025 •

edited

Loading

Uh oh!

RonaldBXu commented Jun 11, 2025

Uh oh!

WoosukKwon commented Jun 15, 2025

Uh oh!

RonaldBXu commented Jun 15, 2025

Uh oh!

RonaldBXu commented Jun 21, 2025

Uh oh!

aarnphm commented Jun 23, 2025

Uh oh!

RonaldBXu commented Jun 23, 2025 •

edited

Loading

Uh oh!

aarnphm commented Jun 23, 2025

Uh oh!

Uh oh!

Uh oh!

[V1] [Spec decode] Llama4 type eagle support in v1 #18369

Are you sure you want to change the base?

[V1] [Spec decode] Llama4 type eagle support in v1 #18369

Conversation

RonaldBXu commented May 19, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 19, 2025

Uh oh!

sarckk commented May 19, 2025

Uh oh!

RonaldBXu commented May 21, 2025

Uh oh!

WoosukKwon commented May 22, 2025

Uh oh!

aarnphm commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RonaldBXu commented Jun 11, 2025

Uh oh!

WoosukKwon commented Jun 15, 2025

Uh oh!

RonaldBXu commented Jun 15, 2025

Uh oh!

RonaldBXu commented Jun 21, 2025

Uh oh!

aarnphm commented Jun 23, 2025

Uh oh!

RonaldBXu commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aarnphm commented Jun 23, 2025

Uh oh!

Uh oh!

RonaldBXu commented May 19, 2025 •

edited by github-actions bot

Loading

aarnphm commented Jun 5, 2025 •

edited

Loading

RonaldBXu commented Jun 23, 2025 •

edited

Loading