Skip to content

[Misc][Help]: Adding support for a Custom model with External MoE Routing #15214

Open
@XMaster96

Description

@XMaster96

Anything you want to discuss about vllm.

I have a custom model that uses something really similar to MoE but instead of that routing being determined by the model itself, I set routing dependent on what part of the sequence the tokens are from.

I would now like to inference my model using VLLM, and because I have a non-standard model I assume I will need to modify some parts of VLLM to get it to work. It would be really nice if someone more familiar with the project can give me some feedback on my implementation thoughts and maybe can point me to the right places in the code base.

  • I really would like to have batching support where each batch item can have a different routing map.
  • I will most likely need a custom sampler, which is capable of determining the routing pattern / setting the next expert, based on the sequence.
  • I assume that I would need to CUDA graph cache each routing pattern / combination, which sounds like a lot. How is this handled with other MoE models? Most likely I want to at first deactivate this.

Thank you for all your help in advance and with best regards
XMaster96

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    miscstaleOver 90 days of inactivity

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions