Skip to content

Conversation

@Dhruvilbhatt
Copy link

Signed-off-by: youkaichao <[email protected]>

This pattern has proven effective in production environments and scales from experimental prototypes to multi-model production deployments.

If you're interested in plugin-based architectures for inference systems or want to explore how to structure runtime patching in a clean way, feel free to reach out. Always happy to chat about scalable LLM deployment and design patterns.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel free to reach out

how?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added links here

### Key takeaways:
- ✅ Use `VLLMPatch[TargetClass]` for surgical, class-level modifications
- ✅ Register via `vllm.general_plugins` entry point in `setup.py`
- ✅ Control patches with `VLLM_CUSTOM_PATCHES` environment variable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please clarify that VLLM_CUSTOM_PATCHES is not a vllm standard environment variable. every users might need to choose their own env var name.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, modified this in the blog

Copy link
Member

@youkaichao youkaichao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for contributing! looks good in general, please fix the two comments.

- ❌ **Every vLLM upgrade breaks your patch** - Because you replaced full files, not just the individual lines of interest
- ❌ **Debugging becomes painful** - Is the bug in your patch? In unchanged vanilla code? Or because monkey patching rewired behavior unexpectedly?
- ❌ **Operational complexity grows over time** - Every vLLM release forces you to diff and re-sync your copied files - exactly the same problem as maintaining a fork, just disguised inside your Python package

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another disadvantage we hit sometimes: monkey patch doesn't work for some module. for example, monkey patch scheduler module usually doesn't work. Since scheduler is called by EngineCore process, monkey patch through process works in unexpected way usually. You need usually monkey patch EngineCore at all.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, added in the blog. Thanks @wangxiyuan !


This approach keeps the operational overhead minimal while maintaining long-term flexibility - something both small teams and large platform groups will appreciate.

---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you mention that there are 4 kind of plugins supported by vLLM? they're called and used by diffrerent case and process.

exmaple: https://github.com/wangxiyuan/vllm/blob/1f5ba1fea6171e82126fea80c521a79522b7a30d/vllm/plugins/__init__.py#L14-L22

Since the arch.png is for platform plugin while the case in the article is for generic plugin, I think it's good to explain more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more details on this too

<picture>
<img src="/assets/figures/2025-11-17-vllm-plugin-system/vllm-plugin-system-arch.png" width="100%">
</picture><br>
<em>Source: <a href="https://cloud.tencent.com/developer/article/2513676" target="_blank">https://cloud.tencent.com/developer/article/2513676</a></em>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is original from vLLM Beijing Meetup. lol.

You can get high quality version from P10 from https://drive.google.com/file/d/1lGMTF2RooWz2G0XkqIaCMB3xQalPLsjU/view P10

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks @Yikun ! Updated the image with vllm-ascend github repo reference

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants