Pass chunk size to moe op #2264

yiliu30 · 2025-08-15T05:24:49Z

Requires HabanaAI/vllm-hpu-extension#337

Signed-off-by: yiliu30 <[email protected]>

Copilot

Pull Request Overview

This PR modifies the FP8 quantization MoE (Mixture of Experts) forward pass to support passing chunk size information to the underlying MoE operation. The change enables dynamic chunk size configuration by extracting tokens_num from hidden_states and delegating to the original module for additional kwargs.

Adds a helper method to extract extra kwargs from the original module
Modifies forward_quant to pass chunk size information via extra_kwargs

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-08-16T05:55:56Z

neural_compressor/torch/algorithms/fp8_quant/_quant_common/helper_modules.py

    def forward_quant(self,
                      hidden_states,
                      expert_routing_table,
                      router_weights,
                      permuted_weights=True,
                      activation="silu"):
+        tokens_num, hidden_dim = hidden_states.shape


This assumes hidden_states has exactly 2 dimensions, but tensors could have more dimensions (e.g., batch_size, sequence_length, hidden_dim). Consider using hidden_states.shape[0] or hidden_states.shape[-2] depending on the expected tensor layout.

Suggested change

tokens_num, hidden_dim = hidden_states.shape

tokens_num = hidden_states.shape[-2]

hidden_dim = hidden_states.shape[-1]

pass chunk size to moe op

5c6c55c

Signed-off-by: yiliu30 <[email protected]>

yiliu30 requested a review from Copilot August 16, 2025 05:55

Copilot AI reviewed Aug 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pass chunk size to moe op #2264

Pass chunk size to moe op #2264

Uh oh!

yiliu30 commented Aug 15, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Aug 16, 2025

Uh oh!

Uh oh!

	tokens_num, hidden_dim = hidden_states.shape
	tokens_num = hidden_states.shape[-2]
	hidden_dim = hidden_states.shape[-1]

Pass chunk size to moe op #2264

Are you sure you want to change the base?

Pass chunk size to moe op #2264

Uh oh!

Conversation

yiliu30 commented Aug 15, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Aug 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!