Skip to content

Conversation

@samremes
Copy link

@samremes samremes commented Oct 23, 2025

Motivation

Llama4 Maverick uses a custom routing function that isn't using a softmax but only sigmoid: https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/llama4.py#L62-L71
Especially in inference that custom routing function done in torch only becomes a significant overhead.

Technical Details

Relies on a PR in composable_kernel: ROCm/composable_kernel#3062
Need to bump 3rdparty/composable_kernel after merging the CK PR.

Test Plan

Added a simple test into op_tests that test both fp16 and bf16 cases.

Test Result

python3 op_tests/test_moe_topk_sigmoid.py 
[aiter] import [module_aiter_enum] under /workspaces/dev/aiter/aiter/jit/module_aiter_enum.so
[W1023 15:37:48.300574905 collection.cpp:1114] Warning: ROCTracer produced duplicate flow start: 1 (function operator())
[aiter] import [module_moe_asm] under /workspaces/dev/aiter/aiter/jit/module_moe_asm.so
[aiter] [checkAllclose atol=0.01 rtol=0.01 passed~]
[aiter] [checkAllclose atol=0.01 rtol=0.01 passed~]
Runtime (torch baseline):     29.784888888888865
Runtime (fused topk sigmoid): 4.163444444444443
Uplift:                       7.15x
[aiter] [checkAllclose atol=0.01 rtol=0.01 passed~]
[aiter] [checkAllclose atol=0.01 rtol=0.01 passed~]
Runtime (torch baseline):     31.291888888888884
Runtime (fused topk sigmoid): 4.296666666666662
Uplift:                       7.28x

Submission Checklist

@samremes samremes marked this pull request as ready for review October 29, 2025 08:43
"f'{AITER_CSRC_DIR}/py_itfs_cu/asm_topksoftmax.cu'"
"f'{AITER_CSRC_DIR}/py_itfs_cu/asm_topksoftmax.cu'",
"f'{AITER_CSRC_DIR}/py_itfs_ck/topk_sigmoid_kernels.cu'",
"f'{CK_DIR}/example/ck_tile/09_topk_softmax/topk_softmax_api.cpp'"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move it out of module_moe_asm, it's quite heavy now

Copy link
Author

@samremes samremes Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a new module for topk.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should remove these as we had them in new module

) -> None: ...


@compile_ops("module_moe_asm")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...



if __name__ == "__main__":
test_topk_sigmoid(dtype=torch.float16)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add arg parser to allow user test other case, refer this one

default=[(32, 32), (40, 40), (64, 64), (111, 111), (128, 128), (160, 160)],

@gyohuangxin gyohuangxin requested a review from valarLip October 30, 2025 02:06
"f'{AITER_CSRC_DIR}/py_itfs_cu/asm_topksoftmax.cu'"
"f'{AITER_CSRC_DIR}/py_itfs_cu/asm_topksoftmax.cu'",
"f'{AITER_CSRC_DIR}/py_itfs_ck/topk_sigmoid_kernels.cu'",
"f'{CK_DIR}/example/ck_tile/09_topk_softmax/topk_softmax_api.cpp'"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should remove these as we had them in new module

@valarLip valarLip self-assigned this Oct 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants