Mla splitkv enhance split alg inte #1233

valarLip · 2025-10-21T15:19:35Z

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

…it_alg

…nhance_split_alg_inte

…_schema which doesn't allow dict as parameter. Thus, change the API for metadata.

…in branch.

float32.

…it's not compatible with hip graph

Copilot

Pull Request Overview

This PR enhances the MLA (Multi-head Latent Attention) split key-value algorithm with significant new functionality including persistent thread group support, sparse attention capabilities, and fp8 quantization support. The changes introduce metadata generation for optimized work distribution and a reduce kernel for merging partial results.

Adds persistent thread group implementation for variable query/output lengths
Implements sparse attention with top-k token selection
Integrates fp8 quantization support for both Q and KV tensors

Reviewed Changes

Copilot reviewed 52 out of 85 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
op_tests/test_mla_sparse.py	New test file for sparse MLA attention with top-k token selection
op_tests/test_mla_persistent*.py	New test files for persistent thread group MLA implementation
csrc/py_itfs_cu/asm_mla.cu	Updated to support persistent mode, fp8 datatypes, and new metadata parameters
csrc/kernels/mla/reduce.cu	New reduce kernel for merging partial attention outputs
csrc/kernels/mla/metadata*.cuh	New metadata generation kernels for work distribution
aiter/mla.py	Updated decode forward pass to use new metadata and reduce operations
csrc/include/mla.h	New header defining MLA data structures and function signatures
Various copyright headers	Updated copyright format from (c) to (C)

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

aiter/mla.py

csrc/kernels/mla/metadata/v1_comm.cuh

csrc/kernels/mla/metadata/v1_1_device.cuh

…ent.py

valarLip and others added 30 commits June 26, 2025 13:19

add num_kv_splits_indptr to mla for mtp<=4 case for now

fa2c2d2

update

15f6155

update new kernel

8dd5617

infrastructures

c871e8d

1st version of split kernel

3750b5f

Fix issues raised by Lingpeng and fix the issue on batch_size

7ca2598

update mla

7c5891c

update mla_stage2

12def78

Merge branch 'main' into mla_splitkv_enhance

5dc5a6d

Merge branch 'main' into mla_splitkv_enhance

eae14ae

Merge branch 'mla_splitkv_enhance' into jruan/mla_splitkv_enhance_spl…

f244f11

…it_alg

1st draft of v1 split program

224f89f

add kv_offset

ef442fd

mla_splitkv_enhance_split_alg_inte

f10235e

splitkv debug

600b5dd

1st version of reduce kernel

5c58ae8

metadata & kernel finish

9700bc5

Merge branch 'jruan/mla_splitkv_enhance_split_alg' into mla_splitkv_e…

4a86304

…nhance_split_alg_inte

add reduce

d49c0cd

final_lse is optional now.

e4bf891

update kernel

7bf6aa4

bug fix

2411f1f

Merge branch 'jruan/mla_splitkv_enhance_split_alg' into mla_splitkv_e…

e21600d

…nhance_split_alg_inte

bug fix 1

ffcc113

modify reduce api

07e4ed1

Merge branch 'jruan/mla_splitkv_enhance_split_alg' into mla_splitkv_e…

3f2bf25

…nhance_split_alg_inte

update kernel

7c877c4

fix max splits

d10cdab

bug fix 3

bac5750

fix s80 early return

f59a3e6

ruanjm and others added 22 commits October 13, 2025 03:40

Merge branch 'main' into mla_splitkv_enhance_split_alg_inte

0b874cb

Fix issue after merge. aiter main branch is using torch.library.infer…

ce9abd8

…_schema which doesn't allow dict as parameter. Thus, change the API for metadata.

Adjust metadata v1.1 and make this branch be ready to be merged to ma…

64c3e29

…in branch.

Merge branch 'main' into mla_splitkv_enhance_split_alg_inte

57b9d57

remove invalid co kernel

b70d8d4

Fix issue brought from f794ae4 which disabled hipify by default.

f668d60

support qolen>1 for sparse mla

33ea0e8

make code become prettier

6e2c4ff

Fix issue in metadata v1.1

c3813fb

Merge branch 'main' into mla_splitkv_enhance_split_alg_inte

bcd219a

Fix issue in test_mla.py

33b0499

Fix lint fails

53f5826

Fix sub-test fails in op_test/test_mla.py

41576e1

Fix regression in test_mla.py where mtp>1

68ef089

Add head_dim=128 support to reduce

f7efe97

Merge branch 'main' into mla_splitkv_enhance_split_alg_inte

8440195

Add nhead=8 for pa and add assert to make sure the input tensors are in

1c5b77b

float32.

fix issue in vllm benchmark for deepseek: remove metadata v0 because …

69d41a0

…it's not compatible with hip graph

fix lint

0cf3db2

Revert all the change about mi350 gemm.

ae96787

add a8w8 and a16w8 kernel in mla mi350

be55ef5

add A8W8 Non-persistent mode kernel

600d993

Copilot AI review requested due to automatic review settings October 21, 2025 15:19

Copilot AI reviewed Oct 21, 2025

View reviewed changes

aiter/mla.py Show resolved Hide resolved

csrc/kernels/mla/metadata/v1_comm.cuh Outdated Show resolved Hide resolved

csrc/kernels/mla/metadata/v1_1_device.cuh Outdated Show resolved Hide resolved

ruanjm and others added 4 commits October 22, 2025 06:10

Fix issue reported by Copilot

6c7f795

add mla non-persistent test

573c3cd

script: update a16w8 kernel

0cfc1a3

rm test_mla_persistent_mi350.py and support mi350 in test_mla_persist…

0490f21

…ent.py

valarLip self-assigned this Oct 24, 2025

Merge branch 'main' into mla_splitkv_enhance_split_alg_inte

8ca7679

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mla splitkv enhance split alg inte #1233

Mla splitkv enhance split alg inte #1233

Uh oh!

valarLip commented Oct 21, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Mla splitkv enhance split alg inte #1233

Are you sure you want to change the base?

Mla splitkv enhance split alg inte #1233

Uh oh!

Conversation

valarLip commented Oct 21, 2025

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants