Skip to content

Conversation

@valarLip
Copy link
Collaborator

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

@Copilot Copilot AI review requested due to automatic review settings October 21, 2025 15:19
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances the MLA (Multi-head Latent Attention) split key-value algorithm with significant new functionality including persistent thread group support, sparse attention capabilities, and fp8 quantization support. The changes introduce metadata generation for optimized work distribution and a reduce kernel for merging partial results.

  • Adds persistent thread group implementation for variable query/output lengths
  • Implements sparse attention with top-k token selection
  • Integrates fp8 quantization support for both Q and KV tensors

Reviewed Changes

Copilot reviewed 52 out of 85 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
op_tests/test_mla_sparse.py New test file for sparse MLA attention with top-k token selection
op_tests/test_mla_persistent*.py New test files for persistent thread group MLA implementation
csrc/py_itfs_cu/asm_mla.cu Updated to support persistent mode, fp8 datatypes, and new metadata parameters
csrc/kernels/mla/reduce.cu New reduce kernel for merging partial attention outputs
csrc/kernels/mla/metadata*.cuh New metadata generation kernels for work distribution
aiter/mla.py Updated decode forward pass to use new metadata and reduce operations
csrc/include/mla.h New header defining MLA data structures and function signatures
Various copyright headers Updated copyright format from (c) to (C)

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@valarLip valarLip self-assigned this Oct 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants