Skip to content

Conversation

@kesavanramakrishnan
Copy link
Contributor

Added in support for gqa

Copilot AI review requested due to automatic review settings October 20, 2025 22:10
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for Grouped Query Attention (GQA) and 64-bit indexing to the lean attention implementation. GQA allows different numbers of query heads (hq) and key/value heads (hk), which improves efficiency in transformer models by reducing memory bandwidth for key/value tensors.

Key changes:

  • Refactored function signatures to separate query heads (hq) from key/value heads (hk)
  • Added GQA support with head expansion logic in reference implementation and kernel mapping
  • Implemented 64-bit indexing to handle large tensor offsets that exceed 32-bit limits

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
op_tests/triton_tests/test_la.py Updated test parameters and helper functions to support separate hq and hk parameters; added GQA head expansion in reference implementation
op_tests/op_benchmarks/triton/bench_la.py Updated benchmark configurations to include hq and hk parameters with new GQA test cases
aiter/ops/triton/lean_atten.py Added GQA group size calculation, 64-bit indexing support, and enhanced runtime safety checks for buffer validation
aiter/ops/triton/_triton_kernels/lean_atten.py Implemented GQA head mapping in kernel, added 64-bit pointer arithmetic, and padding mask support for irregular head dimensions

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@valechen valechen self-requested a review October 21, 2025 15:43
@valechen valechen requested a review from vgokhale October 21, 2025 20:08
@valechen valechen removed the request for review from vgokhale October 22, 2025 21:53
@valechen valechen requested review from xiaohuguo2023 and removed request for xiaohuguo2023 October 23, 2025 22:10
@valechen valechen merged commit 0e10941 into main Oct 25, 2025
18 of 19 checks passed
@valechen valechen deleted the final_gqa_update branch October 25, 2025 02:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants