Added in GQA and 64-bit indexing #1226

kesavanramakrishnan · 2025-10-20T22:10:47Z

Added in support for gqa

Copilot

Pull Request Overview

This PR adds support for Grouped Query Attention (GQA) and 64-bit indexing to the lean attention implementation. GQA allows different numbers of query heads (hq) and key/value heads (hk), which improves efficiency in transformer models by reducing memory bandwidth for key/value tensors.

Key changes:

Refactored function signatures to separate query heads (hq) from key/value heads (hk)
Added GQA support with head expansion logic in reference implementation and kernel mapping
Implemented 64-bit indexing to handle large tensor offsets that exceed 32-bit limits

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
op_tests/triton_tests/test_la.py	Updated test parameters and helper functions to support separate `hq` and `hk` parameters; added GQA head expansion in reference implementation
op_tests/op_benchmarks/triton/bench_la.py	Updated benchmark configurations to include `hq` and `hk` parameters with new GQA test cases
aiter/ops/triton/lean_atten.py	Added GQA group size calculation, 64-bit indexing support, and enhanced runtime safety checks for buffer validation
aiter/ops/triton/_triton_kernels/lean_atten.py	Implemented GQA head mapping in kernel, added 64-bit pointer arithmetic, and padding mask support for irregular head dimensions

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

aiter/ops/triton/_triton_kernels/lean_atten.py

aiter/ops/triton/lean_atten.py

op_tests/triton_tests/test_la.py

kesavanramakrishnan and others added 2 commits October 17, 2025 17:47

Added GQA support for Lean Attention and 64 bit indexing

c18075a

reformatted files

a06a44e

Copilot AI review requested due to automatic review settings October 20, 2025 22:10

Merge branch 'main' into final_gqa_update

af5c053

Copilot AI reviewed Oct 20, 2025

View reviewed changes

aiter/ops/triton/_triton_kernels/lean_atten.py Show resolved Hide resolved

aiter/ops/triton/lean_atten.py Show resolved Hide resolved

op_tests/triton_tests/test_la.py Show resolved Hide resolved

valechen added 2 commits October 20, 2025 20:54

Merge branch 'main' into final_gqa_update

e432976

Merge branch 'main' into final_gqa_update

694572a

valechen self-requested a review October 21, 2025 15:43

valechen approved these changes Oct 21, 2025

View reviewed changes

valechen requested a review from vgokhale October 21, 2025 20:08

Merge branch 'main' into final_gqa_update

0f9c3aa

valechen removed the request for review from vgokhale October 22, 2025 21:53

valechen added 2 commits October 23, 2025 07:50

Merge branch 'main' into final_gqa_update

dd7bfa7

Merge branch 'main' into final_gqa_update

360fb25

valechen requested review from xiaohuguo2023 and removed request for xiaohuguo2023 October 23, 2025 22:10

gyohuangxin approved these changes Oct 24, 2025

View reviewed changes

Merge branch 'main' into final_gqa_update

4bcc725

valechen merged commit 0e10941 into main Oct 25, 2025
18 of 19 checks passed

valechen deleted the final_gqa_update branch October 25, 2025 02:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added in GQA and 64-bit indexing #1226

Added in GQA and 64-bit indexing #1226

Uh oh!

kesavanramakrishnan commented Oct 20, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Added in GQA and 64-bit indexing #1226

Added in GQA and 64-bit indexing #1226

Uh oh!

Conversation

kesavanramakrishnan commented Oct 20, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants