Skip to content

Conversation

@Jeff-Huang
Copy link
Contributor

@Jeff-Huang Jeff-Huang commented Oct 25, 2025

Proposed changes

[CK_TILE] fmha: Add query padding support to backward pass

Introduces support for query sequence padding (q_padding) in the FMHA backward pass kernels.
- Passing seqlen_q_ptr to the backward kernels to distinguish logical from physical sequence lengths.
- Updating OGradDotO, ConvertQGrad, and DQDKDV kernels to respect logical lengths and handle zero-length sequences.
- Aligning LSE indexing in the forward kernel with the padded layout for consistency.
- Adding a new GTest suite (test_fmha_bwd_kernel_padding.cpp) with comprehensive tests for various padding scenarios, Including zero-length sequences in fwd/bwd group mode

Unify sequence length and padding handling
- Replaced seqstart_padded_*_ptr with a more robust system that uses
seqstart_*_ptr for physical sequence lengths and introduces
seqlen_*_ptr and cu_seqlen_*_ptr for logical (unpadded) lengths.
- Established a clear order of precedence for determining sequence
length: cumulative lengths (cu_seqlen_*_ptr) take priority,
followed by per-sequence lengths (seqlen_*_ptr), and finally
physical lengths derived from seqstart_*_ptr.
- Clarified the distinction between "group mode" and "batch mode" and
how sequence lengths are handled in each case.
- Renamed cu_seqlen_kv_ptr to cu_seqlen_k_ptr for consistency.
- Updated comments and documentation to reflect the new argument
structure and usage.

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

  • I have added tests relevant to the introduced functionality, and the unit tests are passing locally
  • I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
  • I have added inline documentation which enables the maintainers with understanding the motivation
  • I have removed the stale documentation which is no longer relevant after this pull request
  • (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
  • I have run clang-format on all changed files
  • Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

Jeff-Huang and others added 3 commits October 25, 2025 16:10
Introduces support for query sequence padding (q_padding) in the FMHA backward pass kernels.
- Passing `seqlen_q_ptr` to the backward kernels to distinguish logical from physical sequence lengths.
- Updating `OGradDotO`, `ConvertQGrad`, and `DQDKDV` kernels to respect logical lengths and handle zero-length sequences.
- Aligning LSE indexing in the forward kernel with the padded layout for consistency.
- Adding a new GTest suite (`test_fmha_bwd_kernel_padding.cpp`) with comprehensive tests for various padding scenarios, including zero-length
  sequences and deterministic mode.
Add backward q/kv sequence padding unit tests.
@Jeff-Huang Jeff-Huang changed the title Swdev 558893 fix [CK_TILE] fmha: Add query padding support to backward pass Oct 25, 2025
Refactor the handling of sequence lengths and padding in the
FMHA forward and backward kernels to provide a more unified and flexible
interface.

- Replaced `seqstart_padded_*_ptr` with a more robust system that uses
  `seqstart_*_ptr` for physical sequence lengths and introduces
  `seqlen_*_ptr` and `cu_seqlen_*_ptr` for logical (unpadded) lengths.
- Established a clear order of precedence for determining sequence
  length: cumulative lengths (`cu_seqlen_*_ptr`) take priority,
  followed by per-sequence lengths (`seqlen_*_ptr`), and finally
  physical lengths derived from `seqstart_*_ptr`.
- Clarified the distinction between "group mode" and "batch mode" and
  how sequence lengths are handled in each case.
- Renamed `cu_seqlen_kv_ptr` to `cu_seqlen_k_ptr` for consistency.
- Updated comments and documentation to reflect the new argument
  structure and usage.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants