[DRAFT] Add Lds transpose load ds read tr16 #2043

stefankoncarevic · 2025-10-17T15:39:20Z

Motivation

close: https://github.com/ROCm/rocMLIR-internal/issues/1858

Technical Details

This change adds full support for LDS transpose load integration within both single-buffering and double-buffering pipelines.
The implementation enables transpose-aware LDS loading for operands A and B, provided that both matrices use compatible memory layouts.
Currently, the logic performs iterations over the K dimension, while iteration over M and N dimensions is still under development and will be refined in the next update.
Future work will focus on performance evaluation and optimization of bank conflict patterns during LDS access.

Test Plan

Basic functionality was verified using existing MFMA pipeline tests for both single and double buffering.
Next, I will extend the tests to cover various matrix layout configurations and measure execution performance.
A detailed performance table and LDS bank conflict statistics will be added in comment later to quantify the improvements.

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

- Implemented rock.lds_transpose_load TD definition supporting f16 and bf16. - Added verifier to ensure source memref is in workgroup (LDS) memory and indices match rank. - Implemented lowering pattern to amdgpu.transpose_load. - Created MLIR tests covering FP16 and BF16 loads with FileCheck patterns.

dimensions

This commit introduces the full implementation of LDS transpose load handling used in threadwise read and single-buffering and double buffering pipelines. It adds logic for computing per-lane base offsets, generating LdsTransposeLoadOp instructions, and managing vectorized fragment loading for MFMA operations. The implementation supports multiple layout kinds (e.g., L16x16, L32x16, L32x8) and dynamically expands offsets for multi-K fused cases.This enables more flexible data movement between LDS and registers for MFMA input tiles.

f16 and bf16 data types, with multiple K-dimension configurations and schedule versions. Add CFG file to restrict execution to gfx950 architecture only, ensuring tests run exclusively on supported hardware. All test cases have passed validation under gfx950.

buffering pipeline.

for double buffering. Added global decision context to propagate hwtranspose::Decision from BlockwiseGemm to ThreadwiseReadIntoOp. Updated ThreadwiseReadIntoOp to attach hwtranspose attributes when a valid decision is available. Fixed double buffering handling to ensure correct LDS access.

decision handling.

stefankoncarevic added 6 commits October 17, 2025 09:15

Add helper functions to query MFMA intrinsic K and non-K

f9b081c

dimensions

Updated tests.

02b8151

Add conditional for transpose setup for A and B in single

4e4616e

buffering pipeline.

stefankoncarevic requested a review from causten as a code owner October 17, 2025 15:39

stefankoncarevic mentioned this pull request Oct 17, 2025

[DRAFT] Add Lds transpose load (ds.read_tr16) support on gfx950 for f16/bf16 #2029

Closed

1 task

stefankoncarevic self-assigned this Oct 17, 2025

stefankoncarevic marked this pull request as draft October 17, 2025 15:42

stefankoncarevic added 5 commits October 20, 2025 05:53

Merge branch 'develop' into lds_transpose_load_ds_read_tr16

d48b3b5

Fix getDecisionLdsTransposeContext for correct transpose

40bfd28

decision handling.

Merge branch 'develop' into lds_transpose_load_ds_read_tr16

99449f2

Fix attachAttributes function.

3525c61

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[DRAFT] Add Lds transpose load ds read tr16 #2043

[DRAFT] Add Lds transpose load ds read tr16 #2043

stefankoncarevic commented Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[DRAFT] Add Lds transpose load ds read tr16 #2043

Are you sure you want to change the base?

[DRAFT] Add Lds transpose load ds read tr16 #2043

Conversation

stefankoncarevic commented Oct 17, 2025

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants