Skip to content

Conversation

@DDEle
Copy link
Contributor

@DDEle DDEle commented Oct 23, 2025

Proposed changes

Please describe the motivation behind the pull request, whether it enables a new feature or fixes a bug. If there are associated pull requests or issues, please link them to the pull request.

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

  • I have added tests relevant to the introduced functionality, and the unit tests are passing locally
  • I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
  • I have added inline documentation which enables the maintainers with understanding the motivation
  • I have removed the stale documentation which is no longer relevant after this pull request
  • (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
  • I have run clang-format on all changed files
  • Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

DDEle and others added 3 commits October 23, 2025 07:06
commit 3e1a851
Author: Ding, Yi <[email protected]>
Date:   Thu Oct 23 06:10:54 2025 +0000

    Fix & clean after rebase

commit 1edf485
Author: Ding, Yi <[email protected]>
Date:   Wed Oct 22 10:46:13 2025 +0000

    Squashed commit of the following:

    commit 0b6b9db
    Author: mtgu0705 <[email protected]>
    Date:   Mon Sep 22 02:04:27 2025 -0500

        fix bandwidth calculation

    commit 9aebf53
    Author: mtgu0705 <[email protected]>
    Date:   Mon Sep 22 00:58:59 2025 -0500

        updates

    commit 62607de
    Author: mtgu0705 <[email protected]>
    Date:   Fri Sep 19 00:39:46 2025 -0500

        fix a bug, set the A DS_read preload size to 4 for MXFP4

    commit 92ad6fc
    Author: mtgu0705 <[email protected]>
    Date:   Thu Sep 18 01:19:03 2025 -0500

        fix a_wrap preload issue for large MPerBlock.

    commit f2db447
    Author: mtgu0705 <[email protected]>
    Date:   Wed Sep 17 21:34:03 2025 -0500

        optimized the VGPR repack issue for MXFP4

    commit 346a400
    Author: Gino Lu <[email protected]>
    Date:   Wed Sep 17 04:19:44 2025 -0500

        fix time error

    commit 80c1743
    Author: mtgu0705 <[email protected]>
    Date:   Wed Sep 17 03:58:00 2025 -0500

        updated, function passed.

    commit ce26d90
    Author: mtgu0705 <[email protected]>
    Date:   Tue Sep 16 22:21:39 2025 -0500

        fix, function partially passed

    commit 0a89ed1
    Author: mtgu0705 <[email protected]>
    Date:   Tue Sep 16 03:01:12 2025 -0500

        fix, reference function passed, next check kernel function

    commit ec9bcef
    Author: Gino Lu <[email protected]>
    Date:   Tue Sep 16 02:29:01 2025 -0500

        let pack/unpack return pk_fp4_t

    commit a333206
    Author: mtgu0705 <[email protected]>
    Date:   Mon Sep 15 20:50:26 2025 -0500

        fix

    commit 3893c06
    Author: Gino Lu <[email protected]>
    Date:   Mon Sep 15 05:51:06 2025 -0500

        fix bug

    commit 8052bea
    Author: mtgu0705 <[email protected]>
    Date:   Mon Sep 15 04:02:05 2025 -0500

        fix core dump issue, function is not correct.

    commit 9ceb3fd
    Author: mtgu0705 <[email protected]>
    Date:   Mon Sep 15 03:03:02 2025 -0500

        updates, build pass

    commit cc94eb6
    Author: mtgu0705 <[email protected]>
    Date:   Mon Sep 15 00:05:18 2025 -0500

        updates

    commit 22586c3
    Author: Gino Lu <[email protected]>
    Date:   Sun Sep 14 23:40:28 2025 -0500

        fix bug

    commit e92e67b
    Author: Gino Lu <[email protected]>
    Date:   Fri Sep 12 03:28:50 2025 -0500

        fix interface

    commit 8b1dd60
    Author: Gino Lu <[email protected]>
    Date:   Fri Sep 12 02:53:50 2025 -0500

        add interface in warp_gemm_impl

    commit c6135f6
    Author: mtgu0705 <[email protected]>
    Date:   Wed Sep 10 05:03:08 2025 -0500

        updates some fixes.

    commit b0d71b8
    Author: mtgu0705 <[email protected]>
    Date:   Tue Sep 9 04:37:42 2025 -0500

        fix after merge ginolu/add_wgmfma_dispatcher

    commit f119c30
    Merge: c5030e6 72c8ef8
    Author: mtgu0705 <[email protected]>
    Date:   Mon Sep 8 22:09:15 2025 -0500

        Merge remote-tracking branch 'origin/ginolu/add_wgmfma_dispatcher' into mtgu/cktile_mxfp4_flatmm_dev

    commit c5030e6
    Author: mtgu0705 <[email protected]>
    Date:   Mon Sep 8 21:42:47 2025 -0500

        update mx flatmm tail pipeline

    commit 72c8ef8
    Merge: 9661bb4 e4a7728
    Author: Gino Lu <[email protected]>
    Date:   Mon Sep 8 19:10:23 2025 -0500

        Merge branch 'develop' into ginolu/add_wgmfma_dispatcher

    commit 9661bb4
    Author: Gino Lu <[email protected]>
    Date:   Mon Sep 8 19:09:55 2025 -0500

        fix type error

    commit 0509597
    Author: mtgu0705 <[email protected]>
    Date:   Mon Sep 8 04:01:40 2025 -0500

        update hotloop pipeline

    commit 754ae04
    Merge: 15d4440 83f607e
    Author: Gino Lu <[email protected]>
    Date:   Fri Sep 5 04:22:26 2025 -0500

        Merge branch 'develop' into ginolu/add_wgmfma_dispatcher

    commit 15d4440
    Author: Gino Lu <[email protected]>
    Date:   Fri Sep 5 04:21:26 2025 -0500

        fix clang format

    commit 146963d
    Author: mtgu0705 <[email protected]>
    Date:   Wed Sep 3 10:00:54 2025 -0500

        some updates

    commit 12526b6
    Merge: 47cee04 00fd72b
    Author: asleepzzz <[email protected]>
    Date:   Wed Sep 3 13:22:03 2025 +0800

        Merge branch 'develop' into ginolu/add_wgmfma_dispatcher

    commit 47cee04
    Author: Gino Lu <[email protected]>
    Date:   Mon Sep 1 02:11:02 2025 -0500

        fix vec size error

    commit d289292
    Author: Gino Lu <[email protected]>
    Date:   Mon Sep 1 01:23:39 2025 -0500

        fix format error

    commit 16993ac
    Author: mtgu0705 <[email protected]>
    Date:   Sat Aug 30 03:19:07 2025 -0500

        update codes

    commit 9c37e55
    Author: mtgu0705 <[email protected]>
    Date:   Fri Aug 29 11:27:33 2025 -0500

        init ck_tile mxfp4 flatmm

    commit 5c484a5
    Author: Feng Shijie <[email protected]>
    Date:   Thu Aug 28 08:02:50 2025 +0000

        Add bias for f16xf4 moe_flatmm

    commit dd6539f
    Author: Feng Shijie <[email protected]>
    Date:   Wed Aug 27 13:39:47 2025 +0000

        update case construction

    commit 65b7024
    Author: Feng Shijie <[email protected]>
    Date:   Tue Aug 26 12:32:29 2025 +0000

        support swiglu activaion and use rcpf to accelerate silu

    commit b422e41
    Author: Gino Lu <[email protected]>
    Date:   Tue Aug 26 02:33:55 2025 -0500

        first commit

    commit d05eed9
    Author: root <[email protected]>
    Date:   Fri Aug 22 04:01:59 2025 -0500

        add line to last

    commit d69cab7
    Author: root <[email protected]>
    Date:   Fri Aug 22 03:20:46 2025 -0500

        adjust A_LDS descriptor to avoid bankconflict

    commit 65989e9
    Author: root <[email protected]>
    Date:   Thu Aug 21 09:46:52 2025 -0500

        enable hotloop

    commit c378e9b
    Author: Feng Shijie <[email protected]>
    Date:   Thu Aug 21 09:12:21 2025 +0000

        support atomic_pk_add_bf16 on gfx950

    commit 85976b0
    Author: Feng Shijie <[email protected]>
    Date:   Thu Aug 21 06:58:55 2025 +0000

        use int64_t as expert stride to avoid overflow

    commit 9fbcc8f
    Author: Feng Shijie <[email protected]>
    Date:   Wed Aug 20 13:53:32 2025 +0000

        use v4i32 as the storage type for B to avoid repack operation

    commit 81899bd
    Author: Feng Shijie <[email protected]>
    Date:   Wed Aug 20 06:40:03 2025 +0000

        add pk_fp4_t and e8m0_t support for amd_buffer_load_impl

    commit c27eb07
    Author: Feng Shijie <[email protected]>
    Date:   Wed Aug 20 04:39:14 2025 +0000

        optimize cvt_pkf4_to_f16 implementation

    commit 3ca0bd5
    Author: Feng Shijie <[email protected]>
    Date:   Tue Aug 19 14:56:46 2025 +0000

        optimize A_LDS descriptor to avoid bankconflict

    commit f7f0306
    Author: Feng Shijie <[email protected]>
    Date:   Mon Aug 18 18:43:37 2025 +0000

        fix gate-up when GU_NRepeat > 1

    commit be55c0f
    Author: Feng Shijie <[email protected]>
    Date:   Mon Aug 18 17:28:11 2025 +0000

        add fp16xf4 moe

    commit 599e1f5
    Author: Feng Shijie <[email protected]>
    Date:   Sun Aug 17 17:51:18 2025 +0000

        rename example

    commit 7899fb4
    Author: Feng Shijie <[email protected]>
    Date:   Fri Aug 15 06:20:46 2025 +0000

        remove additional check when e8m0->float

    commit 714b341
    Author: Feng Shijie <[email protected]>
    Date:   Thu Aug 14 09:34:12 2025 +0000

        eliminate repeat dequant

    commit 53e8c0c
    Merge: 5de6208 cc9c7b9
    Author: Feng Shijie <[email protected]>
    Date:   Wed Aug 13 16:51:49 2025 +0000

        Merge remote-tracking branch 'origin/moe_flatmm' into feat-mixed_input_flatmm

    commit 5de6208
    Author: Feng Shijie <[email protected]>
    Date:   Wed Aug 13 16:16:48 2025 +0000

        update f16xMXF4

    commit 732ebde
    Author: Feng Shijie <[email protected]>
    Date:   Wed Aug 13 10:48:53 2025 +0000

        update scale-preshuffle for MXF4

    commit edb58d0
    Author: Feng Shijie <[email protected]>
    Date:   Mon Aug 11 11:24:34 2025 +0000

        update

    commit cc9c7b9
    Author: Feng Shijie <[email protected]>
    Date:   Mon Aug 11 08:38:23 2025 +0000

        optimize gemm2 atomic_add pattern

    commit 200a11a
    Author: Feng Shijie <[email protected]>
    Date:   Mon Aug 11 07:59:47 2025 +0000

        update scale for mxfp4

    commit 87aed56
    Author: Feng Shijie <[email protected]>
    Date:   Mon Aug 11 07:56:14 2025 +0000

        update case construction

    commit 8b85fa6
    Author: Feng Shijie <[email protected]>
    Date:   Mon Aug 11 06:03:06 2025 +0000

        update granularity control

    commit 1b8c709
    Author: Feng Shijie <[email protected]>
    Date:   Mon Aug 11 03:42:46 2025 +0000

        fix TileConfig

    commit 8ba1c70
    Author: Gino Lu <[email protected]>
    Date:   Thu Aug 7 21:37:28 2025 +0800

        Add e8m0 scaled convert into CK_TILE (#2617)

        * first commit

        * remove redundent code

        * modify according to comments.

        * fix type_convert error with scaled_type_convert

    commit f788d3d
    Author: Feng Shijie <[email protected]>
    Date:   Fri Aug 8 20:19:16 2025 +0000

        add mixed_prec fp16xfp4

    commit 3dea10a
    Author: Feng Shijie <[email protected]>
    Date:   Thu Aug 7 09:22:04 2025 +0000

        debug mixed_prec flatmm

    commit 0ba513b
    Merge: 90e910f c0cb4d0
    Author: lalala-sh <[email protected]>
    Date:   Wed Aug 6 16:49:47 2025 +0800

        Merge pull request #2626 from ROCm/felix/flatmm_fix_splitk

        fix split k

    commit 6d3cbc7
    Author: Feng Shijie <[email protected]>
    Date:   Wed Aug 6 08:33:33 2025 +0000

        add moe_flatmm

    commit c0cb4d0
    Author: coderfeli <[email protected]>
    Date:   Wed Aug 6 02:45:31 2025 +0000

        fix split k

    commit 90e910f
    Author: Feng Shijie <[email protected]>
    Date:   Mon Aug 4 07:16:36 2025 +0000

        fix flatmm with scaling when WarpTileM == 32

    commit aa5e008
    Author: Feng Shijie <[email protected]>
    Date:   Fri Aug 1 11:01:23 2025 +0000

        optimize scaling epilogue

    commit ac5908c
    Author: Feng Shijie <[email protected]>
    Date:   Fri Aug 1 07:28:38 2025 +0000

        fix wrong config for fp8 scaling

    commit 3f43b84
    Author: Feng Shijie <[email protected]>
    Date:   Wed Jul 30 06:20:30 2025 +0000

        prune debug message

    commit 2e5d4c7
    Author: Feng Shijie <[email protected]>
    Date:   Wed Jul 30 04:52:08 2025 +0000

        fix compile error

    commit c117a19
    Author: Feng Shijie <[email protected]>
    Date:   Tue Jul 29 15:42:58 2025 +0000

        Add persistent option on flatmm for tuning

    commit a587701
    Author: AMD-dteng <[email protected]>
    Date:   Tue Jul 29 22:48:00 2025 +0800

        update pipeline v1: add atomic IGLP schedule

    commit f9e4814
    Author: lalala-sh <[email protected]>
    Date:   Thu Jul 24 09:09:27 2025 +0000

        fix error log throwing

    commit 1b6d7cf
    Author: Feng Shijie <[email protected]>
    Date:   Mon Jul 28 08:24:51 2025 +0000

        crz idea

    commit 5473f06
    Author: Feng Shijie <[email protected]>
    Date:   Sun Jul 27 11:57:38 2025 +0000

        Add permuteN optimzization when NRepeat % 2 == 0 on flatmm

    commit bfb9f40
    Author: sjfeng <[email protected]>
    Date:   Sun Jul 27 17:24:08 2025 +0800

        try to remove c_shuffle_lds

    commit 1264f4d
    Author: Feng Shijie <[email protected]>
    Date:   Fri Jul 25 07:41:48 2025 +0000

        fix loop-dim mismatch and improve c_shuffle alu parallelism

    commit 1239d8a
    Merge: 4066454 b908f5e
    Author: lalala-sh <[email protected]>
    Date:   Thu Jul 24 08:46:51 2025 +0000

        merge flatmm -scale

    commit 4066454
    Author: lalala-sh <[email protected]>
    Date:   Thu Jul 24 16:19:58 2025 +0800

        revert delete of inc file

    commit 6839098
    Author: solin <[email protected]>
    Date:   Thu Jul 24 04:38:16 2025 +0000

        reorg  flatmm code

    commit b908f5e
    Author: Feng Shijie <[email protected]>
    Date:   Wed Jul 23 19:12:31 2025 +0000

        fix flatmm syntax error on gfx950

    commit 5a1183e
    Author: Feng Shijie <[email protected]>
    Date:   Wed Jul 23 19:04:22 2025 +0000

        support flatmm scaling

    commit 89fa639
    Author: valarLip <[email protected]>
    Date:   Wed Jul 23 08:44:12 2025 +0000

        merge flatmm pipe v0 from dteng_flatmm_opt

    commit 3f7d848
    Author: lalala-sh <[email protected]>
    Date:   Wed Jul 23 15:38:12 2025 +0800

        build pass

    commit 6dacf83
    Author: lalala-sh <[email protected]>
    Date:   Wed Jul 23 07:20:26 2025 +0000

         fix bug

    commit 7e1bd4b
    Author: lalala-sh <[email protected]>
    Date:   Wed Jul 23 15:01:53 2025 +0800

        sync

    commit 46a538e
    Author: valarLip <[email protected]>
    Date:   Tue Jul 22 08:09:35 2025 +0000

        adaptive scheduler instead of Macro definition

    commit 9aa3396
    Author: lalala-sh <[email protected]>
    Date:   Thu Jul 17 08:40:35 2025 +0000

        fix tail handler bug

    commit fb76450
    Author: lalala-sh <[email protected]>
    Date:   Wed Jul 16 10:12:19 2025 +0000

        merge from dteng_flatmm_opt

---------

Co-authored-by: lalala-sh <[email protected]>
Co-authored-by: AMD-dteng <[email protected]>
Co-authored-by: solin <[email protected]>
Co-authored-by: sjfeng <[email protected]>
Co-authored-by: valarLip <[email protected]>
Co-authored-by: asleepzzz <[email protected]>
Co-authored-by: Feng Shijie <[email protected]>
Co-authored-by: coderfeli <[email protected]>
Co-authored-by: Gino Lu <[email protected]>
Co-authored-by: mtgu0705 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant