Skip to content

Conversation

johnnynunez
Copy link

What does this PR do?

Fixes #1320 #1308 #1323 #1335 and includes fixes for flash-attention >= CUDA 12.9 and adds cutlass v4.2.1 that fixes some kernels for Blackwell.
Also add support for Spark and Thor.
Added Blackwell family support. https://developer.nvidia.com/blog/nvidia-blackwell-and-nvidia-cuda-12-9-introduce-family-specific-architecture-features/

Thanks to: #1285 #1262 that are included here.

Fixes in flash-attention to support CUDA 13:

  1. CUTLASS v4.2.1 Upgrade to cutlass v4.2.1 Dao-AILab/flash-attention#1905
  2. C++11 fix warnings C++11 fix warnings Dao-AILab/flash-attention#1904
  3. Blackwell family specific [NVIDIA] Enable Blackwell Family Specific Dao-AILab/flash-attention#1882
  4. [BUILD] SBSA wheels + CUDA 13 Support [BUILD] SBSA wheels + CUDA 13 Support Dao-AILab/flash-attention#1865
  5. [BUG] CUDA 13: make FA3 compatible with CUDA 13 Builds [BUG] CUDA 13: make FA3 compatible with CUDA 13 Builds Dao-AILab/flash-attention#1860

Pytorch 2.9.0 https://dev-discuss.pytorch.org/t/pytorch-2-9-rc1-produced-for-pytorch-audio-vision/3234

cc @sgrigory

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 9, 2025
@johnnynunez
Copy link
Author

johnnynunez commented Oct 15, 2025

@jiawenliu64 @bottler @sgrigory could you run and merge?

@snakeeater4526
Copy link

just a little message to tell you that i believe that PR is needed hard for people on cuda 13, basically pytorch 2.9 is now the stable one, but the latest available xformers doesn't support cuda 13.

so some software ( like comfyui ) using tensor related stuff can't work properly.

ps: i'm not a dev at all, but i just tried for an entire day to use comfyui with TensorRT acceleration, and it's basically impossible with Cuda 13 drivers... ( did not managed to compile successfully this PR sadly )

@johnnynunez
Copy link
Author

johnnynunez commented Oct 21, 2025

just a little message to tell you that i believe that PR is needed hard for people on cuda 13, basically pytorch 2.9 is now the stable one, but the latest available xformers doesn't support cuda 13.

so some software ( like comfyui ) using tensor related stuff can't work properly.

ps: i'm not a dev at all, but i just tried for an entire day to use comfyui with TensorRT acceleration, and it's basically impossible with Cuda 13 drivers... ( did not managed to compile successfully this PR sadly )

You have to export cccl to new ones. For me it is working.

@johnnynunez
Copy link
Author

just a little message to tell you that i believe that PR is needed hard for people on cuda 13, basically pytorch 2.9 is now the stable one, but the latest available xformers doesn't support cuda 13.

so some software ( like comfyui ) using tensor related stuff can't work properly.

ps: i'm not a dev at all, but i just tried for an entire day to use comfyui with TensorRT acceleration, and it's basically impossible with Cuda 13 drivers... ( did not managed to compile successfully this PR sadly )

could you try again? it should be fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cu129 ERROR

2 participants