Skip to content

Conversation

jiayus-nvidia
Copy link
Contributor

  1. Add support for arbitrary mask.
  2. Add support for paged kv in Ampere.
  3. Add support for fp8 bwd with 5 quantization modes in Hopper.
  4. Add support for all fp16/bf16 masks in fp8 fwd and bwd.

Copy link

netlify bot commented Oct 13, 2025

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit e01548d
🔍 Latest deploy log https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/68ef3d48857cfa0008c49811
😎 Deploy Preview https://deploy-preview-4998--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@meta-cla meta-cla bot added the cla signed label Oct 13, 2025
Copy link
Contributor

meta-codesync bot commented Oct 13, 2025

@q10 has imported this pull request. If you are a Meta employee, you can view this in D84501311.

@q10
Copy link
Contributor

q10 commented Oct 13, 2025

Hi @jiayus-nvidia the default values parsed in generate_kernels.py appear to be broken, I see

raise ValueError("ARBITRARY_NFUNC must be odd")

when I run the script. Could you update the script so that reasonable defaults are applied when no environment variables are set?

@jiayus-nvidia
Copy link
Contributor Author

Hi @q10, sorry for not considering this situation. Fixed now.

@q10
Copy link
Contributor

q10 commented Oct 14, 2025

Hi @jiayus-nvidia it appears there are some undefined symbols:

E   OSError: /home/ec2-user/miniconda/envs/build_binary/lib/python3.13/site-packages/fbgemm_gpu/experimental/hstu/fbgemm_gpu_experimental_hstu.so: undefined symbol: _Z13run_hstu_bwd_ILi90EN7cutlass12float_e4m3_tELi128ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELi0EEvR15Hstu_bwd_paramsP11CUstream_st

Maybe the code generation step didn't generate all the template instantiations?

@jiayus-nvidia
Copy link
Contributor Author

Hi @q10, you're right. That's because of the mismatch of instantiations generated and kernels called in main. Now it's ok to compile with no environment variables and run test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants