Add several features. #4998

jiayus-nvidia · 2025-10-13T03:11:32Z

Add support for arbitrary mask.
Add support for paged kv in Ampere.
Add support for fp8 bwd with 5 quantization modes in Hopper.
Add support for all fp16/bf16 masks in fp8 fwd and bwd.

netlify · 2025-10-13T03:11:37Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`e01548d`
🔍 Latest deploy log	https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/68ef3d48857cfa0008c49811
😎 Deploy Preview	https://deploy-preview-4998--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

meta-codesync · 2025-10-13T12:19:36Z

@q10 has imported this pull request. If you are a Meta employee, you can view this in D84501311.

q10 · 2025-10-13T21:20:16Z

Hi @jiayus-nvidia the default values parsed in generate_kernels.py appear to be broken, I see

raise ValueError("ARBITRARY_NFUNC must be odd")

when I run the script. Could you update the script so that reasonable defaults are applied when no environment variables are set?

jiayus-nvidia · 2025-10-14T02:28:55Z

Hi @q10, sorry for not considering this situation. Fixed now.

q10 · 2025-10-14T23:32:21Z

Hi @jiayus-nvidia it appears there are some undefined symbols:

E   OSError: /home/ec2-user/miniconda/envs/build_binary/lib/python3.13/site-packages/fbgemm_gpu/experimental/hstu/fbgemm_gpu_experimental_hstu.so: undefined symbol: _Z13run_hstu_bwd_ILi90EN7cutlass12float_e4m3_tELi128ELb1ELb0ELb0ELb0ELb0ELb0ELb1ELi0EEvR15Hstu_bwd_paramsP11CUstream_st

Maybe the code generation step didn't generate all the template instantiations?

jiayus-nvidia · 2025-10-15T06:24:53Z

Hi @q10, you're right. That's because of the mismatch of instantiations generated and kernels called in main. Now it's ok to compile with no environment variables and run test.

Add several features.

262876b

meta-cla bot added the cla signed label Oct 13, 2025

Merge branch 'main' into main

0e65959

jiayus-nvidia added 2 commits October 13, 2025 19:19

Fix arbitrary with no environment variables

56ea697

Merge branch 'main' of https://github.com/jiayus-nvidia/FBGEMM

39727d0

Minor fixes

e01548d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add several features. #4998

Add several features. #4998

jiayus-nvidia commented Oct 13, 2025

Uh oh!

netlify bot commented Oct 13, 2025 •

edited

Loading

Uh oh!

meta-codesync bot commented Oct 13, 2025

Uh oh!

q10 commented Oct 13, 2025

Uh oh!

jiayus-nvidia commented Oct 14, 2025

Uh oh!

q10 commented Oct 14, 2025

Uh oh!

jiayus-nvidia commented Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add several features. #4998

Are you sure you want to change the base?

Add several features. #4998

Conversation

jiayus-nvidia commented Oct 13, 2025

Uh oh!

netlify bot commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Uh oh!

meta-codesync bot commented Oct 13, 2025

Uh oh!

q10 commented Oct 13, 2025

Uh oh!

jiayus-nvidia commented Oct 14, 2025

Uh oh!

q10 commented Oct 14, 2025

Uh oh!

jiayus-nvidia commented Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

netlify bot commented Oct 13, 2025 •

edited

Loading