Skip to content

Conversation

gchalump
Copy link
Contributor

Summary:
Continuous attempt to improve build time.
Part1: D83523333

This diff breaks the files further down such that it is one instantiation per file

Update Structure

  • Template Headers:

    • blackwell_fmha_bwd_template.cuh: Template definition only
    • blackwell_fmha_fwd_template.cuh: Template definition only
  • Instantiation Files (ONE instantiation per file):

    • 74 files following naming convention: blackwell_fmha_{fwd|bwd}_hdim{64|128}_{fp16|bf16|fp8}_{varlen|novarlen}_{mask}_{det}_sm100.cu
    • Examples:
      • blackwell_fmha_fwd_hdim128_fp16_novarlen_nomask_sm100.cu
      • blackwell_fmha_bwd_hdim128_bf16_novarlen_nodet_causal_sm100.cu
      • blackwell_fmha_fwd_hdim64_fp8_varlen_residual_sm100.cu

Differential Revision: D84100982

Copy link

netlify bot commented Oct 13, 2025

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit ff28ca5
🔍 Latest deploy log https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/68f004b8c87658000894c81e
😎 Deploy Preview https://deploy-preview-5000--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Contributor

meta-codesync bot commented Oct 13, 2025

@gchalump has exported this pull request. If you are a Meta employee, you can view the originating Diff in D84100982.

gchalump added a commit to gchalump/FBGEMM that referenced this pull request Oct 14, 2025
Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/2014

Pull Request resolved: pytorch#5000

Continuous attempt to improve build time.
Part1: D83523333

This diff breaks the files further down such that it is one instantiation per file

### Update Structure
- **Template Headers**:
  - `blackwell_fmha_bwd_template.cuh`: Template definition only
  - `blackwell_fmha_fwd_template.cuh`: Template definition only

- **Instantiation Files** (ONE instantiation per file):
  - 74 files following naming convention: `blackwell_fmha_{fwd|bwd}_hdim{64|128}_{fp16|bf16|fp8}_{varlen|novarlen}_{mask}_{det}_sm100.cu`
  - Examples:
    - `blackwell_fmha_fwd_hdim128_fp16_novarlen_nomask_sm100.cu`
    - `blackwell_fmha_bwd_hdim128_bf16_novarlen_nodet_causal_sm100.cu`
    - `blackwell_fmha_fwd_hdim64_fp8_varlen_residual_sm100.cu`

Differential Revision: D84100982
gchalump added a commit to gchalump/FBGEMM that referenced this pull request Oct 14, 2025
Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/2014

Pull Request resolved: pytorch#5000

Continuous attempt to improve build time.
Part1: D83523333

This diff breaks the files further down such that it is one instantiation per file

### Update Structure
- **Template Headers**:
  - `blackwell_fmha_bwd_template.cuh`: Template definition only
  - `blackwell_fmha_fwd_template.cuh`: Template definition only

- **Instantiation Files** (ONE instantiation per file):
  - 74 files following naming convention: `blackwell_fmha_{fwd|bwd}_hdim{64|128}_{fp16|bf16|fp8}_{varlen|novarlen}_{mask}_{det}_sm100.cu`
  - Examples:
    - `blackwell_fmha_fwd_hdim128_fp16_novarlen_nomask_sm100.cu`
    - `blackwell_fmha_bwd_hdim128_bf16_novarlen_nodet_causal_sm100.cu`
    - `blackwell_fmha_fwd_hdim64_fp8_varlen_residual_sm100.cu`

Differential Revision: D84100982
gchalump added a commit to gchalump/FBGEMM that referenced this pull request Oct 15, 2025
Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/2014

Pull Request resolved: pytorch#5000

Continuous attempt to improve build time.
Part1: D83523333

This diff breaks the files further down such that it is one instantiation per file

### Update Structure
- **Template Headers**:
  - `blackwell_fmha_bwd_template.cuh`: Template definition only
  - `blackwell_fmha_fwd_template.cuh`: Template definition only

- **Instantiation Files** (ONE instantiation per file):
  - 74 files following naming convention: `blackwell_fmha_{fwd|bwd}_hdim{64|128}_{fp16|bf16|fp8}_{varlen|novarlen}_{mask}_{det}_sm100.cu`
  - Examples:
    - `blackwell_fmha_fwd_hdim128_fp16_novarlen_nomask_sm100.cu`
    - `blackwell_fmha_bwd_hdim128_bf16_novarlen_nodet_causal_sm100.cu`
    - `blackwell_fmha_fwd_hdim64_fp8_varlen_residual_sm100.cu`

Differential Revision: D84100982
Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/2014

Pull Request resolved: pytorch#5000

Continuous attempt to improve build time.
Part1: D83523333

This diff breaks the files further down such that it is one instantiation per file

### Update Structure
- **Template Headers**:
  - `blackwell_fmha_bwd_template.cuh`: Template definition only
  - `blackwell_fmha_fwd_template.cuh`: Template definition only

- **Instantiation Files** (ONE instantiation per file):
  - 74 files following naming convention: `blackwell_fmha_{fwd|bwd}_hdim{64|128}_{fp16|bf16|fp8}_{varlen|novarlen}_{mask}_{det}_sm100.cu`
  - Examples:
    - `blackwell_fmha_fwd_hdim128_fp16_novarlen_nomask_sm100.cu`
    - `blackwell_fmha_bwd_hdim128_bf16_novarlen_nodet_causal_sm100.cu`
    - `blackwell_fmha_fwd_hdim64_fp8_varlen_residual_sm100.cu`

Differential Revision: D84100982
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant