Skip to content

Conversation

kandelak
Copy link

@kandelak kandelak commented May 16, 2025

There is a bug in calculation of the attention if fused_attn is set to false.

To replicate this bug, set fused_attn to false and you will get very poor reconstruction whereas if it is set to true (default), it works. With this change, it works again (reimplemented non-efficient fused_attention basically)

Possible reason: The training was done using F.scaled_dot_product_attention which is internally different from the "else branch" where attention calculation happens in a non-efficient way.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants