Skip to content

Conversation

@nrailg
Copy link

@nrailg nrailg commented Oct 13, 2025

Sometimes, certain attention and mask implementations are difficult to write a fused / optimized implementation in a short period of time. However, we still need to run experiments to verify their effectiveness.
At such times, we need to fallback to the eager mode. Therefore, I added a switch to fallback to the eager implementation of attention:

--fallback-to-eager-attn

Additionally, since Megatron Core's eager attention does not support context parallelism, I provided a distributed attention implementation similar to that described in the Llama 3 paper.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Oct 13, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@nrailg nrailg changed the title Add context parallel support to eager attention implementation Adding context parallel support to eager attention implementation Oct 13, 2025
@nrailg nrailg force-pushed the nrwu/eagercp branch 2 times, most recently from 10ddbd8 to 5c981b7 Compare October 14, 2025 08:48
@sbhavani sbhavani added the enhancement New feature or request label Oct 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants