Skip to content

Conversation

leoleoasd
Copy link

Copy link

copy-pr-bot bot commented Aug 21, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@sbhavani sbhavani added the bug Something isn't working label Sep 9, 2025
@leoleoasd
Copy link
Author

@sbhavani Is there any update on this?

@sbhavani
Copy link
Contributor

@leoleoasd thanks for reporting the issue and providing the fix!

I see that we recently added commit 1aef9f8ef which provides protection against division by zero. I'll review internally and check which approach is appropriate for the loss calculation.

@sbhavani
Copy link
Contributor

sbhavani commented Oct 2, 2025

@leoleoasd is your issue resolved with the recent commit?

@leoleoasd
Copy link
Author

I didn't get a chance to try it, will find sometime next week

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Context parallel nan loss

2 participants