Skip to content

Conversation

@olehtika
Copy link
Contributor

Pull Request Template

Note to AMDers:
This is a public repository. Please do not upload any confidential or customer data. Make sure all such data has been anonymized or removed before making this PR. If you need to attach any private files or links, please insert a Internal OneDrive Link or a Jira Ticket Link instead.

@olehtika olehtika marked this pull request as draft October 16, 2025 09:00
@olehtika olehtika assigned olehtika and amital-amd and unassigned olehtika Oct 16, 2025
Copy link

@jasainio jasainio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I'm not sure if JAXRcclAnalyser is the proper place for this code. Why not put this be under NcclAnalyser?
  • Remove unnecessary files that are not meant to be included in the PR.
  • rccl_complete_analysis.py should be refactored and only the essential parts should be kept.
  • Time differences between ranks in different nodes might not be meaningful. These should be investigated further. Probably out of the scope of this work.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this folder the proper place for Jupyter notebooks?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this folder the proper place for Jupyter notebooks?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean this out if not needed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this file?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this file?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this file?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you try to assimilate (rewrite/clean/refactor) the essential parts from this file into the current feature (multinode RCCL analysis) and place those in locations that make sense in TraceLense?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably not correct to compare timestamps between different nodes / processes as they might not be synced. I think this requires further investigations. The safe option would be to remove any calculations / metrics that depend on comparing timestamps on different nodes / processes and add these in a future PR if they make sense.

@amital-amd amital-amd added the enhancement New feature or request label Oct 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants