Skip to content

Conversation

@iassiour
Copy link

@iassiour iassiour commented Dec 2, 2025

Motivation

This patch fixes a deadlock in ActiveSignal when profiling and graph packet batching is enabled, that is introduced after ROCm/rocm-systems#1354.

Technical Details

Added a timestamp check to the signal insertion logic. Previously, a new profiling signal was only created when the existing signal was still active (signal > 0). Added a second condition signal_list_[temp_id]->ts_ == ts to handle the case where the slot is already associated with the current timestamp.

Test Plan

Tested with the pytorch job from https://ontrack-internal.amd.com/browse/SWDEV-569101 and verified that it resolves the hang

Test Result

Resolved hang in the reproducer job from https://ontrack-internal.amd.com/browse/SWDEV-569101

Submission Checklist

Copy link
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's first try to pull this commit in with a regular submodule update. Patches are very disruptive.

@iassiour
Copy link
Author

iassiour commented Dec 2, 2025

@ScottTodd please clarify whether you need any changes from my side in this PR, or you are planning to do a submodule update in a different PR and when will that happen. As it is mentioned in the chat, we need this fix in tonight's Rock build.

@jayhawk-commits
Copy link
Contributor

I'll initiate a submodule bump for rocm-systems now.

@iassiour
Copy link
Author

iassiour commented Dec 2, 2025

Thank you @jayhawk-commits can you post a link to that PR?

@jayhawk-commits
Copy link
Contributor

Thank you @jayhawk-commits can you post a link to that PR?

#2387

@jayhawk-commits
Copy link
Contributor

The latest rocm-systems submodule bump from #2387 should include this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: TODO

Development

Successfully merging this pull request may close these issues.

4 participants