-
Notifications
You must be signed in to change notification settings - Fork 66
[WIP] Multi-role warp specialization #5076
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Description
Changes walkthrough 📝
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
I think the problem is that we are using the same "slot full" mbarrier for both async warps. If one warp does the expect_tx and the load completed before the other warp has issued its expect_tx then the mbarrier will be arrived which throws off the expected parity bit leading to the deadlock. Instead I think it's best to have one slot full barrier and one slot empty barrier per async warp. Consumers that use buffers from two async warps like in this example will just need to wait for both mbarriers before proceeding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would you like me to review in this PR? It doesn't seem ready to merge.
Status: generates ITE tree but kernel hangs.
TwoAsyncWarps
test runs successfully withNVFUSER_ENABLE=kernel_debug
so there's a race that's mitigated by slowing down the kernel.Generated kernel:
I fixed the obvious things like where the compute warp is arriving twice and
b14
should not be in the predicates, and tried a few more things. Still investigating...Outlook: