-
Notifications
You must be signed in to change notification settings - Fork 66
Fix predicate elimination with shared memory tensors #5107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
!test --diff |
Review updated until commit 5eb6ad9 Description
Changes walkthrough 📝
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
for (auto consumer : ir_utils::filterByType<TensorView>(expr->outputs())) { | ||
for (auto producer : ir_utils::filterByType<TensorView>(expr->inputs())) { | ||
if (isSharedMemory(producer) || isSharedMemory(consumer)) { | ||
if (needSharedMemPredicate(producer, consumer)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is not done when there's no producer.
return false; | ||
} | ||
|
||
bool needsPredicateSharedMemAccess(const Expr* expr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes here are just making sure all checks are performed even when there's no producer.
It turned out this resulted in a lot of code changes, which I don't feel like going through right now. For now, I tried to keep the current behavior as is, even though in some cases it doesn't seem to make sense, as long as the correctness is preserved.
!test --diff |
!test --diff |
!test --diff |
!test --diff |
// dimensions, need to predicate against out of bound | ||
// shared memory access by out of bound threads. | ||
// | ||
// TODO: This condition should not need |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, I thought we shouldn't need to look at producers, but that resulted in some unexpected changes that includes missed shared memory aliasing.
} | ||
} | ||
|
||
// TODO: This condition should not need |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly here, this condition is to keep the current behavior for now.
The predicate elimination check with shared memory is skipped when there's no producer tensor, which is wrong. This happens, for example, with factory ops. Please see the new test.
Here's the generated kernel with the repro using ToT:
Notice that the predicate of the zero assignment to
T2
is eliminated, which is wrong.TODO: It assumes the allocation of a shared memory tensor is always based on the loop domain, which may not be true.