-
Notifications
You must be signed in to change notification settings - Fork 798
[SYCL] fix for flaky ~event failure in Win unit tests #19762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] fix for flaky ~event failure in Win unit tests #19762
Conversation
Can you please elaborate more on how the race happens? Like rough thread interleavings? |
@uditagarwal97 I had trouble pinning it down. It's flaky and also load sensitive. But what seems to be happening is that the clearing of the platform here is somehow leading to the UR to drop the adapters before we tell it to drop the adapters. This conflicts with our host tasks, which are threaded, when their event destructors fire and call to the now-dropped adapters. But this entire scenario is contrived. These are stunts we are doing in the unit tests to "pretend" like we are shutting down when we aren't. If clearing backends and platforms between tests is imperative, then probably we change course entirely and do that at the beginning of the UrMock, rather than simulating shutdown. But I didn't want to make such a substantial change when trying to address a couple of flaky CI failures. The scenario that is NOT contrived though, is using SYCL when statically linked. We don't have enough testing of that use case, and I think we need more. Perhaps a lot more |
… Also fix the otherwise tautological DeviceRefCounter test. Signed-off-by: Chris Perkins <[email protected]>
This PR is limited to the unit tests. The test failures are in unrelated e2e tests. Not sure what's going one there |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Windows shutdown/teardown doesn't have the exact same timing as on Linux, and even more in the the unit tests which are statically compiled ( so no DllMain() call, which means no space between the calls to
shutdown_early()
andshutdown_late()
). Here we remove a race between the platform deletion in the mock and the win shutdown. This fixes a flaky failure we are seeing in a couple of the unit tests.