You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With a fence implementation we could emulate those, but UnsafeAtomics.fence does not work currently.
The current extensions are somewhat dissatisfying, since they “simply” give up and leave it up to the user to figure out what is going wrong.
Which doesn't even work since some backends fail silently #54
We should be able to support almost all element types as long as we have a compare and swap operation that is larger through masking (thi is how CUDA C supports Int16&co), but the question is “whose responsibility is implementation thereof”?
We should also be able to support all the relevant orders as long as we have a fence.
Thereafter, it is a bit up on the air. We could wait and generalize JuliaGPU/GPUCompiler.jl#652 and state that UnsafeAtomics is the interface and all the special behavior is up to GPUCompiler to implement.
Some of the complexity is in my old CUDA prs that I never quite finished, but JuliaGPU/CUDA.jl#1644 shows how to implement order with atomic fences as fallback.
For supporting GPU operations generally, we face multiple issues:
atomix_xchg
Implement atomicswap for CUDA #55UnsafeAtomics.fence
does not work currently.The current extensions are somewhat dissatisfying, since they “simply” give up and leave it up to the user to figure out what is going wrong.
Which doesn't even work since some backends fail silently #54
AMDGPU.jl makes the interesting choice to rely solely on UnsafeAtomics.jl and only specializes a single Atomix operation:
https://github.com/JuliaGPU/AMDGPU.jl/blob/3d07a5dbc4200ad36b98645d4f238d168fd6d09f/src/AMDGPU.jl#L130-L137
We should be able to support almost all element types as long as we have a compare and swap operation that is larger through masking (thi is how CUDA C supports Int16&co), but the question is “whose responsibility is implementation thereof”?
We should also be able to support all the relevant orders as long as we have a fence.
@maleadt recently spent some time JuliaGPU/GPUCompiler.jl#652
I believe we should make the following steps:
modify!
for allx
we should first only intercept it for the known supported types. This is mostly to avoid issues like Metal fails silently when Atomix is used with an unsupported operation #54get
andset!
andxchg
Thereafter, it is a bit up on the air. We could wait and generalize JuliaGPU/GPUCompiler.jl#652 and state that UnsafeAtomics is the interface and all the special behavior is up to GPUCompiler to implement.
Some of the complexity is in my old CUDA prs that I never quite finished, but JuliaGPU/CUDA.jl#1644 shows how to implement order with atomic fences as fallback.
x-ref: JuliaGPU/CUDA.jl#1790
The text was updated successfully, but these errors were encountered: