Description
For supporting GPU operations generally, we face multiple issues:
- Limited support for certain element types
- Metal.jl only supports UInt32/Int32 and Float32
- CUDA.jl doesn't support smaller than Int32
- Limited support of operations.
- We have forgotten to implement
atomix_xchg
Implement atomicswap for CUDA #55
- We have forgotten to implement
- Limited support for atomic orderings
- With a fence implementation we could emulate those, but
UnsafeAtomics.fence
does not work currently.
- With a fence implementation we could emulate those, but
The current extensions are somewhat dissatisfying, since they “simply” give up and leave it up to the user to figure out what is going wrong.
Which doesn't even work since some backends fail silently #54
AMDGPU.jl makes the interesting choice to rely solely on UnsafeAtomics.jl and only specializes a single Atomix operation:
https://github.com/JuliaGPU/AMDGPU.jl/blob/3d07a5dbc4200ad36b98645d4f238d168fd6d09f/src/AMDGPU.jl#L130-L137
We should be able to support almost all element types as long as we have a compare and swap operation that is larger through masking (thi is how CUDA C supports Int16&co), but the question is “whose responsibility is implementation thereof”?
We should also be able to support all the relevant orders as long as we have a fence.
@maleadt recently spent some time JuliaGPU/GPUCompiler.jl#652
I believe we should make the following steps:
- In the extensions, instead of intercepting
modify!
for allx
we should first only intercept it for the known supported types. This is mostly to avoid issues like Metal fails silently when Atomix is used with an unsupported operation #54 - Implement
get
andset!
andxchg
Thereafter, it is a bit up on the air. We could wait and generalize JuliaGPU/GPUCompiler.jl#652 and state that UnsafeAtomics is the interface and all the special behavior is up to GPUCompiler to implement.
Some of the complexity is in my old CUDA prs that I never quite finished, but JuliaGPU/CUDA.jl#1644 shows how to implement order with atomic fences as fallback.
x-ref: JuliaGPU/CUDA.jl#1790