Skip to content

Generalized GPU support #57

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
vchuravy opened this issue May 13, 2025 · 0 comments
Open

Generalized GPU support #57

vchuravy opened this issue May 13, 2025 · 0 comments

Comments

@vchuravy
Copy link
Member

For supporting GPU operations generally, we face multiple issues:

  1. Limited support for certain element types
    • Metal.jl only supports UInt32/Int32 and Float32
    • CUDA.jl doesn't support smaller than Int32
  2. Limited support of operations.
  3. Limited support for atomic orderings
    • With a fence implementation we could emulate those, but UnsafeAtomics.fence does not work currently.

The current extensions are somewhat dissatisfying, since they “simply” give up and leave it up to the user to figure out what is going wrong.
Which doesn't even work since some backends fail silently #54

AMDGPU.jl makes the interesting choice to rely solely on UnsafeAtomics.jl and only specializes a single Atomix operation:
https://github.com/JuliaGPU/AMDGPU.jl/blob/3d07a5dbc4200ad36b98645d4f238d168fd6d09f/src/AMDGPU.jl#L130-L137

We should be able to support almost all element types as long as we have a compare and swap operation that is larger through masking (thi is how CUDA C supports Int16&co), but the question is “whose responsibility is implementation thereof”?

We should also be able to support all the relevant orders as long as we have a fence.

@maleadt recently spent some time JuliaGPU/GPUCompiler.jl#652

I believe we should make the following steps:

  1. In the extensions, instead of intercepting modify! for all x we should first only intercept it for the known supported types. This is mostly to avoid issues like Metal fails silently when Atomix is used with an unsupported operation #54
  2. Implement get and set! and xchg

Thereafter, it is a bit up on the air. We could wait and generalize JuliaGPU/GPUCompiler.jl#652 and state that UnsafeAtomics is the interface and all the special behavior is up to GPUCompiler to implement.

Some of the complexity is in my old CUDA prs that I never quite finished, but JuliaGPU/CUDA.jl#1644 shows how to implement order with atomic fences as fallback.

x-ref: JuliaGPU/CUDA.jl#1790

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant