-
-
Notifications
You must be signed in to change notification settings - Fork 611
ArrayFire #1126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
What would need to be done? ArrayFire extends AbstractArrays, it looks like it could work out of the box. Have you tested it? I sadly don't own an AMD gpu to test it. |
dunno, not my area of expertise but im happy to fork and have a go. |
I tried the examples https://fluxml.ai/Flux.jl/stable/gpu/# but using ArrayFire equivalents - all fine. difficult to tell if its doing anything on GPU [yet]. |
Have you seen #938 as well? It's based on ROCArrays, it gives a good idea on how to implement things and to test if they work |
Do some huge, repeated, matmul. Like a |
did something like that. AFarrays() seem to work ! but |>gpu, not so sure. |
|
got it working. as you say, define the layers with ArrayFire, but also have to define the datasets as array fire. AF only support sigmoid AF. but if I do all that - works on AMD, 0.00017s for a given dataset versus 0.034 on cpu.(|>gpu doesn't do anything). Some of the speed difference there is undoubtedly the lack of latency on the async GPU calls. |
Cool!
Yeah, you'd need to change a bit of code for it to work. |
We could get it to offload to AF, but yeah, just sugar above cuda |
I've been thinking of setting up GPU backends for Flux, either baking them into the package (a la Plots, with the many backends) or splitting it up into lightweight libraries (eg FluxCuda, FluxArrayFire and FluxROCm) for example. Then it's up to the user to load the correct one, and then the gpu method would be the correct one as well. |
By the way @clive-g-brown, I'm not sure how AFArrays compare to CuArrays, but you might need to reconvert them to Arrays to save it to disk safely. |
Great idea on the multiple back ends. You’d reach Mac users like me. |
Or just do it through Requires? Similar to |
I'll be honest here and say that I had never read into Requires 😋 |
it has its own save/load [I do wonder what serialize would do], the wrapping seem more extensive than is documented on the front page - all the bits are there for more activation functions. https://github.com/JuliaGPU/ArrayFire.jl/blob/master/src/wrap.jl |
I only got so far with this, when I try LSTM layers it breaks on broadcasting. |
Did your code work on Cuda/CPU before? There are some things on broadcasting that became harder to get right with Zygote, I've seen other users commenting on it. |
CPU is fine. I don't have a CUDA test setup. ill have to reimplement. |
AF seems to take some liberties with broadcasting. It also trips over many basic NNlib functions because of that. Looking at the source, it appears to convert e.g. relu.(x) back to relu(x) and assume the function can run in vectorized form without explicit broadcast, which leads to that error. I suspect that's what's going on, anyway. It's a bit hard to tell, since it uses a global flag to control the broadcast somehow. AF also has trouble with Zygote.gradient, since it has a try-catch block inside broadcasted(), which evidently isn't allowed. Then there are some arithmetic oddities that don't quite match julia. E.g. you can do A*X with nd-arrays and it will matrix multiply the deepest dimensions, which Julia doesn't do out of the box. Might cause inconsistent behaviour. I've been doing a bit of feasibility assessment on either improving ArrayFire, reviving CLArrays or creating a new package from scratch with somewhat different approach. I'll probably have some extra time over the summer, so I'm giving it a proper go at least. Honestly, I'm leaning towards the last option. AF seems like a good idea at a glance, but I don't think I'd feel comfortable relying on it for anything non-trivial, and CLArrays might be too much for me to untangle. This just as a tentative heads-up in case someone has similar plans. Maybe we can compare notes. |
thanks for that, useful to know. I hadn't grokked it completely. The issue is using AMD GPU, compulsory on Mac. So a MetalArrays would sort that. But that wouldn't work so well on Win/LINUX. PlaidML has a nice system, chose a backend which includes Metal and OpenCL. I don't know if their tile system is wrappable or compatible although it clearly works with python/keras/tf. |
@jpsamaroo Is doing stuff with AMD arrays. There's also been talk of Metal GPU codegen by @PhilipVinc on the #GPU slack |
That still leaves out at least Intel and ARM hardware for most platforms and all non-nvidia hardware for windows (until ROCm support is expanded a bit). I think OpenCL still has some of mileage in it, at least until the remaining parties unveil their tailored apis for select OSes. We'll have a dozen back-ends to support in few years. I actually briefly looked into hacking Tensorflow.jl to use PlaidML via nGraph. I believe that would work, but it might take more effort than making a full backend from scratch. It's also not very Julian way to do things. |
That’s intriguing, ill have a look at that over the weekend. |
I don't think ArrayFire is a good way forward for a language like Julia. As we can see from the above conversation, ArrayFire.jl has to do a variety of "unholy" things to be able to dispatch to the ArrayFire library, and is limited by what ArrayFire's underlying library is built to do. What I see as the best way forward is the following:
|
I just started learning Flux and has been trying to get ArrayFire to work. AFArray doesn't seem to have in-place operations and cannot be changed without altering the objectid, but then Params work on IdDicts. Is there a workaround to this? |
Have you thought about an ArrayFire back end [as well as CUDA) ? would make Flux usable on AMD devices [Mac].
The text was updated successfully, but these errors were encountered: