Have you thought about an ArrayFire back end [as well as CUDA) ? would make Flux usable on AMD devices [Mac].