Replies: 1 comment
-
Hi, $> ./a.out
[GPU, direct] a[0] == 12.000000
[GPU, direct] a[nb_cell-1] == 12.000000
[GPU, direct] b[0] == 437.000000
[GPU, direct] b[nb_cell-1] == 437.000000
[GPU, backward] da[0] == 72.000000
[GPU, backward] da[nb_cell-1] == 72.000000
[GPU, backward] db[0] == 0.000000
[GPU, backward] db[nb_cell-1] == 0.000000 I just adapted the first version with shadow parameter for |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
Sorry to re-open the subject (cf. #1565), but I try now to upgrade the CPU version above to a GPU's one for the backward mode.
CPU implementation
Here, you will find the updated above CPU example for forward and backward mode with functor instead of lambda function : https://fwd.gymni.ch/zEqXXH
The approach wrote on the CPU example is really generic; I now can defined any functor and it will be differentiated by calling :
for the forward mode or
for the backward mode.
GPU implementation
I really does not known yet if it will be possible to use this
defined loop+functor
approach on GPU.Acording to the documentation on GPU, a naive transcription of the CPU version into CUDA will not work, because of the call of
loop_on_dof_gpu<<<grd_topo, blk_topo>>>(nb_cell, f);
: https://fwd.gymni.ch/5CWooaTo do it right, I probably need to use Enzyme’s custom derivative registration to define a custom forward and reverse pass for the wrapper function of
compute_gpu
as follows : https://fwd.gymni.ch/murZk4NB: In fact, I'm having trouble creating the appropriate environment (
llvm-14+cuda-11.2+clang-14+enzyme-0.81
), so i do not have any opportunity to test the last code.Am I on the right track to being able to calculate the gradient with Enzyme on GPU ?
If I am completly wrong, can you highlight to me if this approach with a
defined loop+functor
has any chance of working on GPU ?If it is good :
void* __enzyme_register_gradient_compute_gpu_...[3] = {...}
which would be perfect !!!__enzyme_fwddiff<void>((void*)compute_gpu<SimpleFunctor>, nb_cell, enzyme_dup, (void*)&f_gpu, (void*)&df_gpu);
as done on CPU ?Thanks a lot for your precious help.
Best regards
Beta Was this translation helpful? Give feedback.
All reactions