Bringing GPU programming to DFTK #697

GVigne · 2022-07-25T12:58:47Z

This is a work in progress and probably shouldn't be merged immediately into DFTK.
As part of my GSoC (you can see some additional information here ), I am working on a GPU version of DFTK. One of the goal is to keep overall code changes as low as possible (I do not aim to build an other DFTK package using GPUs).
This is the first step in my project: so far I managed to implement the Kinetic, Local and NonLocal terms and am running computations through the self_consistent_field function. I have not yet managed to make the SCF solvers work, so I disabled them for now.

Here is a MWE:

using DFTK
using CUDA
a = 10.263141334305942  # Lattice constant in Bohr
lattice = a / 2 .* [[0 1 1.]; [1 0 1.]; [1 1 0.]]
Si = ElementPsp(:Si, psp=load_psp("hgh/lda/Si-q4"))
atoms     = [Si, Si]
positions = [ones(3)/8, -ones(3)/8];
terms_LDA = [Kinetic(), AtomicLocal(), AtomicNonlocal()]

# Setup an LDA model and discretize using
# a single k-point and a small `Ecut` of 5 Hartree.
mod = Model(lattice, atoms, positions; terms=terms_LDA,symmetries=false)
basis = PlaneWaveBasis(mod; Ecut=30, kgrid=(1, 1, 1))
basis_gpu = PlaneWaveBasis(mod; Ecut=30, kgrid=(1, 1, 1), array_type = CuArray)

scfres = self_consistent_field(basis; tol=1e-3, solver=scf_damping_solver(1.0))
scfres_gpu = self_consistent_field(basis_gpu; tol=1e-3, solver=scf_damping_solver(1.0))

Any feedback is appreciated, be it on the code in itself or the implementation of new features!

…h no SCF solver (solver=scf_damping_solver(1.0)) and just one Kinetic term.

…lity in LOBPCG

mfherbst · 2022-07-25T19:08:19Z

Super awesome! I have not had a look at the code, but the interface is pretty much exactly how I'd like to have it.

antoine-levitt

This is awesome!! A lot of minor comments, but I'm really impressed with how simple it turns out to be (which of course you should take as a compliment: making things look simple is very hard). Most important is we should talk about a good API for defining new arrays.

src/PlaneWaveBasis.jl

src/densities.jl

antoine-levitt · 2022-07-25T19:30:46Z

src/guess_density.jl

@@ -66,7 +66,8 @@ function _guess_spin_density(basis::PlaneWaveBasis{T}, atoms, positions, magneti
        @warn("Returning zero spin density guess, because no initial magnetization has " *
              "been specified in any of the given elements / atoms. Your SCF will likely " *
              "not converge to a spin-broken solution.")
-        return zeros(T, basis.fft_size)
+        array_type = typeof(similar(basis.G_vectors, T, basis.fft_size))
+        return convert(array_type,zeros(T, basis.fft_size))


to be discussed

src/guess_density.jl

antoine-levitt · 2022-07-25T19:34:47Z

src/orbitals.jl

-    ortho_qr(randn(Complex{T}, length(G_vectors(basis, kpt)), howmany))
-end
+    orbitals = similar(basis.G_vectors, Complex{T}, length(G_vectors(basis, kpt)), howmany)
+    randn!(TaskLocalRNG(), orbitals) #Force the use of GPUArrays.jl's random function if using the GPU


Interesting and a bit annoying. Make a note to discuss it with Valentin?

antoine-levitt · 2022-07-25T19:37:38Z

src/terms/local.jl

    model = basis.model

    # pot_fourier is <e_G|V|e_G'> expanded in a basis of e_{G-G'}
    # Since V is a sum of radial functions located at atomic
    # positions, this involves a form factor (`local_potential_fourier`)
    # and a structure factor e^{-i G·r}

-    pot_fourier = map(G_vectors(basis)) do G
+    #This operation needs to be done only once, so let's try to make it happen on CPU (else we needs to isbitsify the pseudopotentials)
+    pot_fourier = map(Array(G_vectors(basis))) do G


that's a bit wasteful (the copy)

As I tried to explain in the comment, either we use directly the G_vectors and then the entire map function needs to be a kernel, or we do it on CPU (it's only going to happen once, when we build the PlaneWaveBasis). If we don't want to make a copy, then all the element in the map need to be isbits, so we have to convert the pseudopotentials to a isbits structure.
I agree it's a bit ugly, but for performance purposes it only happens once so I thought it was ok.

antoine-levitt · 2022-07-25T19:43:22Z

It'd be good to merge some bits before we get the full GPU story, so that it's easier for you to stay and sync and for us to review. Eg you could have in separate PRs the uncontroversial fixes (eg zeros->similar, etc) that we can just merge easily now, and the LOBPCG fixes to not use blockarrays

mfherbst

Some more detailed comments from my side.

Overall a very solid start @GVigne !

src/PlaneWaveBasis.jl

mfherbst · 2022-07-28T15:41:21Z

src/DFTK.jl

+using AbstractFFTs
+using GPUArrays
+using CUDA
+using Random


Not sure they should be here (and a hard dependency of DFTK) long-term.

I think we will need to discuss dependencies (especially if we want to move LOBPCG out of DFTK, that can take some work): I also didn't really know where to put my imports and how they were managed in a big package, so there is room for improvement.

src/PlaneWaveBasis.jl

mfherbst · 2022-07-28T15:56:48Z

src/eigen/preconditioners.jl

@@ -27,7 +27,7 @@ PreconditionerNone(basis, kpt) = I
 mutable struct PreconditionerTPA{T <: Real}
    basis::PlaneWaveBasis
    kpt::Kpoint
-    kin::Vector{T}  # kinetic energy of every G
+    kin::AbstractVector{T}  # kinetic energy of every G


I think I'd just make T an array type and use that directly.

mfherbst · 2022-07-28T15:57:52Z

src/fft.jl

-    ipFFT = FFTW.plan_fft!(tmp, flags=_fftw_flags(T))
-    opFFT = FFTW.plan_fft(tmp, flags=_fftw_flags(T))


Maybe this allows us to kill the FFTW dependency completely? Or at least remove it from this file?

We still might need FFTW when doing multi-threading. But for the fft file, I guess we could remove the dependency, although it would have to be replaced by AbstractFFTs.

src/terms/kinetic.jl

mfherbst · 2022-07-28T16:00:05Z

src/eigen/lobpcg_hyper_impl.jl

@@ -43,20 +43,100 @@
 vprintln(args...) = nothing


I think the modifications here should be a separate PR that we merge in first.

src/workarounds/gpu_computations.jl

GVigne added 8 commits July 6, 2022 09:30

LOBPCG with GPU support (CUDA). Does not yet support preconditionning

e17fb59

Merge branch 'master' into gpu_hpc

e80f5b6

MWE for self_consistent_field with GPU support (CUDA). Only works wit…

ed15b32

…h no SCF solver (solver=scf_damping_solver(1.0)) and just one Kinetic term.

Fix package version conflicts while merging

19bfa69

Stop using BlockArrays and use a custom BlockVector for GPU compatibi…

f4748ac

…lity in LOBPCG

GPU support for AtomicLocal term

94f1d2a

First GPU implementation of the non local term + LOBPCG enhancement

60d8041

Merge branch 'master' into gpu_hpc

fb6484a

mfherbst marked this pull request as draft July 25, 2022 18:39

mfherbst changed the title ~~(WIP) Bringing GPU programming to DFTK~~ Bringing GPU programming to DFTK Jul 25, 2022

add timed examples

cf1dc3c

antoine-levitt reviewed Jul 25, 2022

View reviewed changes

GVigne added 2 commits July 28, 2022 08:51

Change some code organisation after PR's feedback

11b85f0

Code organisation and performance optimisation after PR's feedback

abb99f4

mfherbst reviewed Jul 28, 2022

View reviewed changes

GVigne added 8 commits August 2, 2022 07:19

Code refactoring following PR's feedback

a89171a

PWB is now parametric on the array type: this also fixes type issues

44bcb61

Update workarounds: remove iszero and isone, add eigen

646b44c

Rename block_mul into * + build e on GPU

76c697d

Modify the change of basis functions to be GPU compatible

bd684d7

Merge branch 'master' into gpu_hpc

f02c954

Keep this branch synced with LOBPCG_GPU

15d1324

Add the Hartree term

62d9f79

GVigne closed this Aug 23, 2022

		ipFFT = FFTW.plan_fft!(tmp, flags=_fftw_flags(T))
		opFFT = FFTW.plan_fft(tmp, flags=_fftw_flags(T))

Bringing GPU programming to DFTK #697

Bringing GPU programming to DFTK #697

Uh oh!

Conversation

GVigne commented Jul 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mfherbst commented Jul 25, 2022

Uh oh!

antoine-levitt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

antoine-levitt commented Jul 25, 2022

Uh oh!

mfherbst left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

GVigne commented Jul 25, 2022 •

edited

Loading