Skip to content

Bringing GPU programming to DFTK #697

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 19 commits into from
Closed

Conversation

GVigne
Copy link
Contributor

@GVigne GVigne commented Jul 25, 2022

This is a work in progress and probably shouldn't be merged immediately into DFTK.
As part of my GSoC (you can see some additional information here ), I am working on a GPU version of DFTK. One of the goal is to keep overall code changes as low as possible (I do not aim to build an other DFTK package using GPUs).
This is the first step in my project: so far I managed to implement the Kinetic, Local and NonLocal terms and am running computations through the self_consistent_field function. I have not yet managed to make the SCF solvers work, so I disabled them for now.

Here is a MWE:

using DFTK
using CUDA
a = 10.263141334305942  # Lattice constant in Bohr
lattice = a / 2 .* [[0 1 1.]; [1 0 1.]; [1 1 0.]]
Si = ElementPsp(:Si, psp=load_psp("hgh/lda/Si-q4"))
atoms     = [Si, Si]
positions = [ones(3)/8, -ones(3)/8];
terms_LDA = [Kinetic(), AtomicLocal(), AtomicNonlocal()]

# Setup an LDA model and discretize using
# a single k-point and a small `Ecut` of 5 Hartree.
mod = Model(lattice, atoms, positions; terms=terms_LDA,symmetries=false)
basis = PlaneWaveBasis(mod; Ecut=30, kgrid=(1, 1, 1))
basis_gpu = PlaneWaveBasis(mod; Ecut=30, kgrid=(1, 1, 1), array_type = CuArray)

scfres = self_consistent_field(basis; tol=1e-3, solver=scf_damping_solver(1.0))
scfres_gpu = self_consistent_field(basis_gpu; tol=1e-3, solver=scf_damping_solver(1.0))

Any feedback is appreciated, be it on the code in itself or the implementation of new features!

@mfherbst mfherbst marked this pull request as draft July 25, 2022 18:39
@mfherbst mfherbst changed the title (WIP) Bringing GPU programming to DFTK Bringing GPU programming to DFTK Jul 25, 2022
@mfherbst
Copy link
Member

Super awesome! I have not had a look at the code, but the interface is pretty much exactly how I'd like to have it.

Copy link
Member

@antoine-levitt antoine-levitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome!! A lot of minor comments, but I'm really impressed with how simple it turns out to be (which of course you should take as a compliment: making things look simple is very hard). Most important is we should talk about a good API for defining new arrays.

@@ -66,7 +66,8 @@ function _guess_spin_density(basis::PlaneWaveBasis{T}, atoms, positions, magneti
@warn("Returning zero spin density guess, because no initial magnetization has " *
"been specified in any of the given elements / atoms. Your SCF will likely " *
"not converge to a spin-broken solution.")
return zeros(T, basis.fft_size)
array_type = typeof(similar(basis.G_vectors, T, basis.fft_size))
return convert(array_type,zeros(T, basis.fft_size))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be discussed

ortho_qr(randn(Complex{T}, length(G_vectors(basis, kpt)), howmany))
end
orbitals = similar(basis.G_vectors, Complex{T}, length(G_vectors(basis, kpt)), howmany)
randn!(TaskLocalRNG(), orbitals) #Force the use of GPUArrays.jl's random function if using the GPU
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting and a bit annoying. Make a note to discuss it with Valentin?

model = basis.model

# pot_fourier is <e_G|V|e_G'> expanded in a basis of e_{G-G'}
# Since V is a sum of radial functions located at atomic
# positions, this involves a form factor (`local_potential_fourier`)
# and a structure factor e^{-i G·r}

pot_fourier = map(G_vectors(basis)) do G
#This operation needs to be done only once, so let's try to make it happen on CPU (else we needs to isbitsify the pseudopotentials)
pot_fourier = map(Array(G_vectors(basis))) do G
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a bit wasteful (the copy)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I tried to explain in the comment, either we use directly the G_vectors and then the entire map function needs to be a kernel, or we do it on CPU (it's only going to happen once, when we build the PlaneWaveBasis). If we don't want to make a copy, then all the element in the map need to be isbits, so we have to convert the pseudopotentials to a isbits structure.
I agree it's a bit ugly, but for performance purposes it only happens once so I thought it was ok.

@antoine-levitt
Copy link
Member

It'd be good to merge some bits before we get the full GPU story, so that it's easier for you to stay and sync and for us to review. Eg you could have in separate PRs the uncontroversial fixes (eg zeros->similar, etc) that we can just merge easily now, and the LOBPCG fixes to not use blockarrays

Copy link
Member

@mfherbst mfherbst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more detailed comments from my side.

Overall a very solid start @GVigne !

Comment on lines +16 to +19
using AbstractFFTs
using GPUArrays
using CUDA
using Random
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure they should be here (and a hard dependency of DFTK) long-term.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will need to discuss dependencies (especially if we want to move LOBPCG out of DFTK, that can take some work): I also didn't really know where to put my imports and how they were managed in a big package, so there is room for improvement.

@@ -27,7 +27,7 @@ PreconditionerNone(basis, kpt) = I
mutable struct PreconditionerTPA{T <: Real}
basis::PlaneWaveBasis
kpt::Kpoint
kin::Vector{T} # kinetic energy of every G
kin::AbstractVector{T} # kinetic energy of every G
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd just make T an array type and use that directly.

Comment on lines -258 to -259
ipFFT = FFTW.plan_fft!(tmp, flags=_fftw_flags(T))
opFFT = FFTW.plan_fft(tmp, flags=_fftw_flags(T))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this allows us to kill the FFTW dependency completely? Or at least remove it from this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still might need FFTW when doing multi-threading. But for the fft file, I guess we could remove the dependency, although it would have to be replaced by AbstractFFTs.

@@ -43,20 +43,100 @@
vprintln(args...) = nothing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the modifications here should be a separate PR that we merge in first.

@GVigne GVigne closed this Aug 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants