-
Notifications
You must be signed in to change notification settings - Fork 93
Bringing GPU programming to DFTK #697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…h no SCF solver (solver=scf_damping_solver(1.0)) and just one Kinetic term.
Super awesome! I have not had a look at the code, but the interface is pretty much exactly how I'd like to have it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome!! A lot of minor comments, but I'm really impressed with how simple it turns out to be (which of course you should take as a compliment: making things look simple is very hard). Most important is we should talk about a good API for defining new arrays.
src/guess_density.jl
Outdated
@@ -66,7 +66,8 @@ function _guess_spin_density(basis::PlaneWaveBasis{T}, atoms, positions, magneti | |||
@warn("Returning zero spin density guess, because no initial magnetization has " * | |||
"been specified in any of the given elements / atoms. Your SCF will likely " * | |||
"not converge to a spin-broken solution.") | |||
return zeros(T, basis.fft_size) | |||
array_type = typeof(similar(basis.G_vectors, T, basis.fft_size)) | |||
return convert(array_type,zeros(T, basis.fft_size)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be discussed
ortho_qr(randn(Complex{T}, length(G_vectors(basis, kpt)), howmany)) | ||
end | ||
orbitals = similar(basis.G_vectors, Complex{T}, length(G_vectors(basis, kpt)), howmany) | ||
randn!(TaskLocalRNG(), orbitals) #Force the use of GPUArrays.jl's random function if using the GPU |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting and a bit annoying. Make a note to discuss it with Valentin?
model = basis.model | ||
|
||
# pot_fourier is <e_G|V|e_G'> expanded in a basis of e_{G-G'} | ||
# Since V is a sum of radial functions located at atomic | ||
# positions, this involves a form factor (`local_potential_fourier`) | ||
# and a structure factor e^{-i G·r} | ||
|
||
pot_fourier = map(G_vectors(basis)) do G | ||
#This operation needs to be done only once, so let's try to make it happen on CPU (else we needs to isbitsify the pseudopotentials) | ||
pot_fourier = map(Array(G_vectors(basis))) do G |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a bit wasteful (the copy)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I tried to explain in the comment, either we use directly the G_vectors and then the entire map function needs to be a kernel, or we do it on CPU (it's only going to happen once, when we build the PlaneWaveBasis). If we don't want to make a copy, then all the element in the map need to be isbits, so we have to convert the pseudopotentials to a isbits structure.
I agree it's a bit ugly, but for performance purposes it only happens once so I thought it was ok.
It'd be good to merge some bits before we get the full GPU story, so that it's easier for you to stay and sync and for us to review. Eg you could have in separate PRs the uncontroversial fixes (eg zeros->similar, etc) that we can just merge easily now, and the LOBPCG fixes to not use blockarrays |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more detailed comments from my side.
Overall a very solid start @GVigne !
using AbstractFFTs | ||
using GPUArrays | ||
using CUDA | ||
using Random |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure they should be here (and a hard dependency of DFTK) long-term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we will need to discuss dependencies (especially if we want to move LOBPCG out of DFTK, that can take some work): I also didn't really know where to put my imports and how they were managed in a big package, so there is room for improvement.
@@ -27,7 +27,7 @@ PreconditionerNone(basis, kpt) = I | |||
mutable struct PreconditionerTPA{T <: Real} | |||
basis::PlaneWaveBasis | |||
kpt::Kpoint | |||
kin::Vector{T} # kinetic energy of every G | |||
kin::AbstractVector{T} # kinetic energy of every G |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd just make T an array type and use that directly.
ipFFT = FFTW.plan_fft!(tmp, flags=_fftw_flags(T)) | ||
opFFT = FFTW.plan_fft(tmp, flags=_fftw_flags(T)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this allows us to kill the FFTW dependency completely? Or at least remove it from this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still might need FFTW when doing multi-threading. But for the fft file, I guess we could remove the dependency, although it would have to be replaced by AbstractFFTs.
@@ -43,20 +43,100 @@ | |||
vprintln(args...) = nothing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the modifications here should be a separate PR that we merge in first.
This is a work in progress and probably shouldn't be merged immediately into DFTK.
As part of my GSoC (you can see some additional information here ), I am working on a GPU version of DFTK. One of the goal is to keep overall code changes as low as possible (I do not aim to build an other DFTK package using GPUs).
This is the first step in my project: so far I managed to implement the Kinetic, Local and NonLocal terms and am running computations through the
self_consistent_field
function. I have not yet managed to make the SCF solvers work, so I disabled them for now.Here is a MWE:
Any feedback is appreciated, be it on the code in itself or the implementation of new features!