-
Notifications
You must be signed in to change notification settings - Fork 243
Weird behaviour of mean function in CuArray #1773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hmm, that's not enough information to help you. |
Same happens in a REPL outside VSCode. Example:
julia> A = CUDA.ones((640, 640, 32, 1))
julia> B = ones((640, 640, 32, 1))
julia> mean(A; dims=1)
1×640×32×1 CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}:
[:, :, 1, 1] =
0.999994 0.999994 0.999994 0.999994 0.999994 0.999994 … 0.999994 0.999994 0.999994 0.999994 0.999994
[:, :, 2, 1] =
0.999994 0.999994 0.999994 0.999994 0.999994 0.999994 … 0.999994 0.999994 0.999994 0.999994 0.999994
[:, :, 3, 1] =
0.999994 0.999994 0.999994 0.999994 0.999994 0.999994 … 0.999994 0.999994 0.999994 0.999994 0.999994
;;; …
[:, :, 30, 1] =
0.999994 0.999994 0.999994 0.999994 0.999994 0.999994 … 0.999994 0.999994 0.999994 0.999994 0.999994
[:, :, 31, 1] =
0.999994 0.999994 0.999994 0.999994 0.999994 0.999994 … 0.999994 0.999994 0.999994 0.999994 0.999994
[:, :, 32, 1] =
0.999994 0.999994 0.999994 0.999994 0.999994 0.999994 … 0.999994 0.999994 0.999994 0.999994 0.999994
julia> mean(A; dims=1)
1×640×32×1 CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}:
[:, :, 1, 1] =
0.999994 0.999994 0.999994 0.999994 0.999994 0.999994 … 0.999994 0.999994 0.999994 0.999994 0.999994
[:, :, 2, 1] =
0.999994 0.999994 0.999994 0.999994 0.999994 0.999994 … 0.999994 0.999994 0.999994 0.999994 0.999994
[:, :, 3, 1] =
0.999994 0.999994 0.999994 0.999994 0.999994 0.999994 … 0.999994 0.999994 0.999994 0.999994 0.999994
;;; …
[:, :, 30, 1] =
0.999994 0.999994 0.999994 0.999994 0.999994 0.999994 … 0.999994 0.999994 0.999994 0.999994 0.999994
[:, :, 31, 1] =
0.999994 0.999994 0.999994 0.999994 0.999994 0.999994 … 0.999994 0.999994 0.999994 0.999994 0.999994
[:, :, 32, 1] =
0.999994 0.999994 0.999994 0.999994 0.999994 0.999994 … 0.999994 0.999994 0.999994 0.999994 0.999994
julia> mean(A; dims=[1,2])
1×1×32×1 CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}:
[:, :, 1, 1] =
1.0000081
[:, :, 2, 1] =
1.0000081
[:, :, 3, 1] =
1.0000081
;;; …
[:, :, 30, 1] =
1.0000081
[:, :, 31, 1] =
1.0000081
[:, :, 32, 1] =
1.0000081
julia> mean(A; dims=[1,2,3])
1×1×1×1 CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}:
[:, :, 1, 1] =
1.0000155
julia> mean(B; dims=[1,2,3])
1×1×1×1 Array{Float64, 4}:
[:, :, 1, 1] =
1.0
julia> versioninfo()
Julia Version 1.8.2
Commit 36034abf26 (2022-09-29 15:21 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: 16 × AMD Ryzen 7 4800H with Radeon Graphics
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, znver2)
Threads: 1 on 16 virtual cores |
I think I found maybe who is causing this, the Statistics mean function uses the _mean that is override in GPUArrays. It seems when you do the type conversion from the inverse of the size of the reduced dimensions (Float64) to Float32 (in the situation where you have a Float32 CuArray) there is a small error that creates the behavior I mentioned above. I executed this _mean function statement by statement and found this: julia> A = CUDA.ones((640, 640, 32, 1))
640×640×32×1 CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}:
julia> T = float(eltype(A))
Float32
julia> λ = convert(T, inv(_mean_denom(A, dims)))
0.0015625f0
julia> sum(Base.Fix1(*,λ), A; dims)
1×640×32×1 CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}:
[:, :, 1, 1] =
0.999994 0.999994 0.999994 0.999994 0.999994 0.999994 0.999994 … 0.999994 0.999994 0.999994 0.999994 0.999994 0.999994 0.999994
[:, :, 2, 1] =
0.999994 0.999994 0.999994 0.999994 0.999994 0.999994 0.999994 … 0.999994 0.999994 0.999994 0.999994 0.999994 0.999994 0.999994
[:, :, 3, 1] =
0.999994 0.999994 0.999994 0.999994 0.999994 0.999994 0.999994 … 0.999994 0.999994 0.999994 0.999994 0.999994 0.999994 0.999994
;;; …
[:, :, 30, 1] =
0.999994 0.999994 0.999994 0.999994 0.999994 0.999994 0.999994 … 0.999994 0.999994 0.999994 0.999994 0.999994 0.999994 0.999994
[:, :, 31, 1] =
0.999994 0.999994 0.999994 0.999994 0.999994 0.999994 0.999994 … 0.999994 0.999994 0.999994 0.999994 0.999994 0.999994 0.999994
[:, :, 32, 1] =
0.999994 0.999994 0.999994 0.999994 0.999994 0.999994 0.999994 … 0.999994 0.999994 0.999994 0.999994 0.999994 0.999994 0.999994 But with a smaller Array this does not happen. julia> A = CUDA.ones((320, 320, 32, 1))
320×320×32×1 CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}:
julia> T = float(eltype(A))
Float32
julia> λ = convert(T, inv(_mean_denom(A, dims)))
0.003125f0
julia> sum(Base.Fix1(*,λ), A; dims)
1×320×32×1 CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}:
[:, :, 1, 1] =
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 … 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
[:, :, 2, 1] =
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 … 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
[:, :, 3, 1] =
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 … 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
;;; …
[:, :, 30, 1] =
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 … 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
[:, :, 31, 1] =
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 … 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
[:, :, 32, 1] =
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 … 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 @mcabbott can you also check this? |
I don't see this with the size above: julia> size(a)
(640, 640, 32, 1)
julia> mean(a; dims=1)[1:2]
2-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
1.0
1.0
(@v1.10) pkg> st CUDA GPUArrays
Status `~/.julia/environments/v1.10/Project.toml`
[052768ef] CUDA v4.0.1
[0c68f7d7] GPUArrays v8.6.2
julia> nextfloat(0.999994f0, 101)
1.0f0 What is the actual value you get? The compact printing may have lost some digits from |
julia> mean(A; dims=1)[1:2]
2-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
0.9999937
0.9999937
(@v1.8) pkg> st CUDA GPUArrays
Project YOLOv7 v0.1.0
Status `C:\Users\gabri\.julia\environments\v1.8\Project.toml`
[052768ef] CUDA v4.0.1
[0c68f7d7] GPUArrays v8.6.2
julia> nextfloat(0.9999937f0, 101)
0.9999997f0 I made the array larger just to check, and look: julia> size(A)
(1280, 1280, 32, 1)
julia> mean(A; dims=1)[1:2]
2-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
0.9999892
0.9999892 I notice this in my Flux BatchNorm layers, they were producing some wrong values, and that's how I ended up seeing this problem with larger arrays. Another example with smaller values. julia> B = CUDA.fill(1.0f-5, (640, 640, 32, 1))
640×640×32×1 CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}:
julia> mean(B; dims=1)[1:2]
2-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
1.0000034f-5
1.0000034f-5 |
Some accuracy experiments (on CPU): julia> function countepsfrom(x::T, xtrue) where {T<:AbstractFloat}
target = T(xtrue)
for n in Iterators.flatten(zip(0:100, -1:-1:-100))
nextfloat(x, n) === target && return n
end
return (target - x) / eps(x)
end;
julia> mean1(x, λ=convert(eltype(x), inv(length(x)))) = sum(Base.Fix1(*,λ), x); # like CUDA at present
julia> mean1(ones(Float32, 640))
0.9999992f0
julia> countepsfrom(ans, 1)
13
julia> mean2(x, l=length(x)) = sum(Base.Fix2(/,l), x); # divide instead, same result
julia> mean2(ones(Float32, 640))
0.9999992f0
julia> countepsfrom(ans, 1)
13
julia> mean3(x) = sum(x)/length(x); # naiive sum then divide
julia> mean3(ones(Float32, 640))
1.0f0 Float16 to look for overflow: julia> mean1(ones(Float16, 10^5))
Float16(1.002)
julia> countepsfrom(ans, 1)
-2
julia> mean2(ones(Float16, 10^5))
Float16(0.0)
julia> mean3(ones(Float16, 10^5))
NaN16
julia> x = rand(Float16, 10^5);
julia> xbar = mean(big.(x)); # true result
julia> countepsfrom(mean1(x), xbar)
-2
julia> countepsfrom(mean2(x), xbar)
Inf16
julia> countepsfrom(mean3(x), xbar)
Inf16 |
The solution I proposed in JuliaGPU/GPUArrays.jl#453 will fail with the Float16 but not for Float32. julia> mean4(x, λ=convert(eltype(x), inv(length(x)))) = sum(x) .* λ;
julia> mean4(ones(Float32, 640))
1.0f0
julia> mean4(ones(Float16, 10^5))
Inf16
julia> x = rand(Float16, 10^5);
julia> xbar = mean(big.(x));
julia> countepsfrom(mean4(x), xbar)
-2
julia> mean1(ones(Float32, 10^9))
0.99999976f0
julia> countepsfrom(ans, 1)
4
julia> mean4(ones(Float32, 10^9))
1.0f0 I never worked with Float16, so I can't imagine how to deal with it in these operations. |
The reason this doesn't reproduce consistently is the |
Describe the bug
Sometimes when using a CuArray, I'm having some weird behaviors when using some Flux normalizations.
Trying to find the problem, I saw that this happens (at least) when using Statistics mean function.
If I execute it on the CPU it always give the expected values, but on GPU sometimes it gives this strange behavior.
To reproduce
The Minimal Working Example (MWE) for this bug:
Manifest.toml
Version info
Details on Julia:
Julia 1.8.2
Details on CUDA:
The text was updated successfully, but these errors were encountered: