-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Open
Labels
compiler:llvmFor issues that relate to LLVMFor issues that relate to LLVMperformanceMust go fasterMust go faster
Description
On the latest master (as of 3e7cf64), autovectorization fails when using AllocArrays.jl v0.1.1 (latest) for an array copy:
using AllocArrays
function mycopy(dest, src, iter)
# precondition: dest and src do not alias
# precondition: the iterators of dest and src are equal
for i in iter
@inbounds dest[i] = src[i]
end
return dest
end
b = BumperAllocator(2^30);
arr = rand(Float32, 50000000);
arr2 = similar(arr);
a = AllocArray(arr);
# this is vectorized
mycopy(arr2, arr, eachindex(arr2));
# this is not
with_allocator(b) do
c = similar(a)
mycopy(c, a, eachindex(c))
end;
We can observe the effect on performance:
julia> @benchmark mycopy(arr2, arr, eachindex(arr2))
BenchmarkTools.Trial: 339 samples with 1 evaluation per sample.
Range (min … max): 14.181 ms … 16.030 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 14.609 ms ┊ GC (median): 0.00%
Time (mean ± σ): 14.704 ms ± 349.402 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▁ ▁▁ ▂▄▅▄█▄▂▄▁▁ ▁ ▁
▄▄█▆▆██▇██████████▇▄█▇▆▄▆█▄▅▆▃▆▃▅▄▄▄▃▃▃▄▁▃▁▁▃▄▁▃▁▁▁▁▁▃▁▁▁▁▁▃ ▄
14.2 ms Histogram: frequency by time 16 ms <
Memory estimate: 16 bytes, allocs estimate: 1.
julia> @benchmark with_allocator(() -> mycopy(c, a, eachindex(c)), b) setup=(reset!(b);c=similar(a);)
BenchmarkTools.Trial: 213 samples with 1 evaluation per sample.
Range (min … max): 20.352 ms … 55.386 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 22.032 ms ┊ GC (median): 0.00%
Time (mean ± σ): 22.426 ms ± 2.889 ms ┊ GC (mean ± σ): 0.00% ± 0.00%
▁▆█▂▂▆▅▇▃
▄▅█████████▆▄▆▄▃▅▁▁▃▁▃▁▁▁▁▁▁▃▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▃ ▃
20.4 ms Histogram: frequency by time 33.6 ms <
Memory estimate: 304 bytes, allocs estimate: 12.
Here is the LLVM code for the base case: https://pastebin.com/gCbp1uEX
and for the AllocArrays case: https://pastebin.com/bKYmswnd
Metadata
Metadata
Assignees
Labels
compiler:llvmFor issues that relate to LLVMFor issues that relate to LLVMperformanceMust go fasterMust go faster