Open
Description
On a 1024x1024 Float32 matrix:
julia> @benchmark sum($a)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 84.734 μs … 228.946 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 85.332 μs ┊ GC (median): 0.00%
Time (mean ± σ): 87.917 μs ± 7.545 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█▇▃▂▄▁▁ ▁
████████▇██▇▇█▇▆█████▇██████▇▆▇▆▆▅▆▆▆▆▆▆▆▆▅▅▅▆▆▄▇█▆▅▅▅▆▄▄▃▅▄ █
84.7 μs Histogram: log(frequency) by time 120 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark sum($d_a)
BenchmarkTools.Trial: 618 samples with 1 evaluation.
Range (min … max): 6.966 ms … 9.740 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 8.047 ms ┊ GC (median): 0.00%
Time (mean ± σ): 8.079 ms ± 602.087 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▁▁▄▃▄ ▁ ▁▇ ▅▄█▃▄ ▂ ▁ ▃▁▇▁ ▁ ▁▁ ▃
▂▂▃▃▄█████▇▇█▇██▇█████▄▇▆▆▃█▅▇▇▆▆███████████▅█████▃▄▃▆▂▅▄▂▃ ▅
6.97 ms Histogram: frequency by time 9.28 ms <
Memory estimate: 27.41 KiB, allocs estimate: 509.
It scales, so this is probably the kernel being bad:
julia> d_a = oneArray(rand(Float32, 4096, 4096));
julia> a = rand(Float32, 4096, 4096);
julia> @benchmark sum($a)
BenchmarkTools.Trial: 1682 samples with 1 evaluation.
Range (min … max): 2.918 ms … 3.185 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 2.964 ms ┊ GC (median): 0.00%
Time (mean ± σ): 2.967 ms ± 25.760 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▅ █ ▆ ▃
▂▁▁▃▂▂▇▅▃▇█▃▃█▃▂██▃██▄▄█▇▃▄▇▂▃▇▃▂▅▅▂▃▄▂▂▃▂▂▃▃▂▂▃▂▁▃▂▂▂▂▁▁▂ ▃
2.92 ms Histogram: frequency by time 3.05 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark sum($d_a)
BenchmarkTools.Trial: 45 samples with 1 evaluation.
Range (min … max): 112.776 ms … 113.728 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 113.151 ms ┊ GC (median): 0.00%
Time (mean ± σ): 113.186 ms ± 218.961 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▁ ▁ ▁ ▄ ▄█▁ ▄▁ ▁▁ ▁
█▁▁▆▆▁▁▁█▆▁▁▁▁▁▁█▆█▆▁▆███▁██▁▁▁▆▆▆▆██▁▁▁▆█▆▁▁▁▁▆▁▁▆▁▁▁▁▁▆▁▁▁▆ ▁
113 ms Histogram: frequency by time 114 ms <
Memory estimate: 28.75 KiB, allocs estimate: 516.