Replies: 3 comments
-
Hmm, something is going wrong here. In theory, you should just be able to do You could also consider |
Beta Was this translation helpful? Give feedback.
-
I am getting slightly different results, but still the fastest implementation is using @benchmark Matrix(1.0I, 20000, 20000)
BenchmarkTools.Trial: 5 samples with 1 evaluation.
Range (min … max): 843.900 ms … 1.179 s ┊ GC (min … max): 0.10% … 9.66%
Time (median): 1.111 s ┊ GC (median): 10.25%
Time (mean ± σ): 1.025 s ± 165.337 ms ┊ GC (mean ± σ): 9.48% ± 8.79%
█ ▁ ▁ ▁
█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁█ ▁
844 ms Histogram: frequency by time 1.18 s <
Memory estimate: 2.98 GiB, allocs estimate: 2. function i_turbo()
dp = Array{Float64}(undef, 20000, 20000);
@turbo for j ∈ axes(dp,2)
for i ∈ axes(dp, 1)
dp[i,j] = ifelse(i==j, 1.0, 0.0);
end
end
return dp;
end
i_turbo()
@benchmark i_turbo()
BenchmarkTools.Trial: 4 samples with 1 evaluation.
Range (min … max): 1.156 s … 1.562 s ┊ GC (min … max): 0.08% … 13.87%
Time (median): 1.333 s ┊ GC (median): 7.36%
Time (mean ± σ): 1.346 s ± 181.958 ms ┊ GC (mean ± σ): 7.68% ± 7.92%
█ █ █ █
█▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
1.16 s Histogram: frequency by time 1.56 s <
Memory estimate: 2.98 GiB, allocs estimate: 2. function i_tturbo()
dp = Array{Float64}(undef, 20000, 20000);
@tturbo for j ∈ axes(dp,2)
for i ∈ axes(dp, 1)
dp[i,j] = ifelse(i==j, 1.0, 0.0);
end
end
return dp;
end
i_tturbo()
@benchmark i_tturbo()
BenchmarkTools.Trial: 6 samples with 1 evaluation.
Range (min … max): 499.124 ms … 1.105 s ┊ GC (min … max): 0.16% … 42.38%
Time (median): 963.943 ms ┊ GC (median): 36.06%
Time (mean ± σ): 861.720 ms ± 258.747 ms ┊ GC (mean ± σ): 30.91% ± 20.18%
█ █ █ █ █ █
█▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁█▁▁▁▁▁▁█▁▁▁█ ▁
499 ms Histogram: frequency by time 1.11 s <
Memory estimate: 2.98 GiB, allocs estimate: 2. function i_mt()
dp = Array{Float64}(undef, 20000, 20000);
Threads.@threads for j ∈ axes(dp,2)
@simd for i ∈ axes(dp, 1)
dp[i,j] = ifelse(i==j, 1.0, 0.0);
end
end
return dp;
end
i_mt()
@benchmark i_mt()
BenchmarkTools.Trial: 11 samples with 1 evaluation.
Range (min … max): 312.225 ms … 740.457 ms ┊ GC (min … max): 0.31% … 51.99%
Time (median): 473.053 ms ┊ GC (median): 28.27%
Time (mean ± σ): 476.158 ms ± 114.606 ms ┊ GC (mean ± σ): 28.81% ± 17.80%
█
▇▁▁▁▁▁▇▁▁▁▁▁▁▁▇▁▇▁▁▁▁▁█▁▇▁▇▁▁▁▁▁▁▁▁▁▁▁▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇ ▁
312 ms Histogram: frequency by time 740 ms <
Memory estimate: 2.98 GiB, allocs estimate: 199. |
Beta Was this translation helpful? Give feedback.
-
Using the suggested method I could get similar performance. I am wondering if it could be further improved. function i_polyester()
dp = Array{Float64}(undef, 20000, 20000);
@batch minbatch=1250 for j ∈ axes(dp,2)
for i ∈ axes(dp, 1)
dp[i,j] = ifelse(i==j, 1.0, 0.0);
end
end
return dp;
end
i_polyester()
@benchmark i_polyester()
BenchmarkTools.Trial: 10 samples with 1 evaluation.
Range (min … max): 266.641 ms … 633.969 ms ┊ GC (min … max): 0.28% … 47.30%
Time (median): 508.365 ms ┊ GC (median): 34.35%
Time (mean ± σ): 501.235 ms ± 116.029 ms ┊ GC (mean ± σ): 31.46% ± 15.68%
▁ ▁ █▁▁ ▁ ▁ ▁ ▁
█▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁███▁▁▁▁▁▁▁▁█▁▁▁█▁▁█▁▁▁█ ▁
267 ms Histogram: frequency by time 634 ms <
Memory estimate: 2.98 GiB, allocs estimate: 2. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Recently I found creating a large identity matrix the default way (e.g.
Matrix(1.0I, 20000, 20000)
) could be relatively slow due to the limited memory bandwidth of a single thread. To address that, I found I could do better with something like below:I am wondering if I could use
LoopVectorization
to achieve better performance, or simply minimize the use of nativeThreads.@threads
. Thanks in advance!Beta Was this translation helpful? Give feedback.
All reactions