You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Now the library only supports doing a calculation on an Array and also returns an Array.
It may be worth while to define scalar methods too.
julia> IVM.sin(1.1)
ERROR: MethodError: no method matching sin(::Float64)
You may have intended to import Base.sin
Closest candidates are:sin(::Array{Float32,N}where N) at C:\Users\yahyaaba\.julia\packages\IntelVectorMath\Gb348\src\setup.jl:72sin(::Array{Float64,N}where N) at C:\Users\yahyaaba\.julia\packages\IntelVectorMath\Gb348\src\setup.jl:72
Stacktrace:
[1] top-level scope at none:0
This way we only use Intel for calculating one scalar number, which (if possible) helps to fuse for-loops with broadcasted functions and use @avx or @simd features of Julia instead for parallelization.
We should see if Intel provides scalar API. Because if it only provides Vector API, and the function call uses the Vector Processor Unit of the CPU, we cannot parallelize the function. This is like vectorizing an already vectorized function (although having a size of 1), which doesn't have an effect.
Related to #43, which can help to implement the 3rd macro.
This can also solve #22, by using Intel-only for a scalar call and provide an SVML like behavior using @avx or @simd.
Intriguing.
I suppose a few tests would be necessary to see if the speed is comparable with base.
We should see if Intel provides scalar API. Because if it only provides Vector API, and the function call uses the Vector Processor Unit of the CPU, we cannot parallelize the function. This is like vectorizing an already vectorized function (although having a size of 1), which doesn't have an effect.
Uh oh!
There was an error while loading. Please reload this page.
Now the library only supports doing a calculation on an
Array
and also returns anArray
.It may be worth while to define scalar methods too.
This way we only use Intel for calculating one scalar number, which (if possible) helps to fuse for-loops with broadcasted functions and use
@avx
or@simd
features of Julia instead for parallelization.We should see if Intel provides scalar API. Because if it only provides Vector API, and the function call uses the Vector Processor Unit of the CPU, we cannot parallelize the function. This is like vectorizing an already vectorized function (although having a size of 1), which doesn't have an effect.
Related to #43, which can help to implement the 3rd macro.
This can also solve #22, by using Intel-only for a scalar call and provide an SVML like behavior using
@avx
or@simd
.Places to look into:
The text was updated successfully, but these errors were encountered: