Skip to content

Scalar Calculation? #44

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
aminya opened this issue Mar 4, 2020 · 2 comments
Open

Scalar Calculation? #44

aminya opened this issue Mar 4, 2020 · 2 comments

Comments

@aminya
Copy link
Member

aminya commented Mar 4, 2020

Now the library only supports doing a calculation on an Array and also returns an Array.

It may be worth while to define scalar methods too.

julia> IVM.sin(1.1)
ERROR: MethodError: no method matching sin(::Float64)
You may have intended to import Base.sin
Closest candidates are:
  sin(::Array{Float32,N} where N) at C:\Users\yahyaaba\.julia\packages\IntelVectorMath\Gb348\src\setup.jl:72
  sin(::Array{Float64,N} where N) at C:\Users\yahyaaba\.julia\packages\IntelVectorMath\Gb348\src\setup.jl:72
Stacktrace:
 [1] top-level scope at none:0

This way we only use Intel for calculating one scalar number, which (if possible) helps to fuse for-loops with broadcasted functions and use @avx or @simd features of Julia instead for parallelization.

We should see if Intel provides scalar API. Because if it only provides Vector API, and the function call uses the Vector Processor Unit of the CPU, we cannot parallelize the function. This is like vectorizing an already vectorized function (although having a size of 1), which doesn't have an effect.

Related to #43, which can help to implement the 3rd macro.

This can also solve #22, by using Intel-only for a scalar call and provide an SVML like behavior using @avx or @simd.

Places to look into:

@Crown421
Copy link
Collaborator

Crown421 commented Mar 4, 2020

Intriguing.
I suppose a few tests would be necessary to see if the speed is comparable with base.

@aminya
Copy link
Member Author

aminya commented Mar 4, 2020

Intriguing.
I suppose a few tests would be necessary to see if the speed is comparable with base.

We should see if Intel provides scalar API. Because if it only provides Vector API, and the function call uses the Vector Processor Unit of the CPU, we cannot parallelize the function. This is like vectorizing an already vectorized function (although having a size of 1), which doesn't have an effect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants