Skip to content

Convert AbstractArrays with strides to NumPy arrays #876

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 29 commits into from

Conversation

mkitti
Copy link
Member

@mkitti mkitti commented Jan 11, 2021

StridedArrays can currently be converted to NumPy arrays with PyCall.jl. However, other types of AbstractArray also have a stride method. In particular, views such as SubArray, ReshapedArray, ReinterpretArray, and PermutedDimsArray implement stride but are not included in StridedArray.

This pull request:

  1. Widens PyCall.NpyArray to accept any AbstractArray
  2. If the array is immutable, embeds a reference to the parent array in the PyObject.
  3. To support slicing, a new type PySlice is created.

@stevengj
Copy link
Member

Julia's PermutedDimsArray and StrideSubArray can transmitted to Python/Numpy without copying by taking advantage of numpy.transpose and Python's slice. The strategy is to create a view of the parent array and then use Numpy to perform similar view based operations on the array as was done in Julia

That shouldn't be necessary. You should be able to create a NumPy array directly from any array with strided data, just by broadening the signature of this method.

@mkitti
Copy link
Member Author

mkitti commented Jan 11, 2021

Julia's PermutedDimsArray and StrideSubArray can transmitted to Python/Numpy without copying by taking advantage of numpy.transpose and Python's slice. The strategy is to create a view of the parent array and then use Numpy to perform similar view based operations on the array as was done in Julia

That shouldn't be necessary. You should be able to create a NumPy array directly from any array with strided data, just by broadening the signature of this method.

Great. That will make things easier. I think we will need to modify some additional methods to dispatch to that method.

I'm not sure if I understand the dispatching with the code below in conversions.jl. It looks like we should be able to direct any AbstractArray with the method stride to the code you mention. Currently, such an array gets dispatched to:

  • PyObject(A::AbstractArray)
  • array2py(A::AbstractArray)
  • array2py(A::AbstractArray{<:Any, N}, dim::Integer, i::CartesianIndex{N})

function array2py(A::AbstractArray{<:Any, N}, dim::Integer, i::CartesianIndex{N}) where {N}
if dim > N # base case
return PyObject(A[i])
else # recursively store multidimensional array as list of lists
ilast = CartesianIndex(ntuple(j -> j == dim ? lastindex(A, dim) : i[j], Val{N}()))
o = PyObject(@pycheckn ccall((@pysym :PyList_New), PyPtr, (Int,), size(A, dim)))
for icur in cirange(i,ilast)
oi = array2py(A, dim+1, icur)
@pycheckz ccall((@pysym :PyList_SetItem), Cint, (PyPtr,Int,PyPtr),
o, icur[dim]-i[dim], oi)
pyincref(oi) # PyList_SetItem steals the reference
end
return o
end
end
array2py(A::AbstractArray) = array2py(A, 1, first(CartesianIndices(A)))
PyObject(A::AbstractArray) =
ndims(A) <= 1 || hasmethod(stride, Tuple{typeof(A),Int}) ? array2py(A) :
pyjlwrap_new(A)

Rather this needs to be directed to the NpyArray method you mentioned.

PyCall.jl/src/numpy.jl

Lines 175 to 195 in 318c23f

function NpyArray(a::StridedArray{T}, revdims::Bool) where T<:PYARR_TYPES
@npyinitialize
size_a = revdims ? reverse(size(a)) : size(a)
strides_a = revdims ? reverse(strides(a)) : strides(a)
p = @pycheck ccall(npy_api[:PyArray_New], PyPtr,
(PyPtr,Cint,Ptr{Int},Cint, Ptr{Int},Ptr{T}, Cint,Cint,PyPtr),
npy_api[:PyArray_Type],
ndims(a), Int[size_a...], npy_type(T),
Int[strides_a...] * sizeof(eltype(a)), a, sizeof(eltype(a)),
NPY_ARRAY_ALIGNED | NPY_ARRAY_WRITEABLE,
C_NULL)
return PyObject(p, a)
end
function PyObject(a::StridedArray{T}) where T<:PYARR_TYPES
try
return NpyArray(a, false)
catch
return array2py(a) # fallback to non-NumPy version
end
end

Perhaps we should dispatch using a Holy trait:

  • PyObject(A::AbstractArray)
  • py2array(A::AbstractArray, isstrided::Val{true})
  • NpyArray(A::AbstractArray, revdims::Bool))

Basically, this would mean that PyCall would try to convert any AbstractArray that has the method strided to a NumPy array rather than relying on StridedArray to dispatch.

Is changing PyObject dispatch in this way what you had in mind?

@stevengj
Copy link
Member

stevengj commented Jan 11, 2021

I'm not sure we want to introduce a trait. Just changing this signature to PyObject(a::AbstractArray{T}) where T<:PYARR_TYPES (and similarly for NpyArray , which should probably be npyarray) should suffice to duck-type it — any type that doesn't have strides and a pointer conversion will fall back to array2py.

@codecov-io
Copy link

codecov-io commented Jan 12, 2021

Codecov Report

Merging #876 (d04dc61) into master (ead312f) will increase coverage by 0.33%.
The diff coverage is 80.64%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #876      +/-   ##
==========================================
+ Coverage   67.63%   67.97%   +0.33%     
==========================================
  Files          20       20              
  Lines        1965     1992      +27     
==========================================
+ Hits         1329     1354      +25     
- Misses        636      638       +2     
Flag Coverage Δ
unittests 67.97% <80.64%> (+0.33%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/PyCall.jl 69.56% <ø> (ø)
src/gc.jl 85.00% <62.50%> (-15.00%) ⬇️
src/conversions.jl 64.33% <84.21%> (+0.95%) ⬆️
src/numpy.jl 77.33% <100.00%> (+2.66%) ⬆️
src/pyinit.jl 82.82% <100.00%> (+0.17%) ⬆️
src/pyarray.jl 71.23% <0.00%> (+0.68%) ⬆️
src/pyoperators.jl 87.50% <0.00%> (+4.16%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ead312f...d04dc61. Read the comment docs.

@mkitti
Copy link
Member Author

mkitti commented Jan 12, 2021

I was trying to sort out the Core.TypeofVararg (see JuliaLang/julia#38136 ) issue with the test on line 53. Turns out that the AOT CI script needing an overhaul now that some issues have been resolved. Previously it was always testing master.

@mkitti mkitti changed the title WIP: Transpose and slice for PermutedDimsArray and StrideSubArray Convert AbstractArrays with stride to NumPy arrays Jan 13, 2021
@mkitti mkitti requested a review from stevengj January 17, 2021 08:04
@mkitti
Copy link
Member Author

mkitti commented Jan 17, 2021

@stevengj Let me know if you would like to split the slice and PySlice code into another PR. I could also rebase this if you would like.

@mkitti
Copy link
Member Author

mkitti commented Jan 28, 2021

I removed the slice code from this pull request. If we're interested, I can submit that as a distinct pull request. The diff is now smaller.

@PhilipVinc
Copy link
Contributor

What’s the status on this? I would find it useful, so I would be willing to commit some time if necessary.

@mkitti
Copy link
Member Author

mkitti commented Apr 22, 2021

Hi @stevengj , I was going through my old PRs, and I was wonder where we were on this pull request to expand no-copy conversion of Julia arrays to NumPy arrays? Is there anything you would like to change before merging?

@mkitti mkitti changed the title Convert AbstractArrays with stride to NumPy arrays Convert AbstractArrays with strides to NumPy arrays May 16, 2021
@PhilipVinc
Copy link
Contributor

@stevengj bump

@mkitti
Copy link
Member Author

mkitti commented May 19, 2021

For those who need this sooner rather than later, here's a version I created for a Discourse post:

julia> using PyCall

julia> A = rand(Int, 256);

julia> pytypeof( PyObject(reinterpret(UInt64, A)) )
PyObject <class 'list'>

julia> pytypeof( PyObject(@view(A[1:100])) )
PyObject <class 'list'>

julia> module PyCallHack
           import PyCall: NpyArray, PYARR_TYPES, @npyinitialize, npy_api, npy_type
           import PyCall: @pycheck, NPY_ARRAY_ALIGNED, NPY_ARRAY_WRITEABLE, pyembed
           import PyCall: PyObject, PyPtr
           const HACKED_ARRAYS = Union{SubArray{T}, Base.ReinterpretArray{T}, Base.ReshapedArray{T}, Base.PermutedDimsArray{T}} where T <: PYARR_TYPES
           function NpyArray(a::HACKED_ARRAYS{T}, revdims::Bool) where T <: PYARR_TYPES
               @npyinitialize
               size_a = revdims ? reverse(size(a)) : size(a)
               strides_a = revdims ? reverse(strides(a)) : strides(a)
               p = @pycheck ccall(npy_api[:PyArray_New], PyPtr,
                   (PyPtr,Cint,Ptr{Int},Cint, Ptr{Int},Ptr{T}, Cint,Cint,PyPtr),
                   npy_api[:PyArray_Type],
                   ndims(a), Int[size_a...], npy_type(T),
                   Int[strides_a...] * sizeof(eltype(a)), a, sizeof(eltype(a)),
                   NPY_ARRAY_ALIGNED | NPY_ARRAY_WRITEABLE,
                   C_NULL)
              return PyObject(p, a)
           end
           pyembed(po::PyObject, jo::HACKED_ARRAYS) = pyembed(po, jo.parent)
       end
Main.PyCallHack

julia> pytypeof( PyObject(reinterpret(UInt64, A)) )
PyObject <class 'numpy.ndarray'>

julia> pytypeof( PyObject(@view(A[1:100])) )
PyObject <class 'numpy.ndarray'>

@PhilipVinc
Copy link
Contributor

@stevengj bump

@mkitti
Copy link
Member Author

mkitti commented Jul 11, 2021

I've initiated the registration of https://github.com/mkitti/NumPyArrays.jl which brings most of the functionality of this PR to a new type, NumPyArray. NumPyArray is just a wrapper of PyArray but has a constructor for an AbstractArray.

julia> using PyCall, NumPyArrays

julia> rA = reinterpret(Int8, rand(UInt8, 4,4))
4×4 reinterpret(Int8, ::Matrix{UInt8}):
 115   -4  -20   -32
 -43   13  -62   -36
 -84  -37   91  -109
 114  -40    9    73

julia> pytypeof(PyObject(rA))
PyObject <class 'list'>

julia> nprA = NumPyArray(rA)
4×4 NumPyArray{Int8, 2}:
 115   -4  -20   -32
 -43   13  -62   -36
 -84  -37   91  -109
 114  -40    9    73

julia> pytypeof(nprA)
PyObject <class 'numpy.ndarray'>

julia> sum(nprA)
-52

julia> pointer(rA) == pointer(nprA)
true

julia> np = pyimport("numpy")
PyObject <module 'numpy' from 'C:\\Users\\kittisopikulm\\.julia\\conda\\3\\lib\\site-packages\\numpy\\__init__.py'>

julia> np.sum(nprA)
-52

julia> PyObject(rA).dtype
ERROR: KeyError: key :dtype not found
Stacktrace:
 [1] __getproperty(o::PyObject, s::Symbol)
   @ PyCall ~\.julia\packages\PyCall\BD546\src\PyCall.jl:307
 [2] getproperty(o::PyObject, s::Symbol)
   @ PyCall ~\.julia\packages\PyCall\BD546\src\PyCall.jl:312
 [3] top-level scope
   @ REPL[33]:1

julia> PyObject(nprA).dtype
PyObject dtype('int8')

julia> py"""
       def add_one(A):
           return A + 1

       """

julia> py_add_one = py"add_one"
PyObject <function add_one at 0x00000000014BFF70>

julia> py_add_one(nprA)
4×4 Matrix{Int8}:
 116   -3  -19   -31
 -42   14  -61   -35
 -83  -36   92  -108
 115  -39   10    74

julia> py_add_one(rA)
ERROR: PyError ...

julia> py_add_one(PyObject(rA))
ERROR: PyError ...

@mkitti
Copy link
Member Author

mkitti commented Dec 28, 2021

I intend on closing this pull request on January 11th.

@mkitti
Copy link
Member Author

mkitti commented Jan 24, 2022

Closing this. See https://github.com/mkitti/NumPyArrays.jl if interested.

@mkitti mkitti closed this Jan 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants