-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import numpy as np
import operator
from pandas.core.ops.array_ops import comparison_op
# Define a custom ndarray subclass
class TestArray(np.ndarray):
def __new__(cls, input_array):
return np.asarray(input_array).view(cls)
# Define test data
lvalues_1_dim = [1, 2, 3]
rvalues_0_dim = 1
# 0-dim and 1-dim numpy arrays
np_array_lvalues = np.array(lvalues_1_dim)
np_array_rvalues = np.array(rvalues_0_dim)
# 0-dim and 1-dim TestArray arrays
test_array_lvalues = TestArray(lvalues_1_dim)
test_array_rvalues = TestArray(rvalues_0_dim)
# Define a comparison operator
op = operator.gt
# Calling op on left and right np arrays works as expected
print(f"{op(np_array_lvalues, np_array_rvalues)=}")
# Calling comparison_op on left and right np arrays works as expected
print(f"{comparison_op(np_array_lvalues, np_array_rvalues, op)=}")
# Calling op on left and right TestArray arrays works as expected
print(f"{op(test_array_lvalues, test_array_rvalues)=}")
# Calling comparison_op on left and right TestArray arrays raises TypeError
try:
print(f"{comparison_op(test_array_lvalues, test_array_rvalues, op)=}")
except TypeError as e:
print(f"TypeError raised as expected: {e}")
# Expected output:
# op(np_array_lvalues, np_array_rvalues)=array([False, True, True])"
# comparison_op(np_array_lvalues, np_array_rvalues, op)=array([False, True, True])
# op(test_array_lvalues, test_array_rvalues)=TestArray([False, True, True])
# TypeError raised as expected: len() of unsized objectIssue Description
Due to the recent behaviour change of item_from_zerodim for subclasses of np.ndarray (see #62981) array_ops.py::comparison_op raises an unexpected TypeError for a 0-dim object of this subclass type. Relevant part of the code (Line 309-329):
rvalues = lib.item_from_zerodim(rvalues)
if isinstance(rvalues, list):
# We don't catch tuple here bc we may be comparing e.g. MultiIndex
# to a tuple that represents a single entry, see test_compare_tuple_strs
rvalues = np.asarray(rvalues)
if isinstance(rvalues, (np.ndarray, ABCExtensionArray)):
# TODO: make this treatment consistent across ops and classes.
# We are not catching all listlikes here (e.g. frozenset, tuple)
# The ambiguous case is object-dtype. See GH#27803
if len(lvalues) != len(rvalues):
raise ValueError(
"Lengths must match to compare", lvalues.shape, rvalues.shape
)Until recently it was guaranteed that rvalues would be either of type np.float64 (or so) if the dimension of rvalues before was 0 or a np.ndarray / other similar type if the dimension was higher. Now if rvalues is of a type that subclasses np.ndarray, 0-dim arrays are not changed by item_from_zerodim which then leads to the following TypeError in the isinstance(rvalues, (np.ndarray, ABCExtensionArray)) if-block as len cannot be called upon a 0-dim array:
if len(lvalues) != len(rvalues):
~~~^^^^^^^^^
TypeError: len() of unsized objectAs shown above in the reproducible example calling the operator op on the TestArray values works fine:
>>> op(test_array_lvalues, test_array_rvalues)
TestArray([False, True, True])Therefore I would suggest to add another check that skips the length check for 0-dim arrays for rvalues. A possible implementation could be to extend the if-statement with a dimension check:
if isinstance(rvalues, (np.ndarray, ABCExtensionArray)) and rvalues.ndim != 0:
# TODO: make this treatment consistent across ops and classes.
# We are not catching all listlikes here (e.g. frozenset, tuple)
# The ambiguous case is object-dtype. See GH#27803
if len(lvalues) != len(rvalues):
raise ValueError(
"Lengths must match to compare", lvalues.shape, rvalues.shape
)As the rvalues.ndim != 0 statement is only evaluated if isinstance(rvalues, (np.ndarray, ABCExtensionArray)) already returned True and these two types both have the ndim property this should not be a problem for other objects of rvalues that do not have this property. A bit safer would be getattr(rvalues, "ndim", 1) != 0. For both the != 0 part could be left out, but I guess including it makes it more readable that explicitly the array cannot be of 0-dim.
Expected Behavior
I would expect that calling comparison_op with an object of type subclass of np.ndarray for rvalues and of dimension 0 skips the matching length check as the 0-dim array will be broadcasted to the matching length either trough res_values = op(lvalues, rvalues) for should_extension_dispatch or _na_arithmetic_op. In the case of the example above I would expect the following return:
>>> comparison_op(test_array_lvalues, test_array_rvalues, op)
TestArray([False, True, True])Installed Versions
INSTALLED VERSIONS
commit : 7bf6660
python : 3.13.9
python-bits : 64
OS : Linux
OS-release : 5.15.153.1-microsoft-standard-WSL2
Version : #1 SMP Fri Mar 29 23:14:13 UTC 2024
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : C.UTF-8
pandas : 3.0.0.dev0+2728.g7bf6660984
numpy : 2.3.5
dateutil : 2.9.0.post0
pip : 25.3
Cython : 3.2.1
sphinx : None
IPython : 9.4.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.14.2
bottleneck : 1.6.0
fastparquet : None
fsspec : 2025.10.0
html5lib : 1.1
hypothesis : None
gcsfs : None
jinja2 : 3.1.6
lxml.etree : None
matplotlib : 3.10.8
numba : None
numexpr : None
odfpy : None
openpyxl : None
psycopg2 : None
pymysql : None
pyarrow : 22.0.0
pyiceberg : None
pyreadstat : None
pytest : 9.0.1
python-calamine : None
pytz : 2025.2
pyxlsb : None
s3fs : 2025.10.0
scipy : 1.16.3
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlsxwriter : None
zstandard : 0.25.0
qtpy : None
pyqt5 : None