Skip to content

indexing with a Categorical of Intervals is inefficient #61928

@flying-sheep

Description

@flying-sheep

This line converts the IntervalIndex into a numpy object array:

cat_array = hash_array(np.asarray(categories), categorize=False)

then in this block, a TypeError is raised and causes that object array to be converted into strings:

TypeError: (-0.00872, 0.439] of type <class 'pandas._libs.interval.Interval'> is not a valid type for hashing, must be string or null

try:
vals = hash_object_array(vals, hash_key, encoding)
except TypeError:
# we have mixed types
vals = hash_object_array(
vals.astype(str).astype(object), hash_key, encoding
)

Metadata

Metadata

Assignees

No one assigned

    Labels

    CategoricalCategorical Data TypeIntervalInterval data typePerformanceMemory or execution speed performance

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions