Skip to content

Conversation

lshaw8317
Copy link
Collaborator

@lshaw8317 lshaw8317 commented Aug 21, 2025

Edited code to try and make fancy indexing faster (avoid searching full index every time via ndindex:as_subidx) and incorporate 1D fast path better. Could maybe be revised to allow for interleaved slice and integer arrays (currently it follows ndindex in only allowing slices before and after the integer arrays). Improvements are not huge, but there is a performance benefit, compared to the old algorithm, which found intersecting chunks via ndindex, and looped over those chunks, converting the full index to a local chunk index at every iteration. This involved searching the whole index for every loop (it remains as the default algorithm for boolean indexing).

The algorithm adaps he Zarr vindexing implementation which uses a sorting-based algorithm. Intersecting chunks are found (for slices and integer arrays), the integer array is sorted (by chunks for N>1 integer arrays, by index for a single integer array), and then the code loops over the chunks, taking the indexes corresponding to the chunk from the sorted integer array. For a single integer array, only the relevant part of the chunk (between the minimum and maximum index within a chunk) is loaded - for several integer arrays, because of how get_slice_numpy is written, it is necessary to load the whole chunk and then index on it.
Hence on avoids repeated searches over the whole index. There is a small detail in using np.unique, which does not quite function like np.bincount, since it sorts and may do a copy, which can cause a slowdown if not handled correctly.

Improvements could be made to try and find a way to use np.bincount directly, as I still do not completely trust np.unique to be fast. Also one could rewrite to interleave slices and indices (matching numpy). ndindex.expand also seems to consume quite a bit of memory (via copying) when broadcasting indices, I think np.broadcast would just do a view so might be better. Finally, one could rewrite get_slice_numpy to allow continuous, flat returns, and not necessarily only slices (a sequence of slices, which is multidimensional ''rectangular'', differs from a range from one multidimensional index to another, which is essentially 1D). See #441.

From the plot one can see that the new algorithm is just as fast as the old optimised 1D path, although it handles more cases.
newfancybench

@lshaw8317 lshaw8317 marked this pull request as ready for review August 26, 2025 08:08
Copy link
Member

@FrancescAlted FrancescAlted left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look great to me. It would be interesting to see how the new performance compares with h5py and zarr, for a reference.

@lshaw8317 lshaw8317 mentioned this pull request Jul 25, 2025
7 tasks
@lshaw8317 lshaw8317 merged commit 142f903 into main Aug 26, 2025
12 of 13 checks passed
@lshaw8317 lshaw8317 deleted the fancyIndex branch August 26, 2025 11:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants