Fancy index #459

lshaw8317 · 2025-08-21T07:27:12Z

Edited code to try and make fancy indexing faster (avoid searching full index every time via ndindex:as_subidx) and incorporate 1D fast path better. Could maybe be revised to allow for interleaved slice and integer arrays (currently it follows ndindex in only allowing slices before and after the integer arrays). Improvements are not huge, but there is a performance benefit, compared to the old algorithm, which found intersecting chunks via ndindex, and looped over those chunks, converting the full index to a local chunk index at every iteration. This involved searching the whole index for every loop (it remains as the default algorithm for boolean indexing).

The algorithm adaps he Zarr vindexing implementation which uses a sorting-based algorithm. Intersecting chunks are found (for slices and integer arrays), the integer array is sorted (by chunks for N>1 integer arrays, by index for a single integer array), and then the code loops over the chunks, taking the indexes corresponding to the chunk from the sorted integer array. For a single integer array, only the relevant part of the chunk (between the minimum and maximum index within a chunk) is loaded - for several integer arrays, because of how get_slice_numpy is written, it is necessary to load the whole chunk and then index on it.
Hence on avoids repeated searches over the whole index. There is a small detail in using np.unique, which does not quite function like np.bincount, since it sorts and may do a copy, which can cause a slowdown if not handled correctly.

Improvements could be made to try and find a way to use np.bincount directly, as I still do not completely trust np.unique to be fast. Also one could rewrite to interleave slices and indices (matching numpy). ndindex.expand also seems to consume quite a bit of memory (via copying) when broadcasting indices, I think np.broadcast would just do a view so might be better. Finally, one could rewrite get_slice_numpy to allow continuous, flat returns, and not necessarily only slices (a sequence of slices, which is multidimensional ''rectangular'', differs from a range from one multidimensional index to another, which is essentially 1D). See #441.

From the plot one can see that the new algorithm is just as fast as the old optimised 1D path, although it handles more cases.

FrancescAlted

Look great to me. It would be interesting to see how the new performance compares with h5py and zarr, for a reference.

bench/ndarray/fancy_index.py

src/blosc2/ndarray.py

tests/ndarray/test_ndarray.py

Luke Shaw and others added 9 commits August 13, 2025 10:13

Improve handling of 1d keys

b0f6ecd

Merge branch 'main' into fancyIndex

5926e8a

Passes all findex tests

8a82c0c

Uncomment tests

1fc4732

Now passes tests

3507e80

Streamlining code and trying to add 1D fast path

5a14db4

Minor memory optimisations

0883a1c

Added 1D fast path using bincount

4ffe594

Streamline code a little

75228d3

lshaw8317 marked this pull request as ready for review August 26, 2025 08:08

FrancescAlted approved these changes Aug 26, 2025

View reviewed changes

Luke Shaw added 2 commits August 26, 2025 11:09

Cleaning up code

6c2371b

Merge branch 'main' of github.com:Blosc/python-blosc2 into fancyIndex

3e4ea86

lshaw8317 mentioned this pull request Jul 25, 2025

Optimise fancy indexing further #441

Open

7 tasks

Edits to fancy_index.py

47bcb89

lshaw8317 merged commit 142f903 into main Aug 26, 2025
12 of 13 checks passed

lshaw8317 deleted the fancyIndex branch August 26, 2025 11:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fancy index #459

Fancy index #459

Uh oh!

lshaw8317 commented Aug 21, 2025 •

edited

Loading

Uh oh!

FrancescAlted left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fancy index #459

Fancy index #459

Uh oh!

Conversation

lshaw8317 commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FrancescAlted left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lshaw8317 commented Aug 21, 2025 •

edited

Loading