Skip to content

Commit a516232

Browse files
committed
Explain why we are using blocks for chunks in h5py and zarr
1 parent 3f2a7fc commit a516232

File tree

1 file changed

+6
-2
lines changed

1 file changed

+6
-2
lines changed

bench/large-tree-store.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,10 @@
1212
This benchmark creates N numpy arrays with sizes following a normal distribution
1313
and measures the time and memory consumption for storing them in TreeStore, h5py, and zarr.
1414
15+
The arrays in h5py/zarr are compressed with the same defaults as in TreeStore.
16+
Moreover, the chunks for storing arrays in h5py/zarr are set to Blosc2's blocks
17+
(first partition) which should lead to same compression ratio as in TreeStore.
18+
1519
Note: This adapts to zarr v3+ API if available.
1620
"""
1721

@@ -154,7 +158,7 @@ def store_arrays_in_h5py(arrays, output_file):
154158
else:
155159
grp = f[group_name]
156160

157-
# Store array with compression
161+
# Store array with compression; use arr.blocks (first partition in Blosc2) as chunks
158162
grp.create_dataset(dataset_name, data=arr[:],
159163
# compression="gzip", shuffle=True,
160164
# To compare apples with apples, use Blosc2 compression with Zstd compression
@@ -213,7 +217,7 @@ def store_arrays_in_zarr(arrays, output_dir):
213217
else:
214218
grp = root[group_name]
215219

216-
# Store array with blosc2 compression
220+
# Store array with blosc2 compression; use arr.blocks (first partition in Blosc2) as chunks
217221
if zarr.__version__ >= "3":
218222
grp.create_array(
219223
name=dataset_name,

0 commit comments

Comments
 (0)