Skip to content

Conversation

@keewis
Copy link
Collaborator

@keewis keewis commented Oct 27, 2025

  • Closes #xxxx
  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst

Building on top of zarr-developers/zarr-python#3534, this is a draft PR that allows writing variable-sized chunks to zarr.

To see this in action, try:

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "xarray @ git+https://github.com/keewis/xarray.git@variable-chunking",
#   "zarr @ git+https://github.com/jhamman/zarr-python.git@feature/rectilinear-chunk-grid",
# ]
# ///

import numpy as np
import xarray as xr

rng = np.random.default_rng(seed=0)
values = rng.normal(size=(365, 20))

ds = xr.Dataset(
    {"a": (["time", "x"], values)},
    coords={"time": xr.date_range("2025-01-01", freq="d", periods=365)}
)
chunked = ds.chunk({"time": xr.groupers.TimeResampler(freq="ME"), "x": 10})

chunked.to_zarr(
    "variable_chunks.zarr",
    mode="w",
    safe_chunks=False,
    zarr_format=3,
    consolidated=False,
)

ds = xr.open_dataset(store, engine="zarr", chunks={})
print(ds.chunksizes)
# Frozen({'time': (31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31), 'x': (10, 10)})

At the moment, this requires safe_chunks=False because I didn't change the chunk alignment machinery, yet.

cc @d-v-b, @jhamman, @dcherian

@github-actions github-actions bot added topic-backends topic-zarr Related to zarr storage library io labels Oct 27, 2025
# while dask chunks can be variable sized
# https://dask.pydata.org/en/latest/array-design.html#chunks
if var_chunks and not enc_chunks:
if zarr_format == 3:
Copy link
Collaborator Author

@keewis keewis Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check is probably not sufficient

@keewis keewis marked this pull request as draft October 27, 2025 16:34
@jhamman
Copy link
Member

jhamman commented Oct 27, 2025

We need zarr-python>=3, which doesn't work with @jhamman's fork because it doesn't have tags for versions above 3.0.0b2

I just pushed tags to my fork!

@keewis
Copy link
Collaborator Author

keewis commented Oct 27, 2025

thanks, I've changed the example back to using your fork

Comment on lines 307 to +315
if any(len(set(chunks[:-1])) > 1 for chunks in var_chunks):
raise ValueError(
"Zarr requires uniform chunk sizes except for final chunk. "
"Zarr v2 requires uniform chunk sizes except for final chunk. "
f"Variable named {name!r} has incompatible dask chunks: {var_chunks!r}. "
"Consider rechunking using `chunk()`."
)
if any((chunks[0] < chunks[-1]) for chunks in var_chunks):
raise ValueError(
"Final chunk of Zarr array must be the same size or smaller "
"Final chunk of a Zarr v2 array must be the same size or smaller "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct - it's unfortunately not as simple as "Zarr V3 supports variable-length chunking but Zarr V2 doesn't".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

io topic-backends topic-zarr Related to zarr storage library

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants