Skip to content

[do not merge] proof of concept for unified v2 / v3 codecs #3276

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 130 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
130 commits
Select commit Hold shift + click to select a range
32e60d2
modernize typing
d-v-b Feb 21, 2025
f104f27
lint
d-v-b Feb 24, 2025
be6dedd
new dtypes
d-v-b Feb 26, 2025
f0dfbbf
rename base dtype, change type to kind
d-v-b Feb 26, 2025
06db4f6
start working on JSON serialization
d-v-b Feb 27, 2025
2bb4707
get json de/serialization largely working, and start making tests pass
d-v-b Feb 27, 2025
edcb7eb
tweak json type guards
d-v-b Feb 27, 2025
3fd0bf8
fix dtype sizes, adjust fill value parsing in from_dict, fix tests
d-v-b Feb 27, 2025
404a71c
mid-refactor commit
d-v-b Mar 2, 2025
aaeeb98
working form for dtype classes
d-v-b Mar 2, 2025
ec934b8
remove unused code
d-v-b Mar 2, 2025
8369ffc
use wrap / unwrap instead of to_dtype / from_dtype; push into v2 code…
d-v-b Mar 2, 2025
0aa1e49
push into v2
d-v-b Mar 3, 2025
de24a14
remove endianness kwarg to methods, make it an instance variable instead
d-v-b Mar 3, 2025
31a39d6
make wrapping safe by default
d-v-b Mar 4, 2025
2079efe
dtype-specific tests
d-v-b Mar 4, 2025
46a761b
more tests, fix void type default value logic
d-v-b Mar 5, 2025
3507eff
fix dtype mechanics in bytescodec
d-v-b Mar 5, 2025
53205ca
remove __post_init__ magic in favor of more explicit declaration
d-v-b Mar 7, 2025
ba9c06e
fix tests
d-v-b Mar 9, 2025
04f3b84
refactor data types
d-v-b Mar 12, 2025
925b9e2
start design doc
d-v-b Mar 13, 2025
e2fce7f
more design doc
d-v-b Mar 13, 2025
a583cd3
update docs
d-v-b Mar 13, 2025
e0b662d
fix sphinx warnings
d-v-b Mar 13, 2025
ed0c76b
tweak docs
d-v-b Mar 13, 2025
79a8fd2
info about v3 data types
d-v-b Mar 13, 2025
5e15369
adjust note
d-v-b Mar 13, 2025
14da662
fix: use unparametrized types in direct assignment
d-v-b Mar 13, 2025
a050f3b
start fixing config
d-v-b Mar 17, 2025
48f883b
Update src/zarr/core/_info.py
d-v-b Mar 17, 2025
6c70eac
add placeholder disclaimer to v3 data types summary
d-v-b Mar 17, 2025
fe2754a
make example runnable
d-v-b Mar 17, 2025
eb48a3c
placeholder section for adding a custom dtype
d-v-b Mar 17, 2025
d5e376f
define native data type and native scalar
d-v-b Mar 17, 2025
fc3297a
update data type names
d-v-b Mar 17, 2025
9c13c85
fix config test failures
d-v-b Mar 17, 2025
3e0e61b
call to_dtype once in blosc evolve_from_array_spec
d-v-b Mar 17, 2025
a8d815a
refactor dtypewrapper -> zdtype
d-v-b Mar 19, 2025
d90e6a0
update code examples in docs; remove native endianness
d-v-b Mar 19, 2025
b66f077
adjust type annotations
d-v-b Mar 20, 2025
9eae82a
fix info tests to use zdtype
d-v-b Mar 20, 2025
d6727a3
remove dead code and add code coverage exemption to zarr format checks
d-v-b Mar 20, 2025
8b517ee
fix: add special check for resolving int32 on windows
d-v-b Mar 20, 2025
97c645b
add dtype entry point test
d-v-b Mar 20, 2025
5d617e5
remove default parameters for parametric dtypes; add mixin classes fo…
d-v-b Mar 21, 2025
2fef5b2
refactor: use inheritance to remove boilerplate in dtype definitions
d-v-b Mar 24, 2025
2315068
Update docs/user-guide/data_types.rst
d-v-b Mar 24, 2025
13af518
update data types documentation, and expose core/dtype module to autodoc
d-v-b Mar 24, 2025
9a2eb93
add failing endianness round-trip test
d-v-b Mar 24, 2025
803b14d
fix endianness
d-v-b Mar 24, 2025
5bef120
additional check in test_explicit_endianness
d-v-b Mar 24, 2025
4dc9cd2
add failing test for round-tripping vlen strings
d-v-b Mar 24, 2025
e31e813
route object dtype arrays to vlen string dtype when numpy > 2
d-v-b Mar 25, 2025
844a94d
relax endianness mismatch to a warning instead of an error
d-v-b Mar 25, 2025
aa19cca
use public dtype module for docs instead of special-casing the core d…
d-v-b Mar 25, 2025
528cf28
use public dtype module for docs instead of special-casing the core d…
d-v-b Mar 25, 2025
cdc83a8
silence mypy error about array indexing
d-v-b Mar 25, 2025
78747c9
add release note
d-v-b Mar 25, 2025
901be0d
fix doctests, excluding config tests
d-v-b Mar 25, 2025
6eb707d
revert addition of linkage between dtype endianness and bytes codec e…
d-v-b Mar 26, 2025
5f0e60f
remove Any types
d-v-b Mar 26, 2025
d8bf274
add docstring for wrapper module
d-v-b Mar 26, 2025
7d6b86e
simplify config and docs
d-v-b Mar 26, 2025
5382e18
update config test
d-v-b Mar 26, 2025
233e051
fix S dtype test for v2
d-v-b Mar 26, 2025
a3a17df
fully remove v3jsonencoder
d-v-b Apr 28, 2025
421cf0b
refactor dtype module structure
d-v-b Apr 29, 2025
317f5cc
add timedelta64
d-v-b Apr 29, 2025
1dd36b3
refactor time dtypes
d-v-b Apr 30, 2025
b91ebb6
widen dtype test strategies
d-v-b May 1, 2025
5a2c48d
wip: begin creating isomorphic test suite for dtypes
d-v-b May 2, 2025
4c67302
finish common tests
d-v-b May 2, 2025
4140ca0
wip: test infrastructure for dtypes
d-v-b May 7, 2025
b1aa6ae
wip: use class-based tests for all dtypes
d-v-b May 7, 2025
813a3b9
fill out more tests, and adjust sized dtypes
d-v-b May 8, 2025
a832110
wip: json schema test
d-v-b May 12, 2025
557ecdd
add casting tests
d-v-b May 13, 2025
3484a1c
use relative link for changes
d-v-b May 13, 2025
b58346a
typo
d-v-b May 13, 2025
aa156f2
make bytes codec dtype logic a bit more literate
d-v-b May 13, 2025
5c51c52
increase deadline to 500ms
d-v-b May 13, 2025
0a2b567
fewer commented sections of problematic lru_store_cache section of th…
d-v-b May 13, 2025
4b2b6ec
add link to gh issue about lru_cache for sharding codec
d-v-b May 13, 2025
b737e67
attempt to speed up hypothesis tests by reducing max array size
d-v-b May 13, 2025
d4615e0
clean up docs
d-v-b May 13, 2025
aafb348
remove placeholder
d-v-b May 13, 2025
3ba3c22
make final example section doctested and more readable
d-v-b May 13, 2025
d5154c0
revert change to auto chunking
d-v-b May 13, 2025
d936c0e
revert quotation of literal type
d-v-b May 13, 2025
906caf7
lint
d-v-b May 13, 2025
6d34f7e
fix broken code block
d-v-b May 13, 2025
ef1c722
specialize test to handle stringdtype changes coming in numpy 2.3
d-v-b May 13, 2025
b4f2a59
add docstring to _TestZDType class
d-v-b May 13, 2025
c5cacca
type hints
d-v-b May 15, 2025
20e45a2
add numcodecs protocol
d-v-b May 16, 2025
165d106
expand changelog
d-v-b May 16, 2025
e1c7fbc
tweak docstring
d-v-b May 16, 2025
0101776
support v3 nan strings in JSON for float dtypes
d-v-b May 19, 2025
1f09128
revert removal of metadata chunk grid attribute
d-v-b May 21, 2025
cc6d741
use none to denote default fill value; remove old structured tests; u…
d-v-b May 22, 2025
b12e30c
add item size abstraction
d-v-b May 22, 2025
deb3068
rename fixed-length string dtypes, and be strict about the numpy obje…
d-v-b May 22, 2025
2b725ee
remove vestigial use of to_dtype().itemsize()
d-v-b May 22, 2025
03259c6
remove another vestigial use of to_dtype().itemsize()
d-v-b May 22, 2025
9a87b3d
emit warning about unstable dtype when serializing Structured dtype t…
d-v-b May 23, 2025
de76df0
put string dtypes in the strings module
d-v-b May 24, 2025
b4f1063
make tests isomorphic to source code
d-v-b May 24, 2025
7b6c78c
remove old string logic
d-v-b May 25, 2025
63ad7f5
use scale_factor and unit in cast_value for datetime
d-v-b May 26, 2025
e0b5a64
add regression testing against v2.18
d-v-b May 27, 2025
6437c8d
truncate U and S scalars in _cast_value_unsafe
d-v-b May 27, 2025
d9ab8da
docstrings and simplification for regression tests
d-v-b May 27, 2025
3302161
changes necessary for linting with regression tests
d-v-b May 27, 2025
4a301d9
improve method names, refactor type hints with typeddictionaries, fix…
d-v-b May 29, 2025
12bbb07
fix storage info discrepancy in docs
d-v-b May 29, 2025
463789b
fix docstring that was troubling sphinx
d-v-b May 29, 2025
e665cef
wip: add vlen-bytes
d-v-b May 29, 2025
35116af
add vlen-bytes
d-v-b May 29, 2025
73c3c45
wip
d-v-b Jun 26, 2025
6295578
wip
d-v-b Jun 30, 2025
64f234e
add image codecs test
d-v-b Jul 3, 2025
6eb3298
wip
d-v-b Jul 10, 2025
e463d0a
pass tests
d-v-b Jul 20, 2025
2cfc848
expand example
d-v-b Jul 20, 2025
60939c2
revert to main
d-v-b Jul 21, 2025
31c95ca
recover from bad rebase
d-v-b Jul 21, 2025
a2bc655
remove off-target changes
d-v-b Jul 21, 2025
50c6b48
update imagecodecs example
d-v-b Jul 22, 2025
9055a1a
Merge branch 'main' into feat/numcodecs-compat
d-v-b Aug 4, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 51 additions & 0 deletions examples/image_codecs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "zarr @ git+https://github.com/d-v-b/zarr-python.git@a2bc6555",
# "imagecodecs==2025.3.30",
# "pytest"
# ]
# ///

# "zarr @ git+https://github.com/zarr-developers/zarr-python.git@main",
from typing import Literal

import numcodecs
import numpy as np
import pytest
from imagecodecs.numcodecs import Jpeg

import zarr

numcodecs.register_codec(Jpeg)
jpg_codec = Jpeg()


@pytest.mark.parametrize("zarr_format", [2, 3])
def test(zarr_format: Literal[2, 3]) -> None:
store = {}
if zarr_format == 2:
z_w = zarr.create_array(
store=store,
data=np.zeros((100, 100, 3), dtype=np.uint8),
compressors=jpg_codec,
zarr_format=zarr_format,
)
else:
z_w = zarr.create_array(
store=store,
data=np.zeros((100, 100, 3), dtype=np.uint8),
serializer=jpg_codec,
zarr_format=zarr_format,
)
z_w[:] = 2
z_r = zarr.open_array(store=store, zarr_format=zarr_format)
assert np.all(z_r[:] == 2)
if zarr_format == 2:
print(z_r.metadata.to_dict()["compressor"])
else:
print(z_r.metadata.to_dict()["codecs"])


if __name__ == "__main__":
pytest.main([__file__, f"-c {__file__}", "-s"])
4 changes: 3 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,8 @@ test = [
"pytest-xdist",
"packaging",
"tomlkit",
"uv"
"uv",
"imagecodecs"
]
remote_tests = [
'zarr[remote]',
Expand Down Expand Up @@ -383,6 +384,7 @@ module = [
"tests.test_indexing",
"tests.test_properties",
"tests.test_sync",
"tests.test_v2",
"tests.test_regression.scripts.*"
]
ignore_errors = true
Expand Down
61 changes: 59 additions & 2 deletions src/zarr/abc/codec.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,21 @@
from __future__ import annotations

from abc import abstractmethod
from typing import TYPE_CHECKING, Generic, TypeVar
from collections.abc import Mapping
from typing import (
TYPE_CHECKING,
Generic,
Literal,
TypedDict,
TypeVar,
overload,
)

from typing_extensions import ReadOnly

from zarr.abc.metadata import Metadata
from zarr.core.buffer import Buffer, NDBuffer
from zarr.core.common import ChunkCoords, concurrent_map
from zarr.core.common import ChunkCoords, NamedConfig, ZarrFormat, concurrent_map
from zarr.core.config import config

if TYPE_CHECKING:
Expand Down Expand Up @@ -34,6 +44,21 @@
CodecInput = TypeVar("CodecInput", bound=NDBuffer | Buffer)
CodecOutput = TypeVar("CodecOutput", bound=NDBuffer | Buffer)

TName = TypeVar("TName", bound=str, covariant=True)


class CodecJSON_V2(TypedDict, Generic[TName]):
id: ReadOnly[TName]


CodecConfig_V3 = NamedConfig[str, Mapping[str, object]]

CodecJSON_V3 = str | CodecConfig_V3

# The widest type we will accept for a codec JSON
# This covers v2 and v3
CodecJSON = str | Mapping[str, object]


class BaseCodec(Metadata, Generic[CodecInput, CodecOutput]):
"""Generic base class for codecs.
Expand Down Expand Up @@ -157,6 +182,34 @@ async def encode(
"""
return await _batching_helper(self._encode_single, chunks_and_specs)

@overload
def to_json(self, zarr_format: Literal[2]) -> CodecJSON_V2[str]: ...
@overload
def to_json(self, zarr_format: Literal[3]) -> NamedConfig[str, Mapping[str, object]]: ...

def to_json(
self, zarr_format: ZarrFormat
) -> CodecJSON_V2[str] | NamedConfig[str, Mapping[str, object]]:
raise NotImplementedError

@classmethod
def _from_json_v2(cls, data: CodecJSON) -> Self:
raise NotImplementedError

@classmethod
def _from_json_v3(cls, data: CodecJSON) -> Self:
raise NotImplementedError

@classmethod
def from_json(cls, data: CodecJSON, zarr_format: ZarrFormat) -> Self:
if zarr_format == 2:
return cls._from_json_v2(data)
elif zarr_format == 3:
return cls._from_json_v3(data)
raise ValueError(
f"Unsupported Zarr format {zarr_format}. Expected 2 or 3."
) # pragma: no cover


class ArrayArrayCodec(BaseCodec[NDBuffer, NDBuffer]):
"""Base class for array-to-array codecs."""
Expand Down Expand Up @@ -447,3 +500,7 @@ async def wrap(chunk: CodecInput | None, chunk_spec: ArraySpec) -> CodecOutput |
return await func(chunk, chunk_spec)

return wrap


# Raised when a codec JSON data is invalid
class CodecValidationError(ValueError): ...
2 changes: 1 addition & 1 deletion src/zarr/codecs/_v2.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@


@dataclass(frozen=True)
class V2Codec(ArrayBytesCodec):
class _V2Codec(ArrayBytesCodec):
filters: tuple[numcodecs.abc.Codec, ...] | None
compressor: numcodecs.abc.Codec | None

Expand Down
Loading
Loading