Serialize scalars and 0-dimensional arrays #99

hmaarrfk · 2024-01-01T17:27:51Z

closes #98

tests/test_np.py

mverleg

It would be nice to update the readme. I'm happy to do it myself if you prefer.

json_tricks/decoders.py

mverleg · 2024-01-02T21:23:01Z

json_tricks/decoders.py

 	else:
-		return _scalar_to_numpy(data_json, nptype)
+		# This code path is mostly for 0-dimensional arrays
+		# numpy scalars are separately decoded


What about sclars that were serialized with encode_scalars_inplace before these changes?

json_tricks/decoders.py

json_tricks/encoders.py

tests/test_np.py

mverleg · 2024-01-02T22:30:26Z

json_tricks/encoders.py

+	elif isinstance(obj, scalar_types):
+		dct = hashodict((
+			('__numpy_scalar__', obj.item()),
+			('dtype', str(obj.dtype)),


This new approach seems great and indeed in an ideal world it would work perfectly.

...

However, it was found in issue #18 that Python sees some numpy scalars are primitives, and refuses to call encoders for them (presumably for performance).

Which ones depends on the Python version, for extra confusion, although I guess Python 2 is less important now.

In any case, this function in Python 3 works for a lot of types, but not for float64, which is an important one. There are two concerns with this:

While keeping the numpy scalar type is better in general, doing it half the time seems like it adds more confusion than it's worth.

We've been doing it this way, it's be a (slightly) breaking change to start doing it differently.

I think we'll need to think a bit more about this, maybe make it opt-in, or skip the scalars and do 0dimensional arrays only, if that's possible.

omg, i feel so ashamed to not have tested float64....

I literally tested uint8 through uint64, int8 through int64, datetime64s, and float32.... but not float64....

for reference, this is the current output.

{ "uint32": { "__numpy_scalar__": 1, "dtype": "uint32" }, "int32": { "__numpy_scalar__": 1, "dtype": "int32" }, "float32": { "__numpy_scalar__": 1.0, "dtype": "float32" }, "float64": 1.0, "datetime64[ns]": { "__numpy_scalar__": 1704235669639528000, "dtype": "datetime64[ns]" }, "datetime64[us]": { "__numpy_scalar__": { "__datetime__": null, "year": 2024, "month": 1, "day": 2, "hour": 22, "minute": 47, "second": 49, "microsecond": 639546 }, "dtype": "datetime64[us]" } }

hmaarrfk · 2024-01-03T03:14:12Z

The test you added in ac5c8be is in direct conflict with some of the spirits here of serializing numpy dtypes in a round trip way.

mverleg · 2024-01-04T22:31:09Z

The test you added in ac5c8be is in direct conflict with some of the spirits here of serializing numpy dtypes in a round trip way.

Yeah I get that, but I think it could be useful to preserve backwards compatibility (and test to ensure we do so). That's what I meant in the previous review.

We've been doing it this way, it's be a (slightly) breaking change to start doing it differently.

Maybe it's useful enough to break compatibility but I'm not sure. I guess people who really like the old format could use the primitives=True flag. I'll think about it.

mverleg · 2024-01-26T22:32:11Z

I think it's best to keep this one for consideration in version 4.0 because of the backwards compatibility concerns

hmaarrfk · 2024-01-27T23:05:24Z

honestly, I think the challenges with float vs np.float64 occur because the default json parser is used as a base. If you want, we can have a heavy handed approach, effectively taking _make_iterencode and adding a stricter check against float type.

ultimately, you could "solve" this issue by trying to use an other json parser like orjson or something other.

Your choice.

This isn't a deal breaker for me just at this exact moment, however, round tripping is something that is quite important to us and this might make me look elsewhere as a user as the challenges increase.

Understanding your timeline for 4.0 would be helpful in us making our decision to stick with pyjson_tricks and continue to contribute to this wonderful project!

hmaarrfk · 2024-08-26T14:49:53Z

This somewhat came up as we are reviewing our test suite and trying to cleanup our warnings. It seems that the main thing that was left is trying to decide on how to deal with

Maybe it's useful enough to break compatibility but I'm not sure. I guess people who really like the old format could use the primitives=True flag. I'll think about it.

Is that in fact the only outstanding issue? it might be possible if I create enough test cases to find a pattern for how to address these old datasets.

hmaarrfk commented Jan 1, 2024

View reviewed changes

tests/test_np.py Show resolved Hide resolved

hmaarrfk commented Jan 1, 2024

View reviewed changes

tests/test_np.py Outdated Show resolved Hide resolved

mverleg reviewed Jan 2, 2024

View reviewed changes

hmaarrfk force-pushed the serialize_scalars branch 3 times, most recently from 0061bd4 to 3f9a396 Compare January 3, 2024 03:10

mverleg added the v4.0 label Jan 26, 2024

TomasSkotare mentioned this pull request Mar 8, 2024

MatlabExecutionResult: Serialization of some primitives will change the type TomasSkotare/visp_matlab_loader#1

Closed

hmaarrfk added 4 commits August 26, 2024 10:34

Enable different serialization for 0dim arrays and datetime64

5bb2e2f

WARNING: DO NOT MERGE: copied to modify float check

dabc8d2

Get encode in place to work

f6c09f5

Remove test scalar types test

dbfeed4

hmaarrfk force-pushed the serialize_scalars branch from 49e834f to dbfeed4 Compare August 26, 2024 14:34

Serialize scalars and 0-dimensional arrays #99

Are you sure you want to change the base?

Serialize scalars and 0-dimensional arrays #99

Uh oh!

Conversation

hmaarrfk commented Jan 1, 2024

Uh oh!

Uh oh!

Uh oh!

mverleg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mverleg Jan 2, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mverleg Jan 2, 2024

Choose a reason for hiding this comment

Uh oh!

hmaarrfk Jan 2, 2024

Choose a reason for hiding this comment

Uh oh!

hmaarrfk commented Jan 3, 2024

Uh oh!

mverleg commented Jan 4, 2024

Uh oh!

mverleg commented Jan 26, 2024

Uh oh!

hmaarrfk commented Jan 27, 2024

Uh oh!

hmaarrfk commented Aug 26, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants