-
Notifications
You must be signed in to change notification settings - Fork 24
Serialize scalars and 0-dimensional arrays #99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
mverleg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to update the readme. I'm happy to do it myself if you prefer.
| else: | ||
| return _scalar_to_numpy(data_json, nptype) | ||
| # This code path is mostly for 0-dimensional arrays | ||
| # numpy scalars are separately decoded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about sclars that were serialized with encode_scalars_inplace before these changes?
| elif isinstance(obj, scalar_types): | ||
| dct = hashodict(( | ||
| ('__numpy_scalar__', obj.item()), | ||
| ('dtype', str(obj.dtype)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new approach seems great and indeed in an ideal world it would work perfectly.
...
However, it was found in issue #18 that Python sees some numpy scalars are primitives, and refuses to call encoders for them (presumably for performance).
Which ones depends on the Python version, for extra confusion, although I guess Python 2 is less important now.
In any case, this function in Python 3 works for a lot of types, but not for float64, which is an important one. There are two concerns with this:
- While keeping the numpy scalar type is better in general, doing it half the time seems like it adds more confusion than it's worth.
- We've been doing it this way, it's be a (slightly) breaking change to start doing it differently.
I think we'll need to think a bit more about this, maybe make it opt-in, or skip the scalars and do 0dimensional arrays only, if that's possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
omg, i feel so ashamed to not have tested float64....
I literally tested uint8 through uint64, int8 through int64, datetime64s, and float32.... but not float64....
for reference, this is the current output.
{
"uint32": {
"__numpy_scalar__": 1,
"dtype": "uint32"
},
"int32": {
"__numpy_scalar__": 1,
"dtype": "int32"
},
"float32": {
"__numpy_scalar__": 1.0,
"dtype": "float32"
},
"float64": 1.0,
"datetime64[ns]": {
"__numpy_scalar__": 1704235669639528000,
"dtype": "datetime64[ns]"
},
"datetime64[us]": {
"__numpy_scalar__": {
"__datetime__": null,
"year": 2024,
"month": 1,
"day": 2,
"hour": 22,
"minute": 47,
"second": 49,
"microsecond": 639546
},
"dtype": "datetime64[us]"
}
}0061bd4 to
3f9a396
Compare
|
The test you added in ac5c8be is in direct conflict with some of the spirits here of serializing numpy dtypes in a round trip way. |
Yeah I get that, but I think it could be useful to preserve backwards compatibility (and test to ensure we do so). That's what I meant in the previous review.
Maybe it's useful enough to break compatibility but I'm not sure. I guess people who really like the old format could use the |
|
I think it's best to keep this one for consideration in version 4.0 because of the backwards compatibility concerns |
|
honestly, I think the challenges with ultimately, you could "solve" this issue by trying to use an other json parser like Your choice. This isn't a deal breaker for me just at this exact moment, however, round tripping is something that is quite important to us and this might make me look elsewhere as a user as the challenges increase. Understanding your timeline for 4.0 would be helpful in us making our decision to stick with |
49e834f to
dbfeed4
Compare
|
This somewhat came up as we are reviewing our test suite and trying to cleanup our warnings. It seems that the main thing that was left is trying to decide on how to deal with
Is that in fact the only outstanding issue? it might be possible if I create enough test cases to find a pattern for how to address these old datasets. |
closes #98