-
Couldn't load subscription status.
- Fork 11
Add check for empty datasets in NWB containers #584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Add check for empty datasets in NWB containers #584
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @anoushkajain!
I think this looks good but there are a couple more pieces to add before merging.
- We may need to expand the check to other array-like dataset objects that could be present in NWB files that haven’t been written to disk yet. I've added a suggestion to the code related to this.
- Can you add tests including passing and failing cases with the different types of datasets.
- Can you add a new best practice section to the documentation recommending that objects with empty datasets should not be included in the file.
Also in the future, feel free to @ me or request a review whenever you are ready for a PR to be looked at!
| Inspector messages for each empty dataset found, or None if no empty datasets are found. | ||
| """ | ||
| for field_name, field in getattr(nwb_container, "fields", dict()).items(): | ||
| if not isinstance(field, (h5py.Dataset, zarr.Array)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we may also want to check for other empty array-like objects here, including numpy arrays and DataIO objects. Could use something like:
| if not isinstance(field, (h5py.Dataset, zarr.Array)): | |
| is_array_data = hasattr(value, "shape") and hasattr(value, "dtype") | |
| if not is_array_data: |
| continue | ||
|
|
||
| # Check if the dataset has zero elements | ||
| if field.size == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if it will be necessary but for the DataIO objects you may need to use get_data_shape from the nwbinspector.utils module.
| return None | ||
|
|
||
|
|
||
| @register_check(importance=Importance.CRITICAL, neurodata_type=NWBContainer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure yet whether this should be a critical check or a best practice violation. There was a discussion here about whether empty datasets should be allowed: NeurodataWithoutBorders/pynwb#2065.
Fix #324