-
Notifications
You must be signed in to change notification settings - Fork 271
Metadata inspection #1792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Metadata inspection #1792
Conversation
I should google this
This lets the idea of just giving this thing a file path work, but the implementation is very hacky
Here's the code health analysis summary for commits Analysis Summary
Code Coverage Report
|
fix syntax for setting config file parameters
make namespace depth consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works really nicely! This summary table is also useful in determining if a directory of images have unique names, based on images found and unique basename
s. Super helpful.
I noticed some inconsistency with the updating.md
file which I went ahead and resolved. e.g. parallel.function_name
vs plantcv.parallel.function_name
. First example is more consistently with other instances, where we document plantcv.main_function_name
(rather than the technically more accurate plantcv.plantcv.main_function_name
).
Thanks Haley, I ended up adding some more later on for v5 stuff so I'll take a look at those as well. |
Describe your changes
Adding a metadata inspection tool to inspect a configuration file before running a workflow in parallel,
plantcv.parallel.inspect_dataset
. It can also take a filepath but with reduced functionality.inspect_dataset
returns two dataframes, the first is a summary of what steps removed images and what was removed in each step, the second is the un-summarized dataframe of what images were found, their metadata, and whether or not they were kept and would go to your workflow.Type of update
This is a new feature.
Associated issues
Closes #1788
Additional context
We had at least two different goals brought up about this feature. My thought was that it is mostly to check a configuration file before running something in parallel, @kmurphy61 and some other people might want a more general use tool to do broader file exploration.
There are a few places where this is a little clunky so far, most obvious to me being what it does if there is a
metadata.json
orSnapshotInfo.csv
file. In those cases I sort of think it could do nothing and that would be fine. I think this will also be more useful after changes from PR 1774 and PR 1776 allowing for multiple imgformats and defaulting to all image formats, respectively.For the reviewer
See this page for instructions on how to review the pull request.
plantcv/mkdocs.yml
updating.md