Skip to content

Conversation

joshqsumner
Copy link
Contributor

@joshqsumner joshqsumner commented Sep 15, 2025

Describe your changes
Adding a metadata inspection tool to inspect a configuration file before running a workflow in parallel, plantcv.parallel.inspect_dataset. It can also take a filepath but with reduced functionality.

inspect_dataset returns two dataframes, the first is a summary of what steps removed images and what was removed in each step, the second is the un-summarized dataframe of what images were found, their metadata, and whether or not they were kept and would go to your workflow.

Type of update
This is a new feature.

Associated issues
Closes #1788

Additional context
We had at least two different goals brought up about this feature. My thought was that it is mostly to check a configuration file before running something in parallel, @kmurphy61 and some other people might want a more general use tool to do broader file exploration.

There are a few places where this is a little clunky so far, most obvious to me being what it does if there is a metadata.json or SnapshotInfo.csv file. In those cases I sort of think it could do nothing and that would be fine. I think this will also be more useful after changes from PR 1774 and PR 1776 allowing for multiple imgformats and defaulting to all image formats, respectively.

For the reviewer
See this page for instructions on how to review the pull request.

  • PR functionality reviewed in a Jupyter Notebook
  • All tests pass
  • Test coverage remains 100%
  • Documentation tested
  • New documentation pages added to plantcv/mkdocs.yml
  • Changes to function input/output signatures added to updating.md
  • Code reviewed
  • PR approved

@joshqsumner joshqsumner added new feature New feature ideas and solutions work in progress Mark work in progress labels Sep 15, 2025
Copy link

deepsource-io bot commented Sep 15, 2025

Here's the code health analysis summary for commits 2766a24..ba90f5c. View details on DeepSource ↗.

Analysis Summary

AnalyzerStatusSummaryLink
DeepSource Python LogoPython✅ SuccessView Check ↗
DeepSource Test coverage LogoTest coverage✅ SuccessView Check ↗

Code Coverage Report

MetricAggregatePython
Branch Coverage100%100%
Composite Coverage100%100%
Line Coverage100%100%
New Branch Coverage100%100%
New Composite Coverage100%100%
New Line Coverage100%, ✅ Above Threshold100%, ✅ Above Threshold

💡 If you’re a repository administrator, you can configure the quality gates from the settings.

@joshqsumner joshqsumner added ready to review and removed work in progress Mark work in progress labels Sep 17, 2025
@joshqsumner joshqsumner mentioned this pull request Sep 18, 2025
8 tasks
@HaleySchuhl HaleySchuhl self-requested a review October 6, 2025 20:10
@github-project-automation github-project-automation bot moved this to Pull Requests in PlantCV4 Oct 6, 2025
@HaleySchuhl HaleySchuhl added this to the PlantCV v4.10 milestone Oct 6, 2025
Copy link
Contributor

@HaleySchuhl HaleySchuhl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works really nicely! This summary table is also useful in determining if a directory of images have unique names, based on images found and unique basenames. Super helpful.

I noticed some inconsistency with the updating.md file which I went ahead and resolved. e.g. parallel.function_name vs plantcv.parallel.function_name. First example is more consistently with other instances, where we document plantcv.main_function_name (rather than the technically more accurate plantcv.plantcv.main_function_name).

@joshqsumner
Copy link
Contributor Author

I noticed some inconsistency with the updating.md file which I went ahead and resolved. e.g. parallel.function_name vs plantcv.parallel.function_name. First example is more consistently with other instances, where we document plantcv.main_function_name (rather than the technically more accurate plantcv.plantcv.main_function_name).

Thanks Haley, I ended up adding some more later on for v5 stuff so I'll take a look at those as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new feature New feature ideas and solutions ready to review

Projects

Status: Pull Requests

Development

Successfully merging this pull request may close these issues.

metadata inspection tool

3 participants