-
-
Notifications
You must be signed in to change notification settings - Fork 617
Adding Figure Test #4980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Adding Figure Test #4980
Conversation
Sorry this took so long to open, but I’m really looking forward to any discussions, questions, or suggestions you might have on this proof of concept.This is still an early version, so I'm very open to feedback on how to make it cleaner, more maintainable, and easier to use for contributors |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #4980 +/- ##
========================================
Coverage 98.57% 98.57%
========================================
Files 304 304
Lines 23645 23645
========================================
Hits 23309 23309
Misses 336 336 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Thanks for starting this, @medha-14! I like the third approach, too. And I'm also fine with the first approach – yes, it's not good practice to store binary data in a Git repository for repository hygiene and security, but smaller binaries like PNGs or other images are perfectly fine – we had many of them for the longest time in the repo. They didn't end up inflating the repository size as they didn't change much over time, so I'd be okay with adding a few.
The good thing is that we have a repository for precisely this purpose: https://github.com/pybamm-team/pybamm-data. The bad thing is that we'll need to put up the baseline images into a new release in that repository and update the version here (or create a new registry for storing the hashes of these images). Here are some factors, IMO, that we should consider before making a decision:
What are other packages that employ figure testing in practice doing (besides image-heavy packages such as Matplotlib itself)? We discussed SunPy before, and I remember they were using the second approach. Another thing to look at would be to see how much we've updated the plotting tests across the Git history on average over the past two years, as that would be a reasonable estimate for identifying how disruptive the process could be. This is a good idea; let's see how it turns out! |
Thanks @agriyakhetarpal for taking interest in this,
Actually, each test for plotting functionality produces its own baseline image, so we will indeed accumulate one image per mpl_image_compare decorator.
Yes—each @pytest.mark.mpl_image_compare yields exactly one baseline image (or hash entry). If you parameterize a test, you get one file per parameter combination.
Each PNG baseline averages about 0.095 MiB (≈100 kB). So the total footprint is simply (number of tests) × 0.095 MiB. With our current few tests, that’s well under 1 MiB; even 50 images would be around 4.75 MiB.
Yes, They’d live alongside your plotting tests
Yes. I’ve tested emitting SVG baselines by setting savefig_kwargs={"format": "svg"}, extension="svg" but SVG comparisons require extra dependencies (Inkscape for SVG→PNG conversion) though they offer human-readable diffs and often smaller file sizes as compared to bitmaps.
Yes, the baseline images are pretty sensitive to changes in the test setup. If we tweak things like which variables we’re plotting, how many time points we use, or even styling details like line widths or fonts, the resulting image can change.
From what I have seen in other repos like SunPy and Astropy they make this whole process easier by using CI automations—when new images are added, a GitHub Action takes care of updating the data repo. We could do something similar in
Yes—you get one file per parameter index (e.g. test_plot-0-baseline.png, test_plot-1-baseline.png). pytest --mpl-generate-hash-library=hashes.json --mpl This updates all the hashes in the hash library ie hashes.json. After running the command, the user can review the updated hashes and commit the new JSON. On the next CI run, pytest-mpl will use these new values as the reference for comparisons.
I was unable to find any other packages accomplishing this, other repositories like astropy, sunpy etc are the primary ones i could find using figure test and they also accomplish it with |
Description
Fixes #4885
This PR introduces a proof of concept for adding figure-based testing to PyBaMM. The goal is to ensure that the visual output of plots remains consistent and any changes are intentional.
There are three modes for implementing figure tests:
Image Comparison Mode
This mode stores baseline images directly in the repository and compares them with the images generated during tests.
Pros: Easy to debug; failed tests show the exact difference visually.
Cons: Increases the repository size significantly over time.
Hash Comparison Mode
This mode stores only the hash of the image and compares hashes during testing.
Pros: Very lightweight; no increase in repo size.
Cons: Hard to debug visually; hash mismatches don’t indicate what changed in the plot.
Hybrid Mode (Implemented here)
This mode stores the hashes in the main repository and saves the actual baseline images in a separate external repository (e.g., pybamm-figures).
Pros: Keeps the main repo small while still allowing visual debugging; contributors can view image diffs using the external repo.
Cons: Requires coordination between two repositories when updating baselines.
It also provides a detailed report of the tests and the status of eachh test when we run pytest
we can host these reports and integrate them in the workflow so user can access them at the time of pushnig a commit.
When a test fails, contributors can inspect the changed image visually and update both the external baseline image and the hash in test_hashes.json if the change is valid.
In the current implementation, only plotting functions that return a figure and are decorated are included in this test system. Failures will appear clearly via hash mismatches, and debugging will be supported visually through the separate image repo. There is some risk of flakiness (e.g., due to randomness or font rendering differences), but this can be minimized by seeding randomness and using consistent rendering settings.
Type of change
Please add a line in the relevant section of CHANGELOG.md to document the change (include PR #)
Important checks:
Please confirm the following before marking the PR as ready for review:
nox -s pre-commit
nox -s tests
nox -s doctests