Adding Figure Test #4980

medha-14 · 2025-04-17T08:32:28Z

Description

This PR introduces a proof of concept for adding figure-based testing to PyBaMM. The goal is to ensure that the visual output of plots remains consistent and any changes are intentional.

There are three modes for implementing figure tests:

Image Comparison Mode
This mode stores baseline images directly in the repository and compares them with the images generated during tests.
Pros: Easy to debug; failed tests show the exact difference visually.
Cons: Increases the repository size significantly over time.
Hash Comparison Mode
This mode stores only the hash of the image and compares hashes during testing.
Pros: Very lightweight; no increase in repo size.
Cons: Hard to debug visually; hash mismatches don’t indicate what changed in the plot.
Hybrid Mode (Implemented here)
This mode stores the hashes in the main repository and saves the actual baseline images in a separate external repository (e.g., pybamm-figures).
Pros: Keeps the main repo small while still allowing visual debugging; contributors can view image diffs using the external repo.
Cons: Requires coordination between two repositories when updating baselines.

It also provides a detailed report of the tests and the status of eachh test when we run pytest

we can host these reports and integrate them in the workflow so user can access them at the time of pushnig a commit.

When a test fails, contributors can inspect the changed image visually and update both the external baseline image and the hash in test_hashes.json if the change is valid.

In the current implementation, only plotting functions that return a figure and are decorated are included in this test system. Failures will appear clearly via hash mismatches, and debugging will be supported visually through the separate image repo. There is some risk of flakiness (e.g., due to randomness or font rendering differences), but this can be minimized by seeding randomness and using consistent rendering settings.

Type of change

Please add a line in the relevant section of CHANGELOG.md to document the change (include PR #)

Important checks:

Please confirm the following before marking the PR as ready for review:

No style issues: nox -s pre-commit
All tests pass: nox -s tests
The documentation builds: nox -s doctests
Code is commented for hard-to-understand areas
Tests added that prove fix is effective or that feature works

medha-14 · 2025-04-17T08:35:07Z

Sorry this took so long to open, but I’m really looking forward to any discussions, questions, or suggestions you might have on this proof of concept.This is still an early version, so I'm very open to feedback on how to make it cleaner, more maintainable, and easier to use for contributors

codecov · 2025-04-17T11:50:45Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.57%. Comparing base (34186fe) to head (3cb2694).

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #4980   +/-   ##
========================================
  Coverage    98.57%   98.57%           
========================================
  Files          304      304           
  Lines        23645    23645           
========================================
  Hits         23309    23309           
  Misses         336      336

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

agriyakhetarpal · 2025-04-17T18:59:41Z

Thanks for starting this, @medha-14! I like the third approach, too. And I'm also fine with the first approach – yes, it's not good practice to store binary data in a Git repository for repository hygiene and security, but smaller binaries like PNGs or other images are perfectly fine – we had many of them for the longest time in the repo. They didn't end up inflating the repository size as they didn't change much over time, so I'd be okay with adding a few.

saves the actual baseline images in a separate external repository (e.g., pybamm-figures).

The good thing is that we have a repository for precisely this purpose: https://github.com/pybamm-team/pybamm-data. The bad thing is that we'll need to put up the baseline images into a new release in that repository and update the version here (or create a new registry for storing the hashes of these images).

Here are some factors, IMO, that we should consider before making a decision:

In terms of the first approach:
- how many baseline images do we plan to add to the repository?
  - Is the number of images to be added equal to (or will be) the number of mpl_image_compare decorators?
- what is the mean size per image in MiB (and total size)?
- I assume the images will be added to the testing code and not under src/pybamm/?
- Is it possible to store these images in a text-based formats instead (e.g., SVG)?
In terms of the third approach:
- I imagine that the generated images will change if the parameters inside the test change – say, if we change the variables to output at, or the number of time points, etc.? How sensitive are the baseline images to such changes?
- If one adds a new test, a new baseline image is generated, and someone among us needs to create a new release to the pybamm-data repository and add it to the PR where the test was added. This feels antithetical to allowing improvements, as it involves many bureaucratic steps.
- If a test is parametrized, do the baseline images also change? Please walk me through the process of how the hashes are updated 😅

What are other packages that employ figure testing in practice doing (besides image-heavy packages such as Matplotlib itself)? We discussed SunPy before, and I remember they were using the second approach.

Another thing to look at would be to see how much we've updated the plotting tests across the Git history on average over the past two years, as that would be a reasonable estimate for identifying how disruptive the process could be.

This is a good idea; let's see how it turns out!

medha-14 · 2025-05-04T06:23:29Z

Thanks @agriyakhetarpal for taking interest in this,

How many baseline images do we plan to add?

Actually, each test for plotting functionality produces its own baseline image, so we will indeed accumulate one image per mpl_image_compare decorator.
Right now, our figure‐testing suite is modest , only a few plotting tests exist so we’d only add a handful of images. but if we want to cover all the different models and solvers etc that can be plotted then the number of tests can increase.

Is the number of images to be added equal to (or will be) the number of mpl_image_compare decorators?

Yes—each @pytest.mark.mpl_image_compare yields exactly one baseline image (or hash entry). If you parameterize a test, you get one file per parameter combination.

What is the mean size per image in MiB (and total size)?

Each PNG baseline averages about 0.095 MiB (≈100 kB). So the total footprint is simply (number of tests) × 0.095 MiB. With our current few tests, that’s well under 1 MiB; even 50 images would be around 4.75 MiB.

I assume the images will be added to the testing code and not under src/pybamm/?

Yes, They’d live alongside your plotting tests (e.g. in a tests/.../baselines folder). If we wanted to, we could also host them externally and simply configure the tests to pull from that location—but wouldn’t mingle them with the main src/pybamm/ modules.

Is it possible to store these images in a text-based formats instead (e.g., SVG)?

Yes. I’ve tested emitting SVG baselines by setting

savefig_kwargs={"format": "svg"}, extension="svg"

but SVG comparisons require extra dependencies (Inkscape for SVG→PNG conversion) though they offer human-readable diffs and often smaller file sizes as compared to bitmaps.

I imagine that the generated images will change if the parameters inside the test change – say, if we change the variables to output at, or the number of time points, etc.? How sensitive are the baseline images to such changes

Yes, the baseline images are pretty sensitive to changes in the test setup. If we tweak things like which variables we’re plotting, how many time points we use, or even styling details like line widths or fonts, the resulting image can change.

If one adds a new test, a new baseline image is generated, and someone among us needs to create a new release to the pybamm-data repository and add it to the PR where the test was added. This feels antithetical to allowing

From what I have seen in other repos like SunPy and Astropy they make this whole process easier by using CI automations—when new images are added, a GitHub Action takes care of updating the data repo. We could do something similar in pybamm-data to avoid the need for manual tagging and make things smoother.

If a test is parametrized, do the baseline images change? How are hashes updated?

Yes—you get one file per parameter index (e.g. test_plot-0-baseline.png, test_plot-1-baseline.png).
and as of the generatrion of hases:
When a user updates a test and that change affects the baseline images, they can regenerate the hash library using:

pytest --mpl-generate-hash-library=hashes.json --mpl

This updates all the hashes in the hash library ie hashes.json. After running the command, the user can review the updated hashes and commit the new JSON. On the next CI run, pytest-mpl will use these new values as the reference for comparisons.

What are other packages that employ figure testing in practice doing (besides image-heavy packages such as Matplotlib itself)? We discussed SunPy before, and I remember they were using the second approach.

I was unable to find any other packages accomplishing this, other repositories like astropy, sunpy etc are the primary ones i could find using figure test and they also accomplish it with pytest-mpl only and they use hybrid mode of testing, with parametrized tox environments, baseline folders, hash-library files, and HTML summary etc.

figure test POC

40c3d75

medha-14 requested a review from a team as a code owner April 17, 2025 08:32

style: pre-commit fixes

7474a53

Merge branch 'develop' into figuretest

3cb2694

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding Figure Test #4980

Adding Figure Test #4980

medha-14 commented Apr 17, 2025

medha-14 commented Apr 17, 2025

codecov bot commented Apr 17, 2025 •

edited

Loading

agriyakhetarpal commented Apr 17, 2025

medha-14 commented May 4, 2025

Adding Figure Test #4980

Are you sure you want to change the base?

Adding Figure Test #4980

Conversation

medha-14 commented Apr 17, 2025

Description

Type of change

Important checks:

medha-14 commented Apr 17, 2025

codecov bot commented Apr 17, 2025 • edited Loading

Codecov Report

agriyakhetarpal commented Apr 17, 2025

medha-14 commented May 4, 2025

codecov bot commented Apr 17, 2025 •

edited

Loading