Improve plotting: separate subplots per benchmark + add gate ratio metric #174

AdiBak · 2025-10-15T05:39:02Z

Summary

Improves the plotting system by separating benchmarks into individual subplots for better readability, as requested in #156.

Changes Made

Changed layout from single-axis plots to clean 2x3 grid with individual subplots per benchmark
New compilation metric showing ratio compiled_gates / raw_gates
Generates individual files for compile-time, gate-count, gate-ratio, rel-err-ideal, rel-err-noisy
Added option to use linear vs log scale (gate ratio uses linear, others use log)

Files Changed

plotting/plot_latest_benchmark.py - Updated plotting functions to use subplot layout and add gate ratio metric

Technical Details

Replaced generate_plot() with generate_compilation_subplots() and generate_simulation_subplots()
Changed from dynamic row calculation to fixed 2x3 grid layout
Added use_log_scale configuration option to plot configs
Added compiled gate ratio calculation: compiled_multiq_gates / raw_multiq_gates

Key Improvements

Each benchmark has its own subplot
2x3 layout for easy comparison
Linear scale for gate ratio (values typically 0-1.5)
Separate files per metric for organization

Images

Compilation:

latest_compiler_benchmarks_by_circuit_compiled-ratio

latest_compiler_benchmarks_by_circuit_compiled-multiq-gates

latest_compiler_benchmarks_by_circuit_compile-time

Simulation:

Testing

Tested with latest benchmark data from ucc-benchmarks-8-core-U22.04 runner.

Issue Addressed

#156.

Thanks so much to @jordandsullivan for the guidance and feedback!

…tric

jordandsullivan · 2025-10-15T23:04:57Z

@AdiBak looks like the tests are failing rn due to a ruff formatting/style complaint (basically our CI/CD pipeline requires certain code best practices). If you run ruff check --fix then commit those changes, you should be good to go :)

jordandsullivan · 2025-10-15T23:05:37Z

@natestemen @bachase asking your your review here if you have an opinion on the format of the plots, and whether we should add any of them to the landing page.

Copilot

Pull Request Overview

This PR improves the plotting system by transitioning from single-axis plots to individual subplots for better visualization of benchmark data. The changes create a cleaner 2x3 grid layout where each benchmark gets its own subplot, and introduces a new gate ratio metric for compilation analysis.

Replaced single-axis plotting with subplot-based visualization using a fixed 2x3 grid layout
Added a new compilation metric showing the ratio of compiled gates to raw gates
Implemented separate file generation for each metric type with configurable linear/log scaling

Comments suppressed due to low confidence (1)

plotting/plot_latest_benchmark.py:1

[nitpick] The simulation plot configs don't include a use_log_scale parameter, but the compilation configs do. Consider adding this parameter for consistency, even if it defaults to True.

import argparse

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-16T13:13:23Z

plotting/plot_latest_benchmark.py

+def generate_simulation_subplots(
+    df: pd.DataFrame,
+    plot_configs: list[dict],
+    latest_date: str,
+    out_path: Path,
+    use_pdf: bool = False,
+):
+    """Generate subplots for simulation benchmarks with separate subplot per benchmark."""
+    # Configure matplotlib for LaTeX output if PDF export is requested
+    if use_pdf:
+        plt.rcParams.update(
+            {
+                "text.usetex": True,  # for matching math & fonts (optional)
+                "font.family": "serif",
+            }
+        )
+
+    benchmarks = sorted(df["benchmark_id"].unique())
+    compilers = df["compiler"].unique()
+    n_benchmarks = len(benchmarks)
+    ncols = 3
+    nrows = 2
+
+    # Create separate figures for each metric (like compilation plots)
+    for config in plot_configs:
+        fig, axes = plt.subplots(nrows, ncols, figsize=(5 * ncols, 4 * nrows), squeeze=False)
+        axes = axes.flatten()
+        color_map = get_compiler_colormap()
+
+        for i, ax in enumerate(axes):
+            if i < n_benchmarks:
+                benchmark = benchmarks[i]
+                sub = df[df["benchmark_id"] == benchmark]
+
+                # Extract values for each compiler
+                values = []
+                compiler_names = []
+                for compiler in compilers:
+                    row = sub[sub["compiler"] == compiler]
+                    if not row.empty:
+                        values.append(row[config["y_col"]].values[0])
+                        compiler_names.append(compiler)
+
+                # Create bars
+                x_positions = np.arange(len(compiler_names))
+                bars = ax.bar(
+                    x_positions,
+                    values,
+                    color=[color_map.get(compiler, "#4C72B0") for compiler in compiler_names],
+                    width=0.5,
+                )
+
+                ax.set_xticks(x_positions)
+                ax.set_xticklabels(compiler_names, rotation=30, ha="right")
+                ax.set_title(f"Benchmark: {benchmark}")
+                ax.set_ylabel(config["ylabel"])
+            else:
+                ax.set_visible(False)
+
+        plt.suptitle(f"{config['title']} (Date: {latest_date})", fontsize=16)
+        plt.tight_layout(rect=[0, 0, 1, 0.96])
+
+        # Save with metric-specific filename
+        metric_name = config["y_col"].replace("_", "-")
+        metric_out_path = out_path.parent / f"{out_path.stem}_{metric_name}{out_path.suffix}"
+        print(f"Saving plot to {metric_out_path}")
+        fig.savefig(metric_out_path, dpi=300, bbox_inches="tight")
+        plt.close(fig)


The generate_simulation_subplots function contains nearly identical code to generate_compilation_subplots. Consider extracting the common subplot generation logic into a shared helper function to reduce code duplication.

…e for gate ratio

AdiBak · 2025-10-17T05:54:57Z

Hi,
Thanks for the feedback! I refactored the code to have a helper plotting function generate_subplots that is now used for compilation and simulation subplot generation, and removed the unused bars variable.

jordandsullivan · 2025-10-17T16:50:15Z

Thanks @AdiBak -- it looks like the tests are still failing on the formatting issue -- can you run ruff check --fix locally?

bachase

Thanks for this work @AdiBak. I left a few comments/questions.

For overall feedback, for the compiler metrics (gate count/compile-time), I don't find the format more convenient to read than we have now for a quick overview/glance of performance. I think it does show the "Gate Counts" plot isn't so good, since it mixes scales of different circuits. I would consider replacing that with a compiled ratio plot instead, but keeping the same "all in one plot" format.

For the simulation plots, I also like the style in #156 for the compiled vs. uncompiled vs ideal, but not that isn't quite what this PR does. Regardless, I wouldn't include the simulation ones in the README.md until we better calibrate the noise levels.

If we want to switch to this style, I'd lean towards making a BENCHMARK.md or something which links to all the images, adds a little context on how to interpret them, and link to that from the README and the docs.

All that being said, I don't mind auto-generating these in addition to what we have today, but I'm not in favor of just replacing directly.

bachase · 2025-10-17T17:40:49Z

plotting/plot_latest_benchmark.py

+    use_pdf: bool = False,
+):
+    """Generate subplots for compilation benchmarks with separate subplot per benchmark."""
+    generate_subplots(df, plot_configs, latest_date, out_path, use_pdf)


Since generate_compilation_subplots (and generate_compilation_subplots later on) just directly call generate_subplots, I would just eliminate these functions and gall generate_subplots directly.

@bachase could you clarify what you mean by

I think it does show the "Gate Counts" plot isn't so good, since it mixes scales of different circuits. I would consider replacing that with a compiled ratio plot instead, but keeping the same "all in one plot format"

Are you referring to our existing plots when you say "it does show the "Gate Counts" plot isn't so good"?

If I understand correctly you are saying that we could replace the existing all-in-one plot of gate counts, which currently is not terribly informative given it mixes circuits with wildly different numbers of gates, with an all-in-one plot of compiled ratio, yes?

Since generate_compilation_subplots (and generate_compilation_subplots later on) just directly call generate_subplots, I would just eliminate these functions and gall generate_subplots directly.

Sure, will remove the functions and directly call generate_subplots in the plot_compilation and plot_simulation functions.

@jordandsullivan correct; replacing the plot below with compiled ratio instead, but keeping the same format of all circuits in the same plot with different bars per compiler. Can also consider adding a second line to the benchmark name axis labels that has the # of original 2-qubit gates pre-compilation

bachase · 2025-10-17T17:43:32Z

plotting/plot_latest_benchmark.py

+                if config.get("use_log_scale", True):
+                    ax.set_yscale("log")
+            else:
+                ax.set_visible(False)


If there are more than 6 benchmark results, does this silently just not show the additional plots? I'd consider either supporting arbitrary number of benchmark results, or at least asserting/erroring its not the hard coded 6 results.

That's a good point. An error could perhaps be raised when the number of benchmarks exceeds the 6 available subplots.

bachase · 2025-10-17T17:50:09Z

plotting/plot_latest_benchmark.py

+                # Extract values for each compiler
+                values = []
+                compiler_names = []
+                for compiler in compilers:


What is the motivation for this loop, versus just working with sub directly? That is, wouldn't sub[config["y_col"]] give you the values array?

Depending on the answer I may have additional feedback on the code below.

I was thinking I would have to iterate the compilers to get the values for each specific compiler, but in hindsight that was overcomplicating. Using sub directly works too.

bachase · 2025-10-17T17:52:56Z

plotting/plot_latest_benchmark.py

            "y_col": "compiled_multiq_gates",
            "title": "Gate Counts",
            "ylabel": "Compiled Gate Count",
+            "use_log_scale": True,


It might be better to have ylabel be "Compiled Multi-Qubit Gate Count" to be clear it doesn't include single qubit gates.

Also noting this is a gap in the existing code already!

bachase · 2025-10-17T17:54:00Z

...ucc-benchmarks-8-core-U22.04/latest_compiler_benchmarks_by_circuit_compiled-multiq-gates.png

This is more a style nit, but the logscale for the prep_select and qv benchmarks has way more hashes labeled. Is there a nice way to make it less "busy"?

jordandsullivan · 2025-10-17T18:15:39Z

@AdiBak let us know if you need any help with the changes @bachase suggested. I know it's your first time contributing, so we want to make sure you have the support you need :)

AdiBak · 2025-10-17T20:30:01Z

Thanks so much for the feedback and support! I appreciate it and I would love some guidance on a few points:

Would you want to have the existing "all-in-one" gate counts plot replaced by an "all-in-one" compiled ratio plot? This could be more informative at once, since ratios are comparable across different circuit sizes.
The prep_select and qv log scale plots are indeed too "busy" with too many tick marks. I wonder how I could make the ticks less crowded according to the scale.
Should I use the style from Make plots more readable #156 for the simulation plot rather than the subplots, for easier view-ability? Or should I keep the subplot approach but make it look more like Make plots more readable #156?

Thanks!

…_subplots and generate_simulation_subplots, and use sub

bachase · 2025-10-18T12:17:27Z

Would you want to have the existing "all-in-one" gate counts plot replaced by an "all-in-one" compiled ratio plot? This could be more informative at once, since ratios are comparable across different circuit sizes.

As per comment above, yes, I think that would be great

The prep_select and qv log scale plots are indeed too "busy" with too many tick marks. I wonder how I could make the ticks less crowded according to the scale.

I don't have a great suggestion to change it, just an observation. Not a blocker from me

Should I use the style from #156 for the simulation plot rather than the subplots, for easier view-ability? Or should I keep the subplot approach but make it look more like #156?

I don't have a strong POV here yet. @jordandsullivan or @natestemen ?

natestemen

I like the changes here a lot! Mostly a question, but did you try fixing the y-axis subplot bounds the same across subplots? Maybe not since the original issue specified otherwise, but it would make it much easier to understand how the scale of these circuits change across benchmarks. It might make other information more difficult to read, however so it might not be worth it.

Just curious as this is the first thing I thought of when looking at the plots.

jordandsullivan · 2025-11-19T21:01:57Z

Hi @AdiBak -- just checking in here, do you think you'd be able to complete this PR in the couple next weeks?

AdiBak added 6 commits October 5, 2025 11:16

Improve plotting: separate subplots per benchmark + add gate ratio me…

78b0983

…tric

Improve plotting: separate subplots per benchmark + add gate ratio me…

4ced7f1

…tric

Merge branch 'unitaryfoundation:main' into main

e1c51b3

Use linear scale for compiled gate ratio plot

298c86d

add resultant plot images

e27638c

remove mini_plot reference script

a3ce8c4

jordandsullivan requested review from bachase, jordandsullivan and natestemen October 15, 2025 23:02

bachase requested a review from Copilot October 16, 2025 13:12

Copilot AI reviewed Oct 16, 2025

View reviewed changes

Refactor: extract common subplot generation logic and use linear scal…

e5ac54a

…e for gate ratio

bachase reviewed Oct 17, 2025

View reviewed changes

fix formatting issues in plotting script

f68b089

Fix: raise errors for excess benchmarks, discard generate_compilation…

da1e16e

…_subplots and generate_simulation_subplots, and use sub

natestemen reviewed Oct 20, 2025

View reviewed changes

jordandsullivan added this to UCC v0.4.12 Development Nov 18, 2025

Improve plotting: separate subplots per benchmark + add gate ratio metric #174

Are you sure you want to change the base?

Improve plotting: separate subplots per benchmark + add gate ratio metric #174

Uh oh!

Conversation

AdiBak commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes Made

Files Changed

Technical Details

Key Improvements

Images

Compilation:

Simulation:

Testing

Issue Addressed

Uh oh!

jordandsullivan commented Oct 15, 2025

Uh oh!

jordandsullivan commented Oct 15, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

AdiBak commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jordandsullivan commented Oct 17, 2025

Uh oh!

bachase left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jordandsullivan commented Oct 17, 2025

Uh oh!

AdiBak commented Oct 17, 2025

Uh oh!

bachase commented Oct 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

natestemen left a comment

Choose a reason for hiding this comment

Uh oh!

jordandsullivan commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

AdiBak commented Oct 15, 2025 •

edited

Loading

AdiBak commented Oct 17, 2025 •

edited

Loading

bachase left a comment •

edited

Loading

bachase commented Oct 18, 2025 •

edited

Loading