Skip to content

Conversation

@AdiBak
Copy link

@AdiBak AdiBak commented Oct 15, 2025

Summary

Improves the plotting system by separating benchmarks into individual subplots for better readability, as requested in #156.

Changes Made

  • Changed layout from single-axis plots to clean 2x3 grid with individual subplots per benchmark
  • New compilation metric showing ratio compiled_gates / raw_gates
  • Generates individual files for compile-time, gate-count, gate-ratio, rel-err-ideal, rel-err-noisy
  • Added option to use linear vs log scale (gate ratio uses linear, others use log)

Files Changed

  • plotting/plot_latest_benchmark.py - Updated plotting functions to use subplot layout and add gate ratio metric

Technical Details

  • Replaced generate_plot() with generate_compilation_subplots() and generate_simulation_subplots()
  • Changed from dynamic row calculation to fixed 2x3 grid layout
  • Added use_log_scale configuration option to plot configs
  • Added compiled gate ratio calculation: compiled_multiq_gates / raw_multiq_gates

Key Improvements

  • Each benchmark has its own subplot
  • 2x3 layout for easy comparison
  • Linear scale for gate ratio (values typically 0-1.5)
  • Separate files per metric for organization

Images

Compilation:

latest_compiler_benchmarks_by_circuit_compiled-ratio latest_compiler_benchmarks_by_circuit_compiled-multiq-gates latest_compiler_benchmarks_by_circuit_compile-time

Simulation:

latest_simulation_benchmarks_by_circuit_rel-err-noisy latest_simulation_benchmarks_by_circuit_rel-err-ideal

Testing

Tested with latest benchmark data from ucc-benchmarks-8-core-U22.04 runner.

Issue Addressed

#156.


Thanks so much to @jordandsullivan for the guidance and feedback!

@jordandsullivan
Copy link
Contributor

@AdiBak looks like the tests are failing rn due to a ruff formatting/style complaint (basically our CI/CD pipeline requires certain code best practices). If you run ruff check --fix then commit those changes, you should be good to go :)

@jordandsullivan
Copy link
Contributor

@natestemen @bachase asking your your review here if you have an opinion on the format of the plots, and whether we should add any of them to the landing page.

@bachase bachase requested a review from Copilot October 16, 2025 13:12
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves the plotting system by transitioning from single-axis plots to individual subplots for better visualization of benchmark data. The changes create a cleaner 2x3 grid layout where each benchmark gets its own subplot, and introduces a new gate ratio metric for compilation analysis.

  • Replaced single-axis plotting with subplot-based visualization using a fixed 2x3 grid layout
  • Added a new compilation metric showing the ratio of compiled gates to raw gates
  • Implemented separate file generation for each metric type with configurable linear/log scaling
Comments suppressed due to low confidence (1)

plotting/plot_latest_benchmark.py:1

  • [nitpick] The simulation plot configs don't include a use_log_scale parameter, but the compilation configs do. Consider adding this parameter for consistency, even if it defaults to True.
import argparse

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines 117 to 184
def generate_simulation_subplots(
df: pd.DataFrame,
plot_configs: list[dict],
latest_date: str,
out_path: Path,
use_pdf: bool = False,
):
"""Generate subplots for simulation benchmarks with separate subplot per benchmark."""
# Configure matplotlib for LaTeX output if PDF export is requested
if use_pdf:
plt.rcParams.update(
{
"text.usetex": True, # for matching math & fonts (optional)
"font.family": "serif",
}
)

benchmarks = sorted(df["benchmark_id"].unique())
compilers = df["compiler"].unique()
n_benchmarks = len(benchmarks)
ncols = 3
nrows = 2

# Create separate figures for each metric (like compilation plots)
for config in plot_configs:
fig, axes = plt.subplots(nrows, ncols, figsize=(5 * ncols, 4 * nrows), squeeze=False)
axes = axes.flatten()
color_map = get_compiler_colormap()

for i, ax in enumerate(axes):
if i < n_benchmarks:
benchmark = benchmarks[i]
sub = df[df["benchmark_id"] == benchmark]

# Extract values for each compiler
values = []
compiler_names = []
for compiler in compilers:
row = sub[sub["compiler"] == compiler]
if not row.empty:
values.append(row[config["y_col"]].values[0])
compiler_names.append(compiler)

# Create bars
x_positions = np.arange(len(compiler_names))
bars = ax.bar(
x_positions,
values,
color=[color_map.get(compiler, "#4C72B0") for compiler in compiler_names],
width=0.5,
)

ax.set_xticks(x_positions)
ax.set_xticklabels(compiler_names, rotation=30, ha="right")
ax.set_title(f"Benchmark: {benchmark}")
ax.set_ylabel(config["ylabel"])
else:
ax.set_visible(False)

plt.suptitle(f"{config['title']} (Date: {latest_date})", fontsize=16)
plt.tight_layout(rect=[0, 0, 1, 0.96])

# Save with metric-specific filename
metric_name = config["y_col"].replace("_", "-")
metric_out_path = out_path.parent / f"{out_path.stem}_{metric_name}{out_path.suffix}"
print(f"Saving plot to {metric_out_path}")
fig.savefig(metric_out_path, dpi=300, bbox_inches="tight")
plt.close(fig)
Copy link

Copilot AI Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generate_simulation_subplots function contains nearly identical code to generate_compilation_subplots. Consider extracting the common subplot generation logic into a shared helper function to reduce code duplication.

Copilot uses AI. Check for mistakes.
@AdiBak
Copy link
Author

AdiBak commented Oct 17, 2025

Hi,
Thanks for the feedback! I refactored the code to have a helper plotting function generate_subplots that is now used for compilation and simulation subplot generation, and removed the unused bars variable.

@jordandsullivan
Copy link
Contributor

Thanks @AdiBak -- it looks like the tests are still failing on the formatting issue -- can you run ruff check --fix locally?

Copy link
Collaborator

@bachase bachase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this work @AdiBak. I left a few comments/questions.

For overall feedback, for the compiler metrics (gate count/compile-time), I don't find the format more convenient to read than we have now for a quick overview/glance of performance. I think it does show the "Gate Counts" plot isn't so good, since it mixes scales of different circuits. I would consider replacing that with a compiled ratio plot instead, but keeping the same "all in one plot" format.

For the simulation plots, I also like the style in #156 for the compiled vs. uncompiled vs ideal, but not that isn't quite what this PR does. Regardless, I wouldn't include the simulation ones in the README.md until we better calibrate the noise levels.

If we want to switch to this style, I'd lean towards making a BENCHMARK.md or something which links to all the images, adds a little context on how to interpret them, and link to that from the README and the docs.

All that being said, I don't mind auto-generating these in addition to what we have today, but I'm not in favor of just replacing directly.

use_pdf: bool = False,
):
"""Generate subplots for compilation benchmarks with separate subplot per benchmark."""
generate_subplots(df, plot_configs, latest_date, out_path, use_pdf)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since generate_compilation_subplots (and generate_compilation_subplots later on) just directly call generate_subplots, I would just eliminate these functions and gall generate_subplots directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bachase could you clarify what you mean by

I think it does show the "Gate Counts" plot isn't so good, since it mixes scales of different circuits. I would consider replacing that with a compiled ratio plot instead, but keeping the same "all in one plot format"

Are you referring to our existing plots when you say "it does show the "Gate Counts" plot isn't so good"?

If I understand correctly you are saying that we could replace the existing all-in-one plot of gate counts, which currently is not terribly informative given it mixes circuits with wildly different numbers of gates, with an all-in-one plot of compiled ratio, yes?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since generate_compilation_subplots (and generate_compilation_subplots later on) just directly call generate_subplots, I would just eliminate these functions and gall generate_subplots directly.

Sure, will remove the functions and directly call generate_subplots in the plot_compilation and plot_simulation functions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jordandsullivan correct; replacing the plot below with compiled ratio instead, but keeping the same format of all circuits in the same plot with different bars per compiler. Can also consider adding a second line to the benchmark name axis labels that has the # of original 2-qubit gates pre-compilation

image

if config.get("use_log_scale", True):
ax.set_yscale("log")
else:
ax.set_visible(False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are more than 6 benchmark results, does this silently just not show the additional plots? I'd consider either supporting arbitrary number of benchmark results, or at least asserting/erroring its not the hard coded 6 results.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. An error could perhaps be raised when the number of benchmarks exceeds the 6 available subplots.

# Extract values for each compiler
values = []
compiler_names = []
for compiler in compilers:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the motivation for this loop, versus just working with sub directly? That is, wouldn't sub[config["y_col"]] give you the values array?

Depending on the answer I may have additional feedback on the code below.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking I would have to iterate the compilers to get the values for each specific compiler, but in hindsight that was overcomplicating. Using sub directly works too.

"y_col": "compiled_multiq_gates",
"title": "Gate Counts",
"ylabel": "Compiled Gate Count",
"use_log_scale": True,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to have ylabel be "Compiled Multi-Qubit Gate Count" to be clear it doesn't include single qubit gates.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also noting this is a gap in the existing code already!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more a style nit, but the logscale for the prep_select and qv benchmarks has way more hashes labeled. Is there a nice way to make it less "busy"?

@jordandsullivan
Copy link
Contributor

@AdiBak let us know if you need any help with the changes @bachase suggested. I know it's your first time contributing, so we want to make sure you have the support you need :)

@AdiBak
Copy link
Author

AdiBak commented Oct 17, 2025

Thanks so much for the feedback and support! I appreciate it and I would love some guidance on a few points:

  • Would you want to have the existing "all-in-one" gate counts plot replaced by an "all-in-one" compiled ratio plot? This could be more informative at once, since ratios are comparable across different circuit sizes.
  • The prep_select and qv log scale plots are indeed too "busy" with too many tick marks. I wonder how I could make the ticks less crowded according to the scale.
  • Should I use the style from Make plots more readable #156 for the simulation plot rather than the subplots, for easier view-ability? Or should I keep the subplot approach but make it look more like Make plots more readable #156?

Thanks!

…_subplots and generate_simulation_subplots, and use sub
@bachase
Copy link
Collaborator

bachase commented Oct 18, 2025

Would you want to have the existing "all-in-one" gate counts plot replaced by an "all-in-one" compiled ratio plot? This could be more informative at once, since ratios are comparable across different circuit sizes.

As per comment above, yes, I think that would be great

The prep_select and qv log scale plots are indeed too "busy" with too many tick marks. I wonder how I could make the ticks less crowded according to the scale.

I don't have a great suggestion to change it, just an observation. Not a blocker from me

Should I use the style from #156 for the simulation plot rather than the subplots, for easier view-ability? Or should I keep the subplot approach but make it look more like #156?

I don't have a strong POV here yet. @jordandsullivan or @natestemen ?

Copy link
Member

@natestemen natestemen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the changes here a lot! Mostly a question, but did you try fixing the y-axis subplot bounds the same across subplots? Maybe not since the original issue specified otherwise, but it would make it much easier to understand how the scale of these circuits change across benchmarks. It might make other information more difficult to read, however so it might not be worth it.

Just curious as this is the first thing I thought of when looking at the plots.

@jordandsullivan
Copy link
Contributor

Hi @AdiBak -- just checking in here, do you think you'd be able to complete this PR in the couple next weeks?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

4 participants