-
Notifications
You must be signed in to change notification settings - Fork 9
Improve plotting: separate subplots per benchmark + add gate ratio metric #174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@AdiBak looks like the tests are failing rn due to a ruff formatting/style complaint (basically our CI/CD pipeline requires certain code best practices). If you run |
|
@natestemen @bachase asking your your review here if you have an opinion on the format of the plots, and whether we should add any of them to the landing page. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR improves the plotting system by transitioning from single-axis plots to individual subplots for better visualization of benchmark data. The changes create a cleaner 2x3 grid layout where each benchmark gets its own subplot, and introduces a new gate ratio metric for compilation analysis.
- Replaced single-axis plotting with subplot-based visualization using a fixed 2x3 grid layout
- Added a new compilation metric showing the ratio of compiled gates to raw gates
- Implemented separate file generation for each metric type with configurable linear/log scaling
Comments suppressed due to low confidence (1)
plotting/plot_latest_benchmark.py:1
- [nitpick] The simulation plot configs don't include a
use_log_scaleparameter, but the compilation configs do. Consider adding this parameter for consistency, even if it defaults to True.
import argparse
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
plotting/plot_latest_benchmark.py
Outdated
| def generate_simulation_subplots( | ||
| df: pd.DataFrame, | ||
| plot_configs: list[dict], | ||
| latest_date: str, | ||
| out_path: Path, | ||
| use_pdf: bool = False, | ||
| ): | ||
| """Generate subplots for simulation benchmarks with separate subplot per benchmark.""" | ||
| # Configure matplotlib for LaTeX output if PDF export is requested | ||
| if use_pdf: | ||
| plt.rcParams.update( | ||
| { | ||
| "text.usetex": True, # for matching math & fonts (optional) | ||
| "font.family": "serif", | ||
| } | ||
| ) | ||
|
|
||
| benchmarks = sorted(df["benchmark_id"].unique()) | ||
| compilers = df["compiler"].unique() | ||
| n_benchmarks = len(benchmarks) | ||
| ncols = 3 | ||
| nrows = 2 | ||
|
|
||
| # Create separate figures for each metric (like compilation plots) | ||
| for config in plot_configs: | ||
| fig, axes = plt.subplots(nrows, ncols, figsize=(5 * ncols, 4 * nrows), squeeze=False) | ||
| axes = axes.flatten() | ||
| color_map = get_compiler_colormap() | ||
|
|
||
| for i, ax in enumerate(axes): | ||
| if i < n_benchmarks: | ||
| benchmark = benchmarks[i] | ||
| sub = df[df["benchmark_id"] == benchmark] | ||
|
|
||
| # Extract values for each compiler | ||
| values = [] | ||
| compiler_names = [] | ||
| for compiler in compilers: | ||
| row = sub[sub["compiler"] == compiler] | ||
| if not row.empty: | ||
| values.append(row[config["y_col"]].values[0]) | ||
| compiler_names.append(compiler) | ||
|
|
||
| # Create bars | ||
| x_positions = np.arange(len(compiler_names)) | ||
| bars = ax.bar( | ||
| x_positions, | ||
| values, | ||
| color=[color_map.get(compiler, "#4C72B0") for compiler in compiler_names], | ||
| width=0.5, | ||
| ) | ||
|
|
||
| ax.set_xticks(x_positions) | ||
| ax.set_xticklabels(compiler_names, rotation=30, ha="right") | ||
| ax.set_title(f"Benchmark: {benchmark}") | ||
| ax.set_ylabel(config["ylabel"]) | ||
| else: | ||
| ax.set_visible(False) | ||
|
|
||
| plt.suptitle(f"{config['title']} (Date: {latest_date})", fontsize=16) | ||
| plt.tight_layout(rect=[0, 0, 1, 0.96]) | ||
|
|
||
| # Save with metric-specific filename | ||
| metric_name = config["y_col"].replace("_", "-") | ||
| metric_out_path = out_path.parent / f"{out_path.stem}_{metric_name}{out_path.suffix}" | ||
| print(f"Saving plot to {metric_out_path}") | ||
| fig.savefig(metric_out_path, dpi=300, bbox_inches="tight") | ||
| plt.close(fig) |
Copilot
AI
Oct 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The generate_simulation_subplots function contains nearly identical code to generate_compilation_subplots. Consider extracting the common subplot generation logic into a shared helper function to reduce code duplication.
|
Hi, |
|
Thanks @AdiBak -- it looks like the tests are still failing on the formatting issue -- can you run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this work @AdiBak. I left a few comments/questions.
For overall feedback, for the compiler metrics (gate count/compile-time), I don't find the format more convenient to read than we have now for a quick overview/glance of performance. I think it does show the "Gate Counts" plot isn't so good, since it mixes scales of different circuits. I would consider replacing that with a compiled ratio plot instead, but keeping the same "all in one plot" format.
For the simulation plots, I also like the style in #156 for the compiled vs. uncompiled vs ideal, but not that isn't quite what this PR does. Regardless, I wouldn't include the simulation ones in the README.md until we better calibrate the noise levels.
If we want to switch to this style, I'd lean towards making a BENCHMARK.md or something which links to all the images, adds a little context on how to interpret them, and link to that from the README and the docs.
All that being said, I don't mind auto-generating these in addition to what we have today, but I'm not in favor of just replacing directly.
plotting/plot_latest_benchmark.py
Outdated
| use_pdf: bool = False, | ||
| ): | ||
| """Generate subplots for compilation benchmarks with separate subplot per benchmark.""" | ||
| generate_subplots(df, plot_configs, latest_date, out_path, use_pdf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since generate_compilation_subplots (and generate_compilation_subplots later on) just directly call generate_subplots, I would just eliminate these functions and gall generate_subplots directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bachase could you clarify what you mean by
I think it does show the "Gate Counts" plot isn't so good, since it mixes scales of different circuits. I would consider replacing that with a compiled ratio plot instead, but keeping the same "all in one plot format"
Are you referring to our existing plots when you say "it does show the "Gate Counts" plot isn't so good"?
If I understand correctly you are saying that we could replace the existing all-in-one plot of gate counts, which currently is not terribly informative given it mixes circuits with wildly different numbers of gates, with an all-in-one plot of compiled ratio, yes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since
generate_compilation_subplots(andgenerate_compilation_subplotslater on) just directly callgenerate_subplots, I would just eliminate these functions and gallgenerate_subplotsdirectly.
Sure, will remove the functions and directly call generate_subplots in the plot_compilation and plot_simulation functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jordandsullivan correct; replacing the plot below with compiled ratio instead, but keeping the same format of all circuits in the same plot with different bars per compiler. Can also consider adding a second line to the benchmark name axis labels that has the # of original 2-qubit gates pre-compilation
| if config.get("use_log_scale", True): | ||
| ax.set_yscale("log") | ||
| else: | ||
| ax.set_visible(False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are more than 6 benchmark results, does this silently just not show the additional plots? I'd consider either supporting arbitrary number of benchmark results, or at least asserting/erroring its not the hard coded 6 results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. An error could perhaps be raised when the number of benchmarks exceeds the 6 available subplots.
plotting/plot_latest_benchmark.py
Outdated
| # Extract values for each compiler | ||
| values = [] | ||
| compiler_names = [] | ||
| for compiler in compilers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the motivation for this loop, versus just working with sub directly? That is, wouldn't sub[config["y_col"]] give you the values array?
Depending on the answer I may have additional feedback on the code below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking I would have to iterate the compilers to get the values for each specific compiler, but in hindsight that was overcomplicating. Using sub directly works too.
| "y_col": "compiled_multiq_gates", | ||
| "title": "Gate Counts", | ||
| "ylabel": "Compiled Gate Count", | ||
| "use_log_scale": True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to have ylabel be "Compiled Multi-Qubit Gate Count" to be clear it doesn't include single qubit gates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also noting this is a gap in the existing code already!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more a style nit, but the logscale for the prep_select and qv benchmarks has way more hashes labeled. Is there a nice way to make it less "busy"?
|
Thanks so much for the feedback and support! I appreciate it and I would love some guidance on a few points:
Thanks! |
…_subplots and generate_simulation_subplots, and use sub
As per comment above, yes, I think that would be great
I don't have a great suggestion to change it, just an observation. Not a blocker from me
I don't have a strong POV here yet. @jordandsullivan or @natestemen ? |
natestemen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the changes here a lot! Mostly a question, but did you try fixing the y-axis subplot bounds the same across subplots? Maybe not since the original issue specified otherwise, but it would make it much easier to understand how the scale of these circuits change across benchmarks. It might make other information more difficult to read, however so it might not be worth it.
Just curious as this is the first thing I thought of when looking at the plots.
|
Hi @AdiBak -- just checking in here, do you think you'd be able to complete this PR in the couple next weeks? |
Summary
Improves the plotting system by separating benchmarks into individual subplots for better readability, as requested in #156.
Changes Made
compiled_gates / raw_gatesFiles Changed
plotting/plot_latest_benchmark.py- Updated plotting functions to use subplot layout and add gate ratio metricTechnical Details
generate_plot()withgenerate_compilation_subplots()andgenerate_simulation_subplots()use_log_scaleconfiguration option to plot configscompiled_multiq_gates / raw_multiq_gatesKey Improvements
Images
Compilation:
Simulation:
Testing
Tested with latest benchmark data from
ucc-benchmarks-8-core-U22.04runner.Issue Addressed
#156.
Thanks so much to @jordandsullivan for the guidance and feedback!