rfc: collect and provide structured benchmark programs #32

DRovara · 2025-11-02T11:57:40Z

This PR proposes RFC 0032, which advocates for the addition of a benchmark suite for structured quantum programs.

Problem

There is no standard benchmark suite for evaluating compiler support for structured control flow (e.g., if, for, dynamic indexing).

Solution

Create a set of benchmarks written in Jeff to fill this gap. This will help drive compiler development, allow for better tool evaluation, and highlight Jeff's capabilities in representing these advanced programs.

This RFC outlines the initial set of benchmarks and invites community feedback.

Rendered RFC

burgholzer

Hey @DRovara 👋🏼

Thanks for kicking this off! Great to see the first RFC ;-)
I went through the document top-to-bottom once and accumulated some comments. Most of them are pretty minor and should be fairly easy to address.

I have one more general comment or request for changes:
Would it make sense to define a list of "defining features" for a structured program and then create a table for the benchmarks that highlights which features a particular program uses?
Something like:

Loops with compile-time bounds
Loops with runtime bounds (true while loops)
Dynamic qubit indexing (within loops)
Conditional quantum instructions (e.g., depending on mid-circuit measurement results)
Qubit reuse (e.g., via reset instructions)
Dynamic qubit allocation

This might make it a little easier to handle the fact that it's hard to put some of the algorithms into a single category.
The table could also include a column with a short description and a column for references to each algorithm as well as, potentially, a checkbox that could be ticked once the algorithm is implemented.
This could create a more structured (pun intended) way of organizing the different benchmarks than the current section-based layout.

rfcs/text/0032-structured-benchmark-programs.md

josh146

Nice work @DRovara 💪

rfcs/text/0032-structured-benchmark-programs.md

josh146 · 2025-11-03T16:20:41Z

rfcs/text/0032-structured-benchmark-programs.md

+
+*Quantum Error Correction (QEC) is one of the most important applications of structured control flow in quantum computing. These benchmark programs implement QEC protocols that involve structured operations at any point in the program.*
+
+- *Magic State Distillation*: Magic state distillation protocols utilize loops and conditionals to iteratively improve the fidelity of magic states, which are essential for fault-tolerant quantum computing. ([Bravyi & Kitaev, 2005](https://arxiv.org/abs/quant-ph/0403025))


Do we want to also consider magic state sythillation? Cultivation?

rfcs/text/0032-structured-benchmark-programs.md

…cription

rfcs/text/0032-structured-benchmark-programs.md

glassnotes · 2025-11-03T20:16:52Z

Thanks for compiling this @DRovara ! (I don't know if I have approval privileges, but it looks great, my comments are mostly just discussion points.)

mark-koch

Nice work @DRovara ! Just leaving a few minor suggestions and possible discussion points

rfcs/text/0032-structured-benchmark-programs.md

mark-koch · 2025-11-04T14:23:03Z

rfcs/text/0032-structured-benchmark-programs.md

+- Maintenance Overhead: The addition of a new benchmark suite requires ongoing maintenance to ensure that the benchmarks remain relevant and up-to-date with the latest advancements in quantum computing and compiler technologies.
+- Complexity: Introducing structured benchmarks may increase the complexity of the Jeff repository, potentially making it more challenging for new users to navigate and understand the available resources.
+- Limited Adoption: If the benchmarks are not widely adopted by the quantum computing community, their impact may be limited, reducing the incentive for compiler developers to implement support for structured control flow.
+- Comparisons are not simple: For users, it might not be stratighforward to know what to compare against when compiling these structured programs. For a full, fair evaluation, a meaningful baseline needs to be established and a more precise methodology for comparison is likely necessary.


I think this is quite important. It would be good if the benchmark challenge came with a way of measuring the performance of programs (e.g. T count, two-qubit gate count etc). In the presence of control-flow, this would probably require actually running the Jeff program:

For programs without branching on mid-circuit measurement outcomes, we could just collect the applied gates in an execution trace - no need to do any actual quantum simulation since everything is deterministic. Still have to figure out how to actually execute jeff though

For mid-circuit measurements, we could either:

Run an actual quantum simulator and sample outcomes, averaging performance metrics over many shots. This doesn't scale to large programs.

Use weighted random coin flips to classically pick outcomes. This might be unfair for optimisations that rely on realistic measurement distributions.

Provide the measurement outcomes that should be tested against as part of the benchmark suite. This breaks once people develop optimisations that introduce/delete/reorder measurements.

But maybe this is complicated enough to be discussed in a different RFC?

Yeah, I think quantitative metrics + evaluation deserves a separate discussion.

mark-koch · 2025-11-04T14:29:05Z

rfcs/text/0032-structured-benchmark-programs.md

+- *VQE Ansatz with Fixed Repetitions*: A variational ansatz circuit that applies a set of parameterized gates in a loop with a predetermined number of repetitions. ([Peruzzo et al., 2014](https://arxiv.org/abs/1304.3061))
+- *QAOA with Fixed Repetitions*: A Quantum Approximate Optimization Algorithm circuit that applies problem and mixer Hamiltonians in a loop with a fixed number of layers. ([Farhi et al., 2014](https://arxiv.org/abs/1411.4028))
+
+## Static Loops with Dynamic Qubit Indexing


A different approach to generating these kinds of programs is taking flat QASM2 benchmarking circuits and trying to recover loop structure in them. In particular, we could have a look at this paper where they try to find polyhedral iteration domains to delinearise flat programs

rfcs/text/0032-structured-benchmark-programs.md

glassnotes · 2025-11-04T18:34:58Z

rfcs/text/0032-structured-benchmark-programs.md

+- Maintenance Overhead: The addition of a new benchmark suite requires ongoing maintenance to ensure that the benchmarks remain relevant and up-to-date with the latest advancements in quantum computing and compiler technologies.
+- Complexity: Introducing structured benchmarks may increase the complexity of the Jeff repository, potentially making it more challenging for new users to navigate and understand the available resources.
+- Limited Adoption: If the benchmarks are not widely adopted by the quantum computing community, their impact may be limited, reducing the incentive for compiler developers to implement support for structured control flow.
+- Comparisons are not simple: For users, it might not be stratighforward to know what to compare against when compiling these structured programs. For a full, fair evaluation, a meaningful baseline needs to be established and a more precise methodology for comparison is likely necessary.


Yeah, I think quantitative metrics + evaluation deserves a separate discussion.

rfcs/text/0032-structured-benchmark-programs.md

bachase

Great RFC and also great feedback from others.

One high-level consideration, which might prompt a change to the RFC format, would be to add example use cases or user flows in the guide-level section. For example, what is the process for contributing a benchmark? How would I use an existing benchmark program to assess my compilers performance? How would I participate in the "challenge"?

I'm not blocking on that suggestion, as getting these initial benchmarks seems mostly clear. But it might help disentangle questions around how these benchmarks will be used and what is out of scope for this contribution.

rfcs/text/0032-structured-benchmark-programs.md

bachase · 2025-11-04T21:10:21Z

rfcs/text/0032-structured-benchmark-programs.md

+
+There are several potential drawbacks to consider with this proposal:
+
+- Maintenance Overhead: The addition of a new benchmark suite requires ongoing maintenance to ensure that the benchmarks remain relevant and up-to-date with the latest advancements in quantum computing and compiler technologies.


We can also spin up a separate jeff-bench repo or similar, at least to avoid any challenges managing dependencies for the core jeff code from the benchmark generation code.

I think we should be good here. We can use inline script metadata for the Python scripts that generate benchmarks. uv has great support for these.
See https://docs.astral.sh/uv/guides/scripts/#declaring-script-dependencies for some good documentation on that.

dime10

Thanks Damian, this is a great first RFC!

dime10 · 2025-11-04T22:48:07Z

rfcs/text/0032-structured-benchmark-programs.md

+| conditionals on originally classical values   | Conditional blocks are used where the condition depends on values that were *not* measurement results. |
+| conditionals on measurement results           | Conditional blocks are used where the condition depends on values that depend on measurement results. |


I'm curious, for the conditional we separate out classical dynamic values from "quantum dynamic values" (i.e. measurement results), but we didn't do this for the loops (I think both are contained in the dynamically-bounded loops).

Yes, that could also be done, but I wanted to not add too many different categories.

At the end of the day, (I believe) the main reason why dynamically bounded loops are so difficult for compilers is because they cannot be "unrolled" at compile time. In that case, it does not really make it much more difficult if these values were measurement results or not.

As conditionals are a bit simpler than loops, on the other hand, adding this distinction there might be more helpful.

But that's not a strong opinion on my end, if we think this new category should be added, I am also fine with it.

Yeah I'm not necessarily advocating for splitting the loop category as well, but I'm also not sure why the distinction exists with conditionals. In my mind, a value is either available in advance (static) or computed at runtime (dynamic), whether from a measurement or not.
I guess the idea might be to distinguish dynamic values that are impossible to convert to static ones from "pseudo" dynamic ones, for example program arguments (and any value deterministically depending on them) can just be supplied and compilation specialized to this instantiation of arguments, so maybe they are not "true" dynamic values, versus measurement results which are impossible to know in advance. But you could ascribe this property to classical non-deterministic values as well.
At the end-of-end the day though, if a compiler doesn't receive those concrete instantiations for example, they will have the same challenge in compilation whether it's "true" dynamicism or not 🤔

This is related to some of my comments. From an algorithmic perspective I see an important distinction between the two. Is that distinction less important at the IR or compiler level? I was thinking that the results coming from the quantum device might be relevant for benchmarking metrics (or even have implications for program optimization?).

This is related to some of my comments. From an algorithmic perspective I see an important distinction between the two.

Could you elaborate on the importance of this distinction from an algorithmic perspective?

Is that distinction less important at the IR or compiler level?

Well I think from a practical "intermediate" compiler perspective the only thing that matters is whether you have access to these values or not. For instance, if you know the value of a conditional predicate, there is no reason not to flatten it. For loop bounds, you would have the choice to flatten, or not, or be able to make some other deductions (e.g. resource counts) if you know those values. But if you don't know the value of the conditional predicate, you have to preserve it considering both branches. Does it matter here whether the predicate is of quantum or classical origin? I'm not so sure.

Having said that, if you are compiling for a concrete architecture there may be a big difference. A quantum-origin conditional has to be compiled to some instructions on the control system, whereas a classical-origin one might be compiled to a side processor.

Could you elaborate on the importance of this distinction from an algorithmic perspective?

It's admittedly hand-wavy. The value is (1) the result of a fundamentally different kind of computational process that runs on a separate device, and (2) there is nondeterminism beyond regular classical randomness, in that there could be errors due to noise, additional readout mitigation required, etc. that affect both the value of the variable and the subsequent branch it takes.

In any case, my argument is getting a bit more philosophical, and I don't want this to block the PR 😅

rfcs/text/0032-structured-benchmark-programs.md

…ounded loops

rfcs/text/0032-structured-benchmark-programs.md

…ation"

burgholzer

Great work @DRovara 👏🏼
Just spotted a typo and a missing dot. Otherwise this is spot on 🎯

rfcs/text/0032-structured-benchmark-programs.md

burgholzer · 2025-11-13T20:37:55Z

rfcs/text/0032-structured-benchmark-programs.md

+
+There are several potential drawbacks to consider with this proposal:
+
+- Maintenance Overhead: The addition of a new benchmark suite requires ongoing maintenance to ensure that the benchmarks remain relevant and up-to-date with the latest advancements in quantum computing and compiler technologies.


I think we should be good here. We can use inline script metadata for the Python scripts that generate benchmarks. uv has great support for these.
See https://docs.astral.sh/uv/guides/scripts/#declaring-script-dependencies for some good documentation on that.

rfcs/text/0032-structured-benchmark-programs.md

Co-authored-by: Lukas Burgholzer <[email protected]>

DRovara added 2 commits November 2, 2025 12:49

rfc: 📝 introduce RFC on structured benchmark programs

fbaf45b

docs: 📝 update RFC id to 0032

0111dac

burgholzer requested changes Nov 3, 2025

View reviewed changes

DRovara added 2 commits November 3, 2025 10:48

docs: 📝 add NOTE environment for NOTE in rfc readme

6406a2c

docs: 📝 add new algorithm classes

c7ada5c

josh146 approved these changes Nov 3, 2025

View reviewed changes

docs: 📝 add overview table with program types and features to rfc des…

b06732b

…cription

bachase self-requested a review November 3, 2025 18:32

glassnotes reviewed Nov 3, 2025

View reviewed changes

DRovara added 4 commits November 4, 2025 11:53

docs: 📝 fix note syntax in RFC document

2d398ee

docs: 📝 incorporate some of the feedback on the RFC

dc46b6a

docs: 📝 incorporate further RFC review comments

026a7f1

docs: 📝 add description of "?" symbol to legend

ec73c59

mark-koch reviewed Nov 4, 2025

View reviewed changes

glassnotes reviewed Nov 4, 2025

View reviewed changes

bachase approved these changes Nov 4, 2025

View reviewed changes

dime10 reviewed Nov 5, 2025

View reviewed changes

DRovara added 3 commits November 6, 2025 16:20

docs: 📝 mark shor as arbitrary-size and and potentially dynamically-b…

fcbd5d6

…ounded loops

docs: 📝 further updates to RFC based on discussion

c695843

docs: 📝 potentially clear up terminology for static/dynamic loops

3ebde9e

glassnotes reviewed Nov 10, 2025

View reviewed changes

DRovara added 3 commits November 13, 2025 13:44

docs: 📝 add information from recent discussion to "Guide-level explan…

0756fca

…ation"

docs: 📝 add information on format features to "Guide-level explanation"

87f3d14

docs: 📝 add newly suggested benchmark programs to the table

c554a31

DRovara requested a review from burgholzer November 13, 2025 14:10

burgholzer approved these changes Nov 13, 2025

View reviewed changes

DRovara and others added 2 commits November 13, 2025 22:08

Update rfcs/text/0032-structured-benchmark-programs.md

35dbca9

Co-authored-by: Lukas Burgholzer <[email protected]>

Update rfcs/text/0032-structured-benchmark-programs.md

c066d5f

Co-authored-by: Lukas Burgholzer <[email protected]>


		Quantum Error Correction (QEC) is one of the most important applications of structured control flow in quantum computing. These benchmark programs implement QEC protocols that involve structured operations at any point in the program.

		- Magic State Distillation: Magic state distillation protocols utilize loops and conditionals to iteratively improve the fidelity of magic states, which are essential for fault-tolerant quantum computing. ([Bravyi & Kitaev, 2005](https://arxiv.org/abs/quant-ph/0403025))


		There are several potential drawbacks to consider with this proposal:

		- Maintenance Overhead: The addition of a new benchmark suite requires ongoing maintenance to ensure that the benchmarks remain relevant and up-to-date with the latest advancements in quantum computing and compiler technologies.

		\| conditionals on originally classical values \| Conditional blocks are used where the condition depends on values that were not measurement results. \|
		\| conditionals on measurement results \| Conditional blocks are used where the condition depends on values that depend on measurement results. \|

rfc: collect and provide structured benchmark programs #32

Are you sure you want to change the base?

rfc: collect and provide structured benchmark programs #32

Conversation

DRovara commented Nov 2, 2025 • edited by aborgna-q Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Uh oh!

burgholzer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

josh146 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glassnotes commented Nov 3, 2025

Uh oh!

mark-koch left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

bachase left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dime10 left a comment

Choose a reason for hiding this comment

Uh oh!

dime10 Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

DRovara commented Nov 2, 2025 •

edited by aborgna-q

Loading

dime10 Nov 4, 2025 •

edited

Loading

dime10 Nov 6, 2025 •

edited

Loading