Skip to content

Conversation

@DRovara
Copy link
Collaborator

@DRovara DRovara commented Nov 2, 2025

This PR proposes RFC 0032, which advocates for the addition of a benchmark suite for structured quantum programs.

Problem

There is no standard benchmark suite for evaluating compiler support for structured control flow (e.g., if, for, dynamic indexing).

Solution

Create a set of benchmarks written in Jeff to fill this gap. This will help drive compiler development, allow for better tool evaluation, and highlight Jeff's capabilities in representing these advanced programs.

This RFC outlines the initial set of benchmarks and invites community feedback.

Rendered RFC

Copy link
Collaborator

@burgholzer burgholzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @DRovara 👋🏼

Thanks for kicking this off! Great to see the first RFC ;-)
I went through the document top-to-bottom once and accumulated some comments. Most of them are pretty minor and should be fairly easy to address.

I have one more general comment or request for changes:
Would it make sense to define a list of "defining features" for a structured program and then create a table for the benchmarks that highlights which features a particular program uses?
Something like:

  • Loops with compile-time bounds
  • Loops with runtime bounds (true while loops)
  • Dynamic qubit indexing (within loops)
  • Conditional quantum instructions (e.g., depending on mid-circuit measurement results)
  • Qubit reuse (e.g., via reset instructions)
  • Dynamic qubit allocation

This might make it a little easier to handle the fact that it's hard to put some of the algorithms into a single category.
The table could also include a column with a short description and a column for references to each algorithm as well as, potentially, a checkbox that could be ticked once the algorithm is implemented.
This could create a more structured (pun intended) way of organizing the different benchmarks than the current section-based layout.

Copy link
Collaborator

@josh146 josh146 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @DRovara 💪


*Quantum Error Correction (QEC) is one of the most important applications of structured control flow in quantum computing. These benchmark programs implement QEC protocols that involve structured operations at any point in the program.*

- *Magic State Distillation*: Magic state distillation protocols utilize loops and conditionals to iteratively improve the fidelity of magic states, which are essential for fault-tolerant quantum computing. ([Bravyi & Kitaev, 2005](https://arxiv.org/abs/quant-ph/0403025))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to also consider magic state sythillation? Cultivation?

@bachase bachase self-requested a review November 3, 2025 18:32
@glassnotes
Copy link
Collaborator

Thanks for compiling this @DRovara ! (I don't know if I have approval privileges, but it looks great, my comments are mostly just discussion points.)

Copy link
Collaborator

@mark-koch mark-koch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @DRovara ! Just leaving a few minor suggestions and possible discussion points

- Maintenance Overhead: The addition of a new benchmark suite requires ongoing maintenance to ensure that the benchmarks remain relevant and up-to-date with the latest advancements in quantum computing and compiler technologies.
- Complexity: Introducing structured benchmarks may increase the complexity of the Jeff repository, potentially making it more challenging for new users to navigate and understand the available resources.
- Limited Adoption: If the benchmarks are not widely adopted by the quantum computing community, their impact may be limited, reducing the incentive for compiler developers to implement support for structured control flow.
- Comparisons are not simple: For users, it might not be stratighforward to know what to compare against when compiling these structured programs. For a full, fair evaluation, a meaningful baseline needs to be established and a more precise methodology for comparison is likely necessary.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is quite important. It would be good if the benchmark challenge came with a way of measuring the performance of programs (e.g. T count, two-qubit gate count etc). In the presence of control-flow, this would probably require actually running the Jeff program:

  • For programs without branching on mid-circuit measurement outcomes, we could just collect the applied gates in an execution trace - no need to do any actual quantum simulation since everything is deterministic. Still have to figure out how to actually execute jeff though
  • For mid-circuit measurements, we could either:
    • Run an actual quantum simulator and sample outcomes, averaging performance metrics over many shots. This doesn't scale to large programs.
    • Use weighted random coin flips to classically pick outcomes. This might be unfair for optimisations that rely on realistic measurement distributions.
    • Provide the measurement outcomes that should be tested against as part of the benchmark suite. This breaks once people develop optimisations that introduce/delete/reorder measurements.

But maybe this is complicated enough to be discussed in a different RFC?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think quantitative metrics + evaluation deserves a separate discussion.

- *VQE Ansatz with Fixed Repetitions*: A variational ansatz circuit that applies a set of parameterized gates in a loop with a predetermined number of repetitions. ([Peruzzo et al., 2014](https://arxiv.org/abs/1304.3061))
- *QAOA with Fixed Repetitions*: A Quantum Approximate Optimization Algorithm circuit that applies problem and mixer Hamiltonians in a loop with a fixed number of layers. ([Farhi et al., 2014](https://arxiv.org/abs/1411.4028))

## Static Loops with Dynamic Qubit Indexing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A different approach to generating these kinds of programs is taking flat QASM2 benchmarking circuits and trying to recover loop structure in them. In particular, we could have a look at this paper where they try to find polyhedral iteration domains to delinearise flat programs

- Maintenance Overhead: The addition of a new benchmark suite requires ongoing maintenance to ensure that the benchmarks remain relevant and up-to-date with the latest advancements in quantum computing and compiler technologies.
- Complexity: Introducing structured benchmarks may increase the complexity of the Jeff repository, potentially making it more challenging for new users to navigate and understand the available resources.
- Limited Adoption: If the benchmarks are not widely adopted by the quantum computing community, their impact may be limited, reducing the incentive for compiler developers to implement support for structured control flow.
- Comparisons are not simple: For users, it might not be stratighforward to know what to compare against when compiling these structured programs. For a full, fair evaluation, a meaningful baseline needs to be established and a more precise methodology for comparison is likely necessary.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think quantitative metrics + evaluation deserves a separate discussion.

Copy link
Collaborator

@bachase bachase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great RFC and also great feedback from others.

One high-level consideration, which might prompt a change to the RFC format, would be to add example use cases or user flows in the guide-level section. For example, what is the process for contributing a benchmark? How would I use an existing benchmark program to assess my compilers performance? How would I participate in the "challenge"?

I'm not blocking on that suggestion, as getting these initial benchmarks seems mostly clear. But it might help disentangle questions around how these benchmarks will be used and what is out of scope for this contribution.


There are several potential drawbacks to consider with this proposal:

- Maintenance Overhead: The addition of a new benchmark suite requires ongoing maintenance to ensure that the benchmarks remain relevant and up-to-date with the latest advancements in quantum computing and compiler technologies.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also spin up a separate jeff-bench repo or similar, at least to avoid any challenges managing dependencies for the core jeff code from the benchmark generation code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be good here. We can use inline script metadata for the Python scripts that generate benchmarks. uv has great support for these.
See https://docs.astral.sh/uv/guides/scripts/#declaring-script-dependencies for some good documentation on that.

Copy link
Collaborator

@dime10 dime10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Damian, this is a great first RFC!

Comment on lines +98 to +99
| conditionals on originally classical values | Conditional blocks are used where the condition depends on values that were *not* measurement results. |
| conditionals on measurement results | Conditional blocks are used where the condition depends on values that depend on measurement results. |
Copy link
Collaborator

@dime10 dime10 Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious, for the conditional we separate out classical dynamic values from "quantum dynamic values" (i.e. measurement results), but we didn't do this for the loops (I think both are contained in the dynamically-bounded loops).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that could also be done, but I wanted to not add too many different categories.

At the end of the day, (I believe) the main reason why dynamically bounded loops are so difficult for compilers is because they cannot be "unrolled" at compile time. In that case, it does not really make it much more difficult if these values were measurement results or not.

As conditionals are a bit simpler than loops, on the other hand, adding this distinction there might be more helpful.

But that's not a strong opinion on my end, if we think this new category should be added, I am also fine with it.

Copy link
Collaborator

@dime10 dime10 Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm not necessarily advocating for splitting the loop category as well, but I'm also not sure why the distinction exists with conditionals. In my mind, a value is either available in advance (static) or computed at runtime (dynamic), whether from a measurement or not.
I guess the idea might be to distinguish dynamic values that are impossible to convert to static ones from "pseudo" dynamic ones, for example program arguments (and any value deterministically depending on them) can just be supplied and compilation specialized to this instantiation of arguments, so maybe they are not "true" dynamic values, versus measurement results which are impossible to know in advance. But you could ascribe this property to classical non-deterministic values as well.
At the end-of-end the day though, if a compiler doesn't receive those concrete instantiations for example, they will have the same challenge in compilation whether it's "true" dynamicism or not 🤔

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is related to some of my comments. From an algorithmic perspective I see an important distinction between the two. Is that distinction less important at the IR or compiler level? I was thinking that the results coming from the quantum device might be relevant for benchmarking metrics (or even have implications for program optimization?).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is related to some of my comments. From an algorithmic perspective I see an important distinction between the two.

Could you elaborate on the importance of this distinction from an algorithmic perspective?

Is that distinction less important at the IR or compiler level?

Well I think from a practical "intermediate" compiler perspective the only thing that matters is whether you have access to these values or not. For instance, if you know the value of a conditional predicate, there is no reason not to flatten it. For loop bounds, you would have the choice to flatten, or not, or be able to make some other deductions (e.g. resource counts) if you know those values. But if you don't know the value of the conditional predicate, you have to preserve it considering both branches. Does it matter here whether the predicate is of quantum or classical origin? I'm not so sure.

Having said that, if you are compiling for a concrete architecture there may be a big difference. A quantum-origin conditional has to be compiled to some instructions on the control system, whereas a classical-origin one might be compiled to a side processor.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on the importance of this distinction from an algorithmic perspective?

It's admittedly hand-wavy. The value is (1) the result of a fundamentally different kind of computational process that runs on a separate device, and (2) there is nondeterminism beyond regular classical randomness, in that there could be errors due to noise, additional readout mitigation required, etc. that affect both the value of the variable and the subsequent branch it takes.

In any case, my argument is getting a bit more philosophical, and I don't want this to block the PR 😅

@DRovara DRovara requested a review from burgholzer November 13, 2025 14:10
Copy link
Collaborator

@burgholzer burgholzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @DRovara 👏🏼
Just spotted a typo and a missing dot. Otherwise this is spot on 🎯


There are several potential drawbacks to consider with this proposal:

- Maintenance Overhead: The addition of a new benchmark suite requires ongoing maintenance to ensure that the benchmarks remain relevant and up-to-date with the latest advancements in quantum computing and compiler technologies.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be good here. We can use inline script metadata for the Python scripts that generate benchmarks. uv has great support for these.
See https://docs.astral.sh/uv/guides/scripts/#declaring-script-dependencies for some good documentation on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants