Skip to content

Conversation

alexanderivrii
Copy link
Member

Summary

In #14433 we added an extremely naive Clifford+T/Tdg optimization pass that aims to reduce the total number of T/Tdg-gates in a Clifford+T/Tdg circuit by combining consecutive pairs of T-gates into S-gates and consecutive pairs of Tdg-gates into Sdg-gates. This PR completely replaces this by a much better algorithm which is furthermore implemented in Rust. We also believe the algorithm is exact (that is, produces the minimum number of T/Tdg-gates).

The idea comes from discussions with Shelly, Julien, Ali and Simon. In essence, we apply the Litinski transform to 1-qubit sequences of Clifford+T/Tdg gates. We iteratively process the gates in the sequence. At each point we have a running list R of $\pm \pi/8$ rotations and a trailing Clifford operator C. When we encounter a new Clifford gate, we simply merge it into C. When we encounter a new T or Tdg-gate, we convert it into an RZ-rotation (while keeping track of the global phase) and swap this rotation with C (this does not change C but may change the axis of the rotation). We append this rotation to R and then check if it can be combined with the previous rotation in R. As an example, if the last rotation in R is an $RX(\pi/8)$-rotation and we are appending an $RX(-\pi/8)$-rotation, then the two rotations simply cancel out. On the other hand, if the last rotation in R is an $RX(\pi/8)$-rotation and we are appending another $RX(\pi/8)$-rotation, then the two rotations can be combined into a Clifford gate and then merged into C. After we process every gate, we rewrite rotations in R using Clifford+T/Tdg gates and express C in terms Clifford gates.

Details and comments

In the above algorithm we need to reason about operators that can be constructed using 1q-Clifford gates. Unlike the other Clifford classes, we need to keep track of the global phase. This leads to $192 = 24 \times 8$ possible operators, corresponding to $24$ single-qubit Cliffords multiplied by a factor of $e^{\pi k i/4}$, $k=0,\dots,7$. To reason about what happens when a Clifford gate is appended or prepended to such a Clifford operator, we have precomputed the tables for appending/prepending H and S-gates. Similarly, we have precomputed tables for evolving RX, RY, RZ rotations using such a Clifford operator. This leads to a fast but somewhat ugly implementation. @ShellyGarion is investigating if we can replace these precomputed tables by an explicit construction.

@ajavadia has Python code that also reimplements the exact resynthesis of Clifford+T/Tdg circuits. In fact, Ali's code has both 1-qubit and multiple-qubit versions, but here I am looking at the 1-qubit one. On the following example,

circuit = QuantumCircuit(1)
for i in range(10000):
    circuit.t(0)
    circuit.compose(random_clifford(1, seed=i*23+17).to_circuit(), [0], inplace=True)

both Ali's code and the code in this PR reduce the number of $T$-gates from 10000 to 3288. However, the implementation in this PR is about 800x faster, taking 0.0063 seconds compared to 5.2329 seconds).

At this point the pass is still very naive and only cancels pairs of adjacent T-gates and pairs of
adjacent Tdg-gates.

A change in behavior: the pass raises an error if the circuit has non-(Clifford+T) gates.
The optimization applies to sequences of 1-qubit Clifford+T/Tdg gates. We believe that for 1-qubit circuits
we get optimal T-counts.
@alexanderivrii alexanderivrii added this to the 2.3.0 milestone Sep 8, 2025
@alexanderivrii alexanderivrii requested a review from a team as a code owner September 8, 2025 08:37
@alexanderivrii alexanderivrii added the Changelog: New Feature Include in the "Added" section of the changelog label Sep 8, 2025
@github-project-automation github-project-automation bot moved this to To do in Transpiler Sep 8, 2025
@qiskit-bot
Copy link
Collaborator

One or more of the following people are relevant to this code:

  • @Qiskit/terra-core

@ShellyGarion ShellyGarion added the fault tolerance related to fault tolerance compilation label Sep 8, 2025
@ShellyGarion ShellyGarion added the mod: transpiler Issues and PRs related to Transpiler label Sep 8, 2025
@coveralls
Copy link

coveralls commented Sep 8, 2025

Pull Request Test Coverage Report for Build 17572998592

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 216 of 245 (88.16%) changed or added relevant lines in 4 files are covered.
  • 18 unchanged lines in 5 files lost coverage.
  • Overall coverage increased (+0.002%) to 88.376%

Changes Missing Coverage Covered Lines Changed/Added Lines %
crates/transpiler/src/passes/optimize_clifford_t.rs 211 240 87.92%
Files with Coverage Reduction New Missed Lines %
crates/circuit/src/parameter/parameter_expression.rs 1 82.79%
crates/circuit/src/parameter/symbol_expr.rs 1 73.15%
qiskit/transpiler/passes/layout/vf2_utils.py 1 93.71%
crates/qasm2/src/lex.rs 3 91.75%
qiskit/transpiler/passes/layout/vf2_post_layout.py 12 91.12%
Totals Coverage Status
Change from base Build 17500557721: 0.002%
Covered Lines: 92406
Relevant Lines: 104560

💛 - Coveralls

Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't reviewed this in depth yet, but from a quick skimming one thing stuck out to me about all the constant arrays that I left an inline comment on. It was a small thing but that would potentially impact performance so I wanted to mention it before giving a full review.

// Precomputed tables used in the algorithm.

// Index of the Clifford1q operator -> corresponding Clifford circuit
const CIRCUIT: &[&[StandardGate]; 24] = &[
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically I'd expect all of this arrays to be:

Suggested change
const CIRCUIT: &[&[StandardGate]; 24] = &[
static CIRCUIT: [[StandardGate]; 24] = [

instead of const slices. The difference in practice is the const version is basically a compiler directive that inlines the value everywhere it's used. While the static is set to a single address in memory that is loaded with the binary. Normally for arrays like this it's more efficient to use a static because with a const if you're using it multiple times or in a loop it's basically like doing:

let mut idx = 0;
let mut val = 0;
loop {
     let foo = [a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z];
     val += foo[i];
     idx += 1;
     if idx > foo.len() {
         break;
     }
}

every time you access these. If you're only using it once then it probably doesn't matter though. Its worth benchmarking it to be sure of course, but this why in other places with arrays like this they're typically defined as statics not const slices.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! I did not understand this difference before

Copy link
Member Author

@alexanderivrii alexanderivrii Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed const to static in 673580f. In particular, the CIRCUIT array is now of the form

static CIRCUIT: [&[StandardGate]; 24] = [&[], &[StandardGate::H], ...]

I was not able to get rid of the inner slices though, as different entries consist of different numbers of gates.

Oh, on the circuit from this PR's summary this had absolutely no effect on performance.

&[StandardGate::S],
&[StandardGate::H, StandardGate::S],
&[StandardGate::S, StandardGate::H],
&[StandardGate::S, StandardGate::H, StandardGate::S],
Copy link
Member

@ShellyGarion ShellyGarion Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minor comment:
I think that for the 6 rep's for the Cliffords, it might be better to choose:
[I, H, S, HS, SdgH, SHS]
since they give exactly the following Cliffords:

Clifford: Stabilizer = ['+Z'], Destabilizer = ['+X']
Clifford: Stabilizer = ['+X'], Destabilizer = ['+Z']
Clifford: Stabilizer = ['+Z'], Destabilizer = ['+Y']
Clifford: Stabilizer = ['+Y'], Destabilizer = ['+Z']
Clifford: Stabilizer = ['+X'], Destabilizer = ['+Y']
Clifford: Stabilizer = ['+Y'], Destabilizer = ['+X']

You chose SH instead of SdgH, which gives "-" instead of "+":
Clifford: Stabilizer = ['+X'], Destabilizer = ['-Y']

The tables should be updated accordingly, but they should be more symmetric.

@ajavadia
Copy link
Member

This looks good! I was initially a bit confused about what your code does but now I understand it so I will comment my understanding here. Feel free to use any part in the docstrings.

This pass can be run as a peephole optimization pass on a circuit written over the "Clifford+T" gateset. More precisely it collapses all chains of 1-qubit gates containing Clifford+T/Tdg into a minimal usage of T/Tdg. For a chain containing m gates, the runtime is O(m).

Linear-time complexity comes from the fact that in the special case of one-qubit gates, there will be no commutation opportunities between two rotations unless there is also a merge/cancel opportunity. This is no longer true for multi-qubit rotations because we must potentially consider repeated commutations until we find a rotation to merge/cancel with (see Zhang's algorithm arXiv:1903.12456).

Optimality comes from the fact that again in the special case of 1-qubit gates if we have a gate sequence such that no two consecutive rotations commute, then the sequence is optimal. This is because a chain of m rotations of this form necessarily has smallest denominator exponent (sde) in its channel representation equal to m, and we know sde is equal to the optimal T count, see arXiv:1308.4134. Since our algorithm ensures that no consecutively commuting rotations remain in the circuit (by merging/cancelling), the circuit it produces is optimal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: New Feature Include in the "Added" section of the changelog fault tolerance related to fault tolerance compilation mod: transpiler Issues and PRs related to Transpiler
Projects
Status: To do
Development

Successfully merging this pull request may close these issues.

6 participants