-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Optimize the OptimizeCliffordT
transpiler pass.
#14996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Optimize the OptimizeCliffordT
transpiler pass.
#14996
Conversation
At this point the pass is still very naive and only cancels pairs of adjacent T-gates and pairs of adjacent Tdg-gates. A change in behavior: the pass raises an error if the circuit has non-(Clifford+T) gates.
The optimization applies to sequences of 1-qubit Clifford+T/Tdg gates. We believe that for 1-qubit circuits we get optimal T-counts.
Co-authored-by: Shelly Garion <[email protected]>"
One or more of the following people are relevant to this code:
|
Co-authored-by: Shelly Garion <[email protected]>
Pull Request Test Coverage Report for Build 17572998592Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't reviewed this in depth yet, but from a quick skimming one thing stuck out to me about all the constant arrays that I left an inline comment on. It was a small thing but that would potentially impact performance so I wanted to mention it before giving a full review.
// Precomputed tables used in the algorithm. | ||
|
||
// Index of the Clifford1q operator -> corresponding Clifford circuit | ||
const CIRCUIT: &[&[StandardGate]; 24] = &[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically I'd expect all of this arrays to be:
const CIRCUIT: &[&[StandardGate]; 24] = &[ | |
static CIRCUIT: [[StandardGate]; 24] = [ |
instead of const slices. The difference in practice is the const
version is basically a compiler directive that inlines the value everywhere it's used. While the static is set to a single address in memory that is loaded with the binary. Normally for arrays like this it's more efficient to use a static because with a const if you're using it multiple times or in a loop it's basically like doing:
let mut idx = 0;
let mut val = 0;
loop {
let foo = [a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z];
val += foo[i];
idx += 1;
if idx > foo.len() {
break;
}
}
every time you access these. If you're only using it once then it probably doesn't matter though. Its worth benchmarking it to be sure of course, but this why in other places with arrays like this they're typically defined as statics not const slices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! I did not understand this difference before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed const
to static
in 673580f. In particular, the CIRCUIT
array is now of the form
static CIRCUIT: [&[StandardGate]; 24] = [&[], &[StandardGate::H], ...]
I was not able to get rid of the inner slices though, as different entries consist of different numbers of gates.
Oh, on the circuit from this PR's summary this had absolutely no effect on performance.
&[StandardGate::S], | ||
&[StandardGate::H, StandardGate::S], | ||
&[StandardGate::S, StandardGate::H], | ||
&[StandardGate::S, StandardGate::H, StandardGate::S], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A minor comment:
I think that for the 6 rep's for the Cliffords, it might be better to choose:
[I, H, S, HS, SdgH, SHS]
since they give exactly the following Cliffords:
Clifford: Stabilizer = ['+Z'], Destabilizer = ['+X']
Clifford: Stabilizer = ['+X'], Destabilizer = ['+Z']
Clifford: Stabilizer = ['+Z'], Destabilizer = ['+Y']
Clifford: Stabilizer = ['+Y'], Destabilizer = ['+Z']
Clifford: Stabilizer = ['+X'], Destabilizer = ['+Y']
Clifford: Stabilizer = ['+Y'], Destabilizer = ['+X']
You chose SH
instead of SdgH
, which gives "-" instead of "+":
Clifford: Stabilizer = ['+X'], Destabilizer = ['-Y']
The tables should be updated accordingly, but they should be more symmetric.
This looks good! I was initially a bit confused about what your code does but now I understand it so I will comment my understanding here. Feel free to use any part in the docstrings. This pass can be run as a peephole optimization pass on a circuit written over the "Clifford+T" gateset. More precisely it collapses all chains of 1-qubit gates containing Clifford+T/Tdg into a minimal usage of T/Tdg. For a chain containing m gates, the runtime is Linear-time complexity comes from the fact that in the special case of one-qubit gates, there will be no commutation opportunities between two rotations unless there is also a merge/cancel opportunity. This is no longer true for multi-qubit rotations because we must potentially consider repeated commutations until we find a rotation to merge/cancel with (see Zhang's algorithm arXiv:1903.12456). Optimality comes from the fact that again in the special case of 1-qubit gates if we have a gate sequence such that no two consecutive rotations commute, then the sequence is optimal. This is because a chain of m rotations of this form necessarily has smallest denominator exponent (sde) in its channel representation equal to m, and we know sde is equal to the optimal T count, see arXiv:1308.4134. Since our algorithm ensures that no consecutively commuting rotations remain in the circuit (by merging/cancelling), the circuit it produces is optimal. |
Summary
In #14433 we added an extremely naive
Clifford+T/Tdg
optimization pass that aims to reduce the total number ofT/Tdg
-gates in aClifford+T/Tdg
circuit by combining consecutive pairs ofT
-gates intoS
-gates and consecutive pairs ofTdg
-gates intoSdg
-gates. This PR completely replaces this by a much better algorithm which is furthermore implemented in Rust. We also believe the algorithm is exact (that is, produces the minimum number ofT/Tdg
-gates).The idea comes from discussions with Shelly, Julien, Ali and Simon. In essence, we apply the Litinski transform to 1-qubit sequences of$\pm \pi/8$ rotations and a trailing Clifford operator $RX(\pi/8)$ -rotation and we are appending an $RX(-\pi/8)$ -rotation, then the two rotations simply cancel out. On the other hand, if the last rotation in $RX(\pi/8)$ -rotation and we are appending another $RX(\pi/8)$ -rotation, then the two rotations can be combined into a Clifford gate and then merged into
Clifford+T/Tdg
gates. We iteratively process the gates in the sequence. At each point we have a running listR
ofC
. When we encounter a new Clifford gate, we simply merge it intoC
. When we encounter a newT
orTdg
-gate, we convert it into an RZ-rotation (while keeping track of the global phase) and swap this rotation withC
(this does not changeC
but may change the axis of the rotation). We append this rotation toR
and then check if it can be combined with the previous rotation inR
. As an example, if the last rotation inR
is anR
is anC
. After we process every gate, we rewrite rotations inR
usingClifford+T/Tdg
gates and expressC
in terms Clifford gates.Details and comments
In the above algorithm we need to reason about operators that can be constructed using 1q-Clifford gates. Unlike the other$192 = 24 \times 8$ possible operators, corresponding to $24$ single-qubit Cliffords multiplied by a factor of $e^{\pi k i/4}$ , $k=0,\dots,7$ . To reason about what happens when a Clifford gate is appended or prepended to such a Clifford operator, we have precomputed the tables for appending/prepending
Clifford
classes, we need to keep track of the global phase. This leads toH
andS
-gates. Similarly, we have precomputed tables for evolvingRX
,RY
,RZ
rotations using such a Clifford operator. This leads to a fast but somewhat ugly implementation. @ShellyGarion is investigating if we can replace these precomputed tables by an explicit construction.@ajavadia has Python code that also reimplements the exact resynthesis of
Clifford+T/Tdg
circuits. In fact, Ali's code has both 1-qubit and multiple-qubit versions, but here I am looking at the 1-qubit one. On the following example,both Ali's code and the code in this PR reduce the number of$T$ -gates from
10000
to3288
. However, the implementation in this PR is about 800x faster, taking0.0063
seconds compared to5.2329
seconds).