Lightweight Detection of a Small Number of Large Errors in a Quantum Circuit

Suppose we want to implement a unitary $U$, for instance a circuit for some quantum algorithm. Suppose our actual implementation is a unitary $\tilde{U}$, which we can only apply as a black-box. In general it is an exponentially-hard task to decide whether $\tilde{U}$ equals the intended $U$, or is significantly different in a worst-case norm. In this paper we consider two special cases where relatively efficient and lightweight procedures exist for this task. First, we give an efficient procedure under the assumption that $U$ and $\tilde{U}$ (both of which we can now apply as a black-box) are either equal, or differ significantly in only one $k$-qubit gate, where $k=O(1)$ (the $k$ qubits need not be contiguous). Second, we give an even more lightweight procedure under the assumption that $U$ and $\tilde{U}$ are Clifford circuits which are either equal, or different in arbitrary ways (the specification of $U$ is now classically given while $\tilde{U}$ can still only be applied as a black-box). Both procedures only need to run $\tilde{U}$ a constant number of times to detect a constant error in a worst-case norm. We note that the Clifford result also follows from earlier work of Flammia and Liu, and of da Silva, Landon-Cardinal, and Poulin. In the Clifford case, our error-detection procedure also allows us to efficiently learn (and hence correct) $\tilde{U}$ if we have a small list of possible errors that could have happened to $U$; for example if we know that only $O(1)$ of the gates of $\tilde{U}$ are wrong, this list will be polynomially small and we can test each possible erroneous version of $U$ for equality with $\tilde{U}$.


Introduction
With the first tentative steps for implementing quantum computations on larger numbers of qubits (53 qubits in the case of Google's quantum supremacy experiment [AM19]) comes the need to verify whether those implementations actually work as intended. In contrast to classical computations, we cannot just "open up" the computer midway through the computation to check whether everything is still on track and then allow the computation to continue, because measurements on the intermediate quantum state typically destroy the superposition; and learning the quantum state takes exponential effort in the number of qubits in general. Similarly, simulation of a general n-qubit quantum circuit to determine what the intended output should be on a given input state, becomes infeasible if n 50. Even reasonably-well implemented circuits of simple quantum operations ("gates") can still be marred by many different types of errors: a few large errors (where a gate or qubit is totally wrong or even absent), or many smallish errors (for example slight overrotations), or some combination of both. Strategies are needed to deal with these. In the long run, when we have sufficiently many physical qubits available to encode our logical qubits by error-correcting codes, such faults could in principle all be dealt with by the machinery of fault-tolerant quantum computing. In particular, the "threshold theorem" [AB08] says that arbitrarily long fault-tolerant quantum computing is possible with low overhead, assuming the fault-rate per qubit per time-step is a sufficiently small constant and the errors are not too correlated. But even here, there could be errors due to the mis-specification of the programme to be run. In the near-to medium-term future we will not have sufficiently many qubits available to do fault-tolerant computing, and we need more "lightweight" methods to verify (and hopefully correct) quantum circuits. By lightweight we mean that the verification procedures should not use very complicated quantum operations beyond runningŨ as a black-box, and should need only polynomial (ideally only linear) additional classical effort in the number of qubits and gates of the tested circuits.
We are interested in this paper with testing the full computation, thought of as a black-box, and testing its behaviour on an arbitrary input, not just the all-zeros state (as is important, for example, if the circuit is to be applied as a subroutine within a larger computation). Accordingly, the verification procedure should test for closeness of the ideal circuit U and the actually implemented circuitŨ in a worst-case norm.
Let us first discuss what specific norm is appropriate to measure distance between unitaries U andŨ (see the survey [MW16, Section 5.1] and references therein for a more extensive discussion). When measuring distance between two states, |φ and |ψ , the canonical distance measure is the trace distance, which is defined as half the difference in Schatten-1 norm between the corresponding density matrices: The trace distance gives exactly the maximal total variation distance difference between the probability distributions obtained from |φ and |ψ , respectively, maximized over all possible measurements. The trace distance between |φ and |ψ turns out to be equal to D(|φ , |ψ ) = 1 − | φ|ψ | 2 .
This D(|φ , |ψ ) satisfies the triangle inequality, but is not a distance in the strictest sense of the word, because |φ and −|φ have distance 0 even though they're not equal. This is, however, as it should be, because such global-phase differences have no physical significance. When comparing different unitaries U andŨ in the worst case, it is natural to maximize the trace distance between U |φ andŨ |φ over all |φ . This gives the following distance: This is actually the special case of the diamond-norm distance, restricted to the case of unitaries. 1 Similarly to the trace distance, this distance cannot "see" the difference in global phase between U and e iθ U (unless we can turn the global phase into a relative phase by conditional operations). 2 Detecting the difference between U andŨ as measured by D max is like finding a needle in a haystack: two n-qubit unitaries may have large D max -distance while being equal on all but one of the elements in some particular 2 n -element basis. The difference would only show up in one out of 2 n possible "directions". Consider the example where the ideal unitary U is the n-qubit identity and the actual implementationŨ is identity with one of the 2 n diagonal entries negated; here D max (U,Ũ ) is large (equal to 1), yet the well-known lower bound for quantum search [BBBV97] implies that Ω( √ 2 n ) black-box applications of U are necessary in order to detect the difference from identity with constant probability. And in a complexity-theoretic context, where the unitaries are not given as a black-box but as explicit polynomial-size quantum circuits, deciding whether D max (U,Ũ ) is close to 0 or close to 1 is known to be QIP-complete [Wat09,Theorem 13]. Still, some non-trivial verification can be done in special cases without doing an exponential amount of work, and that is the topic of this paper.
We will consider two types of U,Ũ in the following subsections: (1) arbitrary unitaries, which we can think of as (possibly very large) circuits over an arbitrary universal set of gates, for instance {H, T, CNOT}. Here we will be able efficiently to detect large D maxdistance if U andŨ differ in only one k-qubit gate, with k = O(1). And (2) unitaries corresponding to Clifford circuits. Here we will be able to efficiently detect difference between any two Clifford circuits U andŨ . In both cases our procedures only need to run the circuits a constant number of times in order to detect a constant distance in worst-case norm. In case (2), if the number of faulty gates in our Clifford circuit is O(1), then we can actually find what those errors are in polynomial time.

Circuits over a universal gate set
Suppose we want to test whether two n-qubit unitaries U andŨ over an arbitrary gate-set are equal or not. We can apply these unitaries as a black-box, but cannot look inside them. For example, we can think of U as corresponding to an implementation of some s-gate quantum circuit on a chip, which for whatever reason we already know to be a correct implementation.Ũ is another chip that has just come off the production line and that is supposed to equal U , but that may or may not be different (faulty) in one or more of the s elementary gates. We want to test whether U andŨ are either equal, or far in D max -distance.
In Section 2.1 we describe a well-known test that compares U andŨ by effectively comparing their "Choi states". By running U ⊗ I on the first half of n EPR-pairs, running U ⊗ I on the first half of another batch of n EPR-pairs, and comparing the two resulting 2n-qubit states 3 with a swap-test, we obtain a test with acceptance probability given by where D is an "average-case" distance measure defined by: 4 The following equality justifies calling D(U,Ũ ) an "average case" [MW16, Proposition 21]: where the integral is according to Haar measure, and (2 n + 1)/2 n is very close to 1 already for small n. Hence our test is sensitive to a difference in trace distance in an average direction. 5 That is of course much weaker than we want, because U andŨ can have large distance D max (U,Ũ ), even when the detection probability of Eq. (1) is exponentially close to 0. However, we show that if U andŨ differ in only one gate on k = O(1) qubits (in the case where our circuit has some fixed spatial geometry: these qubits need not be contiguous), then the D max and D distances are closely related, and one is large iff the other is large. This gives a relatively lightweight procedure to compare two black-box circuits that differ in at most one k-qubit gate. Note that the procedure does not tell us what or where the erroneous gate is. This really concerns one extreme end of the spectrum of possible ways in which a circuit can fail: the relatively simple situation where one k-qubit gate is significantly wrong (the k = O(1) qubits need not be contiguous, and the k-qubit gate that is wrong could be built up from multiple elementary gates, some of which may be wrong), while the other gates in the circuit are essentially perfect. The picture we have in mind is analogous to a chip, where bits or qubits are led through a physical circuit, on which each gate has its own location. This setting does not really correspond to the current proposals for implementing quantum circuits on superconducting or ion-trap hardware, where typically many of the gates can be slightly faulty, and gradual deterioration is going on all over the place. However, our picture could correspond to optical implementations of quantum computers, where the optical set-up implementing a circuit on fly-by photonic qubits has one erroneous location, while everything else works essentially as intended. It could also correspond to the situation where we have a classical program driving near-perfect quantum hardware, where the classical program has one erroneous instruction somewhere, leading to one gate not doing what it's supposed to do (near-perfect quantum hardware that receives the wrong instructions still fails).
As an application, our test can be used to winnow out the faulty circuits from a production line where each circuit has small probability f of having one faulty gate. Using our test we can reduce the fraction of faulty circuits from f to anything we want (see Section 2.3).
What ifŨ has more than one faulty gate compared to U ? One would expect two errors to be no harder to detect than one error. Unfortunately, as we show in Section 2.4, there are cases where U andŨ differ significantly in two 1-qubit gates and have large D max (U,Ũ )distance, yet the two errors conspire to make D(U,Ũ ) (and hence the detection probability of our test) exponentially small.
Our test for arbitrary gate tests assumes the ability to create 2n EPR-pairs, to maintain coherence between the two halves of the EPR-pairs during the run of the circuits, and to apply a swap-test to two 2n-qubit gates. This is reasonably lightweight but not quite as lightweight as we would like our test to be.

Clifford circuits
In order to enable more lightweight testing, we then turn our attention to a specific gate-set. Clifford circuits use the gate-set consisting of the Pauli matrices: Hadamard H, phase gate S, and CNOT. This gate-set is not universal; it becomes universal when adding for instance a T -gate or when we start with certain "magic states" as part of our initial state and allow classical conditioning on the outcomes of intermediate one-qubit measurements (using Clifford gates we can then implement a T -gate).
We will consider the situation where we would like to implement a Clifford circuit U , of which we now have a classical description. We also have an implementation of a (possibly different) Clifford circuitŨ that we can run as a black-box. In Section 3 we give a relatively lightweight procedure for testing (with success probability close to 1) whether U =Ũ or not, which only uses O(1) runs of the black-box circuitŨ together with single-qubit state preparations at the start, and single-qubit measurements at the end. In fact, even with one run ofŨ we already have probability ≥ 1/4 of detecting a difference between U andŨ . This also means that our test still works if the errors are different in each run (i.e., ifŨ is a different erroneous Clifford in different runs).
The reason we can have such a lightweight procedure for testing Clifford circuits, is that such circuits correspond to linear maps of the set of n-qubit Pauli matrices to itself (up to an overall phase ±1), and that two different such maps actually differ on at least half of the 4 n n-qubit Paulis. Our test thus selects an n-qubit Pauli at random, and indirectly checks (by appropriate single-qubit measurements on the state obtained by runningŨ on an appropriate product state) whetherŨ transforms that Pauli as U would have done. This test is inspired by a test due to Richard Jozsa [Joz17], which however uses O(n) runs ofŨ rather than our O(1) runs.
In contrast to the procedure of the previous section, this test can distinguish any two different Cliffords, and we do not need to make any assumptions about U andŨ differing in only one gate. However, if we do additionally assume that U andŨ differ in at most one gate, then we can not only detect the presence of an error, but even find what it is. More generally, if we can compute from U any small list of candidate circuits that is promised to containŨ , then we can use our test from this section to identifyŨ by running over all U in our list and testing whether U =Ũ . 6 For example, if we know that the implemented circuitŨ was obtained from the ideal specification U by O(1) gates that were replaced by other gates, then this list has a size that is only polynomial in the number of qubits and gates of U . Having foundŨ , we have learnt the error(s), which hopefully enables us to correct it (them).

Remark.
After we finished our section on the above test for equivalence of Clifford circuits, we discovered that this result also follows from earlier work of Flammia and Liu [FL11] and da Silva, Landon-Cardinal, and Poulin [dSLCP11] about estimating the fidelity between quantum states and between quantum channels, together with the additional observation that distinct Cliffords have noticeably large D-distance (equivalently, small entanglement fidelity). We give the details in Section 3.4.

Related work
With the development of medium-size quantum computers, verification of their properties is receiving more and more attention. Here we will mention some of the main approaches and results, referring to the recent survey [EHW + 20] and the many references therein for more.
From a theoretical standpoint, an important recent result is Mahadev's verification protocol [Mah18]. This sits at the end of a long line of works in the area of blind quantum computation [AS06,BFK09] where a single verifier (who should be as efficient and as classical as possible) checks a quantum computation by interacting with one or more polynomialtime quantum provers. Mahadev's protocol allows a purely-classical polynomial-time machine to verify the computation of a polynomial-time quantum machine (under reasonable cryptographic assumptions). However, even though everything in Mahadev's protocol is polynomial, and hence "efficient" from a theoretical perspective, in practice the protocol is anything but lightweight: it leads to very significant overheads on the side of the quantum computation, and several rounds of communication between the quantum computer and the classical verifier. It is also designed to test the computation starting from a fixed initial state, so does not test according to a worst-case distance measure. Mahadev's 4-message protocol was subsequently improved to a non-interactive and zero-knowledge protocol in [ACGH20], but that improved protocol is not very lightweight either.
A much more bottom-up approach to verification is to test the building blocks of the quantum algorithm: the elementary gates. There have been some positive results on testing universal sets of quantum gates, for instance [DMMS07]. However, testing gates in isolation is not enough to verify their behavior in the context of a larger circuit, where the surrounding components may adversely affect gates that would have worked fine in isolation. Randomized benchmarking [EAZ05, DCEL09, PRY + 17] is an approach to test sequences of gates: roughly speaking one runs a random sequence of gates from a fixed gate set (often restricted to Clifford circuits on a small number of qubits) followed by their inverse, and then tests to what extent the resulting operation is the identity, as it should be. This approach beautifully isolates the average entanglement fidelity of the gates (see footnote 4), from state preparation and measurement ("SPAM") errors. Note that the average entanglement fidelity of the gates, which is what randomized benchmarking tries to measure, is an average-case measure that may or may not give information about the worst-case errors of these gates [Wal15,KLDF14].
Closer to the second part of this paper is the work of Low [Low09], who studied efficient testing and even identifying (learning) of Clifford circuits. He showed how to fully learn an unknown Clifford circuit U using O(n) runs of U and U † , but assuming the ability to run U † is a stronger assumption than we are willing to entertain here. Low also points (at the end of his Section III.B) to work of Harrow and Winter [HW12] which implies that O(n 2 ) runs of U suffice to learn it (without using U † ), but their work is information-theoretic in nature and assumes the ability to do complicated joint measurements on the O(n 2 ) outputstates of the runs of U . The general philosophy we espouse here (looking for lightweight schemes) is also embodied in the verification protocol described by Jozsa and Strelchuk in [JS17]. Last but not least, we already mentioned the very related work of Flammia and Liu [FL11] and [dSLCP11], which we discuss in Section 3.4.
2 Testing circuits over an arbitrary gate set 2.1 Using the two circuits separately In this section we study the situation where we have two s-gate quantum circuits, U andŨ , over an arbitrary set of one-and two-qubit gates. We can run these in a black-box fashion and want to test whether they are either equal, or substantially different in operator norm. We will give a relatively lightweight test that works if U andŨ differ in at most one gate.
We start by reminding the reader of a simple test that is sensitive to average-case distance between U andŨ (see [MW16, Section 5.1.3] and references therein). We will assume it is possible to create a maximally entangled state on 2n qubits; a simple circuit that starts from |0 2n and applies n Hadamard gates and n CNOTs will do this. We also assume we can do controlled-swap gates. Such 3-qubit gates are not quite as lightweight as we'd ideally like to be, but still much lighter than universal quantum computation. Now consider the following test: 1. Run U ⊗ I on a 2n-qubit maximally entangled state to produce state |ψ U .
3. Run a swap-test on |ψ U and |ψŨ and output the measured bit. 7 This test uses O(n + s) gates. It is easy to calculate the probability that the test outputs 1: and that ψ U |ψŨ = 1 2 n Tr(U †Ũ ).
This gives a relation between p and the average- If U andŨ are equal (up to global phase) then p will be 0, and otherwise p will be positive. Measurement outcome 1 thus tells us that U andŨ are different (by more than a global phase). The detection probability is large iff D(U,Ũ ) is large. This test will therefore be useful, for example, ifŨ were a version of U hit by random errors, because random errors tend to create deviations in many "directions" simultaneously and hence give a non-negligible distance D(U,Ũ ). However, our main focus here is to design a test that is sensitive to the worst-case distance D max (U,Ũ ), because if that distance is small, then U andŨ produce approximately the same states no matter what initial state they are applied to. In general the relation between these worst-case and average-case norms is fairly weak. For example, if U = I andŨ has one of its diagonal entries set to −1, then D max (U,Ũ ) = 1 but D(U,Ũ ) = 4/2 n − 4/2 2n is exponentially small. The above test will thus have exponentially small probability of detecting the large D max -distance in this case. We now show that at least the gap cannot be much more than in the previous example.
Proof. Let µ = min |φ | φ|U †Ũ |φ |, and |φ be a minimizing state. Let B be an orthonormal basis that contains |φ as one of its 2 n states. We have We now bound which implies the inequality of the theorem.
The above theorem is a strengthening (for the special case of unitaries) of a more general but quadratically weaker bound relating these two distances due to Magesan, Gambetta, and Emerson [MGE12], and used for instance in [Wal15, Eq. (3)] and [KLDF14, p. 2]. Now we make the simple but powerful observation that if U andŨ differ only in one k-qubit gate (G vsG), then the two norms are within a factor of roughly 2 k/2 of one another. Specifically, let U = U 1 (G ⊗ I 2 n−k )U 2 andŨ = U 1 (G ⊗ I 2 n−2 )U 2 , where U 1 and U 2 are arbitrary unitaries, and G andG are k-qubit gates. For notational simplicity we wrote G andG as acting on the first k qubits of the state, but in fact they may act on any subset of k of the n qubits, not necessarily contiguous. We have and hence D(U,Ũ ) = D(G,G). We also have and hence D max (U,Ũ ) = D max (G,G). Therefore, using Theorem 1, the probability of detecting a difference between U and In particular, if the worst-case distance is D max (U,Ũ ) ≥ ε and k = O(1) (say, U andŨ differ only in one k-qubit gate, or in one block of errors that affects only k qubits, not necessarily contiguous), then our detection probability is p = Ω(ε 2 ). We can efficiently increase this detection probability to close to 1: if we run O(log(1/δ)/ε 2 ) tests, then if U andŨ are equal then all tests will output 0, while if D max (U,Ũ ) ≥ ε then with probability ≥ 1 − δ at least one of the tests will output 1.
The 1/ε 2 -factor in the number of tests could be improved to 1/ε using amplitude amplification [BHMT02], but that would be a much less lightweight procedure: it also requires the ability to apply controlled versions of U andŨ as well as of their inverses, which may be technologically rather demanding. In any case, if we can apply inverses then there is an easier test that only uses n EPR-pairs instead of 2n: applyŨ and U −1 to the first half of a 2n-qubit maximally entangled state, reverse the Hs and CNOTs that prepared the entangled state, and check (by a measurement in the computational basis) whether you get back |0 2n , as this provides an estimate of D(U,Ũ ).

If we can apply the circuits conditionally
In the case where we cannot apply conditional versions of U andŨ , like in the previous section, differences in their global phases are physically meaningless and we cannot detect them. Now suppose we have slightly more power: we can apply U andŨ in a conditional manner, but not their inverses. This allows for a slightly more efficient test that uses 2n+1 qubits instead of 4n: 1. Prepare H|0 tensored with a 2n-qubit maximally entangled state (2n + 1 qubits in total).
2. Conditioned on the first qubit being |0 , apply U to the first n-qubit block; conditioned on the first qubit being |1 , applyŨ to the first n-qubit block.
3. Apply H to the first qubit and measure it.
The probability that the above algorithm outputs 1 is Note that Tr(U †Ũ ) is not squared here, in contrast to the expression for the probability in the previous section. Hence this test is sensitive to the relative phase between U andŨ . In particular, ifŨ = U then p = 0, while ifŨ = −U then p = 1.

Reducing the fault-rate in a production line of circuits
Suppose we have a production line that is intended to produce identical circuits that implement a particular unitary U . Like everything else in life, the production line is not perfect. Assume that each circuit is perfect (i.e., equal to U ) with probability 1 − f and faulty with probability f , meaning its D max -distance from the ideal U is at least ε; for example because U andŨ differ in exactly one gate like before. 8 If we don't do anything, we expect a fraction of roughly f of the circuits to be faulty. We would like to reduce this fraction by efficiently identifying the faulty circuits. We can achieve this by comparing the circuits against each other, using the fact that most are probably correct. Note that we are not assuming here that we can run the ideal U as a black-box.
Assume we have a test that, given two circuits U 1 and U 2 , can distinguish between the cases U 1 = U 2 (up to global phase) and D max (U 1 , U 2 ) ≥ ε with success probability ≥ 2/3 (for example, our test from Section 2.1 will do that if the distance is due to one faulty gate). Note that we can reduce the error probability of this test from 1/3 to small δ by running it O(log(1/δ)) times and taking the majority outcome among those runs.
Let us take a batch of n circuits coming off the production line, with n odd. By a Chernoff bound, the probability that more than half of them are faulty is at most e −D(1/2||f )n , where D(p||q) = p ln(p/q) + (1 − p) ln ((1 − p)/(1 − q)) is the Kullback-Leibler divergence (a.k.a. relative entropy) between binary distributions with probabilities p and q respectively, measured in nats rather than bits. If f is bounded away from 1/2, then D(1/2||f ) = Ω(1) and e −D(1/2||f )n is exponentially small in n. Now suppose we run our test on each of the n 2 pairs in the batch, with error probability reduced to δ 1/n 2 . Then, except with probability p E ≤ n 2 δ + e −D(1/2||f )n 1, all tests succeed and more than half of the circuits in the batch are correct. Condition on this event below.
Each circuit in the batch will be involved in n − 1 tests. For every good circuit, at least half of the tests it is involved in will be with other good circuits and hence will say "equal". For faulty circuits, more than half of the tests it is involved in will be with good circuits and hence these will say "not equal". Accordingly, if we throw away the circuits where more than half of the tests say "not equal", then we will exactly eliminate the faulty circuits from this batch.
With probability p E , the event we conditioned on did not happen, but the worst that can occur in that case is that we err on all n circuits in that batch, in the sense of throwing away all good circuits from the batch and keeping all faulty ones. Since p E is exponentially small in n, this bad event only negligibly affects the expected fraction of circuits we mishandled.
By choosing the batch-size n large enough, we can thus reduce the expected fault rate from f to anything we want. The number of black-box runs used for analyzing each batch of n circuits, is O( n 2 log(1/δ)) = O(n 2 log(n)).

Detecting two faults is hard for our test in the worst case
The test of Section 2.1 works to detect a one-gate error, because if only one gate is affected then there is a fairly tight relation between average-case distance D(U,Ũ ) that we can test for, and the worst-case distance D max (U,Ũ ) that we would like to test for. What if there are two faulty gates inŨ ? One might expect that detecting two errors should be easier than detecting one, but unfortunately this turns out to be false (at least in the worse case) because the two faults can conspire to destroy the close relation between the worst-case and average-case distance measures.
Here's a simple example. Let V be the n-qubit C n−1 NOT gate, which applies an Xgate to the last qubit conditioned on the first n − 1 qubits being in basis state |1 n−1 . Suppose U = (I ⊗ H)V (I ⊗ H) andŨ = V . In other words, the intended H-gates on the last qubit at the start and the end of the circuit are replaced by identities, so only two of the gates of U are faulty. Because HXH = Z, we have The matrix U †Ũ has ZX = iY in its lower-right corner. Hence min |φ | φ|U †Ũ |φ | = 0, as witnessed for instance by taking |φ = |1 n . This implies D max (U,Ũ ) = 1. On the other hand, Tr(U †Ũ ) = 2 n − 2, hence D(U,Ũ ) 2 = 1 − (1 − 2/2 n ) 2 ≈ 4/2 n . The latter implies that one run of our test only has exponentially small probability of detecting the large D max -distance between U andŨ . In other words, our test fails miserably to detect two or more adversarially placed faulty gates.

Testing Clifford circuits
Let P = {I, X, Y, Z} be the set of 1-qubit Paulis. Note that non-identity Paulis anticommute (XZ = −ZX etc.) and that Y = iXZ. Let P n = {I, X, Y, Z} ⊗n be the set of 4 n n-qubit Paulis. These matrices are unitary and Hermitian, and hence self-inverse. An n-qubit Clifford circuit U consists of Pauli gates, Hadamard gates (H), phase gates (S), and CNOT gates. These are exactly the unitaries that map (by conjugation) all elements of P n to elements of P n , possibly with an overall phase of ±1. We assume there are no intermediate measurements of qubits in the middle of the circuit; these may all be pushed to the end using some auxiliary qubits and CNOTs.
In this section we will deal with the situation where we want to implement an n-qubit Clifford circuit U , which we know fully (i.e., we have a classical description of it). Instead we have a Clifford circuitŨ that we can apply as a black-box. Our goal here is to test whether U =Ũ and, if not, to figure out how they differ so we can correct the errors.

What it means for two Clifford circuits to be different
As mentioned, conjugation by a Clifford circuit U maps elements of P n to elements of P n , up to an overall phase ±1, and it is well known that this map (ignoring the ±1s) corresponds to a linear map F 2n 2 → F 2n 2 , where F 2 is the field of two elements. Here we represent I by 00 ∈ F 2 2 , X by 10, Z by 01, and Y by 11, so we may identify an n-qubit Pauli with an element of F 2n 2 . For example, we can identify P = X ⊗ Z with the 4-bit vector (1, 0, 0, 1) T . The correspondence between a Clifford and its associated linear map with signs seems to be folklore. It can be derived from the connection with the symplectic group, see for instance [KS14, Section I.A] (see also [Gro06,Section II.B], though that applies to qudits of odd dimension). We give a simple proof below for completeness.
Theorem 2 (folklore). Let U be an n-qubit Clifford circuit, and define the associated map U : P n → ±P n by U (P ) = U P U † . There exists an invertible matrix M U ∈ F 2n×2n 2 such that U (P ) ∈ {M U P, −M U P } (where with slight abuse of notation we view P both as an n-qubit Pauli and as an element of F 2n 2 ). Proof. The circuit U is just a composition of Pauli gates, H, S, and CNOT gates. Hence it suffices to prove the theorem for each of these gates and then compose the linear maps.
First, when conjugating a 1-qubit Pauli P with a 1-qubit Pauli gate U , we just get P back, with a minus sign if P and U anti-commute; we ignore the sign for the purposes of this theorem. The corresponding matrix M U is just the identity.
Fourth, conjugation by CNOT maps 2-qubit Paulis to 2-qubit Paulis as given for instance in Figure 3 of [KRUW10]. It may be verified that in the 4-bit representation this map corresponds to the following 4 × 4 matrix: Proof. First, by right-multiplying U andŨ withŨ † , we may assume without loss of generality thatŨ = I and hence M U = MŨ = I. We now want to show that U corresponds to some R ∈ P.
Since M U = I, conjugation by U maps each P ∈ P n to itself, times a sign s P . Since every density matrix ρ is a linear combination of P ∈ P, these signs fully determine the action of U on all density matrices: if ρ = P a P P , then U ρU † = P a P U P U † = P a P s P P .
Let us first consider the n signs s X j induced by the action of U on X j = I ⊗j−1 ⊗ X ⊗ I ⊗n−j (for j = 1, . . . , n), and the n signs s Z j corresponding to Z j = I ⊗j−1 ⊗ Z ⊗ I ⊗n−j . We now show that we can choose a (unique) R ∈ P n consistent with all the signs s X j and s Z j . Consider j = 1.
We now claim that this choice of R (which has M R = I, like all Pauli circuits) not only has the same signs s P as U for all P ∈ {X 1 , . . . , X n , Z 1 , . . . , Z n }, but in fact has the same signs s P for all 4 n P ∈ P n . To that end, fix an arbitrary P , and write it as for some a 1 , . . . , a n , b 1 , . . . , b n ∈ {0, 1}, and some overall phase c ∈ {1, −1, i, −i} which comes from the fact that Y = iXZ. Inserting I = U † U in many places, we can write This shows that s P = n j=1 s a j X j s b j Z j , so all 4 n signs s P are fully determined by the 2n signs s X 1 , s Z 1 , . . . , s Xn , s Zn . But by the same calculation, R induces exactly the same signs for all P ∈ P n . Hence conjugation by U and R are the same map on P n (and by linearity are the same map on all n-qubit density matrices).

Our test for detecting a difference between two Clifford circuits
The previous theorems can be used to design an efficient test to detect whether two Clifford circuits (one given classically, the other as a quantum black-box) are equal or not. The test is based on the observation (used for instance in Freivalds's well-known randomized algorithm for verifying matrix multiplication [Fre77]) that one can detect whether two matrices are equal by comparing their images on a random vector: if the matrices are equal then these images will be the same, but if the two matrices are different then these images will be different with high probability. In our scenario, if two Clifford circuits U and U are different by more than an n-qubit Pauli, then the associated maps U : P n → ±P n andŨ : P n → ±P n will give different n-qubit Paulis (even when ignoring their signs) on at least half of all 4 n Paulis: Theorem 4. Let U andŨ be n-qubit Clifford circuits that have distinct associated matrices M U and MŨ (equivalently, conjugation by U and RŨ are distinct maps for all R ∈ P n ). Then for at least 1 2 4 n of the P ∈ P n , M U P = MŨ P . Proof. Consider the matrix M U − MŨ ∈ F 2n×2n 2 . This is a nonzero matrix, hence its kernel has dimension at most 2n − 1, which means that (M U − MŨ )P = 0 for at most 2 2n−1 different P s. Therefore M U P = MŨ P for at least 2 2n − 2 2n−1 = 1 2 4 n of the P ∈ P n . Of course, it is possible that U andŨ only differ by an n-qubit Pauli, and we have to consider that case separately. Now suppose we have a Clifford circuitŨ that is intended to implement a known Clifford circuit U . We can runŨ but not its inverse, and want to test whether it indeed equals the intended U . Our test starts by choosing a uniformly random P ∈ P n . We compute U † (P ) = U † P U , 9 which is a signed n-qubit Pauli Q = sQ 1 ⊗ · · · ⊗ Q n ∈ ±P n . Note that 9 A classical computer can do this in time linear in the number of gates of U : use the 2n-bit representation and update this gate-by-gate according to the action of the Clifford gates as described in the proof of Theorem 2; also keep track of the overall phase ±1. Note that we want to do this for U † so we have to reverse the order of gates given by U , and invert the gates (which only affects the S-gate, since the other Clifford gates are self-inverse).
if we start with an eigenstate of Q and apply U to it, then we obtain an eigenstate of P itself, with the same eigenvalue. Our test prepares a tensor-product eigenstate |ψ in of Q as follows: for j = 1, . . . , n: if Q j ∈ {X, Y, Z}, then set the jth qubit of |ψ in to either the +1-eigenstate or the −1-eigenstate of Q j , each with probability 1/2; if Q j = I, then set the jth qubit of |ψ in to the +1-eigenstates |0 or |1 , each with probability 1/2 (equivalently, we can think of this as the maximally mixed state, 1 2 |0 0| + 1 2 |1 1|). By construction |ψ in is an eigenstate of Q, with an eigenvalue λ ∈ {+1, −1} that we know. Now we runŨ on |ψ in and measure the ±1-valued observable P on stateŨ |ψ in .
If U =Ũ , then the measurement gives the known value λ as outcome, with probability 1. However, we claim that if U andŨ are different Cliffords, then we will see the opposite outcome −λ with probability at least 1/4. To prove that claim we make a casedistinction for the two ways in which U andŨ can differ (our test doesn't need to know which of the two cases applies).

Case 1:
The matrices M U and MŨ are distinct. LetQ =Ũ † (P ) =sQ 1 ⊗ · · · ⊗Q n ∈ ±P n . We don't know whatQ is since we don't know whatŨ is. However, by Theorem 4 we have Q 1 ⊗ · · · ⊗ Q n =Q 1 ⊗ · · · ⊗Q n with probability at least 1/2, over our random choice of P . In this case, measuring P onŨ |ψ in will give a value different from λ with probability 1/2, which can be seen as follows, by examining the different ways in which Q andQ could differ (ignoring their overall signs, which do not affect the probabilistic argument below): 1. There is a location j where Q j ,Q j ∈ {X, Y, Z} but Q j =Q j . |ψ in j is a ±1-eigenstate of Q j but not ofQ j . It is a property of the eigenstates of the non-identity Paulis that ψ in | jQj |ψ in j = 0, which means that the jth qubit will contribute a uniformly random sign to the measurement outcome.
2. There is a location j where Q j = I andQ j ∈ {X, Y, Z}. Then the jth qubit has been set to the maximally mixed state, which is an equal mixture of the +1-eigenstate and the −1-eigenstate ofQ j . Again, the jth qubit will contribute a uniformly random sign to the measurement outcome.
3. There is a location j where Q j ∈ {X, Y, Z} andQ j = I. In this case |ψ in j is always a +1-eigenvector ofQ j , but it is a +1-eigenstate or −1-eigenstate of Q j with probability 1/2 each. Again, the jth qubit will contribute a uniformly random sign to the measurement outcome.
There could be multiple j where Q j =Q j ; each will add a random sign, multiplying out to one random sign. The probability that this random sign equals the value λ that we expect to obtain as measurement outcome if U =Ũ , is 1/2. Accordingly, since we have probability ≥ 1/2 that Q andQ differ in at least one j, our probability to detect a difference between U andŨ is ≥ 1/4.

Case 2:
There is an R ∈ P n \ {I ⊗n } s.t. conjugation by U and RŨ are the same maps.
Since P is uniformly random, in each location j where R j = I, the Paulis R j and P j at that location will commute with probability 1/2 (namely if P j is chosen to be I or R j ) and anti-commute with probability 1/2 (namely if P j is chosen to be one of the other 2 Paulis), independently of what happens in the other locations. In the locations j where R j = I, this will always commute with P j . Hence RP R = P with probability 1/2 and RP R = −P with probability 1/2. We know U |ψ in is a λ-eigenstate of P . But then it will be a −λ-eigenstate of RP R with probability 1/2. HenceŨ |ψ in will be a −λ-eigenstate of P with probability 1/2.
In sum, our test will output the known value λ ∈ {+1, −1} with probability 1 if U =Ũ , but will output −λ with probability at least 1/4 if U =Ũ . This allows us to detect that U andŨ are different Cliffords.
The cost of this test is essentially as small as could be: computing Q = U † (P ) has classical cost linear in the size of the known circuit U ; then we need to prepare the n-qubit tensor-product state |ψ in , runŨ once on it, and measure P on the resulting state. This gives us constant probability of detecting a difference between the two Clifford circuits U andŨ if there is one. Note that n single-qubit Pauli measurements according to P = P 1 ⊗ · · · ⊗ P n would also suffice: the expectation value of (indeed, the whole distribution of) the product of the n single-qubit measurement outcomes is the same as that of P . This might be easier to realize technologically than one overall ±1-valued n-qubit measurement.
Running our test k times, with fresh random P in each run, will detect U =Ũ with success probability ≥ 1−(3/4) k . Setting k = log(1/δ)/ log(4/3) , the detection probability is ≥ 1 − δ. If we fix δ to some small constant, then we need to run our test only a constant number of times in order to achieve such high success probability. As we noted in the introduction, our test still works to detect whether the implemented Clifford circuit equals U or not, even if the errors are different in each run (i.e., ifŨ is a different erroneous Clifford in different runs).
Our ability to detect a difference from U with probability as close as we wish to 1 using O(1) runs ofŨ , also means that a small (but constant) additional error probability due to the unavoidable noise and decoherence in each of these runs still leaves us with high success probability.

Finding the error(s)
The previous section gave a test to see whether n-qubit Clifford circuit U (of which we have a classical description) equals another n-qubit Clifford circuitŨ (which we can run as a black-box) or differs from it in some way. If we are in the latter situation, it would be nice if we can efficiently find out where and what the difference was.
Using a number of runs of the above test, we can indeed identify the error, or at least something equivalent to it. The idea is the following: the known circuit U acts on n qubits and has s gates, so the number of circuits U that differ from U in one gate (or one Pauli error) is relatively small, only O(s). Accordingly, we can just run the above test for each of those U , testing whether the known Clifford circuit U equals the circuitŨ (which we can still run as a black-box).
Note that the same idea also works if there can be up to d gate-differences instead of one. However, the number of circuits U that are within d errors of U is roughly s d , so the number of tests grows quickly (though still polynomially if d = O(1)). Having learntŨ , we can correct it.
3.4 Deriving the same Clifford-testing result from [FL11] and [dSLCP11] As mentioned in the introduction, after finishing our Clifford test of Section 3.2, we discovered that something very similar can be derived from work of Flammia and Liu [FL11] and da Silva, Landon-Cardinal, and Poulin [dSLCP11]. Specifically, Flammia and Liu [FL11] describe a procedure that, given the classical description of a Clifford circuit U and the ability to run another quantum operationŨ as a black-box, estimates (with success probability ≥ 1 − δ) their entanglement fidelity up to additive error ≤ ε using O( 1 ε 2 log(1/δ)) runs ofŨ . Very similar to ours, each run ofŨ in their procedure starts with a product state of eigenstates of a random Pauli, and ends with a Pauli measurement on the final state. In fact, their procedure even works ifŨ is a general quantum channel (CPTP map) rather than a unitary.
For general unitary circuits, the entanglement fidelity | 1 2 n Tr(U †Ũ )| 2 can be arbitrarily close to 1, which means one has to have arbitrarily small ε to "see" the difference between the case U =Ũ and the case where U andŨ are distinct but have a lot of overlap. However, in the special case where U andŨ are distinct Clifford circuits, we show below that the entanglement fidelity is at most 1/2. Hence running the Flammia-Liu procedure with constant ε < 1/4 suffices to detect (with probability ≥ 1 − δ) any difference between Clifford circuits U andŨ , using just O(log(1/δ)) runs ofŨ on product-state inputs and with Pauli measurement at the end, just like our test.
Proof. It suffices to prove that |Tr(U )| 2 ≤ 2 2n−1 for every non-identity Clifford U . If U = U 1 ⊗ · · · ⊗ U n is a product of Paulis then Tr(U ) = n j=1 Tr(U j ) = 0, because at least one of the U j 's must be X, Y or Z, which have trace 0.
If, on the other hand, U is not a product of Paulis, then by Theorem 4, conjugation by U maps at least half of all P ∈ P n to ±P for some P = P .
Let |ψ = 1 √ 2 n i∈{0,1} n |i |i be the 2n-qubit maximally entangled state. It is well known (and easy to verify) that for all 2 n -dimensional matrices A and B, we have ψ|(A ⊗ B)|ψ = 1 2 n Tr(A T B).
Note that the 2 2n states (I ⊗ P )|ψ , P ∈ P n , form an orthonormal set, hence For at least half of all P ∈ P n , U P U † is ±P for some P = P , in which case Tr(P ·U P U † ) = 0. For the other P ∈ P n (of which there are at most 1 2 2 2n ), where U P U † = ±P , we have Tr(P · U P U † ) = ±2 n . Hence we obtain our desired upper bound: |Tr(U )| 2 ≤ 1 2 n 1 2 2 2n 2 n = 2 2n−1 .

Future work
The goal of lightweight testing and verification of quantum circuits is an important one, especially considering the severe limitations of medium-term quantum computing hardware.
In this paper we gave several examples of non-trivial tests one can do to efficiently check whether two circuits are equal or differ in a worst-case distance measure, and in some cases to find the error. Our tests are far from satisfactory, though, and we hope they can be improved in various directions. Below we mention some questions for future work: • Simpler tests. Can we design better tests that are more lightweight? In particular, the preparation of 2n EPR-pairs in Section 2, and the preservation of entanglement among those qubits for the duration of the test, is hard to realize in experiments.
Can we do something like this with much less entanglement? (see footnote 3 for one approach) • More general errors. In Section 3 we handled the situation where our Clifford circuit U is implemented as a circuitŨ which may be wrong, but is assumed still to be Clifford. However, errors can be of many types. What about testing for a Clifford circuit with one arbitrary unitary but possibly non-Clifford error V ? Such a V can be written as a linear combination of the Paulis, so something should be possible along the lines of this paper, but we have not worked this out yet. Of course, an even more general setting would be arbitrary not-even-unitary errors on some of the qubits, which correspond to arbitrary CPTP maps; in this case we should aim at detecting a large distance in something like the "diamond norm" rather than the D max -norm.
• While our Clifford test of Section 3 does not care whether there are one or more faulty gates, the test for general circuits of Section 2 does. As we showed in Section 2.4, the close relation between the average-case D-distance between two circuits (which is what we can test for) and their worst-case D max -distance (which is what we would like to test for) already disappears when we have two faulty gates instead of one. How can we detect the presence of multiple faulty gates in the general, non-Clifford situation?
• In some cases one can conjugate a possibly-faulty gate with random gates in order to convert adversarial noise to random noise (see e.g. the work of Wallman and Emerson [WE16]). Can we use that somehow? Such an approach might help bridge the gap between average-case and worst-case distance measures.