Randomized Benchmarking as Convolution: Fourier Analysis of Gate Dependent Errors

We show that the Randomized Benchmarking (RB) protocol is a convolution amenable to Fourier space analysis. By adopting the mathematical framework of Fourier transforms of matrix-valued functions on groups established in recent work from Gowers and Hatami [Sbornik: Mathematics 208, 1784 (2017)], we provide an alternative proof of Wallman's [Quantum 2, 47 (2018)] and Proctor's [Phys. Rev. Lett. 119, 130502 (2017)] bounds on the effect of gate-dependent noise on randomized benchmarking. We show explicitly that as long as our faulty gate-set is close to the targeted representation of the Clifford group, an RB sequence is described by the exponential decay of a process that has exactly two eigenvalues close to one and the rest close to zero. This framework also allows us to construct a gauge in which the average gate-set error is a depolarizing channel parameterized by the RB decay rates, as well as a gauge which maximizes the fidelity with respect to the ideal gate-set.


Introduction
Randomized benchmarking (RB) [11,12,20,22,23] is a workhorse of the quantum characterization community. Used to bound errors in a variety of physical implementations of quantum processors [3,4,8,9,16,17,27,30,31,36], RB has been expanded broadly from its original assumptions of errorless control, and depolarizing, gate-independent noise in an effort to quantify a wide variety of more-realistic error models [5,10,14,15,18,25,[33][34][35]. Making rigorous the analyses of these more-realistic models is still an active area of research [23,24,28,32]. Of particular interest in this manuscript are RB sequences with gate-dependent errors, that is, each individual physical gate in the RB gate set is associated with its own independent error process.
In the initial attempt to bound the effect of gate-dependent errors, Magesan et al. use a linearization technique to treat gate-dependent errors as a perturbation with respect to a uniform error channel [23,24]. This approach defines gate error relative to a fixed representation of the operations being benchmarked, which is problematic because RB decay rates are invariant under transformations of this representation, resulting in very loose bounds on RB decay with respect to the gate error. Chasseur and Wilhelm [7] analyzed non-perturbative gate-dependent error in the context of a modified RB protocol accounting for leakage errors. Roughly parallel work from Wallman [32] and Proctor et al. [28] give explicit examples where perturbation terms are non-negligible and where the Magesan bounds are too loose to be practically useful; they additionally justify the exponential decay of RB. The revised methods in both manuscripts, though different in detail, involve deriving the average of RB decay sequences from what is essentially the power of a matrix. This guarantees that for generic, gate-dependent noise, the benchmarking decay will always look like the sum of two exponentials, with small corrections, independent of the gate fidelity with respect to the Clifford group in any fixed representation.
Here we develop an alternative proof that emphasizes clarity and intuition over mathematical rigor, showing that RB can be described as a convolution, and therefore some of its properties are more transparent in a Fourier space. We use a Fourier transform from Gowers and Hatami [19], which extends some techniques from previous work by Moore and Russell [26]. This transform maps matrix-valued functions that act on the elements of a general group onto matrix-valued functions of the group's irreducible representations, and it has all the properties of a traditional Fourier transform -an inverse, a convolution identity, and Parseval's theorem -which allow us to formalize and simplify RB more naturally. In addition, this Fourier analysis provides the tools to construct gauge (i.e., similarity) transformations in which either the average gate error channel is a generalized depolarizing channel fully characterized by the RB decay rates or the average gate fidelity is maximized. We believe that in the latter case this is the first such construction.
The outline of this manuscript is as follows: in Section 2, we review the basics of randomized benchmarking and show that an RB sequence can be thought of as a convolution; in Section 3, we review matrix-valued Fourier transforms; in Section 4 we apply this Fourier transformation to the super-operator representation of the Clifford group; in Section 5, we compactly reproduce Wallman's proof of the effects of gate-dependent noise; in Section 6, we show how the eigenvectors of the Fourier transform can be used to construct gauges; finally, in Section 7, we apply this Fourier technique to reproduce examples from Proctor [28] and Wallman [32], as well as an example of our own exploring leakage characterization and the relevance of global phases to the Clifford group.

Randomized benchmarking as convolution
In this section we review the basics of randomized benchmarking and introduce some notation. Quantum information theorists sometimes fail to distinguish between groups and representations, but we will make their distinction explicit. Consider the operation of a quantum processor as a function φ : U (2 n ) → Q(2 n ), mapping elements of the unitary group on n-qubits, U (2 n ), to the space of quantum processes, Q(2 n ). This mapping is consistent with Markovian error processes (otherwise we might parameterize our maps by some side-channel information, i.e., φ = φ(u, α)) and in principle allows for leakage by the projection of a larger map to the computation subspace. Q(2 n ) is the space of completely-positive, trace non-increasing maps, whose elements can be expressed as R 4 n ×4 n matrices using the standard super-operator description of a quantum processes in the computational basis (e.g., Liouville, natural, and Pauli transfer matrix representations). In this way we can think of the operation of our quantum processor as a matrix-valued function of a group.
In any practical quantum computing application we restrict ourselves to a finite number of fundamental quantum operations, and likewise it can be useful to try to benchmark our quantum processor by its behavior with respect to a finite group. In this manuscript we will assume we are benchmarking with respect to the Clifford group, C, though the presented techniques are more general. Randomized benchmarking consists of the following: Ignoring preparation and measurement, we note that the expectation over quantum processes is itself a matrix-valued function of a group element C, though in standard randomized benchmarking we only evaluate C = e (the group identity element). There is, however, a natural re-indexing of this expression, that now looks like a nested series of convolutions. In the next section, we will describe a Fourier technique that transforms matrix-valued functions of a group to matrix-valued functions of that group's irreducible representations, σ. In this Fourier space convolutions are mapped to products, and thereforẽ where tilde denotes the Fourier transform. In the limit of Markovian noise, the exponential decay of an RB sequence (i.e., the observation from Proctor and Wallman that RB is described by a matrix power) is a direct consequence of it being a convolution. The exact form of decay depends completely on the spectrum ofφ, the Fourier transform of our faulty gate set, which we will discuss in some detail in Sec. 5.

Fourier transforms for matrix-valued functions on finite groups
Here we will briefly review Section 3 of Gowers and Hatami [19] -which itself is in part a review and a consolidation of notation -covering Fourier transforms on matrix-valued functions of finite groups.
Gowers and Hatami show that this somewhat strange object has analogs of all the properties we would like a Fourier transform to have, namely: 1. (Parseval's identity 1) 2. (Parseval's identity 2) 4. (Inverse Fourier transform) where Σ σ denotes sums over all inequivalent irreducible representations of the group G, and Tr σ is the partial trace over the second subsystem. We include item 5 for completeness although it's not necessary for this proof; without formally defining the U 2 or box norms, we just mention that they involve the sum of singular values to the fourth power. The only norms we require in this manuscript are the Hilbert-Schmidt norm · 2 HS , the sum of squares of the singular values, and the operator norm · op , the maximum of the singular values.
The main result of Gowers and Hatami manuscript is a stability theorem. Broadly speaking, it states that if a function mapping a group to matrices is approximately a homomorphism, φ(g 1 g 2 ) − φ(g 1 )φ(g 2 ) HS < for every g 1 , g 2 ∈ G, then φ must be close to a (not-necessarily irreducible) representation of the group ρ, φ(g)−U † ρ(g)U HS < δ for every g ∈ G. Interestingly, φ and ρ may not have the same dimension, and thus U is not necessarily square. Intuitively, we might expect an RB experiment to estimate the first expression in the stability theorem, that is, the ease with which we can invert large sequences of gates determines how well we approximate a homomorphism. The second expression is essentially an average gate fidelity with some choice of gauge given by U . The stability theorem allows us to relate these two metrics, either for finite groups such as the Clifford group or more generally for compact groups such as the special unitary group. One minor caveat is that the stability theorem only applies if φ(g) op ≤ 1, which is not always the case for quantum processes (e.g., the amplitude damping channel), but many of the proof techniques are applicable in the following analysis.

Fourier transform of the ideal Clifford group
Before characterizing the Fourier transform of a faulty implementation of the Clifford group, we should understand what to expect in the ideal case. Let's start with some useful properties of this Fourier transform when it is applied to representations themselves. First off, the Fourier transform of a representation of a group is a projector. To show this, assume φ is a representation of G, theñ It is worth noting that the converse is not true; all projectors in Fourier space do not invert to group representations. But what if φ is an irreducible representation? In that case, the Fourier transformφ(σ) is a rank-1 projector |ψ ψ| if φ and σ are equivalent representations, and it is zero otherwise. Here equivalency is defined up to a similarity transform, i.e., φ and σ are equivalent iff φ = SσS −1 for some S. We can determine the rank of the projector through the trace and the orthogonality of characters as follows:

orthonormality of characters under group expectation)
Furthermore, we observe that the partial trace ofφ(σ) is a maximally mixed state: implying |ψ ψ| must have full Schmidt rank. In other words, this projector |ψ ψ| -the non-vanishing component of the Fourier transform of an irreducible representation -is one very familiar to quantum information theorists, namely, it is locally equivalent to the maximally entangled bi-partite pure state |Φ = 1 √ d φ j |j |j , but with respect to a more generic local similarity transformation as opposed to a local unitary transformation.
In the super-operator representation, the Clifford group on n-qubits is a direct sum of two irreducible representations: the identity irrep, σ I , (i.e., the identity Pauli operator is preserved by unitary operations) and a 4 n − 1 dimensional irrep, σ P , (i.e., there exists some Clifford that maps every Pauli string to any other Pauli string excluding the identity). In the ideal case our only non-zero Fourier components are both rank-1 projectors given bỹ where |ψ I is a length 4 n vector of the form 1 ⊕ 0 4 n −1 (a one followed by 4 n − 1 zeros)and |ψ P is a length 4 n (4 n − 1) vector given by 0 4 n −1 ⊕ |Φ (4 n − 1 zeros prepended to a maximally entangled state on a (4 n − 1) × (4 n − 1) dimension Hilbert space). We have included all the irreducible representations of the single qubit Clifford group and its character table in appendix A.

Analyzing RB with gate dependent errors
We can now analyze randomized benchmarking with gate dependent errors. First, it will be useful to divide both sides of the Parseval identities (Eqs. 5 and 6) by the dimension of the map d φ (note that d φ = 4 n for an n qubit system). Rescaling the Hilbert-Schmidt norm (or trace inner product) this way defines the fidelity of entanglement, F e , which is bounded above by 1 for a quantum process. Therefore, Assuming that our experimental colleagues aren't just banging rocks together, φ is a decent approximation of the Clifford group, φ ideal , in the computational basis. If we assume an average fidelity of 1 − δ we obtain, It is useful to denote the diagonal matrix elements t ≡ ψ I |φ(σ I )ψ I and p ≡ ψ P |φ(σ P )|ψ P . As a consequence of complete positivity (p ≤ t) and the trace non-increasing property of quantum maps (t ≤ 1) (see Appendix B), we can bound i.e., p and t are both fairly close to 1.
The largest singular values of the Fourier matrices,φ(σ I ) andφ(σ P ), are lower bounded by t and p respectively. We can upper bound the size of the next largest singular value, q, in any of the Fourier matrices by assuming q is the only other non-vanishing singular value. Using Eq. 5 we have, where the maximum q for t and p consistent with Eq. 13 is given by While the exponential scaling in n is scary and reminiscent of diamond-norm bounds on average fidelity, bounds on q are actually quite reasonable for small n; e.g., δ 13.3% and δ 3.1% are enough to ensure that q 1 for one-and two-qubit systems, respectively. Tighter bounds probably exist if we restrict to Fourier transformations generated by completely positive process matrices.
At this point our proof is essentially finished, with all the heavy lifting done by Parseval's identity. In the small error limit, our Fourier transform has a good unit-rank approximation in both the σ I and σ P representations. This implies that there can at most be one eigenvalue that is not (nearly) zero in each of these irreps, and we will call these eigenvaluest andp. It would be convenient ift andp were bounded by the diagonal matrix elements t and p relative to some fixed choice of gauge for φ ideal , but we will show that this is not generally true in Sec. 7. As we look at longer RB sequences, or raise the Fourier transforms to higher powers, our spectrum will be dominated by these two eigenvalues to O(δ m/2 ). Since both the inverse Fourier transform and the final expectation value are linear operations we find that which is what we set out to show: that randomized benchmarking generically follows an exponential decay parameterized by at most two rates.

Gauges and Eigenvectors
We have completed our proof using the spectrum of the Fourier transform, but before moving onto examples we should briefly discuss the related eigenvectors of the Fourier transform and how we can use them to construct gauge transformations. Following Gowers and Hatami [19], we can vectorize the Fourier transform to rewrite the Fourier eigen-equation as a matrix equation: where V is a d φ × d σ matrix that contains the d φ d σ elements of the eigenvector v. We can choose v = d σ as the normalization for v for reasons that will soon become apparent. By joining the two dominant eigen-equations from the previous section we can rewrite Eq. 18 as where we define the two d φ × d φ matrices Dp ,t and S dep as The expression (Vt|Vp) denotes a matrix where the column vector Vt has been prepended to the columns of Vp. Our choice of the eigenvector normalization ensures that in the small-error limit S dep is close to the identity, i.e., full-rank and invertible, and therefore The eigenvectors corresponding to the eigenvaluest andp provide a unique similarity, or gauge, transformation in which the average of the individual gate error channels is a generalized depolarizing channel (i.e., a depolarizing map composed with a channel that uniformly decreases the trace) with parameterst andp. We define, φ dep (g) ≡ S −1 dep φ(g)S dep , the gate-set in the depolarizing gauge, and in this gauge the average fidelity of entanglement is given by It is tempting to suggest that the gauge S dep is optimal -meaning that it maximizes the gate fidelity -but this is not generally true. Consider Eq. 13 with a general gauge transformation S: Using the cyclic property of the trace we can instead apply the transformation T −1 to the ideal gateset, which can't change φ ideal 's irrep decomposition, and define |ψ I (S) and |ψ P (S) as the non-trivial eigenvectors of (Sφ ideal S −1 ). Fourier transform matrices may not be diagonalizable, and therefore the quadratic forms in Eq. 23 are not generally bounded by the maximum eigenvaluest andp. We can, however, construct the optimal gauge transformation, S opt , leading to process matrices φ opt (g), by the observation that quadratic forms are invariant under symmetrization, and so instead of constructing a similarity transformation from the eigenvectors ofφ(σ) we could instead use the eigenvectors of (φ(σ) +φ(σ) T )/2, which is always diagonalizable. This similarity transformation will maximize the average gate fidelity, but since the average error channel is not necessarily a generalized depolarizing channel, this reduced error rate is not easily extracted from repeated applications of the gate-set [6,28,29,32]. We expect that in the small-error limit most gate-set Fourier transforms are nearly diagonalizable, that is they are nearly rank-1 with a large diagonal matrix element, and therefore S dep ≈ S opt to O( √ δ). Additionally, for either the depolarizing or optimal gauge transformations the transformed gate sets may no longer be completely positive.

Examples
In this section we will look at three examples of cases where the standard analysis of RB becomes complicated. The first two examples are taken from the literature, both showing how fairly simple error models can lead to RB decays that are not commensurate with the average gate fidelity. The third example describes an ideal gate-set acting on a system with a leakage level. We treat these examples numerically, including a Mathematica notebook detailing these calculations, as well details on Clifford irreps and Fourier transforms, in the supplementary material [1].

Example 1 from Proctor
In Example 1 of Proctor [28], the Clifford group is generated by composite pulse sequences of faulty X π/2 and Y π/2 gates. The error in this case is a small z-rotation appended to each generator, i.e. X π/2 = exp(−iθ σz 2 ) exp(−i π 2 σx 2 ) and Y π/2 = exp(−iθ σz 2 ) exp(−i π 2 σy 2 ). Physically, this is a coherent memory error caused by something like a detuning or mis-timing. There is a gate dependence in this error model because Clifford gates are not all composed from a uniform number of composite pulses. We note that we are not sure our decomposition of the Clifford gates into X π/2 and Y π/2 rotations is exactly the same as the decomposition in Proctor (see Appendix A) but any differences seem to have a very small effect on the numerical outcome.
We consider the case where θ = 0.1 (as in Proctor) where we find that E g F e (φ(g), φ ideal (g)) = 1 − 3.70 × 10 −3 . The largest eigenvalues of our Fourier decomposition, and thus the RB decay rates arē t = 1 andp = 1−2.94×10 −5 , and yield to an RB estimate of E g F e (φ dep (g), φ ideal (g)) = 1−2.20×10 −5 . This two order-of-magnitude discrepancy between RB estimate and average fidelity is in agreement with the previous simulations. The next largest eigenvalue of the Fourier transform,φ, is 1.88 × 10 −3 and so we can confidently model the RB decay as a single exponential.
From this analysis we obtain the depolarizing similarity transformation with E g F e (φ opt (g), φ ideal (g)) = 1 − 1.62 × 10 −5 , only a modest improvement over the RB estimate in this case.

Example from Wallman
While Proctor showed an example where the average overlap with the Ideal Clifford in the computational basis overestimates the decays in RB, Wallman showed an example where the opposite can be true [32]. Wallman's error map is that every gate is affected by a uniform depolarizing channel (a map that preserves the identity and shrinks every other Pauli element by ν), and half of the Cliffords experience an additional z-error (again parameterized by θ, but now applied to the Clifford and not the generators). By varying which half of the Clifford's we apply the z-error to, we obtain a family of error channels, all of which have the same average gate error in the computational basis.
In accordance with Wallman's example, we choose ν = 0.99 and θ = 0.09 and sample 10,000 instances of the error channel out of the 24 12 possible ways to apply z-errors to half of the Cliffords. As expected there is no variance in the average gate error in the computational gauge, which is given by E g F e (φ(g), φ ideal (g)) = 1 − 8.50 × 10 −3 . The error rate derived from RB and the depolarizing frame is very similar in the average case, E g F e (φ dep (g), φ ideal (g)) = 1 − (8.50 ± 0.12) × 10 −3 , but can either over-or under-estimate the error in the computational basis. In the 10,000 trials the maximum over-and under-estimation from the computational gauge errors were less than 5% of the total error. We also calculate average gate error in the optimal frame and found that E g F e (φ opt (g), φ ideal (g)) = 1 − (8.24 ± 0.06) × 10 −3 . The distribution of average errors in the optimal frame is somewhat tighter, but the error can still vary a significant amount. In all cases the optimal gauge provides a lower error rate than either the computational of depolarizing gauges, as expected.

Leakage characterization
The final example in this manuscript doesn't explore an error process per se, but instead we examine the embedding of a qubit into a qutrit, a standard technique in characterizing leakage errors in superconducting [7,13] and semiconducting qubit implementations [21]. In the ideal case we implement this embedding as a mapping from the 24 single-qubit Clifford unitary matrices to qutrit matrices that act like the identity on the leakage space, that is C j → C j ⊕ 1. The corresponding process matrices are now 9 × 9, and we can use the Gell-Mann matrices as a basis for expansion as opposed to the Pauli matrices.
Even in the case of perfect gates a peculiar thing happens: there are now many non-zero eigenvalues in the gate-set Fourier transform. This is because the mapping described in the previous paragraph is not a representation of a group, and therefore its Fourier transform will not be a projector. The special unitary group is a double cover, e.g., X π X π = −I, and in the embedding we have chosen this global phase becomes a relative phase between the logical and leaked spaces and cannot be ignored.
In the qutrit embedding, the group generated by X π/2 and Y π/2 is the 48-element group CSU(2, 3) as opposed to the 24-element Clifford group S 4 . CSU(2, 3) shares all five of S 4 's irreducible representations but has three additional irreps that are not present in the smaller group. One such unshared irrep we call σ u , and is generated by the unitary representation of the Clifford gates: X π/2 = e −i(π/2)(σx/2) and Y π/2 = e −i(π/2)(σy/2) . One might think that this would necessitate the use of CSU (2,3) in all cases, qubit or qutrit, but note, we never used the bare unitary representation of the group in the preceding analysis, only the process matrices. Constructing a process matrix from the unitary representation involves a tensor product of the unitary representation with itself, in the qubit case, σ u ⊗ σ u = σ I ⊕ σ P . σ I and σ P are both irreps that are shared with S 4 , and thus for an unembedded qubit we are able to substitute S 4 for the larger group since this representation has no dependence on the additional phase from CSU (2, 3).
When we try to embed into a qutrit, our unitary representation is now σ u ⊕ σ I and after converting to a process matrix we have (σ u ⊕ σ I )⊗(σ u ⊕ σ I ) = σ I ⊕σ P ⊕σ u ⊕σ u ⊕σ I . This representation now has σ u 's in the direct sum, and therefore what was a global phase can no longer be ignored. Additionally, our process matrix is now the direct sum of five irreps, and therefore the Fourier transform will have five unit eigenvalues, instead of only two. In a practical setting it's not clear that we really need to twirl over this larger group if the initial state and measurement of the RB process have no weight in the leakage subspace, but we have found it can greatly ease theoretical analysis.

Concluding remarks
In this manuscript we have shown that randomized benchmarking is a convolution and therefore is more natural to explore with Fourier analysis. In Fourier space, we directly see that RB with Markovian noise is described by powers of a fixed matrix, regardless of any gate-dependent noise. When our processes are a good approximation of the Clifford group in the computational basis, this matrix has exactly two eigenvalues close to one while the rest are small, implying that the RB survival probability is always well described as a sum of two exponentials. Additionally, this formalism allows us to construct gauge transformations that either a) map the average error operator to a general depolarizing channel parametrized by the RB decay rates or b) maximize the average gate fidelity with respect to the ideal Clifford gates in the computational basis. We have applied this formalism to examples previously explored in the literature.
We have answered the question of "what randomized benchmarking actually measures" as the error rate in a specific gauge -that in which the average error channel commutes with every group element, i.e., it is a generalized depolarizing channel -and not in the gauge in which the error rate obtains a minimum. It's not clear which of these quantities will be more important to the design and validation of fault-tolerant quantum processors where errors can be made approximately depolarizing through twirling in the error correction process, though for small errors we conjecture these two gauges are nearly equivalent because the Fourier transforms are always nearly invertible.
In conclusion, matrix-valued Fourier transforms can greatly simplify the analysis of RB. Even for simulation, it is more straightforward to numerically analyze the spectral properties of a handful of matrices than to approximate nested averages with Monte Carlo integration, though taking the Fourier transform for a group as large as the 2-qubit Clifford group is quite cumbersome. We suspect that going forward, the techniques presented here will greatly ease explorations of non-Markovian and context-dependent noise's effect on randomized benchmarking.

A Clifford group representations
In this appendix we review the representations of the single qubit Clifford group (both with and without a global phase). In both cases the Clifford group has two generators corresponding to π/2 rotations which we will abbreviate as x and y for this appendix.

A.1 The single Clifford group, no phase
The single qubit Clifford group modulo a global phase is better known as the group S 4 , the symmetric group on four elements (group [24,12] in the GAP numbering system [2]. We can divide this group into its conjugacy classes according to c 0 = e, c 1 = x 2 , y 2 , y 3 x 2 y c 2 = x, y, x 3 , y 3 , y 3 xy, y 3 x 3 y c 3 = x 2 y, yx 2 , xy 2 , y 2 x, yxy, y 3 xy 3 c 4 = xy, yx, x 3 y 3 , y 3 x 3 , xy 3 , y 3 x, x 3 y, yx 3 (26) which yields the character table A choice for the generators of these irreps is given by A. 2 The single Clifford group, global phase When we restore the global phase to the single qubit Clifford group we get the order 48 group CSU(2,3), 2 × 2 conformal special unitary matrices acting on the finite field of three elements, or group [48, 28] according to GAP. The conjugacy classes are now given by c 0 = e c 1 = x 4 c 2 = x 2 , y 2 , y 3 x 2 y, x 6 , y 6 , y 7 x 2 y c 3 = x, y, y 3 x 3 y, x 7 , y 7 , y 7 xy c 4 = x 3 , y 3 , y 3 xy, x 5 , y 5 , y 7 x 3 y c 5 = x 3 y, yx 3 , xy 3 , y 3 x, x 5 y, y 7 x 3 , y 5 x, x 7 y 3 c 6 = xy, yx, x 3 y 3 , y 3 x 3 , x 7 y, y 7 x, x 5 y 3 , y 5 x 3 c 7 = x 2 y, yx 2 , yxy, y 3 xy 3 , xy 2 , y 2 x, x 6 y, y 5 x 2 , x 5 y 2 , y 6 x, y 5 xy, y 7 xy 3 which yields a character table, We can generate with irreps of CSU(2,3) with exactly the same generators as S 4 as well as the three additional irrep generators given below (which now contain the more familiar definitions of X π/2 and Y π/2 in the computational basis).
To show how p ≤ t ≤ 1 is implied by φ completely positivity and trace non-increasing, it helps to be more explicit in our construction of process matrices. We define a process by its action on products of Pauli matrices, P j , a complete basis for Hermitian operators. By vectorizing or column-stacking the Pauli product matrices, |P j , can write our quantum processes as real, 4 n × 4 n matrices. We know that φ ideal is composed of Clifford operators, which are unitary operations that map Pauli strings to other Pauli strings. This implies that the process matrices for Φ ideal will have exactly one non-zero entry of ±1 in each row and column. Furthermore φ ideal has the block structure σ I ⊕ σ P , with σ I spanned by |I and σ P by the remaining 4 n − 1 basis elements |P j = I . Let's define two orthogonal projectors Π I = |I I| and Π P = I − Π I . One can show that and p = ψ P |φ(σ P )|ψ P = E g Tr φ(g)Π P φ † ideal (g)Π P /(4 n − 1).
Furthermore, since φ ideal is a unitary map, Π I φ ideal (g)Π I = Π I , and we can simplify the expression for t to t = E g I|φ(g)|I .
If t > 1, then there must exist a g such that I|φ(g)|I > 1 which would violate our assumption that φ is trace non-increasing.
Showing that p ≤ t involves a similar but more involved argument. p is an equally weighted average of terms of the form ± P j |φ(g)|P k and so it must be the case that for some h ∈ G and for some P n and P m not equal to the identity there must exist P n |φ(h)|P m ≥ p. We can now apply the map φ(h) to one of the two positive semidefinite operators |I ± |P m which yields where c j ≡ P j |φ(h)|I ± P j |φ(h)|P m , and we have used the observation that a trace non-increasing map must have I|φ(h)|P m = 0 (otherwise the trace of one of ±|P m would increase under the action of φ(h)). We introduced the sign ambiguity earlier so that we can ensure that c n = P n |φ(h)|I ± P n |φ(h)|P m has magnitude |c n | ≥ p.
To complete the argument we need to show that this ρ necessarily has a negative eigenvalue which, since ρ is Hermitian, can be shown by providing a |ψ such that ψ|ρ|ψ < 0. Let's construct a set of 2 n −1 commuting Pauli strings that contains P n by considering products of P n 's constituent single qubit Pauli operators (replacing any identities with Z operations). As an example, if P n = ZIX, we would say P ( α) = Z α0 Z α1 X α2 , where we've indexed these 2 n Pauli strings by a binary vector, α ∈ {0, 1} ⊗n . There is a natural tensor-product basis for this set of operators, the eigen-basis of the constituent single qubit Pauli operators, which we index by another binary vector β, e.g., |ψ( β) = |(β 0 ) Z |(β 1 ) Z |(β 2 ) X . We can now write the expectation value, where we have utilized that any Pauli string outside of our commuting set has an expectation value of zero with respect to any |ψ( β) . The c α 's may all have arbitrary signs, but it should be clear that we can choose a β such that all terms in the sum are negative. That β will lead to a minimum, and so, if t < p we are guaranteed a process matrix that does not map positive operators to other positive operators.