Randomized benchmarking with gate-dependent noise

We analyze randomized benchmarking for arbitrary gate-dependent noise and prove that the exact impact of gate-dependent noise can be described by a single perturbation term that decays exponentially with the sequence length. That is, the exact behavior of randomized benchmarking under general gate-dependent noise converges exponentially to a true exponential decay of exactly the same form as that predicted by previous analysis for gate-independent noise. Moreover, we show that the operational meaning of the decay parameter for gate-dependent noise is essentially unchanged, that is, we show that it quantifies the average fidelity of the noise between ideal gates. We numerically demonstrate that our analysis is valid for strongly gate-dependent noise models. We also show why alternative analyses do not provide a rigorous justification for the empirical success of randomized benchmarking with gate-dependent noise.


I. INTRODUCTION
The development of practical, large-scale devices that process quantum information relies upon techniques for efficiently characterizing the quality of control operations on the quantum level. Fully characterizing quantum processes [1,2] requires resources that scale exponentially with the number of qubits despite improvements such as compressed sensing and direct fidelity estimation [3][4][5][6][7]. Quantum process tomography is also limited by state preparation and measurement (SPAM) errors [8,9], although this can be addressed at the cost of increased resources using gate-set tomography [10,11].
An alternative to full characterization is to accept partial characterizations based on figures of merit that can be estimated efficiently. The canonical such figure is the fidelity of an experimental implementationG of an ideal unitary channel G(ρ) = GρG † averaged over input states distributed according to the Haar measure dψ, At present, the only efficient partial characterizations are randomized benchmarking (RB) [12][13][14][15][16] and variants thereof [17][18][19][20][21][22][23]. The RB protocol of Ref. [16] provides an estimate of the average fidelity over a group of operations from how the fidelity of sequences of operations decays as more operations are applied. RB protocols scale efficiently with the number of qubits and are robust to state-preparation and measurement (SPAM) errors. Consequently, RB protocols have become an important baseline for the validation and verification of quantum operations [24,25] and have recently been used to efficiently optimize over experimental control procedures [26].
In this paper, we first show that gate-dependent fluctuations have to be surprisingly negligible in order for the bounds derived in Ref. [27] to be small enough to justify fitting an RB experiment to the zeroth-or first-order decay models derived therein. Specifically, we show that the norm of any gate-dependent fluctuations from the average noise has to be significantly smaller than 1−f (E) where f (E) = f (E, I) and E is the average noise, in order to justify ignoring the higher order terms in Ref. [27]. For example, consider an implementation of exp(iθZ) with only stochastic noise E with 1 − f (E) = r and calibration errors in the value of θ. In order to fit to a zeroth order model based on the rigorous analysis of Ref. [27], θ has to be calibrated to within 0.01r radians. This shows that the looseness of the bounds for the particular example of Ref. [28] is in fact generic.
We then prove that, under the assumption of Markovian noise, the full effect of gate-dependent fluctuations can be described by a single perturbation term that decays exponentially with the sequence length m. We also prove that the RB decay parameter is the fidelity of a suitably defined average noise process E. Informally, our main result (theorem 4) is that for gate-dependent, but trace-perserving and Markovian noise, the average survival probability over all RB sequences of length m is for some constants A and B, where δ 1 , δ 2 quantify the gate-dependence of the noise and δ 2 is small for good implementations of gates by theorem 3. We prove all theorems without restricting to trace-preserving noise, as we anticipate that near-term implementations of RB on encoded qubits [29] will involve significant loss from the encoded space due to imperfect error-correction procedures. This paper is structured as follows. We introduce notation in section II and review the RB protocol of Ref. [16] in section III. In section IV we show that in order to justify fitting RB experiments to a single curve based upon the rigorous analysis of Ref. [27], any gate-dependent fluctuations have to be extremely small. We then discuss an ambiguity in the decomposition of noise in section V and show how it can be exploited to cancel out the majority of gate-dependent fluctuations in section VI. We illustrate the accuracy of our new analysis numerically in section VII. We conclude by discussing some implications of our results for implementing and interpreting RB and the relation between the present work and the independent analysis of Ref. [28].

II. NOTATION
We use the following notation throughout this paper.
• All operators except density operators are denoted by Roman font (e.g., A). The normalized identity matrix isÎ d = I d / √ d.
• Ideal channels are denoted by calligraphic font (e.g., C), where the ideal channel U corresponding to a unitary operator U is U(ρ) = U ρU † .
• The unital component of a trace-preserving map is the linear map defined by G u (A) = G(A − Tr(A)/d).
• A noisy implementation of an ideal channel is denoted with an overset˜(e.g.,C denotes the noisy implementation of C).
• Noncommutative products of subscripted objects are denoted by the shorthand where the set X is implicit from the context.
We use the following matrix representation of quantum channels (variously known as the Liouville, natural and Pauli transfer matrix representation) wherever a concrete representation is required. Let {e 1 , . . . , e n } be the canonical orthonormal unit basis of C n and {B 1 , . . . , B d 2 } be a traceorthonormal basis of C d×d , that is, Tr(B † j B k ) = δ j,k . Defining the linear map | * : and the adjoint map A| = |A † , we have A|B = Tr(A † B). A channel C maps ρ to C(ρ), which can be represented by the matrix where we abuse notation slightly by using the same notation for the abstract channel and its matrix representation. For the remainder of the paper, we assume the operator basis is Hermitian and that B 1 =Î d , so that the matrix representation of all Hermiticity-preserving maps, including all completely positive and trace-preserving (CPTP) maps, are real-valued.
III. RANDOMIZED BENCHMARKING PROTOCOL Algorithm 1: Randomized Benchmarking 1Choose a positive integer m; 2 Choose a sequence G = (G 1 , . . . , G m ) ∈ G m of m elements of G uniformly at random and set G m+1 = (G m:1 ) † ; 3 Estimate the expectation value (also known as the survival probability) Q G = Tr[QG m+1:1 (ρ)] of an observable Q after preparing a d-level system in a state ρ and applying the sequence of m + 1 operations G 1 , G 2 , . . . , G m+1 ; 4 Repeat steps 2-3 to estimate the average over random sequences E G Q G to a desired precision; 5 Repeat steps 1-4 for different values of m and fit to the decay model to estimate p [16].
The RB protocol in algorithm 1 is a powerful tool for efficiently characterizing the quality of the experimental implementations of a group of operations G that is also a unitary two-design [15]. Despite the ubiquitous usage of RB, best practice choices for the sequence lengths and the number of sequences at each length are still unknown, though see Refs. [30][31][32] for some guidance. A unitary two-design is any set of unitary channels G such that uniformly sampling G reproduces the first and second moments of the Haar measure. The canonical example of a unitary two-design is the n-qubit Clifford group.
The unitary two-design condition can also be stated in terms of how a general channel is 'twirled' by the group, where the twirl of a channel C over a group G is E G (G † CG). A unitary two-design is any group G such that any CPTP channel is 'twirled' into a depolarizing channel [15,33] Unitary two-designs also twirl a general linear map C into a depolarizing map composed with a channel that uniformly decreases the trace, that is, into a channel of the form where the values of p and t can be computed from the following [30].
Lemma 1. For any unitary two-design G and channel C, where In theorem 4, we prove that eq. (7) holds, up to a small and exponentially decaying perturbation, for gate-dependent noise that is time-independent and Markovian, generalizing the original analysis to more realistic noise.

IV. PRIOR ANALYSIS OF RB
Ref. [28] demonstrated that, for some noise models, the bounds on the gate-dependent perturbations derived in Ref. [27] are so large that they cannot be neglected. We now show that the looseness of the bounds is generic, and more severe than identified by Ref. [28]. More specifically, we show that even the statistical uncertainties on noise that appears gate-independent from full characterization procedures such as gate-set tomography [11] will generally be consistent with noise for which the higher order terms in the analysis of Ref. [27] dominate the exponential decay.
The zeroth-order RB decay curve in eq. (7) was derived in [27] under the assumption that the noise is gate-and time-independent and so can be written asG = GE for all G ∈ G for some fixed E. Deviations from an exponential decay curve may be due to one of three violations of the assumption, namely, 1. non-Markovian noise processes; 2. time-dependent noise; or 3. gate-dependent noise.
Numerical simulations of RB experiments have shown that the standard decay curve in eq. (7) holds (approximately) for a variety of physically relevant noise models with gate-dependent and non-Markovian noise, although the 'true' and estimated fidelity often differ by a factor of approximately two [31,34].
To allow for gate-dependent noise, Ref. [27] defined the average noise to be and expandedG = GE + G∆ G , where E G (∆ G ) = 0 by construction. Ref [27] then proved that for any fixed positive integers k and m, the kth order terms in ∆ G were upper-bounded by is the induced trace norm of a linear map L. Ref. [27] also obtained similar conditions for timedependent noise and derived the first-order correction.
As the bound on higher order terms is derived using the triangle inequality and the submultiplicativity of the induced trace norm, it should be expected to be loose. However, this bound gives an in-principle proof of the robustness of RB to some level of gate-dependent noise.
To see that the bound on the gate-dependent terms is prohibitively loose, note that the sequence length m is varied and the contributions from any order diverge for sufficiently large m. More precisely, the total contributions from all orders k ≥ 1 and k ≥ 2 are at most respectively. Therefore m ≈ 0.01/γ and m ≈ 0.1/γ are the largest sequence lengths for which the zeroth-and first-order analysis of Ref. [27] rigorously hold to within 0.01. The exponential decay curve can be written as where r(E) = 1 − f (E). In order to fit the exponential decay, sequences of length m ≈ 1/r(E) are required [32]. Therefore in order for the higher-order terms in the analysis of Ref. [27] to be negligible, we require γ ≤ 0.01r(E) (for the zeroth-order model) or γ ≤ 0.1r(E) (for the first-order model) respectively. Moreover, constructing higher-order models or a lower rigorous accuracy will change the prefactor, but not the scaling with r(E). Certifying that the norms of the gate-dependent perturbations satisfy such a bound is likely to require fully reconstructing the process matrices for each noise channel with statistical errors in every entry smaller than 0.01r(E) (i.e., ≈ 10 −5 for current experiments).
To illustrate the potential contribution from gate-dependent noise and the need for tighter analysis, consider the following hypothetical implementation of the single-qubit Clifford group (that is, the 24 elements of SU(2) that permute the single-qubit Paulis X, Y and Z up to an overall sign). Supoose that half the elements are implemented with only depolarizing noise, that is,G = GD p , and the other half are implemented with an additional rotation around the z-axis of the Bloch sphere by some angle θ ∈ (0, π 2 ), that is, For this noise model, the average noise is E = D p (I 4 + Z θ )/2 by eq. (12) and so [35]. The maximization in the induced trace norm will be achieved by any state in the xy plane (e.g., (|0 + |1 )/ √ 2), giving For the zeroth-order model to be valid, we require γ ≤ 0.01r(E), that is, |θ| 0.01(1 − p). For example, for a depolarizing rate of p = 0.99, a gate-dependent coherent error that decreases the fidelity by more than 10 −7 makes the zeroth-order analysis of Ref. [27] inapplicable. This dramatic difference arises from the disconnect between norms and the fidelity [36].

V. REPRESENTING GATE-DEPENDENT NOISE
We now discuss an ambiguity in how experimental noise can be abstractly represented, and how specific conventions can be adopted to greatly simplify the analysis of RB. For gate-dependent noise, Ref. [27] wroteG = GE G and perturbatively expandedG around GE where E is an average noise process as defined in eq. (12) (more precisely, Ref. [27] also allowed for time-dependent effects).
The analysis of Ref. [27] also holds for the more general gate-dependent noise model LGR G , where L and R G are gate-independent and gate-dependent noise processes acting from the left and right (that is, after and before a ideal gate) respectively. The analysis in section IV holds for any such decomposition and so while some optimization over L and R G can be performed, it is not obvious that such optimization will allow the bounds derived in Ref. [27] to be useful. However, in hindsight the bounds of Ref. [27] were loose precisely because they bounded each perturbation term separately, which heuristically appeared to be sufficient.
We now identify a specific decomposition that we will prove causes the majority of the perturbation terms to cancel. We will then prove thatG ≈ LGR for these choices of L and R in theorem 3.
Remark. When the noise is independent of the gate, that is,G = LGR for all G ∈ G for some L and R, then L and R are solutions to eq. (20), as can be seen by substitutingG = LGR into eq. (20) and using lemma 1.
For gate-dependent noise such that R is invertible, we can setG = L G GR, so that the noise between two ideal gates G and H is RL G . SubstitutingG = L G GR into eq. (20b) and multiplying by R −1 gives As the Haar measure is unitarily invariant and p is a convex function, gives the fidelity of the average noise between ideal gates to the identity via eq. (3). If R is not invertible, we can introduce an arbitrarily small perturbation into theG to make R invertible and so eq. (22) will hold to an arbitrary precision. WhenG = GD p,t , E G (G u ⊗G) and E G (G) each have only one nonzero eigenvalue, namely, p and t respectively [19,20]. By the Bauer-Fike theorem [37], all the eigenvalues η of a matrix M + δM satisfy where spec(M ) is the spectrum of M and δM is the operator norm, that is, the largest singular value, of δM . As the operator norm is submultiplicative, obeys the triangle inequality, and A ⊗ B = A B , the largest eigenvalues of E G (G u ⊗G) and E G (G) will differ from p and t by at most E G G − GD p ,t for any scalars p and t . Therefore the largest eigenvalues in absolute value will generally be unique and hence must be real as they are the eigenvalues of real matrices.
Proof. Equation (20) where restricting to G u enforces the orthogonality constraints on L |Î d = 0 and Î d |R = 0. Equations (24a) and (24c) are left-and right-eigenvector problems of a real matrix and so t can be set to any eigenvalue of E G (G) and L and R to the corresponding non-trivial eigenvectors (recall that the set of left-and right-eigenvalues of a real matrix are identical). We choose the largest eigenvalue as it will result in the smallest ∆ G . To turn eqs. (24b) and (24d) into eigenvalue equations, we can use the vectorization map that maps a matrix to a vector by stacking the columns vertically, that is, The vectorization map satisfies the identity Applying eq. (26) to eqs. (24b) and (24d) and using G T = G † (as the matrix basis is Hermitian) gives and so p can be set to any eigenvalue of E G (G u ⊗G) and L and R to the corresponding non-trivial eigenvectors (or equivalently, to any eigenvalue of E G (G ⊗ G u ) as the two are related by a unitary transformation and so have the same spectrum). The maps L and R identified are solutions to eigenvector equations and so are only determined up to a normalization constant. As D q,u commutes with G for all scalars q and v and all G ∈ G, we can set L → LD q,u and R → D r,v R to satisfy eq. (20c).
The development of gate-set tomography has highlighted a gauge freedom in quantum theory, in that gauge transformations of the form applied to states, channels and measurement operators respectively do not change any observable properties for any invertible linear map S [11]. A recent criticism of RB theory is that previous analysis was not gauge invariant [28]. As this transformation does not change any observable properties, the quantities estimated by any characterization protocol should be invariant under gauge transformations. While applying a gauge transformationG → S −1G S changes L and R to L → SL and R → RS −1 respectively, the noise between ideal gates, namely, RL, is gauge invariant and so the present analysis is gauge-invariant. We now prove that the solutions L and R to eq. (20) also ensure that is small whenG ≈ GD p,t . To prove that ∆ G =G − LGR is small where L and R satisfy eq. (20), it is convenient to transform to the gaugeG I such that L = I (i.e., set S = L −1 ). In this gauge, eq. (20c) becomes In particular, ∆ G is on the order of G − GD p,t ∈ O( √ 1 − p) where M is the operator norm of M , that is, the largest singular value of M .
Proof. By the triangle inequality, submultiplicativity, and unitary invariance of the operator norm, for all G ∈ G. Equation (31) holds for any R. Now let R be a solution to eq. (20) forG I and expand R = D p,t + R 1 andG I = GD p,t +G 1 . Substituting these expansions into eq. (20b) and using eq. (29) gives Canceling the common term in eq. (32), multiplying both sides by D 1/p,1/t from the left, taking the operator norm and using the triangle inequality and submultiplicativity of the operator norm gives Rearranging and using the unitary invariance of the operator norm gives

VI. ANALYZING RB WITH ARBITRARILY GATE-DEPENDENT NOISE
Experimental implementations of RB almost always produce relatively clean exponential decays. We now explain this empirical success by proving that the average survival probability over RB sequences of length m is equal to the zeroth-order model of Ref. [27] with a single perturbation term for a suitably defined average noise model. The perturbation term will generally be negligible for noise that is close to the identity by theorem 3 and for m ≥ 3 (for numerical values, see fig. 2). Even under this condition, more significant (and potentially oscillating) perturbations are possible for m ≈ 1, potentially explaining experimentally observed fluctuations as seen in, e.g., [24, Fig. 3b]. The decay constants correspond to the average loss of trace and fidelity of the noise between ideal gates via eq. (22). for some δ 1 and δ 2 that quantify the amount of gate dependence.
Remark. Equation (35) reduces to the standard single-exponential model plus an exponentially decreasing perturbation whenG is a CPTP map for all G as then E G (G ⊗ G) and E G (G ⊗G) are both CPTP maps and so all eigenvalues are in the unit disc and the eigenvalue with eigenvector close to |Î d is 1 [38]. The constants A and B and the perturbation term are with ∆ j = ∆ G j =G j −LG j R and where L and R are solutions to eq. (20). The quantities bounding the perturbation term are where δ 2 is bounded in theorem 3 in the gauge where L = I. Note that ρ 1 , Q 1 ≤ 1 in a standard gauge, however, the gauge transformation required to set L = I may change these values and introduce negative eigenvalues to ρ so that the maximization for the induced norm in δ 1 is not restricted to positive semi-definite inputs.
Proof. The average map over all randomized benchmarking sequences G of length m is Let L and R satisfy and define Then for any integer 1 ≤ j ≤ m + 1 we can write where ∆ 0:1 = I d =G m+1:m+2 by convention. Moreover, as G m+1:1 = I d and any set of m elements of G are statistically independent and uniformly distributed, we can apply eq. (40a) recursively and use the fact that D p,t commutes with unital channels to obtain for j = 1. Similarly, we can use lemma 1 and eq. (40) for j ≥ 2 to obtain Therefore we can apply eqs. (42) and (43) and then recursively apply eqs. (42) and (44) to obtain The average survival probability over all random sequences is then where m = Q|E G (∆ m+1:1 )|ρ . Simulated RB data (green circles) and predicted values from theorem 4 (blue line) and eq. (12) (purple dashed line, without the uncertainties due to the bound on the perturbation term) for the detuning noise model considered by Ref. [28], where the noisy Clifford gates are constructed from √ X and √ Y gates composed with exp(0.05iZ). The uncertainties in the individual simulated data points are standard errors of the mean over 1000 random sequences. As observed by Ref. [28], the predicted decay using the value from the zeroth-order model of Ref. [27] is substantially different to the observed value, although note that the values are consistent to within the bounds on the gate-dependent contributions from Ref. [27].
We now bound the perturbation term. By the triangle inequality and submultiplicativity,

VII. NUMERICS
In figs. 1 and 2 we illustrate the reliability of our analysis by comparing the estimates obtained by fitting the simulated RB experiments described below under two noise models (detuning and random unitaries) to the decay rates predicted by theorem 4 and Ref [27]. In fig. 1, we show that theorem 4 predicts the correct decay curve for the the noise model identified by Ref. [28] as violating the predictions of Ref. [27]. In fig. 2, we show that our analysis correctly predicts the decay rate for many noise models where the generating gates have small and independent random unitary errors, even when the values from the zeroth-order model of Ref. [27] differ from the observed decay rate by orders of magnitude due to the looseness of the bounds on the gate-dependent perturbations.
A simulated RB experiment consisted of the following.  We chose not to include SPAM errors as the purpose of the simulations is to demonstrate that the present analysis is consistent with the simulated values even when the estimate based off the zeroth-order model of Ref. [27] differs by orders of magnitude, as noted by Ref. [28]. The presence of two constants significantly increases the numerical uncertainty of the fitting routine and we do not wish to employ the method for estimating the constants used in Ref. [39] as it has not been analyzed for gate-dependent noise.

VIII. CONCLUSION
We have shown that in order to justify fitting experiments with gate-dependent noise to a single exponential curve based on the analysis of Ref. [27], any gate-dependent fluctuations in the noise have to be extremely negligible, to the point that even statistical uncertainties in tomographic reconstructions will typically be consistent with noise for which the analysis of Ref. [27] does not rigorously justify fitting RB experiments to a single exponential.
We then rigorously proved that the contribution of gate-dependent noise can be bounded by a single perturbation term that decays exponentially in the sequence length. This was demonstrated numerically in figs. 1 and 2, where our estimates were consistent with simulated values even in regimes where the bounds from Ref. [27] are consistent with the true decay and the estimated decay based on the zeroth-order model differing by orders of magnitude as highlighted by Ref. [28].
Moreover, we proved that the decay rate is the fidelity of the average noise between idealized gates. More specifically, we can writeG = LGR G where L is a solution to eq. (20). Then for trace-preserving noise the RB decay rate is p = E G (R G L), that is, the average survival probability between ideal gates.
By rigorously reducing the potential impact of gate-dependent errors, the present work enables randomized benchmarking to more reliably diagnose the presence of non-Markovian noise. More precisely, there are two key assumptions that may not hold well enough for the standard exponential decay model to be valid, namely, Markovianity and time-independence. However, the only effect that time-dependent noise can have is to alter the rate at which the survival probability decays [12,30]. Consequently, any "revivals" in RB experiments, that is, statistically significant increases in the average survival probability as m increases [31], can be attributed directly to the noise being non-Markovian.
An intriguing observation from the present analysis is that gate-dependent fluctuations may produce a significant deviation from the exponential fit for short sequences. This deviation is possible because the first few values of m are the most sensitive to gate-dependent fluctuations, which may be as large as 0.1 for m = 1 and 0.01 for m = 2. Consequently, these sequence lengths should be omitted when fitting to eq. (7), but can also indicate the presence of gate-dependent noise.
Alternative protocols based on randomized benchmarking have been developed that also assume gate-independent noise [17][18][19][20][21][22][23]. We leave applying the current analysis to these protocols to future work. One main open problem of the current work is that the solutions L and R from theorem 2 are not guaranteed to be CPTP maps, even under rescaling Recent work has criticized previous RB analysis for not being invariant under gauge transformations [28]. The present analysis is invariant under such transformations as the decay rate is an eigenvalue problem and applying a gauge transformationG → S −1G S maps L → SL and R → RS −1 in theorem 2 and so p(RL) is gauge-invariant.
To resolve the gauge ambiguity, Ref. [28] suggested optimizing the gauge to obtain a better fidelity and then demonstrated that this approach does not predict the correct decay rates in some explicit cases based on the zeroth order theory of Ref. [27]. This gauge-optimization is not supported by previous analysis because, as discussed in section IV, previous bounds on perturbations are extremely loose for reasonable levels of gate dependence and so there generally is no guarantee that a gauge exists that makes the gate-dependent terms small enough to truly be negligible.
Ref. [28] also provided independently developed methods to more closely approximate the RB decay rate under gate-dependent noise. The first method represents a fundamentally different approach but requires further development to obtain concrete predictions. While the authors refer to it as an "exact" formula, the expression in eq. (35) is also exact and was presented in Ref. [27]. They then specify that the relevant eigenvalues "dictate" the RB decay rate, where there are 2 O(n 2 ) eigenvalues and it is unclear which eigenvalues, and what linear combinations thereof, are relevant to the RB decay curve.
The second method of Ref. [28] is much more useful and independently gives the same decay value as the present analysis. However, the present analysis surpasses that of Ref. [28] in two key ways. First, and most importantly, the present analysis gives a precise interpretation of the decay rate, namely, that it is the fidelity of the noise between ideal gates. In contrast, Ref. [28] states that "it is not clear that r (a function of the RB decay rate) should be called (the) "fidelity", except in the special case of gate-independent error maps." Second, the current analysis is more robust to gate-dependent perturbations and has practical implications such as not fitting to the first few sequence lengths. In contrast, the analysis of Ref. [28] does not explain the empirical success of RB to date without further analysis, as the constant perturbation term from their analysis arises from a bound on the norm and so can be on the order of √ 1 − p ≈ 0.1 for all m for RB decay rates on the order of p = 0.99.