Quantum circuit debugging and sensitivity analysis via local inversions

As the width and depth of quantum circuits implemented by state-of-the-art quantum processors rapidly increase, circuit analysis and assessment via classical simulation are becoming unfeasible. It is crucial, therefore, to develop new methods to identify significant error sources in large and complex quantum circuits. In this work, we present a technique that pinpoints the sections of a quantum circuit that affect the circuit output the most and thus helps to identify the most significant sources of error. The technique requires no classical verification of the circuit output and is thus a scalable tool for debugging large quantum programs in the form of circuits. We demonstrate the practicality and efficacy of the proposed technique by applying it to example algorithmic circuits implemented on IBM quantum machines.


Introduction
Quantum computing hardware is undergoing rapid maturation and scaling, especially with superconducting circuit, trapped ion and, neutral atom platforms [1,2,3,4]. In tandem, the development of increasingly sophisticated compilation tools makes it possible to write quantum programs in relatively high-level languages and execute them on quantum information processors (QIPs) with 20-50 qubits [5,6,7,8]. Successful execution, however, is not guaranteed due to the significant error rates present in today's, and likely near-future, QIPs [9]. Accurately predicting a program's success rate is increasingly difficult as the number of qubits and complexity of quantum circuits increases. This difficulty is because (i) classical simulation of ideal quantum circuits scales exponentially with qubit number and circuit depth, and (ii) predictions of circuit success from low-level characterization data, like individual gate errors, require unfeasible simulations of noisy quantum circuits or unreliable approximations.
Fernando A. Calderon-Vargas: facalde@sandia.gov Mohan Sarovar: mnsarov@sandia.gov This situation begs for developing efficient in-situ debugging tools that enable reasoning about the source of errors in the output of a quantum circuit. How can one diagnose the most likely source of errors? Especially for QIPs with 50 or more qubits? This debugging task is made difficult by the fact that, unlike classical computers, the state of a QIP cannot be queried mid-execution without exponential resources (state tomography). Entirely new methods for debugging are therefore necessary to tackle this problem.
This work introduces a method for debugging quantum circuit implementations through a sensitivity analysis of the circuit output. Our method enables identifying which circuit layers most influence the circuit output and discovering time-dependent error modes such as degradation of gates as the circuit progresses. The method is based on locally inverting individual layers of a circuit. Specifically, to study the errors induced by a particular "target" layer, we add two extra layers after it -a quasi-inverse of the target layer and the target layer again. In the absence of noise, these extra layers cancel each other and the circuit output is unaffected. In the presence of noise, however, the noise of the target layer is amplified. This local inversion is repeated for each circuit layer to be studied, and the circuit's output distribution is estimated for each version. By comparing these distributions, we can identify which layers perturb the circuit output the most.
The practical utility of identifying the layers that influence the circuit output the most is the certainty that any effort to improve the gates in those layers will have the largest impact on the circuit outcome. For example, many of the quantum control techniques used to improve gate fidelity, like dynamically corrected gates [10,11] and composite pulse sequences [12,13,14,15], usually require longer gate times or long gate sequences, making them impractical to use in every noisy gate in a circuit. Knowing, however, the location of the most problematic gates in a circuit would allow us to efficiently use the quantum control techniques mentioned above.
Surprisingly, the noisy gates affecting the circuit output the most are not always the noisiest as identified by benchmarking of individual gates [16,17,18]. We show that the impact of noise in a given layer is not only a function of the error rates of the individual gates that compose that layer but also of the algorithmic structure imposed by the entire circuit. Thus, the circuitspecific information we obtain about the impact of noise at a given layer on the circuit output cannot be reliably obtained from low-level characterization of individual gates, even in the absence of crosstalk or temporal instability. Our method is efficient and scalable in the low-noise regime; it requires the execution of circuits only slightly (O(1)) deeper than the original circuit and requires no classical computation of circuit outputs or simulation of noise models. Our technique does not require knowledge of whether the output is correct or not, nor does it provides information on its correctness. The resource requirements and sample complexity of our debugging technique are discussed in Section 5.
Our technique should not be confused with methods for finding bugs in quantum programs [19,20,21,22] that use different assertions to help locate faulty statements in high-level specifications of quantum programs. Our method of local inversions, however, could be used in conjunction with quantum program assertions to readily locate bugs at multiple levels of quantum program specifications. We also note that the technique of local inversions of layers or individual gates has been proposed for error mitigation, in which case it is often referred to as unitary folding [23] or identity insertion [24,25]. However, for error mitigation, the noise amplification provided by layer inversion is used for observable extrapolation and not for circuit debugging, as discussed in this work.
We demonstrate our method using simulations and experiments, using circuits implementing the quantum approximate optimization algorithm (QAOA) for Max-Cut [26] and quantum Fourier transform (QFT) [27]. This paper is organized as follows. In Sec. 2, we introduce the concept of local inversion and our analysis techniques. Section 3 presents numerical simulations demonstrating our debugging technique. In Sec. 4, we present the results of the experimental implementation of our technique on cloud-accessible IBM QIPs. We give concluding remarks in Sec. 5.

Local inversions of a quantum circuit
Consider a quantum circuit C represented as a series of d layers, where each layer L contains one or more quantum gates acting concurrently on an n-qubit system. We can write a d-layer circuit as Note that we only include gates in our definition of circuits and do not include state preparation and measurement (SPAM) quantum operations. Our strategy of amplifying the noise in circuit layers through local inversions cannot be applied to layers that include SPAM operations since these are non-invertible operations. As a result, our method cannot perform debugging or sensitivity analysis of SPAM operations. Figure 1: The local inversion technique shows the extent to which the errors in each of the d layers (L) of a circuit C perturb the output probability distribution. This is done by calculating the total variation distances (TVDs) between the output probability distribution of the circuit C and the probability distributions of each of the d circuits C (i) . In each C (i) circuit, we invert the i-th layer and repeat it such that, in the absence of error on Li, LiL −1 i Li = Li. The relative magnitude of the TVD between the probability distributions of C and C (i) is proportional to the degree the circuit outcome is perturbed by the errors in layer Li (see Appendix A).
Each layer L in Eq. (1), in the absence of any noise, implements a unitary evolution U (L) ∈ SU(2 n ) on the n-qubit quantum state represented by the density operator ρ. Typically, some subset of noisy layers in a circuit C affect the circuit outcomes the most. These dominant noisy layers are not necessarily those with the noisiest gates since the structure of the circuit can amplify or reduce the effect of errors on individual gates on the quantum state at the output of the circuit. We propose to use local inversions to identify the dominant noisy layers (see Fig. 1). We define the i-inverse of the circuit in Eq. (1) as where we invert the i-th layer and repeat it. The inverse of a layer i, L −1 i , is defined to be composed of gates that would ideally invert all the ideal gates in L i . This means that, in the absence of error on the ith layer, the outputs of both inverted and non-inverted circuits are the same. The actual composition of the inverse layers depends on the gate set available on the QIP. In some cases, L −1 i might be implemented as multiple physical layers because constructing the inverse of a native gate might require multiple native gates. In the following, we work with the commonly used universal gate set {cnot, √ X, Z(θ)}, where the Z(θ) rotations are executed virtually through a phase offset to subsequent drives, and thus take zero clock cycles and are almost error-free [28]. All layers are composed of some combination of these gates. The corresponding inverse layers, in this case, always consist of one physical layer since . This final inverse requires only one physical layer since the Z(θ) gates are virtual. In such cases, the inverse layer will have a similar error map as the original layer, and most likely, the error maps will be close in magnitude.
We view i-inversions as local perturbations of the circuit and calculate how those local perturbations affect the circuit outcomes. To do so, we first define p(k|C ) as the probability distribution over measurement outcomes at the output of the circuit C , where k ∈ B n with B n being the set of all length n bitstrings. We assume that all qubits are measured in the computational basis at the end of the circuit and that the input state is some fixed, but not necessarily known, initial state ρ 0 of all n qubits. We then assess the impact of perturbations induced by i-inversion by computing the distance between the original and perturbed output distributions, d[p(·|C ), p(·|C (i) )]. In this work, we use the total variation distance (TVD) for the distance measure d 1 . There are, however, many other choices for d, e.g., Kullback-Leibler divergence and Jensen-Shannon divergence, each with a slightly different operational or statistical interpretation. In Appendix D, we detail a distance measure intimately related to sensitivity analysis that provides fine-grained information in the lownoise regime. However, since this regime is not necessarily realistic with current and near-future devices, we opt to use the TVD. Explicitly, (3) which is interpreted as a measure of distinguishability of p(·|C ) and p(·|C (i) ); i.e., the TVD is the greatest probability that one can discriminate two distributions based on one draw.
If, instead of inverting the i-th layer, we replace the gates in that layer with their ideal (noiseless) version, then p(k|C (i,ideal) ) would be the probability distribution of the i-ideal circuit C (i,ideal) . We can then use Eq. (3) to obtain an expression for the TVD between p(k|C (i,ideal) ) and the probability distribution of the noisy circuit C , (p(k|C )), obtaining: In Appendix A, we show that if the error maps for the target layer and its inverse are similar, one can expect that η (i) ≈ 2η (i,ideal) . This provides some operational meaning to the TVD quantities we calculate through local inversion. Decreasing the error in layer i will move the circuit output closer to the ideal output (in terms of TVD between output distributions) by roughly η (i) /2. In this work, we assume that the circuit output is measured in the computational basis and evaluate errors using the TVD measure between outputs in this basis. For some quantum circuits, e.g., primitives like the quantum Fourier transform, it might be important also to understand errors at the circuit output in other bases, e.g., phase errors. Our technique can be easily modified to focus on such errors or average over possible types of errors by modifying the basis of measurement by adding a local Pauli layer at the end of the circuit.

Numerical simulations of circuit debugging via local inversion
Before we present experimental results from the implementation of our circuit debugging procedure, we discuss some of the features of the approach using numerical simulations. We consider two quantum circuits: a 4-qubit circuit implementing the QAOA [26] on a Max-Cut problem and a 4-qubit circuit implementing the QFT [27]. These circuits are shown in Figs. 2(a) and 2(d), respectively. They are expressed in terms of the set of gates used by IBM's quantum processors: {cnot, √ X, Z(θ)}. In these circuit diagrams, we denote virtual Z(θ) gates by a vertical line to indicate that they do not take up a clock cycle. For the QAOA circuit, we have two sets of angles for the Z rotations that define the QAOA variational angles, one random and the other optimized, both shown in Table 1. The parameters used for the optimized QAOA circuit were obtained with pyQAOA [29], a python package that optimizes QAOA circuits using gradient-based methods. Table 1: Sets of angles, labeled "optimized" and "random", used in the circuit implementing QAOA shown in Fig. 2 4 3.8411 4.471 θ 5 1.532 1.592 θ 6 3.404 3.538 θ 7 4.937 5.131 The noise model used in the simulations is Markovian and crosstalk-free, and is based on process matrices derived from performing one-and two-qubit gate-set tomography (GST) [30,31] on the five-qubit quantum processor IBM-Q Ourense [32]. Each of the elementary gates, (cnot, √ X, I), has a process matrix that has both coherent and stochastic errors, and these are explicitly presented in Appendix B. We assume the Z(θ) gates are ideal. Although this noise model does not accurately capture all errors (e.g., crosstalk), it is sufficient to demonstrate the important features of circuit debugging through local inversions.
The local inversion technique gauges the effect perturbations in each layer have on the circuit output distribution. The magnitude of such effects does not only depend on how noisy the gates present in that layer are, it also depends on which gates are present in the circuit, i.e., the circuit structure, and for variational circuits, like in the QAOA case, the circuit parameters. Illustrating this, Fig. 2(b) shows the TVD, Eq. (3), between the original and i-inverse circuit output distribution for two almost identical 4-qubit QAOA circuits, the only difference being the values of the angles of their Zrotations (see Table 1). Even though the difference in magnitude between the optimized and random angles is small, it is enough to modify which layers affect the output distribution the most. As shown in Fig. 2(b), the second layer (L 2 ) is the dominant one when we use the set of optimized angles, while the dominant layer for the set of random angles is the fourth layer (L 4 ).
Next we demonstrate how local inversion technique can be used to detect gates with temporally varying noise (i.e., drift). A common instance of this is gate degradation over the course of a circuit in some quantum computing platforms; e.g., due to heating of vi-brational modes Mølmer-Søreson gates can degrade in fidelity over the course of a long circuit. This is an effect that cannot be modeled using standard characterization methods that assume static process matrices. In Fig. 2(e) we demonstrate the signatures of gate degradation in η (i) . We simulate the 4-qubit circuit implementing a QFT, where in addition to the standard error model described above, we add two-qubit depolarizing channels to the cnot gates between qubits 1 and 2 (we label the first qubit 0), and increase the depolarizing probability with each successive application of the gate to model gate degradation (this increase is indicated by the darker shades of red in Fig. 2(d)). The depolarizing probability used for the red shaded cnot gates in layers 6, 7, 12, 13, and 14 are {p 1 = 0.025, p 2 = 0.051, p 3 = 0.076, p 4 = 0.102, p 5 = 0.127}, respectively. Figure 2(e) shows the TVDs for the circuits with and without gate degradation. The case where the cnot gate is degrading is clearly identified by an increasing TVD for later layers, thus showing the proposed technique's sensitivity to gate degradation during a circuit.
Finally, we comment on the correlation between η (i) and η (i,ideal) , where the latter is shown in Figs. 2(c) and 2(f). These quantities should be correlated, especially in the weak-noise regime, and indeed the plots clearly show that η (i) ≈ 2η (i,ideal) for most i, for both example circuits. The notable deviation from this behavior is in layers 1, 8, and 9 of the QAOA circuit, where η (i,ideal) is greater than η (i) (for both circuits with optimized and random angles). This mismatch in behavior is also reflected in measures of correlation between η (i) and η (i,ideal) : this correlation, for the QAOA circuit with optimized (random) angles is 0.91 (0.93), while for the QFT circuit with (without) depolarizing channels the correlation is 0.997 (0.99). To understand this mismatch, and thus the lower degree of correlation, note that layers 1, 8, and 9 of the QAOA circuit are composed entirely of √ X gates. As discussed in Appendix B, the √ X gates have coherent noise that cancels when composed with its inverse. As a result, the layer inversion process does not amplify the noise in these layers as effectively as it does other layers. In contrast, the cnot gates are self-inverse and composing them does not result in any error cancellation with this error model. Below, we discuss a refinement of the basic layer inversion procedure presented above that mitigates such error cancellation and allows one to experimentally see the impact of noise in layers 1,8 and 9 on the circuit output. Meanwhile, note that since the QFT circuit has no layers composed entirely of √ X gates, and fewer √ X gates in general, such a mismatch between η (i) and η (i,ideal) is not seen in Fig. 2 where the subscripts indicate the respective graph edge. We consider two sets of Z-rotation angles (one is chosen randomly and used to find the second set via numerical optimization) to exemplify how changes in the angles of Z rotations can influence which circuit layers affect the output distribution the most. The circuit is divided into nine layers. The values for the Z-rotation angles are given in Table 1. (b) TVD, Eq. (3), between the original and i-inverted circuit output distribution, obtained with the set of optimized (in violet) and random (in light orange) rotation angles. The layers with the largest TVD are those affecting the output distribution the most. Note that for the set of optimized angles the layer that is most influential is the second layer L2, while for the set of random angles the fourth layer L4 is the one affecting the output the most. (c) TVD (Eq. (4)) between the original and i-ideal circuit output distributions. The Pearson correlation coefficients (hereinafter correlations) between the TVDs, η (i) and η (i,ideal) , for the sets of optimized (in light purple) and random (in yellow) angles are 0.91 and 0.93, respectively. (d) 4-qubit QFT circuit used to show the layer inversion technique sensitivity to gate degradation during the circuit. We demonstrate this by adding depolarization channels to certain gates in the circuit. The circuit is divided into 15 layers. The first layer corresponds to the tensor product of Hadamard gates acting on the initial |0 ⊗4 state to prepare an input state that gives a circuit output distribution that is peaked/concentrated in the computational basis, since the ideal circuit has a definite outcome. The circuit also takes into account the idle gates present in each layer. The red shaded cnot gates include two-qubit depolarizing channels with increasing depolarizing probabilities; the larger the probability the darker the shade. (e) TVD, Eq. (3), between the original and i-inverted circuit output distributions without (in blue) and with (in maroon) gate degradation. (f) TVD (Eq. (4)) between the original and i-ideal circuit output distributions. The correlations between the TVDs, η (i) and η (i,ideal) , for the circuits with (in red) and without (in cerulean) gate degradation are 0.997 and 0.99, respectively.

Technique refinements: multiple layer inversion and averaging over random Pauli layers
The first refinement is the simple observation that one can perform multiple inversions of a layer and construct circuits of the form where m is the number of times the layer is being inverted. The effect of doing this is to amplify the perturbation from the target layer on the circuit output, which, as we demonstrate below, is useful for computing the sensitivity of the circuit output to the noise in a layer when it is small, or when there are sources of confounding noise like temporal instability of gate parameters on the timescale of the execution of a circuit debugging experiment. That is, consider that the total circuit debugging experiment consists of executing d + 1 circuits, each N shots times. Here, d is the number of layers in the original circuit being debugged and N shots is the number of measurements of each circuit. If gate parameters drift during this experiment time, then the observed difference between estimated p(k|C ) and p(k|C (i) ) could be due to this parameter drift rather than the amplified noise in layer i. In Appendix E we discuss the techniques developed in Ref. [33] for detecting the presence of such temporal instability and post-selecting data to avoid confounding by drift. In addition, performing multiple layer inversions, like described in Eq. (5) is a way of magnifying the effect of noise in layer i, which can be useful to increase the signal in the presence of small amounts of drift.
To motivate the second refinement, note that the degree of noise amplification produced by the local inversion technique depends, in part, on the type of noise present in the target layer (see Eq. (15)). For purely stochastic noise, the layer inversion readily amplifies the noise. Coherent noise, on the other hand, can lead to damping or cancellation [34], e.g., when a gate overrotates and its inverse under-rotates in the same amount they can cancel each other, leaving our technique insensitive to errors that do impact the circuit output distribution. To prevent this scenario, we refine our method by introducing a layer of random single-qubit Pauli gates before and after the inverse of each layer (see Fig. 3(a)) in such a way that, in the absence of noise, the circuit output is unaltered and, in the presence of coherent noise, the average over many such Pauli "twirls" removes any noise self-cancellation [35]. Each time the circuit is executed, a new set of random Pauli gates are chosen such that the circuit takes the form where m is the number of layer inversions, P is a layer of random single-qubit Pauli gates (chosen independently for each execution of the circuit), andP is another layer of single-qubit Pauli gates satisfying If local Pauli gates do not have negligible noise and are likely to introduce significant additional noise into the circuit, then it is preferable to minimize the number of random Paulis in between layer inversions. To achieve this, one can omit adding random single-qubit Pauli gates on qubits that are operated on by gates that have low chance of coherent error in L i , e.g., qubits that are idle or have a Z(θ) gate applied to them in layer i. We do this in the simulations and experiments described below.
To demonstrate this second refinement of the circuit debugging technique, we return to the QAOA circuit without random Pauli with noisy random Pauli with noiseless random Pauli (Pauli twirls) in an iinverse circuit C (i) makes the TVD, between the output probability distribution of the original circuit C and the average output distribution of C (i) over many Pauli twirls, sensitive to all coherent and incoherent noises. (b) TVDs between the original and i-inverted 4-qubit QAOA circuit with optimized angles (Fig. 2(a)) (i) without random Pauli gates (in blue), (ii) with noisy random Pauli gates inserted during inversion (in green), and (iii) with noiseless random Pauli gates inserted during inversion (in orange). In (i) the basic local inversion technique fails to notice that there are noisy gates in the first and last two layers. With the addition of random Pauli gates during inversion (cases (ii) and (iii)), our technique becomes sensitive again to the noise in the first and last two layers.
with optimized angles presented in Figs. 2(a-c), where we find a mismatch between η (i) and η (i,ideal) for layers composed entirely of √ X gates, and now include the random Pauli layers in the local inversions. We compare three cases: circuit debugging (i) without random Pauli gates (same as in Fig. 2), (ii) with noisy random Pauli gates (each Pauli is constructed using the noisy √ X gate and ideal Z(θ) rotations), and (iii) with noiseless (perfect) random Pauli gates. For cases (ii) and (iii), we calculate the output probability distribution of 100 instances of the circuit, each with a random set of Pauli gates, and then take the average. Figure 3(b) clearly shows that, with random Pauli gates, either noisy or perfect, the local inversion technique correctly identifies the effect of the layers with coherent noise on the circuit output (layers 1,8 and 9). In fact, with the use of perfect random Pauli gates the correlation between η (i) and η (i,ideal) (shown in Fig. 2(c)) increases from 0.91 to 0.99. The correlation when noisy random Pauli gates are used (0.84) is lower because the Pauli gates themselves introduce new noise that is not present in the original circuit. Therefore, we conclude that if singlequbit Pauli gates can be implemented with low noise, it is advantageous to perform the random Pauli twirling during layer inversions.
Finally, we note that there is an alternative version of the first refinement that involves the insertion of m repetitions of the target layer L, without any inversion, such that L m is equivalent to the identity in the absence of noise. This approach has the advantage that it removes the problem of error cancellation. The only limitation of this approach is that a target layer containing different types of gates will not necessarily be equal to the identity after m repetitions. Of course, one can compile a circuit into a form where only one type of gate is applied in each layer, but this will increase the depth of circuits, which is usually not preferred, especially for near-term QIPs where it is desired to minimize circuit depth.

Experimental demonstration of circuit debugging
The experimental results presented in this section were obtained using the IBM QIP ibmq jakarta. This QIP has the qubit layout and connectivity shown in Fig. 4(d). Using qubits 1, 3, and 5 on ibmq jakarta we implemented a 3-qubit QFT circuit, and we used qubits 1, 3, 5, and 6 to implement a 4-qubit QAOA for Max-Cut circuit using two different sets of angles (see Table 1). The gate errors and gate times for IBM ibmq jakarta at the time of the QFT and QAOA experiments are presented in Figs. 4(d) and 4(h), respectively. In Appendix C we detail the steps involved in these experiments.
Similarly to the QFT circuit in Fig. 2(a), the first layer of the 3-qubit QFT circuit in Fig. 4(a) is a tensor product of Hadamard gates to prepare an input state that gives a circuit output distribution that is peaked/concentrated in the computational basis, since the ideal circuit has a definite outcome [36]. For this circuit, we obtained experimental results using single and triple local inversion (m = 1 and m = 3, respectively) for circuits without (Fig. 4(b)) and with random Pauli gates (Fig. 4(c)). From the gate errors and gate times shown in the table in Fig. 4(d), one would expect a layer like L 11 , which has a √ X and the noisiest cnot in the circuit, to be among the layers with the largest TVD. Instead, in all cases, η (11) is dominated by η (4) , η (6) , η (9) , η (16) , which correspond to layers with a single cnot between qubits 3 and 5, which is not the noisiest nor the longest gate according to the table in Fig. 4(d). In fact, for all cases, except the m = 1, without random Paulis experiment (light purple in Fig. 4(b)), η (11) is also dominated by η (8) , η (10) , η (15) , η (17) , which also correspond to layers with a single cnot between qubits 3 and 5. The increased sensitivity of the circuit output to this gate could be due to the circuit structure, or because the actual error rates for the cnot gates during these experiments were different from the reported values after calibration (e.g., due to drift). This is an example of the useful information our circuit debugging technique provides, which cannot be inferred from low-level characterization information about each of the gates present in the circuit. In Figs. 4(b) and 4(c) we also see the effect of multiple local inversions: the TVDs from the triple local inversion experiments are considerably larger than those from the single local inversion one. This large difference in magnitude between the TVDs from triple and single local inversions is most likely due to nonnegligible higher-order error terms of the error map (see Eq. 11). Note also that, even though the triple inversion enlarges the TVD in each layer, the TVDs for layers with only Z-rotations are not enlarged. This is expected since the layers with only Z(θ) gates are not actually being locally inverted in the QIP (see Appendix C), and any TVD change is likely due to temporal instability of the QIP. In Appendix D, we discuss the effect of multiple local inversions using another distance measure and some of the advantages and disadvantages it presents. Lastly, Figure 4(c) shows the TVDs for a single-layer inversion and triple-layer inversion with random Pauli gates, of which many are twice as large as those without random Pauli gates shown in Fig. 4(b). This large difference in magnitude seems to be a combination of coherent noise and new noise introduced by the random Pauli gates.  Table 1). We show TVDs between the original and the i-inverted circuits without and with (Figs. 4(f) and 4(g), respectively) random Pauli layers inserted during the inversion. We show here only the results for triple layer inversion (m = 3) for conciseness. The first observation is that depending on the QAOA angles used (optimized or random), the dominant layers can be different. This is similar to what was observed in the numerical simulation shown in Fig. 2(b), where the output of two circuits with the    Fig. 4(h), the rest of the dominant layers have cnot gates between qubits 3 and 5, which are the least noisy. Thus, although noise in the highest error gate affects the output distribution, other gates also have significant influence. Note as well that, with the exception of layer L 6 , none of the layers with two cnot gates (the two noisiest gates according to the table in Fig. 4(h)) are among the layers with largest TVD. This is another example of the practical utility of the local inversion technique, it provides circuit-specific information about the impact of noisy gates on the output distribution that cannot be reliably obtained from low-level characterization of individual gates.

Discussion
We have introduced a new method for debugging quantum circuit implementations based on local inversion of individual layers of a circuit. Through analysis, simulation, and experiment, we have demonstrated that this technique enables the identification of circuit layers that affect the circuit output the most and detects gate degradation and temporal instability. Moreover, we have shown that the impact of a given layer's error depends not only on the error rates of the gates that form the layer but also on the structure of the entire circuit. As such, the technique can be used to determine the source of errors in the output of quantum algorithmic circuits.
We note that our technique can extract error behavior that is inefficient or impossible to derive from the characterization of individual gates, which is the dominant mode of quantum computer characterization at present. For example, the technique captures emergent error modes such as temporal variation of gates or dependence on gate behavior on neighboring gates. Such emergent error modes can be pathological, but if they are relevant to the algorithmic circuit that is being implemented, our technique will detect them. Moreover, our technique has the advantage of not needing classical simulation of circuits in order to extract these emergent error modes.
The resources required for circuit debugging based on layer inversions generally scale favorably with circuit width (n, number of qubits) and depth (d). In addition to the execution of the original circuit to be debugged, C , the technique requires the execution of d modified circuits, The number of executions of each circuit (measurement shots) is the only quantity with slightly complex scaling. Consider that underlying the debugging technique is the task of learning two distributions -p(·|C ) and p(·|C (i) ) -and computing the distance between them. The sample complexity of learning an arbitrary discrete distribution over 2 n elements to L 1 error is Θ(2 n / 2 ), and the worst-case distributions to learn are typically those with high entropy. Therefore, the number of circuit executions required for our circuit debugging protocol scales exponentially in the worst case, which physically will correspond to cases where either (i) the algorith-mic circuit C produces a close-to-uniform distribution over outcomes or (ii) the circuit implementations are so noisy that the output is uniformly distributed across the outcomes (e.g., a completely depolarized output state). We argue that in most cases of interest, the output distribution will have a low entropy because (i) quantum algorithms typically produce low-entropy probability distributions over the measurement bit strings, and (ii) any algorithmic circuit for which one would perform debugging would be in the low-noise regime, where the output is not completely depolarized. In such cases, the sample complexity of distribution learning is much more favorable. In fact, for a distribution over 2 n elements with k modes (loosely, peaks or valleys), learning to L 1 error requires poly(k,log(n), 1/ ) samples [37]. Moreover, there is also the alternative of combining the local inversion with mirror circuits, which can be used to obtain low-entropy output probability distributions [35]. Therefore, in cases of practical application, none of the resources required for circuit debugging via local inversions scale super-polynomially with n or d.
We note that our method could be combined with error-mitigation techniques like the zero-noise extrapolation [38,39,25] and probabilistic error cancellation [38,40]. For zero-noise extrapolation, local inversion could be used to identify the layer/gate that perturbs the output distribution the most and use that layer/gate as a starting point for noise amplification and extrapolation. For probabilistic error cancellation, the main limitation is the sampling overhead, which depends on the number of qubits as well as the number of noisy layers. By focusing only on the layers affecting the outcome the most, our technique could potentially lead to lower sampling overheads without undermining the error mitigation. Finally, our technique can be used to assess the relative merits of different compilations of a quantum algorithm -i.e., hardware-and noise-aware compilations attempt to reduce the effect of hardware noise or limited connectivity on quantum algorithm accuracy and our debugging technique can be used to assess their effectiveness.
The deployment of the circuit debugging tool we have developed to variational quantum circuits bears commenting upon. As we have shown, the most influential layer for a circuit depends on the values of the circuit parameters. That influential layer can, therefore, change depending on where one is in the variational parameter landscape during an optimization. Thus, the debugging protocol should be applied at each parameter value encountered during the optimization. We note that this is consistent with how error mitigation is typically performed for variational circuits -the variational cost function is error mitigated at each value of the variational parameters. Therefore, as mentioned above, one could use our protocol at each step in the parameter optimization to identify the layers or gates to amplify during error mitigation.
Finally, we note that Patel et al. [41] have also recently developed a similar protocol for "identifying the most critical operations in quantum circuits" through local inversions. We note that several elements distinguish our work from Ref. [41]: (i) we present theoretical analysis to justify the local inversion technique and present an operationally meaningful interpretation of the experimentally measured quantity, (ii) we develop the Pauli twirling refinement to mitigate error cancellation effects that might lead to erroneous conclusions, and (iii) we incorporate drift detection into the debugging protocol in order to minimize conflation by temporal variation of gates.
We conclude by noting that as QIPs mature and naturally attempt to execute increasingly complex quantum algorithms, lightweight, in-situ diagnostics and debugging tools such as the one we introduced in this work will be essential. A Analysis of local inversions in the lowerror regime It is instructive to examine the behavior of η (i) (Eq. (3)) perturbatively, in the low-error regime. In order to do so, we find it convenient to use the vectorized form of ρ, i.e., |ρ := vec(ρ) = (ρ 11 , ρ 21 , . . . , ρ n1 , ρ 12 , . . . , ρ nn ) T , then vec(U ρU † ) = (Ū ⊗ U )vec(ρ) = U |ρ [42], where U =Ū ⊗ U andŪ is the complex conjugate of U . Note that, when noise is present, each layer L actually implements (E • U) |ρ , where E is a completely positive trace preserving error map. Then, the probability distribution for a depth-d circuit C , for k ∈ B n , is given by where ρ 0 is the initial state of all n qubits and k| is the vectorized (non-ideal) POVM element that corresponds to measuring bit string k. We can pull all the error maps to the left, obtaining where C 0 = U d · · · U 1 is the error-free circuit and We assume that the error map E i in each layer is always close to the identity process 1. The error map in the i-th layer can be written as [43] where E i is the error map's error generator. Given that the error map is close to the identity, we can expand it as follows: where δ ≡ ||E i || and d is the circuit's depth. Then, to first order in δ, Eq. (8) becomes: (12) where . We can follow the same analysis for the probability distribution over outcomes for the circuit C (i) (Eq. (2)) with the i-th layer inverted, and obtain the following Here E (−1) i is the error map for the layer that inverts layer i and E (−1) i is its error generator. The probability distribution over outcomes of the i-inverse circuit C (i) is, therefore, approximately equal to the probability distribution given by the original circuit C plus contributions from error generators corresponding to the pair of extra layers introduced by the local inversion but modified by the unitary operators preceding and following them. Interestingly, the operator (15) in Eq. (14) shows that under certain conditions the local inversion technique becomes insensitive to some types of error. For example, Λ i = 0 whenever E i = E (−1) i and E i anti-commutes with U i . An example would be a Zaxis coherent error on a Pauli X gate. Similarly, and E i commutes with U i . This is true for, e.g., X-rotation gates with over-rotation or underrotation coherent errors, which are exactly canceled by the inversion process. As an error generator can be decomposed into elementary error generators [43], this equation for Λ i tells us that we are sensitive to the rate of errors of certain error generators, and not others. But, as we show in Section 3.1, the insertion and averaging over random Pauli layers recovers the local inversion sensitivity to all types of errors.
We can also write explicit expressions for the TVD in the definition of η (i) under these approximations, This is the sense in which the error rates of a given layer are amplified by the layer inversion and show up in the TVD-based comparisons we compute of output distributions.
Similarly, under these approximations, the i-ideal circuit C (i,ideal) probability distribution to first-order in δ would be wherẽ Combining the previous two equations with the definition of η (i,ideal) (Eq.(4)) we obtain the following expressions: Note that if the error maps for the target layer and its inverse are similar, one can expect The two cases where one can prove ∆ ) and E i commutes (anti-commutes) with U i . Even if this equality does not hold, we expect a significant correlation between the two TVDs in the low-noise regime and/or when the inverse of native gates are similar to the original gates.

B Process matrices for simulated error model
We list the process matrices used in our error model for simulations presented in the main text. These are derived from GST experiments run on the IBM Q Ourense, and were also used in Ref. 32. These process matrices are completely-positive trace-preserving (CPTP) estimated maps of the corresponding operations and are given in the Pauli transfer basis.
Note that to obtain the corresponding CPTP linear maps in the computational basis we use the following equation: [44]: where d = 2 n is the Hilbert space dimension, P j ∈ {I, X, Y, Z} ⊗n are elements of Pauli group, and R is the process matrix in the Pauli transfer basis.
In Table 2, we list error metrics (entanglement infidelity and diamond distance to ideal gate) for these process matrices.
The error metrics indicate that the √ X gate has significant coherent noise. The diamond distance is first-

Gate label
Ent. Infidelity 1/2 diamond distance I 2.8 · 10 −3 1.7 · 10 −2 √ X 8.8 · 10 −4 1.1 · 10 −2 cnot 1.9 · 10 −2 5.0 · 10 −2 order sensitive to coherent errors, while the entanglement fidelity is only second-order sensitive, and both are first-order sensitive to stochastic errors. Therefore, in the absence of coherent errors, we expect these error metrics to be similar, and the two orders of magnitude difference between them for the √ X gate is strong in-dication of coherent noise. Moreover, we can explicitly see how there is noise cancellation when the √ X and √ X −1 = Z(π) √ XZ(−π) process matrices are concatenated by examining the following average gate fidelities: where (g) ideal denotes the ideal version of a gate g, and F (g, E) is the average gate fidelity between g and the channel E, which in term of process matrices in the Pauli transfer basis for each of these operations, [45]. If there are no error cancellations the two fidelities above will be the same, since both channels involve the application of three √ X gates (recall that the Z(θ) rotations are ideal). However, the fact that the fidelity in Eq. (22) is greater than the one in Eq. (23) indicates that there was cancellation of errors present in the √ X channel when it was concatenated with its inverse.

C Experimental procedure
Here we explicitly detail the steps involved in the experiments with the IBM QIP ibmq jakarta: 1. Using the directed acyclic graph representation of the QAOA or QFT circuit, we split it into several layers, where, if possible, we group √ X and cnot gates in the same layers and group the non-Clifford Z(θ) gates in separate layers.

For a circuit with d layers we create d + 1 circuits,
where the first one is the original circuit without layer inversion and each of the remaining d circuits has a different target layer being locally inverted.
3. We create a list composed of r copies of the d + 1 circuits such that r(d + 1) ≤ 300, which is the maximum number of circuits per job for ibmq jakarta.
4. When random Pauli gates are included, these are randomly selected for each of the d i-inverse circuits. We do not apply random Pauli gates to the target layers with only Z(θ) gates since these are virtual and essentially perfect [28].
5. Finally, we send 6 consecutive jobs (delay between executed jobs is just a few seconds because we had sole access to the QIP), each with 290 (286) QFT (QAOA) circuits and 1024 shots per circuit. Therefore, for each QFT (QAOA) circuit we have 61,440 (67,584) bit strings that are used to estimate their respective output probability distributions.
We execute 60 (66) copies of each QFT (QAOA) circuit to collect a large number of samples for each circuit's output. This could be directly achieved by increasing the number of shots for each measurement from 1024, but we found that there were hardware limitations to doing this. The execution of several copies makes the data more affected by temporal instability of the QIP since the experiment is executed over a longer period of time. However, we can post-process the data to detect for significant temporal instability and remove drift-affected data if they are present. This procedure is detailed in Appendix E. We found only a minor impact of drift in the experiments reported here, and for the data presented in the main text, no data removal was necessary.
In order to be consistent in the use of the proposed technique, we apply the local inversion to each of the circuit layers, including those with only Z(θ) gates. However, since the Z-rotations in IBM QIPs are virtual gates [28], the target layers, L i , with only Z(θ) gates and their respective extra layers ([L −1 i L i ] m ) are automatically combined before the circuit is executed in the QIPs. This is done even if the optimization_level in the Qiskit function execute is set to zero to avoid any circuit optimization before its execution. Consequently, every time a target layer has only Z-rotations, the corresponding circuit that is executed is simply the original circuit without any layer inversion. We then end up comparing the probability distribution of the same circuit output distribution at different times, one obtained at the beginning of the experiment and the other obtained some time later. Interestingly, this gives us a qualitative way to detect parameter drift in the system between different times of the experiment.

D Fisher information based analysis
In the main text we used TVD as a distance measure between the original and perturbed output distribution. We also mentioned that other choices are possible. Here we introduce and discuss another choice for the distance measure.
For a parameterized probability distribution p(x|ϑ) over finite number of outcomes k, the Fisher information (FI) is [46] which is the variance of ∂ ∂ϑ ln p(k|ϑ) (known as the score function). Generalizing to the multiparameter case, the Fisher information matrix (FIM) for a probability distribution over sample space K and parameterized by m With random Pauli gates parameters, p (k|ϑ 1 , . . . ϑ m ) with k ∈ K, has elements: The FI measures the average sensitivity of the probability distribution, p, to variations in its parameters [46]. In circuit debugging we aim to quantify the sensi-tivity of the circuit output to a particular layer in the circuit, and therefore the FI seems like it would be a good metric to use for this purpose. However, a quantum circuit's output distribution is not continuous parameterized by its layers, and therefore we introduce a discrete approximation to the FIM, by approximating the derivatives in the definition above with finite differences. We define the quasi-Fisher information matrix (qFIM) with elements: (26) Here, p k|C (i) is the probability distribution that corresponds to the circuit with the i-th layer inverted (C (i) ), and p(k|C ) is the output distribution for the original circuit without any inverted layers. For a dlayer circuit the qFIM is, therefore, a d × d real symmetric matrix whose rows and columns correspond to each inverted layer. In the low-error regime, the perturbed probability distributions, p(k|C (i) ), are close to p(k|C ) and therefore this approximation of the derivative is expected to be good. However, if the QIP deviates from the low-error regime, then the qFIM does not have a good interpretation. In current generation QIPs, there is no guarantee of low error, especially due to non-Markovian error modes like parameter drift, which is why in the main text we have used the TVD as the measure of distance between distributions -TVD does not have an interpretation in terms of sensitivity analysis, but it is more robust.
Insight into the layers that influence the output distribution the most is attained by examining qFIM's eigenvalues and eigenvectors, where the largest element (in absolute value) of the eigenvector that corresponds to the largest eigenvalue identifies the most dominant layer in the circuit [47]. As an example, for the 3-qubit QFT circuit in Fig. 5(a), we execute experiments using single and triple local inversion (m = 1 and m = 3, respectively) for circuits without and with random Pauli gates (Figs. 5(b) and 5(c),respectively). The qFIM eigenspectra for the two cases (insets in Figs. 5(b) and 5(c)) show single dominant eigenvalues, and thus all the useful information is mostly found in the components of the eigenvectors of the dominant eigenvalue. The absolute value of the components of the eigenvectors are represented by the bar charts in Fig. 5(b), for the case without random Pauli gates, and in Fig. 5(c) for the case with random Pauli gates. Figure 5 also shows that, in contrast to the TVDs, the multiple local inversion has a damping effect on the layers with Z-rotations only. This is a consequence of the noise amplification of each i-inverted layer along with the fact that the layers with Z(θ) gates are not being inverted, and the fact that we are plotting normalized eigenvectors. Therefore, the qFIM is an alternative metric to the TVD for analyzing the effects of layer inversions for circuit debugging, especially if the QIP under test is guaranteed to be in the low-error regime.
Given that the ICTs indicate which circuits vary but are often less sensitive than the aggregate LLR, we can take advantage of these two test statistics by implementing both tests with significance levels adjusted appropriately. To do so, we implement the following twostep strategy proposed in Ref. 33. For a global significance α, (i), implement the aggregate LLR test at a significance level α/2. If context dependence is detected, set β = α; otherwise, set β = α/2. (ii), implement the ICTs using a Hochberg correction at a significance of β. If there are more than two contexts in the data (S all > 2), we can also implement a pairwise comparison between different pairs of contexts, where we implement the above procedure more than once with S = 2, as well as a joint comparison of all contexts with S = S all . And so, in order to maintain a global significance of α, we can split α over each implementation of the above procedure (this is known as generalized Bonferroni correction [49]). Now, in the main text we presented 8 experiments: 1. Four 3-qubit QFT experiments.
(a) QFT circuits with single local inversion without random Pauli gates.
(b) QFT circuits with triple local inversion without random Pauli gates.
(c) QFT circuits with single local inversion with random Pauli gates.
(d) QFT circuits with triple local inversion with random Pauli gates.
(a) QAOA circuits with the set of optimized angles and triple local inversion, without random Pauli gates.
(b) QAOA circuits with the set of random angles and triple local inversion, without random Pauli gates.
(c) QAOA circuits with the set of optimized angles and triple local inversion, with random Pauli gates.
(d) QAOA circuits with the set of random angles and triple local inversion, with random Pauli gates.
Each experiment data is divided in 6 contexts (data sets). We want to test for drift in each experiment in two ways: jointly comparing the six contexts and, also, comparing between different pairs of contexts. This results in 13 comparisons between contexts in total. To maintain a global significance of 5%, therefore, we perform each comparison between contexts at a significance of (5/13)% ≈ 0.38%. We implement the aforementioned two-step strategy, where, accordingly, the aggregate LLR test is performed at (5/26)% ≈ 0.19% significance, and the ICT tests are then performed using the Hochberg correction.  The pairwise comparison between contexts for each experiment gives a more detailed picture of the presence of drift in the circuits. Figure 6 shows an example of the pairwise context comparison results obtained for the QAOA with optimized angles and without random Pauli gates. The upper triangle (in red) of the plot shows the N σ for each pairwise comparison; the threshold for drift detection for each pair is N σ,threshold = 3.07. The lower triangle (in blue) of the plot shows the number of circuits that are found to have statistical significant drift for each pairwise comparison. Drift is not detected between a pair of contexts if the number of circuits found to have statistical significant drift is zero and N σ < N σ,threshold ; drift is detected otherwise. From the results in Fig. 6, we can see that drift is not detected only between the data set pairs {4, 5} and {5, 6}, the former having the smallest standard deviation. However, when we compare the TVD obtained from averaging all six data sets with the TVD obtained from averaging only the data sets 4 and 5 (with no detectable drift in between), we find a correlation > 0.99 between them and, as shown in Fig. 7(b), differences larger than the sum of their respective TVD standard deviations are found exclusively in the layers with only Z rotations. The large differences in these Z-layers is not surprising because we already mentioned in Appendix C that these layers give us a qualitative way to detect parameter drift in the system between different times of the experiment (contexts) and indeed statistical significant drift has been formally detected with the tools presented in this appendix. What is more relevant for the debugging purpose of our technique are the layers with no Z rotations. The TVDs of those non-Z layers in Fig. 7(b) do not show differences larger than the sum of their respective standard deviations, and thus the results obtained using all six data sets are not significantly different from the results that we would obtain if we decide to keep only the data sets 4 and 5. In other words, the TVD results obtained using the average of all six data sets are not appreciably affected by drift. Interestingly, we get to the same conclusion when we apply the same statistical analysis to the other seven experiments. Because of this, and the fact that, in general, statistical variation decreases with the size of the data set, we decided to keep the six data sets in the results presented in the main text.
Finally, note that for each QAOA (QFT) experiment, each job is composed of 11 (10) repetitions per circuit, each with 1024 shots, and thus we can associate each measurement with a different time-stamp or context. In fact, we applied the same statistical analysis to the data sets within each job and found that statistical significant drift errors were much more scarce than those from the comparison between jobs (therefore, the temporal instability on this device is more significant at timescales longer than a single job), and their impact on the averaged results was also found to be negligible.