Estimating Coherent Contributions to the Error Profile Using Cycle Error Reconstruction

Mitigation and calibration schemes are central to maximize the computational reach of today’s Noisy Intermediate Scale Quantum (NISQ) hardware, but these schemes are often specialized to exclusively address either coherent or decoherent error sources. Quantifying the two types of errors hence constitutes a desirable feature when it comes to benchmarking error suppression tools. In this paper, we present a scalable and cycle-centric methodology for obtaining a detailed estimate of the coherent contribution to the error profile of a hard computing cycle. The protocol that we suggest is based on Cycle Error Reconstruction (CER), also known as K-body Noise Reconstruction (KNR). This protocol is similar to Cycle Benchmarking (CB) in that it provides a cycle-centric diagnostic based on Pauli fidelity estimation [1]. We introduce an additional hyper-parameter in CER by allowing the hard cycles to be folded multiple times before being subject to Pauli twirling. Performing CER for different values of our added hyper-parameter allows estimating the coherent error contributions through a generalization of the fidelity decay formula. We confirm the accuracy of our method through numerical simulations on a quantum simulator, and perform proof-of-concept experiments on three IBM chips, namely ibmq_guadalupe , ibmq_manila , and ibmq_montreal . In all three experiments, we measure substantial coherent errors biased in Z .


Introduction
Today's engineering advances in quantum computing hardware architectures allow vendors to produce processors with a considerable number of qubits.Despite these im-pressive advances, the current technology does not yet enable large-scale fault-tolerant quantum computing (FTQC) to be implemented.As a result, computations today and into the intermediate future will be done on Noisy Intermediate-Scale Quantum (NISQ) [2] platforms.Noise on these devices can result in both decoherent and coherent errors.Suppressing noise on these hardware platforms and extending the coherence times will rely on improvements in error mitigation protocols and error suppression methods.
Coherent errors are the result of unitary processes occurring in the closed system formed by the qubits' Hilbert space.They arise from several sources such as miscalibrated control parameters, external fields, and crosstalk (e.g.undesired internal fields).They are usually suppressed through advanced calibration and pulse compensation methods [31][32][33][34].Alternatively, coherent errors can be effectively transformed into decoherent errors via Randomized Compiling (RC) [35,36], and then mitigated with tools aimed for decoherent errors [30].
Because the two types of errors have different natures, there are specific error suppression tools aimed at suppressing only coherent errors, while others impact decoherent errors either exclusively, or more efficiently.Given the fundamentally different nature of coherent and decoherent errors and the likelihood that FTQC will not be available in the intermediate future, it will be important to develop fast and accurate methods to independently measure their impact on quantum computing outcomes.Developing such capabilities can provide an essential means to test and compare the effect of different error suppression and calibration tools.
There exist many different frameworks and protocols for characterizing quantum computing processing errors.One approach focuses on characterizing specific infinitesimal generators of motion from a Hamiltonian or Lindbladian viewpoint.While crucial, such approaches don't immediately connect with the diagnostics of integrated evolutions such as quantum gates.Let's define a quantum instruction as a set of one or few qudit labels, together with a linear operation to apply over these.The operation can be a gate, some measurement operation, or some state preparation.A cycle can be defined as a set of instructions targeting disjoint sets of qubits, together with a schedule.For example, a cycle could consist of a round of simultaneous CNOT gates over the processor.Due to various physical effects such as crosstalk, parallel instructions often substantially influence each other.This observation means that characterizing instructions in isolation can lead to misguided diagnostics.Indeed, a cycle error channel depends on a non-obvious function of every instruction that is being applied, as well as on the precise relative timing of those instructions.
As such, to better contextualize error profiles, we focus on a cycle-centric approach to process characterization.Given the considerable number of parallelizable instruc-tions as well as the complexity of their joint error profile, a cycle-centric approach is particularly relevant in the NISQ-era and beyond.A well-known cycle-centric error diagnostic scheme is known as Gate Set Tomography (GST) [37][38][39][40][41][42].GST provides detailed information about both coherent and decoherent errors, but the diagnostic information addresses non-randomized cycles.
In the current work, we instead focus on characterizing effective dressed cycles tied to randomly compiled circuits.The idea is to provide detailed diagnostic information that closely connects with the effective performance of circuits run under RC.One convenient aspect of effective dressed cycles is that they can be characterized via Pauli infidelity estimation methods [43,44].In particular, the Cycle Benchmarking (CB) [1] protocol features a valuable framework to characterize effective dressed cycles with great precision.As such, more advanced error diagnostic tools, such as Cycle Error Reconstruction (CER) [34,45], repurposed the structure of CB circuits to gather detailed error profiles attached to effective dressed cycles.Note that CER is also known as k-body Noise Reconstruction (KNR) [46].
However, neither CB nor CER are initially designed to provide a budget for coherent and decoherent contributions to the measured error rates.As such, we introduce a new CB-structured method for separately measuring both the decoherent and main coherent cycle errors based on CER data.To distinguish between decoherent and coherent errors, we rely on their different propagation principles (linear vs quadratic).In the current work, we focus on characterizing effective dressed cycles tied to randomly compiled circuits.
The idea is to provide detailed diagnostic information that closely connects with the effective performance of circuits run under RC.This technique can be as fine as desired, depending on the number of marginal error probabilities that are included.One important purpose of making the distinction between coherent and decoherent errors (other than understanding the error budget in a more complete manner) is that this can lead to calibration strategies such as knowledge-based dynamical decoupling or compiled compensations of coherent errors into future cycles.
We acknowledge that both GST and our extended CER do provide a measurement of the noise.However, CER considers effective dressed cycles which are slightly different objects than deterministic cycles characterized in the GST method.Of course, one could deduce the error maps of effective dressed cycles by combining various GST-obtained error profiles, but the advantage of the extended CER is that it leverages the tailored properties of the noise under RC to accelerate the characterization procedure.
In this paper we layout our work as follows.In Section 2, we summarize the main principles describing the propagation of coherent and decoherent errors in repeated error channels.In Section 3, we outline the background material necessary to understand a CB-structured error characterization scheme known as CER [34,45], and we introduce our protocol extension to CER.We finish the section with a generalized decay model that connects the output data of the scheme to the effective coherent contributions to the error profile of the hard cycle.In Section 4, we further expand on the fitting model related to our protocol.This section also provides numerical evidence that the error profile obtained from our suggested experiment matches the underlying error model.In Section 5, we apply our extension to the CER protocol on three separate IBM quantum computing hardware platforms and present the central results of this research.As a proof of concept, we measure the effective coherent and decoherent error rates marginalized on an ancilla during an entangling cycle.Fidelities of the errors for various sequence lengths are calculated and compared to the results obtained from running the circuit applying a proposed polynomial global fitting function.We show through our implementations that the coherent error manifests primarily as Pauli Z error on IBM platforms due to ZZ static coupling in transmon qubits.In Section 6, we summarize our results and expand on their relevance regarding the benchmarking and designing of error suppression tools for gate-based quantum computing architectures.In addition, there are several supplemental sections that expand the discussion in the main sections of the paper.

Theory -Coherent and decoherent error propagation
Coherent and decoherent quantum errors in a quantum computing circuit propagate in intrinsically different ways.In this section, we explicitly demonstrate the error propagation formula for the simple case of an x-fold error channel E x .
Recent work has also analyzed the error propagation in quantum computers [47].They have shown that the decoherent error increases linearly with circuit depth but plateaus after some threshold circuit depth.Our work takes such an analysis into coherent error domain, and shows, both theoretically as well as through implementations on IBM quantum processor, that repeated coherent error propagates quadratically to the first order.
A quantum state can be described by a density matrix ρ, which is a positive semidefinite, unit-trace matrix.This allows representing pure states (if and only if the rank of ρ is 1) and a probabilistic mixture thereof.The evolution of a quantum mechanical state ρ is often described by the von Neumann equation, where H is a Hermitian matrix referred to as the Hamiltonian.However, the von Neumann equation only applies to the deterministic evolution of a closed system.For open quantum systems, a more general evolution formula is given by the Lindblad master equation [48]: where L i are traceless matrices referred to as Lindblad operators or Lindblad jumps, and where the linear operator Λ is referred to as the Lindbladian.A Lindlbadian is one of the general forms of Markovian and time-homogeneous master equations describing the general non-unitary evolution of the density matrix ρ.
An error channel E can be represented as a linear operator acting on density matrices.In this work, we consider error channels E which are close to the identity and that are the result of a finite-time evolution of a time-independent Lindbladian: For what follows, we pick ∆t = 1 without loss of generality, as it simply fixes the scales of the matrices H and {L i } i appearing in the Lindbladian.We emphasize that although the d × d matrices (d = 2 n where n is the number of qubits) H and L j appearing in Eq. ( 2) do not depend on time, we don't have to assume that the error channel E is the result of a time-independent error process.The error process itself could involve time-dependent features, but could be equivalent to an integral over a time-independent evolution.Indeed, if the error process is weak (which implies that the generators of motions can be time-averaged) then after the time-dependent equation is integrated, the solution can be either exactly expressed or well approximated as the integral of a time-independent process.Starting from Eq. ( 3) we can construct a columned-vectorized picture (i.e. the d×d density matrix ρ becomes a d 2 × 1 vector col(ρ) obtained by stacking the columns of ρ) and re-write the equation as where the d 2 × d 2 matrix is the matrix form of the Lindbladian.The solution of Eq. ( 4) is given by Eq. (6).
Therefore our error channel E can be expressed as In this context we are viewing Λ as the generator of the process matrix E. Viewing Λ from this perspective, if all of the terms in Λ can be calculated, then effectively one knows the error process matrix E.
To properly discuss the strength of specific terms appearing in the Lindbladian, let's fix an expansion basis and express the effective Hamiltonian H and Lindblad operators In this work, we only consider errors induced by near-local physical interactions.To provide a notion of locality, we have to consider the geometric features of a given device.We will simply consider an interaction graph, where qubits can undergo controlled entangling operations if the vertices are connected.Given a connectivity graph, we constrain Lindblad and Hamiltonian operators to be at most geometrically k-local for some small integer k, which means that the operators can act on connected subgraphs of at most k qubits (note that the operators can act trivially on some of the qubits, allowing for gaps).The accuracy of the upcoming Eq. ( 11) fails exponentially fast in the size of the locality constraint k.However, given some array-like topology, if we assume that errors stem from 1 or 2-body interactions in some rotating frame, then an N-qubit subsystem (N ≤ n) should be affected by O(N ) terms in the Lindbladian.As a result (see Appendix A.2), the action of the error channel e Λ on a limited qubit support can be well approximated through an early-truncated Taylor expansion.
To be more specific, first consider Pauli fidelities, defined for a generic channel E acting on a n-qubit system as We denote the infidelity as δf P (E) = 1 − f P (E).Given a Lindbladian Λ such that E = e Λ , let's further define the coherent and decoherent contributions to the infidelities respectively as: In this work, we approach the propagation of errors by studying the Pauli fidelities associated with the repeated channel E x = e xΛ : Notice that both sums above are performed over the Paulis which anti-commute with P , and the second sum is further performed over the Lindblad jumps' indices j.In Appendix A.2, we provide a proof of Eq. ( 11), and expand on the quadratic term in x.
If we Pauli-twirl the error E -and denote it as E RC to refer to randomized compiling -we could sensibly characterize it in terms of Pauli error probabilities, since the twirled error channel would become a probabilistic sum over Pauli errors: where P n is the n-qubit Pauli basis, and where {e P } P is a probability distribution over Pauli errors P .We can translate between Pauli error probabilities e P and Pauli fidelities f P by using the Walsh-Hadamard transform (see [49] as well as Eq. ( 60)).We can define the coherent and decoherent contributions to the error probability e P respectively as: Notice that by substituting the above in Eq. ( 10), we get Just as in Eq. (11), error probabilities corresponding to a Pauli-twirled repeated channel (E x ) RC closely follow a quadratic equation: e P (E x ) ≃ x 2 e coh.P + x e decoh.P (15) We elaborate on Eq. ( 15) in Appendix A.3.To a good approximation, coherent errors first induce a purely quadratic error growth as opposed to the purely linear initial growth characteristic of decoherent errors.This difference in propagation speed is the key to differentiating the two error sources.

Learning the Error Profile Attached to a Cycle
In this section, we provide the background material necessary to understand a CBstructured error characterization scheme known as Cycle Error Reconstruction (CER), which is also known as k-body Noise Reconstruction (KNR).Once the basics of CER are covered, we shall introduce our extension to the protocol which complements the standard CER diagnostics with a measure of the coherent contribution to the error profile of the hard cycle.

Cycle Error Reconstruction (CER) Error Diagnostic Protocol
In quantum computing, there are typically hard and easy cycles, which usually consist of parallel entangling gates and parallel single-qubit gates respectively.Hard cycles dictate the circuit performance compared to easy cycles.A dressed cycle refers to a composite cycle comprising a hard cycle followed (or preceded) by an easy cycle.
CER specifically provides error profiles attached to cycles relevant to circuits performed under Randomized Compiling (RC), a well-known error suppression technique [50][51][52][53][54].To understand RC, consider a noisy application circuit composed of m sequential dressed cycles C i interleaved with cycle-dependent error channels E i , RC replaces the above application circuit C meant to be run N shots times with n rand randomly sampled equivalent circuits (over a structured distribution of circuits), each to be run N shots /n rand times.The effect of RC is to effectively replace each noisy dressed cycle E i C i with what is referred to as an effective dressed cycle The difference here is that the generic error channel E i is replaced with a Pauli stochastic error channel E RC i of the form where P n is the n-qubit Pauli basis, and where {e P,i } P is a probability distribution over Pauli errors P corresponding with the ith cycle.The error distribution naturally depends on the cycle C i .CER specifically learns information about the distribution {e P,i } P .To avoid scalability limitations, CER doesn't usually try to learn every probability value e P,i , because there are 4 n of them.Instead, it learns marginal error probabilities over constrained unions of instruction labels.That is, if we let A be a union of instruction labels (e.g.{1} ∪ {2, 3}), and let A c be its complement over the architecture (e.g.{4, • • • , n}), then the marginal probability of the error P (e.g.XXZ) over A is defined as CER is tomographically complete because learning all marginals is equivalent to learning the probability distribution {e P,i } P .However, due to locality constraints in realistic error models, poly(n) marginals are often sufficient to accurately reconstruct the error profile of a given cycle.For instance, in a planar architecture, if we look at the marginal error probabilities for all pairs of gates in a given cycle that are occurring on two directly connected sets of qubits, we would only need to gather O(C • n) error probabilities, where C denotes the average number of connections per qubit (in a rectangular lattice, C = 4).If we want to further collect next-to-nearest neighbor error correlations, we would need O(C 2 • n) error estimates.Keeping the same two examples but for a fully connected architecture, the costs would instead be O(n 2 ) and O(n 3 ) respectively.A more thorough discussion regarding marginal probabilities and reduced error models in CER can be found in [34].
The marginal error probabilities associated with the stochastic channel E RC i are retrieved from the Pauli fidelities, defined in Eq. ( 9).That is, given an effective cycle of interest E RC C, the CER protocol involves the estimation of a set of various Pauli fidelities, {f P (E RC )} P , and converts the cycle of interest into a marginal error distribution.The Pauli fidelities are themselves extracted from circuit-level observables.
To be more explicit, let's specify CER circuits with three parameters, namely a set of commuting Paulis, S, which is dictated by a state preparation and measurement (SPAM) strategy, a number of dressed cycle repetitions "m", and a string "s" sampled from a random variable X, which contains the information about the RC-randomized dressings.The role of the three parameters can be visualized in Fig. 1 (set x = 1), where the random string s dictates a choice of easy cycles.Each sampled CER circuit C(S, m, s) yields a set of fidelity-like numbers {f circuit P (m, s)} P ∈S associated with the set S of commuting Pauli operators determined by the SPAM strategy.The expectation of f circuit P (m, s) over the strings s in X obeys a decay formula of the form: where the constant A P depends on SPAM errors, and where E RC is the Pauli stochastic error channel associated with the effective dressed cycle of interest.In practice, the LHS of Eq. ( 19) is replaced by a sample mean, and the Pauli fidelities f P (E RC ) are obtained by gathering sample mean estimators for various values of dressed cycle repetitions m.
A thorough description of the CER protocol together with pseudo-code can be found in [34].

An extension to CER to learn coherent contributions to errors
CER is meant to provide a diagnostic of effective RC cycles.Let the implementation of a hard cycle be expressed as the product E hard • C hard , where E hard is the cycle error channel attached to the ideal cycle implementation C hard .Similarly, let n-qubit Pauli easy cycles Q ∈ P n be expressed as the product Q • E Q (error channels can be placed on the left or right of the ideal implementation without loss of generality, but may differ depending on which side is chosen).To a high approximation, the fidelities given by CER correspond to [34]: and it corresponds to the Pauli fidelities of the effective dressed cycle E RC • C hard to the ideal cycle C hard .Notice that those fidelities don't provide a quantitative budgeting of the coherent and decoherent contributions to the infidelity.
The easy cycles are randomized (the random aspect of the circuit is specified by a randomly sampled string s) but chosen together with m and the SPAM circuits such that the whole circuit amounts to a Pauli (which is accounted for in post-processing).
To distinguish coherent error contributions from decoherent ones, we propose to perform CER sequences on folded hard cycles (E hard • C hard ) x for x such that C x hard = C hard .The circuit representation of the experiments can be visualized in Fig. 1.As such, the pseudo-code for this generalization is almost identical to CER.This idea is not generally equivalent to considering E x hard • C hard , but as we shall see, they are very closely related in many physically realistic scenarios.Let c be the smallest positive integer such that C c hard = I, also referred to as the period of C hard .For instance, if the hard cycle consists of parallel CNOTs, we get c = 2 since CNOT is self-inverting.Without loss of generality, let's express E hard as where the components Λ (j) hard phase-commute with C hard according to the following equation: Now, let's consider multiplying the noisy hard cycle E hard • C hard with itself a number of c times where c is the period of C hard : where Λ echo is a transformation of the original Lindbladian Λ hard .From the Baker-Campbell-Hausdorff formula [55], we get In other words, up to second-order corrections, the Lindbladian of the echoed transformation Λ echo is the period c times the component of the Lindbladian that commutes with the hard cycle C hard .Therefore, Eq. ( 24) essentially states that by folding the noisy cycle E hard • C hard , we effectively evolve the component of E hard that commutes with C hard .This often consists of the only substantial error component.Indeed, is usually a good approximation if the hard cycle of interest C hard consists of a round of parallel pulses, since weak non-commuting noise components are expected to be echoed out by the drive, similarly as in Eq. (24).By performing CER sequences on folded hard cycles (E hard • C hard ) x where C x hard = C hard , we effectively get an experiment to obtain the following fidelities using Eq.(24).
where Λ Q := log(⟨E Q ⟩ Q∈Pn ) and where the second line is obtained from the first-order Baker-Cambell-Hausdorff formula [55].Standard CER fidelities from Eq. ( 20) are obtained by choosing x = 1.In the scenario where hard ) and where the easy cycle has a very small error component compared to the hard cycle (∥Λ Q ∥ ≪ ∥Λ hard ∥), we immediately get: More generally, we get where the small term ϵ(x) is implicitly defined and can approximately take the form of a constant plus a linear term in x if we Taylor expand Eq. (25).In terms of physics, the linear term in x in the expansion of ϵ(x) can arise if the coherent part of (Λ hard −Λ (0) hard )+ Λ Q coherently interferes with the folded Lindbladian xΛ (0) hard .Combining Eq. ( 27) with the results of Sections 2 and 3.1, we get that by performing CER sequences on folded hard cycles (E hard • C hard ) x where C x hard = C hard , we should expect decay formulas of the form: where the coherent and decoherent Pauli infidelities δf coh.P and δf decoh.

P
are defined in Eq. (10), but are to be associated with the Λ (0) hard effective Lindbladian that commutes with C hard , and where ϵ(x) is a linear function of x.The impact of ϵ(x) on the estimates of the decoherent term is expected to be negligible in the regime where the decoherence induced by the hard cycle dominates over the coherent interference between Finally, the coherent contributions e coh.P to the error probabilities e P (E hard ) of the hard cycle C hard are obtained by gathering the quadratic components from Eq. ( 28) and by performing the Walsh-Hadamard transform.

Extracting Coherent and Decoherent Qubit Errors Using CER
In this section, we elaborate on a simple example of the CER extension protocol introduced in Section 3.2.This allows us to expand further on the subtleties of the fitting model such as the properties of the underlying covariance matrix.Finally, we provide numerical evidence that the error profile obtained from our suggested experiment matches the underlying error model.
We frame our analysis and fitting procedure around a simple proof-of-concept experiment for the sake of concreteness and clarity.In particular, we focus on the errors occurring on an idling qubit during a hard cycle where the nearby qubits undergo a CNOT gate.We focus on this example for two reasons.First, we judged that simplicity would allow us to expand more explicitly on the fitting procedure and analysis.Second, based on the physics behind IBM processors, we expected (and observed) a noticeable coherent Z error on the idling qubit resting next to the CNOT operation [56].This highlights the relevance of our method for the characterization of coherent crosstalk effects.All that said, we emphasize once more that our method extends beyond the characterization of a single qubit because it consists of a straightforward generalization of CER, and that CER has been demonstrated on multi-qubit hard cycles [34,45].

Fitting Model
As discussed in Section 3.2, our experiment consists of sampling CER circuits for various circuit parameter tuples (x, m, B, s): • x is the number of folded hard cycle per dressed cycle; • m is the number of dressed cycles ; /LQHDUO\JURZLQJ=HUURU(lin Z /2) The blue cropped ellipse represents the 1σ confidence region of 2 fitted parameters, namely lin Z /2 and cst P /2, based on the covariance matrix returned by the fitting function.These 2 parameters appear in the 12-parameter model given by Eq. (29).Recall that those parameters are physically constrained to be positive, hence the cropping of the region.The fit was performed on simulated data in order to include exact values (see Section 4.3).The solid blue dot is the value returned by the fit, and the star is the exact value of the parameter pair.The dotted line represents a tradeoff line where (lin Z + cst P )/2 is set to the exact value, for different values of lin Z and cst Z .The longer principal axis of the ellipse in the figure is nearly aligned with the dotted tradeoff line.This illustrates that the estimate of (lin Z + cst P )/2 is much more precise than the estimate of (lin Z − cst P )/2.For the simulated error model, we used the relaxation times from ibmq_montreal, and introduced a coherent Z error with effective rate of 0.002.
• B denotes a state preparation and measurement (SPAM) basis as a set of commuting Pauli observables.We omit B in the expression of the Pauli fidelities since it is implicitly chosen through the P Pauli index.
• s denotes a randomly sampled string that encodes the random part of the circuit construction for the fixed parameters (x, m, B).
From those parameterized sampled circuits, we generated fidelity-like quantities f circuit P (x, m, s).Consider the simplest instance of such an experiment on a single qubit.According to Eq. ( 28), the sample average of f circuit P (x, m, s) obeys the following 12-parameter decay model: There are many possible methods to retrieve estimates for the 12 parameters (A X ,A Y , A Z , quad X , quad Y ,quad Z , lin X ,lin Y ,lin Z ,cst X , cst Y ,cst Z ).A simple one is to start with an initial guess and perform a minimization algorithm with the cost function defined by where the sum P,m,x is performed over the desired Paulis, sequence lengths and number of folded hard cycles.For instance, in this work, we used the Trust-Region-Reflective (TRF) least mean squares algorithm to fit the parameters to the fidelity data [57].In the general case, for N different Pauli fidelities to estimate, the number of parameters in the fit becomes 4N .Since CER focuses on error probabilities marginalized over a small number of qubits, the number of parameters is guaranteed to be manageable unless we observe long-range error correlations.
As shown in the previous section, the coefficients quad P appearing in the quadratic part of the exponential decay are in direct correspondence with the effective coherent contribution to the infidelity of the hard cycle on the qubit: The linear coefficients are most realistically in correspondence with the decoherent contribution to the infidelity of the hard cycle on the qubit, although some linear effects could in principle be induced by a coherent effect between the randomized easy cycle and the folded hard cycle.Easy cycles are usually much less error-prone than hard cycles, meaning that the linear coefficient is often entirely dominated by the decoherent contribution.Finally, the constant terms cst P are due to the errors occurring during the easy cycles, as well as to the components of the hard cycle error that does not commute with the hard cycle C hard (see Eq. ( 27)).

Resource requirements
CER as well as its newly introduced variant are designed for scaling well with the number of qubits n.The number of circuits for such experiments scales as the number of marginal error probabilities to extract, and often -due to constraints in the correlations of errors -only a polynomial number of marginals is enough to accurately describe the error profile.Moreover, other hyper-parameters, such as the number of shots and the number of random circuit samples can be kept constant and still yield multiplicative precision estimates on the error probabilities.Regarding the required number of shots, it is shown in [58] that the number of shots can be kept constant no matter the error rates as long as the sequence lengths are appropriately chosen (the optimal choice of sequence lengths is discussed in [58]).It is worth mentioning that the number of shots bounds the relative precision of the error probability estimates, so it is worth picking a few thousand shots to ensure 2 significant digits on the estimates.Regarding the number of random circuit samples, it is known that due to concentration inequalities, randomly compiled channels converge exponentially fast to their true average with the number of samples.For instance, in [59] it is shown that 30 circuit samples are enough to heavily suppress coherent errors (see the figure 2 in the paper).

Fitting Model Simulation Results and Analysis
To anticipate our hardware-based experiments, we perform numerical simulations of our experiment in the simple case of a single qubit subject to a noise source coming from a nearby entangling gate.To measure the systematic increase in the magnitude of the error from the noise, the number of CNOT gates is systematically increased.Measurements are taken for hard gate cycles of 1, 3, 5, 7 and 9 CNOT gates and the simulations are run for 20,000 shots per random circuit.We used the values 4, 8, 12, 16 and 32 for the the sequence lengths.For the simulated error model, we use the relaxation times from ibmq_montreal, and introduce a coherent Z error with an effective rate of 0.002 to simulate the presence of coherent crosstalk.
In our numerical analysis, we notice that lin P and cst P are strongly anti-correlated, inducing a large uncertainty for their difference lin P − cst P .This is well illustrated in Fig. 2, where we can further see that the sum lin P + cst P is estimated with high precision and accuracy.For this reason, we contrast the coherent contribution to the error rates from the hard cycle with the sum of the other contributions (see Fig. 3).In the regime where the twirling operations are nearly perfect, the contributions lin P + cst P can be interpreted as the sole result of decoherence during the hard cycle.More generally, one could make tighter upper-bounds on the constants cst P , based on physical assumptions or on additional benchmarking data.For example, in Fig. 2, one could reduce the uncertainty on the difference lin P − cst P by upper-bounding the constant Z error cst Z /2 ≤ 0.0005.This would crop the confidence region and improve the precision and level of detail of the error profile, but would inherently rely on an assumption.
In all of the computations, the error parameters appearing in the model described by Eq. ( 29) are bounded between 0 and 1 (they are usually close to 0).As such, we use the Trust-Region-Reflective (TRF) least mean squares algorithm to fit the parameters to the data [57].We compare the estimated error profile to an exact reference in Fig. 3 and find a satisfactory degree of agreement.

Hardware results and analysis
Starting with the proposed fitting model discussed in Section 4.1, we note that each of the quadratic equations f X , f Y and f Z in Eq. ( 29) are interconnected because each direction (X, Y, Z) has anti-commuting terms for X, Y, and Z that are coupled across each dimension of the model.This requires each of the coefficients in the polynomial to have both a coherent (quad P ) and a decoherent (lin P ) fitting parameter.Having all possible terms that anti-commute with each of the x, y, and z axes in Eq. ( 29) included in the proposed model require twelve parameters ( 3 "A P " parameters, 3 "quad P " parameters, 3 "lin P " parameters and finally 3 "cst P " parameters).One advantage of such an approach is that the global fit to these twelve parameters assures overall positive error rates.The results from this global fitting produce 75 values (five different CNOT repetitions at the five different sequence lengths and three f X , f Y and f Z ).
We construct graphs of the component fidelities f X , f Y and f Z versus sequence lengths for hard cycle CNOT repetitions 1, 3, 5, 7, and 9 for the fitting model for each of the three different hardware platforms.These 75 values are plotted on graphs of f X , f Y and f Z versus sequence lengths for each hard cycle CNOT repetition (1, 3, 5, 7, and 9) with star symbols (⋆).Fig. 6 shows an example of these graphs for the ibmq_manila hardware platform.
To compare these computations with experimental data we selected the cloudaccessible IBM quantum computing hardware platforms as a proof-of-concept of our experimental design derived in Section 3.2.The cycle and marginal distribution selected are modelled from the circuit design implemented to measure the spin-spin correlation function of a Heisenberg spin chain [60].Fig. 4 is a block diagram of the spin-spin correlation function circuit showing CNOT gates in the circuit and an ancilla qubit used to measure the computational results.Fig. 5 shows a more detailed sub-circuit used in the time evolution of the Heisenberg spin-chain.As shown through the diagrams, the types of circuits that we consider feature a spectator ancillary qubit that is left idling during the trotterized evolution of the spin chain.As such, we focused our interest on the coherent and decoherent error profile marginalized on this ancilla.In this simple instance, the hard cycle of interest consist of a CNOT acting on a pair of qubits neighboring to the idle ancilla.These types of circuits motivated this work.
We implement the circuit on three different IBM quantum computing hardware platforms (ibmq_guadalupe, ibmq_montreal, and ibmq_manila).Appendix B discusses the specific circuit implementations on each of the hardware platforms.These calculations are randomly compiled 30 times with different Paulis for each sequence length on each of the three hardware platforms using 10,000 shots for ibmq_manila and 20,000 shots for ibmq_guadalupe and ibmq_montreal.We color-code each of the 30 results from these experimental measurements for each sequence length and plot them on the graph of f X , f Y and f Z versus sequence lengths for each hard cycle CNOT repetition (1, 3, 5, 7, and 9) and each of the three different hardware platforms.We Figure 5: Sub-circuit used in time-evolution (see e −iHt block of Fig. 4) of the 4-site Heisenberg spin-chain.Four instances of this sub-circuit implement e −iHt block of Fig. 4 also plot the sample mean from the 30 different data points for each hard cycle CNOT repetition at each sequence length on the graph with the diamond symbol(⋄) and a standard deviation error bar.Fig. 6 shows these results for the ibmq_manila hardware platform.We construct similar computations and graphs for both ibm_montreal and ibm_guadalupe.
Upon examination of Fig. 6 in more detail, it can be seen that some of the 75 different pairs of stars and diamonds do not overlay each other on the graph.For example, the 3 CNOT folding star-diamond pairs for all of the sequence lengths 4, 8, 12, 16, and 32 in the component fidelity f Y are slightly displaced from each other.Similar examples of these star-diamond pair displacements can also be seen throughout the other CNOT foldings in the figure.However, these displacements are rather small.We calculated the coefficient of determination, also referred to as R 2 , [61], to three significant digits for all three devices (for ibmq_manila, R 2 = 0.894; for ibmq_montreal, R 2 = 0.989; for ibmq_guadalupe, R 2 = 0.990).The fact that these values are well above 80% indicates that the model substantially accounts for all the data.
We then implement a minimization procedure on the star-diamond data.The goal is to globally minimize the distance between the diamond values and the star values for all 75 star-diamond combinations.We perform this global minimization by computing the square of the difference between each pair of stars and diamonds and dividing by that specific star value and then summing over all 75 star/diamond pairs This calculation is essentially a minimization of the χ 2 statistic to improve the goodness of the fit.
From Section 4.3 a more detailed analysis for the 12-parameter fit shows that the sums and differences of the lin P /2 and cst P /2 are strongly anti-correlated.As stated in Eq. ( 31), the quad P /2 values contain the coherent contribution to the error rate from the hard cycle, while the (lin P + cst P )/2 contains the other contributions.We plot these terms for each of the three IBM hardware platforms as shown in Fig. 7.This figure is the central result of this research project and shows the effective error rate versus Pauli Error (X, Y and Z) for the designated single qubit on each of the hardware platforms.These results as well as the data from the simulator are shown in Table 1.The table clearly shows that the Pauli Z lin P /2 and cst P /2 terms have the strongest error measured on the single qubit on each of the hardware platforms.
It is well-known that the static ZZ coupling in a transmon qubit is always present and leads to both coherent and incoherent errors [62].Recent works have theoretically analyzed the effect of crosstalk on simultaneous gate operation in a tunable ZZ couplingbased qubit architecture [63,64].Our results confirmed that a CNOT gate indeed affects the idling qubit, and the nature of the coherent error is predominantly Pauli Z error.The results are obtained via the extended CER protocol described in Section 3. The numerical values of the error rates with their respective uncertainty are contained in Table 1.
We observed this noise bias toward Pauli Z error due to the effects of static ZZ coupling in a transmon qubit.This increase in the single qubit error as the number of CNOT hard cycles is increased can also be seen graphically through heat map plots for the single qubit on ibm_guadalupe, ibm_montreal and ibm_manila as shown in Fig. 10, Fig. 11 and Fig. 12 in Appendix C.Although there is some increase in the X and Y Pauli error, these heatmaps also graphically illustrate that it is the Z Pauli error that shows the greatest increase as the number of CNOT hard cycles increases.We observe from our heatmaps that Pauli Z error increases rapidly compared to X and Y errors when the number of CNOTs increases from 1 to 9.This rapid scaling of Z error is the signature of coherent errors.Thus, our characterization scheme shows not only that Pauli Z error is the dominant error on the IBM quantum processors, but also that a substantial fraction of it is the result of coherent processes.

Summary
We successfully demonstrate the design and implementation of an efficient and scalable diagnostic method that quantitatively differentiates between coherent and decoherent errors in cycles.The characterization scheme that we suggest in the present work differs from existing error diagnostic methods such as GST [37][38][39][40][41][42] in that it is targeted toward the characterization of the effective dressed cycles present in randomly compiled circuits.This is an important tool because many error suppression techniques implemented today only provide improved circuit performance by focusing on mitigating either decoherent or coherent errors.Our method can therefore be used to measure the impact of error suppression suites on the type of errors that they specifically target.
We leverage a CB-structured error characterization protocol known as CER [1,34,45].The original method was designed to measure the error profile on effective dressed cycles, which are tailored to have purely decoherent errors via a compiling method known as RC [35].We expand on the basic CER method by introducing an additional hyper-parameter (labeled x in this work) which corresponds to the number of hard cycle repetitions before being subject to Pauli twirling (see Fig. 1).This additional hyper-parameter allows for quantitative estimates of the coherent error contributions to be computed through a generalization of the fidelity decay formula (Eq.( 28)).
Our data analysis relies on the different propagation formulas of coherent and decoherent errors in folded error channels (see Section 2).As a proof of concept, we test our method both physically and numerically by reconstructing the effective error profile on a single-qubit ancilla left idling during a cycle.The numerical simulation confirms a strong agreement between the error profile estimated by our protocol and the exact underlying error model.The data obtained from IBM hardware platforms (ibmq_guadalupe, ibmq_manila, and ibmq_montreal) revealed a substantial level of coherent errors occurring on the idling ancilla induced by an entangling operation (see Fig. 7).
In terms of next steps, we note that the fitting method derived in the current work only relies on the sample averages of fidelities, and does not take into consideration the shape of the distribution of the fidelities for fixed (m, x) hyper-parameters.As shown in Fig. 6, coherent errors affect more properties of the fidelity distributions than just the mean and we leave as an open problem the refining of the fitting function based on those considerations, as well as statistical optimization of the choice of hyperparameters.Finally, we leave the demonstration of our proposed CB-structured method as a means to design and benchmark error suppression tools for future work.29), for three different hardware platforms.We include the difference and the sum of lin P /2 and cst P /2 since these parameters are strongly anti-correlated (see Section 4.3).From Eq. ( 31), we get that the quad P /2 column contains the coherent contribution to the error rate from the hard cycle, while the (lin P + cst P )/2 contains the other contributions.These error rates are shown in Fig. 7. (

Acknowledgments
Definition 4. Let's define the coherent and decoherent infidelities of P ∈ P n as: Proof.A way to interpret the total evolution e Λ is to consider Λ as a transition matrix.The total evolution depicted by e Λ is then the sum over all paths, and each path is weighted by 1/J! where J is the number of jumps.The essence of the proof will be to quantify and categorize the transition amplitudes, and to sum up the different paths.For conciseness, we omit the graph dependence in the area of effect function, i.e.A (M |G) is replaced by A (M ).
Let's define the transition amplitude from the Pauli P ∈ P n to the Pauli Q ∈ P n as From the definition of the Lindblad matrix Eq. ( 5), we more specifically get Tr QL † j L j P . ( It follows by the hermicity-preserving nature of the evolution that t P →Q remains real.Let's decompose the operators L j Eq. (41a), L † j Eq. ( 41b) and H Eq. (41c) by inserting summations over the Pauli operators S as follows: The above expression can be further simplified.Let's define the commutation function, By using the commutation function and by taking the real projection of t P →Q , we get where we explicitly labeled the coherent and decoherent transition amplitudes.Let's break the transitions into two cases.
Case 1: P Q = QP First, if P Q = QP , then χ P,Q = 1 and we get In the special case where P = Q, we get Notice that the sum in one of the expressions above has 2 constraints, namely χ P,S = −1 and A (S) ≤ k.With these, the number of terms in the sum of Eq. ( 47) scales proportionally to the weight w(P ) of P , and scales exponentially in k and in the level of connectivity of the architecture's graph G.
In Eq. ( 46) the transition amplitude is still scaling as the squared magnitude of the Lindbladian terms.Applying the Cauchy-Schwarz inequality to the transition amplitude, we get : Because P Q = QP , we find that P QS anti-commutes with Q just like S, meaning that they belong to the same set of operators.This means that if S is summed over an indexed set {S|χ Q,S = −1, A (S + P QS) ≤ k} =: {s 1 , s 2 , • • • }, P QS is simultaneously summed over a permutation τ of that set: It follows from majorization inequalities that the sum is maximized for the trivial permutation [65]: The number of terms in the sum over S is constrained since 1. S has to anti-commute with the transition endpoint Q (i.e.SQ = −QS); 2. S + P QS has to be geometrically k-local (i.e.A (S + P QS) ≤ k); Since P Q ̸ = I, the second condition implies that non-zero transitions must obey A (P Q) ≤ k; in simpler terms, Q must differ from P by a k-local operator.To get a better picture of non-zero transitions, imagine P as a product of different "Pauli islands", defined as follows: Definition 5 (Pauli Islands).A Pauli P is said to be a k-island if any tensor product partitioning P = Q ⊗ R obeys Any Pauli P can be partitioned into a product of islands, and this product is unique.For instance, for k = 2 and a chain topology, P = (X 2 X 3 X 4 ) ⊗ (Z 6 Y 7 ) is a product of two islands.
Rules 1 (Transition rules for P Q = QP ).The allowed transitions must obey the following rules: 1. From the onset, P Q = QP and Q ̸ = P .
2. An allowed transition is the creation of a new geometrically k-local island.For instance with our P = (X 2 X 3 X 4 ) ⊗ (Z 6 Y 7 ) example, we could have an endpoint of the form Q 3. An allowed transition can be the geometrically k-local modification of a single island.Still with our P = (X 2 X 3 X 4 )⊗(Z 6 Y 7 ) example, we could have an endpoint of the form Q 4. The annihilation of an island is forbidden!This is because of the first constraint on S, which states that it must anti-commute with Q, and enforces S to connect to the endpoint Q.
5. Through the second rule, islands that are less than 2k edges apart can be merged via island modifications.Still with our P = (X 2 X 3 X 4 ) ⊗ (Z 6 Y 7 ) example, we could have a single island endpoint: Q By looking carefully at the above rules, notice that for a non-zero double transition t P →Q t Q→P to start and end at P and pass by Q such that QP = P Q, it has to be a geometrically k-local modification of a single island (see Rules 1, item 3).The reason for this is that although island creations have non-zero amplitudes, annihilation transitions are prohibited.Only a few triple transitions t P →Q t Q→R t R→P can start and end with P and involve an island creation: 1) start with an island creation that is within 2k edges, 2) merge the created island to one of P 's islands in the second transition, 3) return to P through a single island modification.
With this argument in mind, let's bound the total amplitude of double transitions t P →Q t Q→P that start and end at P and pass by Q such that QP = P Q: Notice that the constrained double sum over Q and S(S ′ ) goes over some k-local areas connected to P and over some k-local jumps in those regions.Therefore, up to a where T is a term with a magnitude bounded as where the second line was obtained by using ( The second factor in Eq. ( 56) scales as the average single transition t Q→Q over all Paulis Q with the same support as P :

A.3 Derivation of the Error Probabilities of a repeated channel
Error processes are not necessarily stochastic, meaning that it doesn't always make sense to discuss error probabilities.However, we can always project the error channels unto a stochastic one, and then consider the resulting error probability distribution.
As such, we define the effective Pauli error probabilities as the resulting Pauli error probability of a channel once it is projected onto its Pauli stochastic component.When it comes to Pauli stochastic channels, there is a duality between Pauli fidelities and Pauli error probabilities.The two are in fact related by a linear operation known as the Walsh-Hadamard transform.That is, if we consider a vector of fidelities, f (such as f = (f I , f X , f Y , f Z )), we can obtain the vector of error probabilities e (such as e = (e I , e X , e Y , e Z )) via: where W is the Walsh-Hadamard matrix.The entry W ij is 1 if the jth Pauli in the domain vector commutes with the ith Pauli in the image.W ij is −1 if the jth Pauli in the domain vector anti-commutes with the ith Pauli in the image.Therefore, we can get error probabilities from fidelities by using the Walsh-Hadamard transform on the fidelities given by lemma 1. Let's approximate the elements of the fidelity vector If we apply the Walsh-Hadamard transform W on the vector composed of these approximated elements, we get an approximate error vector e with elements More mathematical details regarding the transformation form fidelities to effective error rates is contained in [34].

B IBM Quantum Computing Hardware Architectures
The calculations for this project were run on three different IBM hardware platforms (ibmq_manila, ibmq_guadalupe, and ibmq_montreal).On the 5 qubit ibmq_manila platform, qubits q0, q1, q2, q3, and q4 are used with q0 as the ancilla and q1-q2 as the two-qubit entangling gate.The ibmq_guadalupe processor is a 16 qubit platform.On this processor the qubits used are q6, q7, q10, q12, and q13 with q10 as the ancilla and q6-q7 as the two-qubit entangling gate.For the computation done on the 27 qubit ibmq_montreal platform, qubits q18, q21, q23, q24, and q25 are used with q23 as the ancilla and q18-q21 as the two-qubit entangling gate.Fig. 9 graphically shows these qubit topologies for each platform.We also ran computations on the Keysight TrueQ simulator [66] for the analysis discussed in the supplemental material in Section 4.3.
ibmq_montreal qubit layout ibmq_guadalupe qubit layout ibmq_manila qubit layout The key metrics associated with each of the processors that help characterize the processor performance are • Quantum volume (QV) This value measures the performance of gate-based quantum computers, regardless of their underlying technology.
• Circuit Layer Operations per Second (CLOPS) is a measure of how many layers of a QV circuit a quantum processing unit (QPU) can execute per unit of time.
The CLOPS is calculated using three key attributes to measure the performance of near-term quantum computers (quality, speed, and scale) [67].
• The Falcon family of devices are medium-scale circuits.They were deployed by IBM as a test environment for demonstrating performance and scalability improvements over previous generation processors.Specifically the r4 is the first revision to deploy multiplexed readout.Previous designs required an independent signal pathway on the chip, as well as in the dilution refrigerator and control electronics for qubit state readout.
The ibmq_guadalupe is a 16 qubit Falcon r4p system with a quantum volume of 64 and 2.4K CLOPS.The computations on Guadalupe use qubits 6, 7, 10, 12, 13 with qubit 6 being used as the ancilla qubit.The ibmq_montreal is a 27 qubit Falcon r4 system with a quantum volume of 128 and 2.0K CLOPS.

C CER Protocol Measurements and Heatmap Methodology
This project measures the noise on the single qubit (q10 in ibmq_guadalupe, q23 for ibmq_montreal and q0 for ibmq_manila).Although the single qubit gates are set to the identity and should remain so despite the increase in noise from the additional CNOT folded hard cycles implemented through the extended CER mitigation protocol, these heatmaps show that what is measured is not what was expected.
The increase in the measured noise on the single qubit is graphically seen in Fig. 10-Fig.12).These figures are heatmap error signatures showing the magnitude of error recorded on these qubits.A dark color represents a relatively low value for the error; warmer colors represent a higher value.To the right of each figure is a bar with a color gradation and numbers that set the scale for that heatmap representing the level of the infidelity being measured.
Each of the three figures ( Fig. 10 -Fig.12) shows five sub-figures representing the single qubit error measurements for CNOT hard cycle repetitions 1, 3, 5, 7, and 9 for each of the three IBM hardware platforms.The y-axis label of the heatmap shows the X, Y, and Z Pauli errors for the qubit of interest (q0 for ibmq_manilla, q10 for ibmq_guadalupe and q23 for ibmq_montreal).
These heatmaps also clearly show that as the number of hard cycles is increased from 1 to 9, the magnitude of the single qubit Pauli errors (especially the Z error) grows by an order of magnitude on each of the three different hardware platforms.This is a graphical signature that the CNOT folded cycles are contributing an increasing magnitude of error that is detected by the deviation of the single spectator qubit from what should have been an identity gate signature.

L
j as linear combinations Pauli operators: where P n denotes the n-qubit Pauli basis defined by the Kronecker products of the Pauli matrices I, X, Y, Z.The coefficients obey h P ∈ R, ℓ j,P ∈ C. In practice, the squared magnitudes |h P | 2 and |ℓ j,P | 2 are upper-bounded by the 2-qubit error rates because they are ultimately tied to geometric near-local interactions.

Figure 1 :
Figure 1: Visual representation of the CB-structured circuits in our error diagnostic scheme.The core cycles in gray are wrapped in state preparation and measurement (SPAM) circuits specified by a set of commuting Paulis S. Dressed cycles are repeated m times and they each contain the x-folded hard cycle of interest (E hard • C hard ) x , where C x hard = C hard .The easy cycles are randomized (the random aspect of the circuit is specified by a randomly sampled string s) but chosen together with m and the SPAM circuits such that the whole circuit amounts to a Pauli (which is accounted for in post-processing).

Figure 3 :
Figure3: Error profile obtained from a simulation of our experiment.The coherent contributions refer to the hard cycle coherent errors.For the simulated error model, we used the relaxation times from ibmq_montreal, and introduced a coherent Z error with effective rate of 0.002.

Figure 6 :Figure 7 :
Figure 6: Fidelities f X , f Y and f Z versus sequence lengths for CNOT repetition 1,3,5,7, and 9 for ibmq_manila.Each CNOT repetition is color coded and identified in the legend in the upper left portion of the figure.Each CNOT repetition was randomly compiled 30 times with different Paulis at each sequence length.Each individual data point plotted is color code matched to the corresponding CNOT repetition.The average value from the 30 different data points for each CNOT repetition at each sequence length is then plotted on the graph with the diamond symbol(⋄) and standard deviation error bar.The global fitting from the model produces 75 values for the estimated mean and is plotted with star symbols (⋆)

P
.D. was supported in part by the U.S. Department of Energy (DoE) under award DE-AC05-00OR22725.S.K.R. acknowledges financial support from the J. William Fulbright Foreign Scholarship Board and the Fulbright Commission in India (USIEF) through a Fulbright Nehru Doctoral Research Fellowship 2021-2022.We acknowledge the use of IBM Quantum services for this work.The views expressed are those of the authors, Definition 3 (decoherent error rate over a set of qubits).Consider a set of qubits V .We define the decoherent error rate over those qubits as

Figure 8 :
Figure 8: Visualization of different Paulis P ∈ P 16 in a planar 16-qubit architecture.a) Visualization of subgraph associated with the area of effect A(P |G) on a planar architecture.The Pauli in figure a) has an area of effect of 7. b) Visualization of k-extended supports of a Pauli for k ∈ {0, 1, 2}.

Figure 9 :
Figure 9: IBM quantum hardware platforms and specific qubits used on each to run the 4 qubit spin-spin correlation function circuit.The black circles indicate the specific qubits used for the KNR two-qubit CNOT and ancilla effective dressed cycles and Pauli fidelity calculations

Figure 10 :
Figure 10: Heatmap of ibmq_guadalupe processor.To the right of each figure is a bar with a color gradation and numbers that set the scale for that heatmap representing the level of the infidelity being measured

Figure 11 :
Figure 11: Heatmap of ibmq_montreal processor.To the right of each figure is a bar with a color gradation and numbers that set the scale for that heatmap representing the level of the infidelity being

Figure 12 :
Figure 12: Heatmap of ibmq_manila processor.To the right of each figure is a bar with a color gradation and numbers that set the scale for that heatmap representing the level of the infidelity being measured

Table 1 :
Various fitted parameters corresponding to the 12-parameter model presented in Eq. (