Towards a general framework of Randomized Benchmarking incorporating non-Markovian Noise

The rapid progress in the development of quantum devices is in large part due to the availability of a wide range of characterization techniques allowing to probe, test and adjust them. Nevertheless, these methods often make use of approximations that hold in rather simplistic circumstances. In particular, assuming that error mechanisms stay constant in time and have no dependence in the past, is something that will be impossible to do as quantum processors continue scaling up in depth and size. We establish a theoretical framework for the Randomized Benchmarking protocol encompassing temporally-correlated, so-called non-Markovian noise, at the gate level, for any gate set belonging to a wide class of finite groups. We obtain a general expression for the Average Sequence Fidelity (ASF) and propose a way to obtain average gate fidelities of full non-Markovian noise processes. Moreover, we obtain conditions that are fulfilled when an ASF displays authentic non-Markovian deviations. Finally, we show that even though gate-dependence does not translate into a perturbative term within the ASF, as in the Markovian case, the non-Markovian sequence fidelity nevertheless remains stable under small gate-dependent perturbations.


Introduction
Having effective and efficient methods for characterization, verification and validation, is of vital importance for the production of useful quantum devices, and it will remain an integral part of them, serving as their debugging component. The Randomized Benchmarking (RB) protocol [1][2][3][4][5] has become the method par-excellence to begin a characterization process, being a highly economical method in many respects, in comparison e.g., with Quantum Process Tomography [6] or Gate Set Tomography [7], despite providing only a limited amount information. In essence, RB consists of a series of steps whereby sequences of gates drawn at random are applied and averaged over many sequence runs, with the outputs allowing to estimate average error rates within the device. The simplicity of the core idea of RB led to a plethora of methods for different specific purposes, for different sets of gates or under different noise profiles [8].
A remarkable feature of standard RB under simplifying assumptions is that the profile of the data produced by it, which we call an Average Sequence Fidelity (ASF), follows an exponential decay in the number of gates that are applied. Furthermore, in this case, State Preparation and Measurement (SPAM) errors do not affect such decay [1][2][3][4][5]. This is not a universal feature of RB, but rather it is a consequence of both the particular gates being benchmarked and the approximations made regarding the underlying noise. In the simplest case, these correspond to multi-qubit Clifford gates, and assumptions such as time-independence, gateindependence and Markovianity 1 . Beyond these assumptions, it is known that within the Markovian approximation, the decays stay relatively benign, whereby time-dependent ASFs become an exponential with changing decay rates between steps, and gate-dependence simply adds up a perturbative term, itself decaying exponentially in sequence length.
Most literature to date on noise characterization techniques, including RB, begin by assuming Markovianity. The characterization of errors beyond this has largely remained unexplored. Due to the very fragility of open quantum systems and the often unstable nature of their surroundings, the Markovian assumption can quickly become ineffective, rendering standard methods unreliable. Indeed, time dependence in effects such as drift [9][10][11], crosstalk [12][13][14], leakage [15][16][17], as well as other memory effects, is an ubiquitous and unavoidable feature [10,11,[18][19][20][21][22][23][24] of open quantum systems. Therefore, a framework for the characterization of temporally-correlated (non-Markovian) errors becomes a crucial aspect to address in order to reach scalability and reliability in quantum devices.
In this regime of non-Markovian noise, both particular cases [25][26][27][28], as well as the general ASF [29], have been studied, showing that the simplicity of RB can still be exploited. These, however, are, to date, restricted to the gate sets forming a unitary 2-design, of which the multiqubit Clifford group is an example [30]. Furthermore, as in the case for multi-qubit Cliffords, these gates are often built-up from other smaller generators, rather than being themselves simpler operations. While a generalization to gate sets forming finite groups in the case of Markovian noise has been addressed in quite some generality [8,31,32], it has been hitherto unclear what happens in the non-Markovian scenario, and whether this regime can also be incorporated into a more general RB framework.
In this manuscript, we establish a framework for RB under generic non-Markovian noise, as long as it is gate-independent, for any gate set that forms a finite group admitting a multiplicityfree representation. This framework generalizes both the Markovian and non-Markovian cases for standard gate-independent RB considered thus applied, and noise forgetting previous noise, respectively. far, and allows to tailor for most specific-purpose RB protocols developed within the Markovian assumption [8]. In particular, it lets us swiftly obtain a general ASF, propose an operational non-Markovian fidelity figure of merit, and obtain conditions under which noise will produce uniquely non-Markovian RB data. Furthermore, given that gate-dependence and more general context-dependence are also generally unavoidable in realistic scenarios, we argue that the Markovian, gate-dependent, results of [33,34] do not generalize trivially to the non-Markovian case. Nevertheless, we show stability under small gate-dependent perturbations, and discuss potential ways for general gate-dependence to be incorporated within our framework.
We thus establish a comprehensive, albeit by no means exhaustive, theoretical framework for RB with a wide class of gate sets for temporallycorrelated noise at the gate level. Given the pressing need to understand time-dependent noise effects in quantum technologies, as these scale-up in size and depth, this seeks to eventually incorporate non-Markovianity within a general framework for RB, in the spirit of [8], or further within a more general framework for quantum device characterization.
We begin with a review of standard RB and non-Markovianity in Section 2. We then present our main results, which are distributed as follows: In Section 3, we obtain a non-Markovian and gate-independent ASF for finite groups which clearly captures the role of the environment in carrying information about the average noise in a RB experiment. We do this by introducing quantities called quality maps, the central objects carrying average noise rates and the temporal correlations therein.
In Section 4, we propose a method to operationally quantify average non-Markovian gate fidelities of full RB sequences by consistently averaging over initial states and measurements.
In Section 5, we obtain sufficient and necessary conditions on the noise in order to witness authentically non-Markovian deviations in the ASF. In particular, we find that the noise on the full environment plus system has to be such that intermediate states increase in purity by at least the square of the size of the system. In Section 6, we incorporate gate-dependence and argue that the results in [33,34], showing that gate-dependence induces a single exponentially vanishing perturbative term on a Markovian ASF, do not carry on to the realm of non-Markovian noise. Nevertheless, we show that both the ASF and the variance of the sequence fidelity remain stable under small gatedependent perturbations in the noise.
Finally, in Section 7, we give some perspective of our results and propose ways of moving forward in the treatment of non-Markovian noise within characterization protocols and techniques.
2 An overview of Randomized Benchmarking and non-Markovianity

Standard RB
Consider a sequence of m quantum gates, G m := m i=1 C i := C m • · · · • C 1 with • denoting gate composition, followed by an inverse sequence, i.e., an undo-gate, C m+1 := 1 i=m C −1 i . This amounts to an overall map that is the identity gate. In the standard gate-independent RB protocol (shown in detail in Appendix A.4), one considers a composition sequence S m := m+1 i=1 (Λ i • C i ), where the Completely Positive Trace Preserving (CPTP) maps Λ i model gate-independent noise inherent to the physical realization of the gates. This is equivalently expressed as Experimentally, for a suitable initial state ρ and a Positive Operator Valued Measurement (POVM) element M , the protocol outputs give an estimate of the ASF, where averaging, denoted by E, is taken over all gates G i . This is the central quantity in standard RB.
Whenever the gates G i belong to a unitary 2design, i.e., a set with identical second moments as the uniform unitary group, and when the noise is time-independent, i.e., Λ 1 = Λ 2 = · · · = Λ m+1 , the ASF takes the form of a decaying exponential in the sequence length, with the rate of decay capturing the average gate fidelity of the phys-ical gates with respect to the ideal ones 2 , and the SPAM errors being absorbed in both a multiplicative and an offset constants. In general, the functional form of the ASF depends not only on the specific gate set to be benchmarked, but also on the assumptions made about the noise. Both a class of non-Clifford gate sets has been considered [8,12,31,32,[35][36][37][38] and the assumptions on the noise relaxed, e.g., for time-dependent [4,5,39], gate-dependent [4,33,34,40,41] or non-Markovian noise [27][28][29], although to this day, arguably the least explored regime is that of non-Markovian noise.

Non-Markovian Quantum Processes
Non-Markovianity generally refers to a dependence of subsequent outcomes on previous ones, and in the context of RB, it implies that the noise at a given step is temporally correlated with the noise that preceded it and that the data outputs cannot be obtained by modeling noise as local quantum channels. The functional form of the ASF for unitary 2-designs in the non-Markovian regime is not that of a decaying exponential in sequence length anymore, but rather that of a non-trivial function of the memory within the noise [29]: this makes the extraction of operationally meaningful figures of merit a more elaborated task than in the Markovian case, despite the simplicity of the RB protocol. Nevertheless, the mere fact that no physical system can be completely isolated from its surroundings requires considering the presence of temporal correlations.
Classically, non-Markovianity can be described by a stochastic process {X t } where information is being sent between timesteps such that the state of the system is conditionally dependent on the past, i.e. (2) for any integers 0 ≤ ≤ k and sequences of event outcomes x i , with P(·|·) denoting a conditional probability. In particular, when = 1, the process is called Markovian and when = 0 it is called random; otherwise, the process is 2 There are some caveats to this statement having to do with the concept of gauge-freedom in the representation of the gates; for detail, see e.g., the section "Randomized Benchmarking and Average Fidelity" of [8].
non-Markovian with Markov order . The fact that the complexity in describing non-Markovian processes increases exponentially in increasing Markov-order can be seen from joint probabilities requiring up to -point correlations within the respective conditional probabilities.
Quantum mechanically, the process tensor 3 framework [49-56], takes into account the invasive nature of observation to unambiguously provide a generalization of the condition in Eq. (2), as shown in [50]. In this case, the medium for information to be sent across timesteps is an environment E, defined by a Hilbert space H E part of a bipartite closed system H E ⊗ H S , with S being the system of interest. Henceforth we set the respective dimensions as d E d S := dim(H E ⊗ H S ). Then, for an initial state ρ of SE, and upon measuring a POVM J k := {M (k) xn } xn on system S, we may describe the probability of observing a sequence of quantum events x k , . . . , x 0 by is the state of system S at the k th timestep, with U i being unitary maps on SE describing the evolution of the full system between timesteps and A x i being Completely Positive (CP) maps acting on system S alone: precisely, each xn is an experimental intervention represented by a CP map with state outcome x n , and such that xn A (i) We drop the super-indices in Eq. (3) for clarity, which we may write more succinctly as the inner product where T denotes a transpose, and Υ k and Θ k are tensors containing all dynamics {U i } and all interventions {A i }; in the Choi-Jamiołkowski representation, these take the form 3 The object we call process tensor is also known in different settings as quantum comb [42,43] where aux := A 1 B 1 . . . A k B k , with A i , B i being d S -dimensional auxiliary spaces, S i being a swap map between S and A i , and ψ = |ii jj| being an unnormalized maximally entangled state.
The process tensor framework thus allows to neatly separate the underlying dynamical source for any given quantum process, including all temporal correlations therein, from all experimentally controllable operations. This description is entirely general as a quantum stochastic process framework [52,55], and similarly the instruments used to describe interventions are entirely general and can be temporally correlated themselves. Similar to the case of quantum states, the choice of employing a Choi state representation in Eq. (5) allows us to readily deduce properties of the process. In particular, temporal properties get codified as spatial properties within the Choi state, so that a Markov process takes an uncorrelated form, Υ (Mkv) := i Y i:i−1 ⊗ ρ S , with Y i:i−1 being individual Choi states of dynamics connecting the (i−1) th and i th steps. This implies that we may quantify the non-Markovianity of a process by simply quantifying its distinguishability from the closest Markovian one, i.e., N := min Υ (Mkv) d(Υ, Υ (Mkv) ), for any operationally meaningful distance measure d(·, ·).

Non-Markovian RB
As stochastic processes are ubiquitous in science, the process tensor framework has proven useful in a wide range of topics, from the foundational [52, [57][58][59][60][61], to the applied in the characterization and control of quantum devices [62][63][64][65]. In the case of RB, it is clear that we can describe the ASF in Eq. (1) as a contraction of process tensors without a need to assume Markovianity for the noise. That is, we now have where I E is an identity map on E, while Λ is now a CPTP map on SE and G i acts solely on S. The probabilities rendered by the ASF now correspond to Eq. (4) by replacing U i → Λ i in Υ m , and replacing A i → G † i • G i−1 in Θ m , which is then averaged over each G i gate.
As it will prove convenient, henceforth we will employ the superoperator [66] (a.k.a. Liou-ville [39] or natural [67]) representation of quantum channels, whereby quantum states get represented as vectors and quantum channels as matrices, both in spaces with respective dimensions squared. This is briefly detailed in Appendix A.3.

Notation 1.
In particular, we distinguish the vectorized and the superoperator representation, respectively, by the double ket notation, | · , and by hats on maps,X .
That is, the ASF for a non-Markovian RB experiment is equivalently written as where 1 E is an identity on a d 2 vector, with ·| := (| · ) T being a co-vector. We will simply denote the spaces by E or S, with their dimensionality being implied by context.
In [29], the exact functional form of the non-Markovian gate-independent ASF was computed for unitary 2-designs, and a set of methods was presented to estimate features of the noise. Despite this, such functional form remains somewhat obscure mathematically. Below we expand the class of gates considered to finite groups, and along we obtain an ASF which is much more transparent as to what the mechanism is behind the data of an RB experiment subject to gateindependent non-Markovian noise.

Non-Markovian Average Sequence Fidelity Beyond Unitary 2-Designs
We now relax the unitary 2-design restriction on the gates to be benchmarked to any finite-group admitting a multiplicity-free representation 4 . For all necessary background details on the representation of finite groups, we refer to Appendix A.1.

Notation 2.
Henceforth, we let G be a finite subgroup of the d S -dimensional unitary group U(d S ), 4 Relaxing this restriction can be done similarly as in [8]; an in-depth analysis of this case might eventually be needed, given its experimental relevance, see e.g. [68].
such that the superoperator representation of the G gates in Eq. (7) and Eq. (9), is multiplicity-free, i.e., n π = 1 for all φ π , and where φ π are the irreducible representations and R G is a set of labels for the corresponding irreducible subspaces.
We begin with a proposition and a definition, both of which will make clearer the expression for the ASF of a RB experiment under non-Markovian noise with gates sampled uniformly from the group G.
Proposition (Subspace φ-twirl). Let T φ denote the so-called twirl associated to the representation φ of G, i.e.
then, for a given CP map Λ, we can write whereP π is a projector operator onto the irreducible subspace defined by φ π , and where f ee εε π := tr ee |Λ|εε P π tr P π (14) with all being arbitrary orthonormal bases for E, and with identities on system S being implicit (e.g., e| means e| ⊗ 1 S ).
The proof that Proposition 3 is true, can be seen in Appendix B.1, and it follows directly from Schur's lemma applied to the φ-twirl.
In the absence of an environment, i.e., with Λ = Λ (Mkv) being a noise map acting solely on S, this is T φ (Λ (Mkv) ) = π f πPπ , where was labeled a quality factor in [32] within the context of RB. This motivates the following.
A length-1 quality map corresponds to that in Eq. (13), i.e.Q n=1,π =Q π , and such that f ee εε n=1,π = f ee εε π . For general n, the n-point quality factor f ee εε n,π can be read as the components of an environment-dependent n-point function with correlations between maps Λ 1 , . . . , Λ n , mediated through the environment via the ( ) i components. This is because we may equivalently write the npoint quality factor as a product of single 1-point quality factors contracted through the environment, that is, where f abcd (i),π := tr( ab|Λ i |cd P π )/ tr(P π ), i.e., the subindex in parenthesis refers to the quality factor being associated to the i th noise map Λ i , and sum is over all i and i indices, with e, e , ε, ε indices being free (summed over in Eq. (16)).
In the absence of an environment, with Λ (Mkv) 1 , . . . , Λ (Mkv) m acting solely on S, the quality map turns into a scalar Q m,π = i f (i),π , product of m quality factors f (i),π = tr(Λ (Mkv) iP π )/ tr(P π ), which are now all independent of each other. In the context of RB, this renders the ASF as m+1P π |ρ S for timedependent noise, or as a linear combination of exponentials F (Mkv) m = π f m π M |Λ (Mkv)P π |ρ S , as in [31,32], for time-independent noise. For unitary 2-designs, as detailed in Appendix A.6, there are two invariant subspaces with f π=1 = 1 and f π=2 = (tr[Λ (Mkv) ] − 1)/(d 2 S − 1), with the remaining M |Λ (Mkv)P π |ρ S corresponding to the SPAM error constants A and B. The trace-preserving property of the noise gives rise to the constant unity factor f π=1 for the trivial subspace; otherwise, for trace non-increasing noise, this quality factor corresponds to a trace-loss quantifier.
Still within the Markovian case, as explained in [32], for non-Clifford gate sets, the quality factors do not always have the same straightforward interpretation as a noise strength or traceloss, precisely because their contributions to the ASF end up in different irreducible subspaces, although these still generally provide information about the quality of the noisy gates. Turning to the non-Markovian case, we first present the following.
Theorem (Average Sequence Fidelity). Given a non-Markovian RB sequenceŜ m of length m as in Eq. (9), with the gates G i satisfying Eq. (10) with irreducible representations φ π , and Λ i being CP maps on SE, the corresponding ASF with an initial state ρ and measurement POVM element M is given by (19) where R G is set of labels for the spaces associated to each irreducible representation, theP π are projector operators onto these, and Q m,π is the length-m quality map associated to the maps Λ 1 , Λ 2 , . . . , Λ m on the π th irreducible subspace, as per Definition 3.
The proof follows directly from Proposition 3 and Definition 3 but is presented coherently as well in Appendix B.1.
Despite dealing with a more general and abstract scenario than that of the unitary 2-design case, the ASF in Theorem 3 is rather conceptually simple in that information about intermediate noise, including all its correlations, is carried through the length-m quality maps Q m,π through the environment across the π irreducible subspaces. Similar to the Markov case, these subspaces might be thought of as quality sectors for the noise, being the trivial subspace the corresponding to the ideal noiseless case. As shown in detail in Appendix B.2, the unitary 2-design case reduces to the expression obtained in [29], of the form where the maps A m and B m contain corresponding quality maps related to the depolarizing effect of the average over system S. Now, given that in realistic scenarios temporal correlations are effectively finite [69,70], a relevant case is that when the ASF contains only a smaller length with non-Markovian behavior than that of the full sequence length. We then have the following.
Corollary (Finite non-Markovianity). Let where Γ is a CP map between SE spaces and Φ (Mkv) is a CP map solely between S spaces. Then, for a RB sequence of length m, with timeindependent noise, i.e., Λ i = Λ j for all i = j, we have with f π being quality factors associated to Φ (Mkv) and Q n,π the quality map associated to n copies of Γ. We denote by The proof is shown in detail in Appendix B.6 for the more general time-dependent case, where similarly the noise at the i th step is modeled as In particular, Eq. (21) becomes a relevant perturbative non-Markovian expansion whenever 0 q 1/2. This may serve to analyze finite non-Markovian deviations on smaller sequence length intervals whenever a model of the form in Eq. (20) is available. In any case, Eq. (22) describes a finite-memory ASF with steps displaying non-Markovian deviations; this serves as a generalization of the finite-memory case for 2designs introduced in [29], and similarly may be used to estimate the Markov order, as well as several related features, for finite-memory effects in quantum devices.
In general, the central quantity capturing average noise in non-Markovian RB remains the quality maps Q m,π . In the following section we propose a way to operationally extract information about these through a slight modification to the standard RB protocol.

Average Process Fidelity
The main advantage of RB is that decays of the ASF can be straightforwardly estimated experimentally; in particular in the Markovian case, obtaining relevant figures of merit, such as the average gate fidelity of the physical gates relative to the ideal ones, is reduced to a fitting problem. In the case of finite groups, the fitting has to be done to a linear combination of exponentials: this can be achieved via a modified RB protocol called Character Randomized Benchmarking [32], allowing to operationally estimate the individual decays over each irreducible subspace (including the non multiplicity-free case [8,68]). While in the non-Markovian case it is possible to execute an analogous Character RB protocol, as pointed out in Appendix B.7, the fact that the individual irreducible parts of the ASF are a non-trivial function of the environment, remains.
In the case of Markovian, local noise in system S, the so-called average gate fidelity of physical gates, which we can model as Λ (Mkv) • C i , with respect to the ideal gates C i , is equivalent to that of Λ (Mkv) with respect to the identity I S , which is defined as with pure states |ψ ∈ H S . As in [32], it can be shown that this gate fidelity is related to the data outputs of the RB protocol as where f π here are the quality factors of Λ (Mkv) , which can be estimated individually through Character RB.
For the non-Markovian case, however, the quality map is dependent on the environment E, and furthermore, SPAM errors might correlate S with E, thus affecting the final fidelity outputs. This stresses the need of a figure of merit benchmarking full quantum noise processes rather than individual noise rates whenever errors are temporallycorrelated. In a sense, too, it is rather error correlations within SPAM which matter, rather than the local errors themselves.
Frameworks such as [71] and [72], generalizing the RB theoretical framework and technique have been proposed which could provide new ideas to tackle this problem. Here we point out the following. Let us definê where here Λ 0 , acting on the full SE, encodes state preparation errors and correlations, for some fiducial pure state ε of E, and so that Eq. (19) turns into F m = π M |F S m,π |ρ S for some prepared initial state ρ S of S. We can regard the output data of the ASF as a distribution in initial states ρ S and measurements M , which are the parameters we can still fix; further averaging over initial states and POVM elements is equivalent to obtaining the average gate fidelity of π F S m,π , and individual instances for each subspace, F S m,π , can be estimated via Character RB. As per the definition in Eq. (23), for simplicity, we may consider first randomizing both the initial state and the measurement element with a fixed unitary map U, drawn uniformly at random from a unitary 2-design, so that averaging over it gives the average gate fidelity of the map F S m,π with respect to the identity. That is, we take where N and M are noise, CP maps, and U is a unitary map, all acting solely between S spaces, and with |ψ , |r an arbitrary vector and orthonormal basis vector of S, respectively. The initial state ρ S is now a random initial noisy state with pure target state |ψ , and M r is the r th element of the random POVM [73] {M r }. The local SPAM noise, N and M, which we take as independent of U, are due to randomization and can now simply be absorbed in Λ 0 and Λ m+1 in Eq. (25). This could equivalently be done by drawing the gates G 1 = G m = U on each run of the RB protocol from a unitary 2-design.
As detailed in Appendix C, averaging over such initial states and measurements, and taking the noise to be trace-preserving, we get where and which directly gives an average gate fidelity for the full m-step noisy process, taking into account all correlations within, including those induced by the SPAM noise. Ideally, the target initial state and measurement should satisfy r|ψ = 1, where no information about the average gate fidelity is obtained.
In particular, Eq. (28) is such that whereŜ :=Λ 0Ŝ εt r EΛ m+1 is a SPAM-only operator, here withŜ ε := (|ε ⊗ 1) ⊗ (|ε ⊗ 1) * and primed operators, Λ 0 and Λ m+1 , indicating the absorbed N and M local SPAM noise, respectively. While Eq. (27) already gives an operationally meaningful average gate fidelity for a full RB process -more rightly called an average process fidelity-, one may estimate separately the noise influence from SPAM-induced correlations from that within the quality map of the process via Character RB and Eq. (30), albeit requiring a prior estimate for the weight of the SPAM noise.
Whenever the noise is effectively Markovian, clearly, averaging over initial states and measurements simply amounts to averaging the SPAM terms; while redundant when only the gate set is of interest, this could also serve to estimate systematic average SPAM error rates. The average process fidelity in Eq. (28) for the Markov case simply turns the trace into a product of quality parameters, including that of the SPAM noise; for the unitary 2-design case these can be related back to individual average gate fidelities, and the ASF is simply E ρ S ,Mr [F m ] = αp m q + β for α, β constants just depending on the inner product of targets | r|ψ | 2 , and q the average noise-strength of the SPAM noise, as q = . This is worked out in detail in Appendix C.1.
A crucial difference now, is that not only the decay profile of the ASF, but also the average fidelity for the whole process is dependent nontrivially on the sequence length m through the environment via the quality factors Q m,π . Then again, this is a consequence of the presence of multi-time correlations. While this implies that it is not possible to fit a simple function of sequence length to the ASF, it can serve as a diagnosis for non-Markovianity and a means to quantify it, as discussed next.

Signature of non-Markovian noise in Randomized Benchmarking
A first signature of non-Markovianity in a RB experiment is the display of deviations from an exponential decay in the ASF [29]. However, deviations might not be evident or significant, either statistically after short sequence lengths, or fundamentally due to the noise itself, making the experiment blind to non-Markovianity, or the noise might be time-dependent but Markovian, not displaying multi-time correlations but rather some arbitrary temporal dependence changing the exponential rates of decay of the ASF data. Similarly, gate-dependence generally generates deviations from an exponential [34,39]. Anyhow, using the framework established before, we may look for unique signatures of non-Markovianity.
In [39] it is pointed out that, in the context of RB, if some of the (quality) parameters are observed to be greater than 1, the experimental noise must be non-Markovian. Wallman and Flammia notice that the decay of the ASF for time-dependent Markovian noise is given by products of time-dependent quality parameters, so that these can only change the rate of decay, as all of them are upper-bounded by unity. Having an increase within the ASF in increasing sequence length thus points to non-Markovianity. As can be seen with the analysis presented above, the same will hold for RB with the finite group G, with the decay specified by f (1),π · · · f (m),π , and with each quality factor satisfying f (i),π = tr(ΛP π )/ tr(P π ) ≤ 1 for all π, due to Λ being trace non-increasing, or in particular trace-preserving.
That is, in general having F n > F m for any n > m must imply a non-Markovian effect, as this cannot be explained within the Markovian framework 5 . Indeed, in the following lemma we give sufficient and necessary conditions, which can only be satisfied in a non-Markovian framework, for the condition F n > F m with n > m to occur.

Lemma (Conditions for non-monotonic ASF).
Let n, m be positive integers such that n > m, and let F n and F m be two ASF corresponding to the same underlying noise process described by CPTP maps Λ 1 , . . . , Λ m and Λ 1 , . . . , Λ n , respectively, acting on SE. A sufficient condition for for all π, where Q m,π and Q n,π are corresponding quality maps to F m and F n , respectively; i.e., that the difference of matrix quality maps, is necessary, where · denotes operator norm, here corresponding to maximum singular value.
The proof is shown in Appendix D, mainly relying on the inequalities tr(XY ) ≤ X tr(Y ) for positive X, Y , and tr A (X AB ) ≤ d A X AB proven in [74]. While Eq. (31) is an expected consequence of all information about the average noise being carried within the quality maps, Eq. (32) places a relevant necessary constraint on the noise represented by the Λ i maps. Clearly, conditions in Eq. (31) and Eq. (32) cannot be satisfied within Markovianity, i.e., if the quality maps are quality factors and if the noise maps act solely on subsystem S.
In particular, consider time-independent noise, Λ i = Λ j = Λ for all i = j, and n = m + 1, then Eq. (32), now Λ > d S , says that the maximum increase of purity by the map Λ on the full SE state at the corresponding step must be over d 2 S , which already rules out, for example, coherent (unitary) noise. For an increase in the ASF after a sequence length n−m > 0 with time-independent noise, we get Λ > d . Moreover, this is relevant as well using the inequality X ≤ √ d for any CPTP map acting between d-dimensional spaces (Theorem II.I in [75]), as it implies that the environment needs In general, as the trace terms in Eq. (32) are normally in practice different from zero, and almost always close to dim 2 ES , this condition can be interpreted analogously, as requiring an increase of purity by the noise of at least , with the ratio of traces being close to unity.
While Markovian time-dependence can be obtained via a superfluous environment getting discarded between steps, non-Markovianity implies a time-dependence such that an environment correlates timesteps with each other. In particular, with Markovian time-dependence, one can estimate average gate fidelities over arbitrary sequence length intervals, as also shown in [39], because temporal modularity is such that products of quality factors satisfy (f 1 · · · f n )/(f 1 · · · f m ) = f m+1 · · · f n for any n > m (and for all π, which we omitted here). An analogous property is not satisfied by quality maps, precisely due to any given step depending on all previous ones.
Understanding such unique non-Markovian deviations more deeply could be relevant to further be able to use this to one's own advantage, either for control, mitigation or otherwise, of the noise in question. Up to this point, however, we have assumed gate-independence for the noise: a remaining question is thus, what the impact of gate-dependence, and/or context errors, is when considered together with non-Markovianity?
6 Gate-dependence in non-Markovian Randomized Benchmarking In [33,34], it was shown that taking into account gate-dependence within the noise, whenever we assume Markovianity, the ASF for a RB experiment with unitary 2-designs will behave as an exponential plus a perturbative term due to gatedependence, which itself decays exponentially in sequence-length. This result is further extended in [32] to RB with the group G as we have considered here, with the same conclusion. Further analyses in more generality can also be seen in Ref. [8,40]. This result, however, does not extend to the case of non-Markovian noise.
Here, to make this point 6 , we follow the argument in [33], which starts by noticing that instead of using the Λ maps, we could have instead chosen to model the noisy gates as J := L • G • R for CP maps L and R; this would render an equivalent ASF to the standard Markovian one with unitary 2-designs. Having gate-dependent noise means either L, R, or both, depend on G, and we can write, e.g., J (g) := L g • G • R. More generally, then, we may define the map ∆ g := J (g) +J capturing all gate-dependence in the noisy gates. Denoting X j:i = X j · · · X i , we can expand the corresponding noisy sequence aŝ where here ∆ i = ∆ g i , and where where the second line follows from the (multi) binomial theorem. Of particular relevance is the rightmost term in the last line in Eq. (33), which mixes J and ∆ terms: the result in [33] (and generalization in [32]) imply that for Markovian noise, such mixed terms do not contribute to the ASF, and that the term ∆ m+1:1 gives a contribution that vanishes exponentially in sequence length.
Generalizing this result with the group G, as in [32], but when the noise is non-Markovian, would require for there to be L and R satisfying the properties where E here again is uniform averaging over the group G, and witĥ for some (length-1) quality map Q π , and wherê P π is a projector onto the irreducible subspace defined by the representation φ π of G as in Eq. (10).
Following [32], we start by plugging the definition of D g in Eq. (34a) and Eq. (34b), together with the multiplicity-free decomposition in Eq. (10), so that Now bothP π and φ(g) act solely on S, so we now take without loss of generality, both for all λ, and hence, writing corresponding identities explicitly, we have the equations and we can vectorize both sides 7 , and reorder 7 By means of vec(AXB) = A ⊗ B T vec(X); this follows from the definition we employ here, vec (|i j|) := |ij . spaces to get At this point, for Markovian noise, the quality map Q π is just a quality factor f π , thus giving right and left eigenvalue equations for the operator E[(φ π (g) * ⊗Ĵ (g) ) T ], and unfolding the proof for an existence of L, R operators satisfying all Eq. (34). This is not possible in general for the non-Markovian case, simply because of the presence of E. Furthermore, Eq. (39) make manifest that average gate-dependent errors will get carried within E through quality maps.
We may notice, however, that if the gatedependence is small in the sense of the physical implementation being given by for an scaled such that ∆ g ≤ 1, then the ASF and its variance remain close to the unperturbed ones. This can be done as in [39], as shown in Appendix E.2, so that |F m − F is the ASF corresponding to a sequence of perturbed gates J ( ,g) m+1:1 , whenever where in particular, with non-Markovian noise, the scaling with increasing d E also requires a smaller , in a sense also pointing out the relevance of the environment for general, not necessarily small, gate-dependence. For the variance, similarly, if for some δ V > 0, m are variances of the unperturbed and perturbed sequence fidelities, respectively. This bound would thus require a perturbation a quarter times smaller for a change in variance of the same magnitude as a change in average of the sequence fidelity. Both bounds are almost the same as in [39], with the extra d E factor being a consequence of the noise acting on the whole SE; the fact that RB is done only on subsystem S is of no consequence, as pointed out in Appendix E.2.
The main upshot is that both gate-dependence or more general contextual effects, together with non-Markovian noise, in RB and almost surely in any other technique, would require a further dealing with the environment. One possible way forward could be to study non-Markovian gate-dependent RB with Fourier analysis, similar to [40]; this has the potential to be not only more compact and simple, but also to allow for more generality and to obtain deeper consequences.

Conclusions
We have established a Randomized Benchmarking (RB) framework for non-Markovian noise with gate sets forming a finite group which admits a multiplicity-free representation. Despite this being a more general and abstract case than that of unitary 2-designs [29], it renders a much clearer functional form of the Average Sequence Fidelity (ASF) as described in terms of quality maps, which we identify as the objects carrying average noise through the environment that mediates temporal correlations. Quality maps naturally generalize the concept of quality parameters [32], from Markovian to non-Markovian RB, as the central quantities capturing average noise rates within the ASF. The main difference between quality maps and quality factors, is that the first capture all temporal correlations within a RB experiment; as such, all information to be known about the intrinsic noise is carried through these, with all remaining quantities being experimental choices for the benchmarked gates, initial state and measurement.
The main obstacle to operationally extract average error rates from quality maps can be tracked back to the environment. Furthermore, the mere fact that prepared initial states and measurements can give rise to correlated State Preparation and Measurement (SPAM) errors, appears as a downside to employing the RB technique under this type of noise. Nevertheless, we have provided a means to bridge this gap and obtain an operational estimate of gate fidelities for full non-Markovian RB processes by simply extending the RB protocol to include a coherent averaging over initial states and measurements. This effectively removes the environmental functional dependence and renders a single figure of merit as an average fidelity of the full RB process, including the correlated SPAM. Further estimates of average noise of gates alone can then potentially be made through Character RB [32] or otherwise.
In non-Markovian RB, given that noise rates at a given timestep will depend on all previous ones, gate fidelities for single gates make sense only when relative to a whole noise process. This prompts pointing out that purely non-Markovian behavior in RB can be detected, and accounted for as above, whenever fidelities are seen to increase in subsequent timesteps. We obtained conditions under which these deviations can be observed, and in particular we highlight that it is necessary for the noise at a given step to increase the purity of the full SE state at such step by over d 2 S , together with having an environment with a size over that of the system, d E > d S , to witness such increase in a subsequent step.
Finally, arguably one of the major hurdles to be cleared in any benchmarking or characterization technique, is incorporating non-Markovianity together with gate-dependence or a more general context-dependence. We noticed that, while the results of [32,33], showing that gate-dependence introduces a single perturbative term in the ASF (itself decaying exponentially in sequence length) do not extend trivially to the non-Markovian case, the stability bound of [39] on the variance of the sequence fidelity still holds in the non-Markovian case when incorporating the size of the environment, and similarly for the ASF.
An overall highlight in our analysis is that the role of the environment in non-Markovian RB is to carry temporal correlations within the noise, as well as gate-dependence and potentially more general context dependence, through quality maps: further studying these quantities could give relevant insights into non-Markovian error mitigation or other advantageous uses of memory effects. This could be done, e.g., for specific microscopic models of noise stemming from the open quantum systems literature [76,77], or with general approaches, such as treating RB as convolution [40]. While non-Markovianity has stayed relatively in the dark when considered in the context of noise characterization techniques, it almost certainly can be smoothly incorporated into a generalized framework for RB in the spirit of [8], and further allow for a more comprehensive understanding and control of memory effects in quantum technologies.

A Background preliminaries
Here we present some of the essential background related to representation theory of finite groups as well as randomized benchmarking; for the first, references such as [78,79] can be consulted for an in-depth treatment, while for RB we mainly follow the results of [32].

A.1 Representations of finite groups
A group is a set G and an operation · satisfying: (Closure) For all g 1 , g 2 ∈ G, also g 1 · g 2 = G.
(Identity) There exists e ∈ G such that e · g = g · e = g for all g ∈ G.
Henceforth we omit the · symbol and take group multiplication to be the group operation.
A representation of a group provides a way of dealing with abstract groups as linear transformations to a vector space. In particular, we restrict ourselves to representations by unitary linear operators on U(V ) for a complex finite-dimensional vector space V , and we may define a representation of a group G as a map φ : being such that i.e., with the group operation being preserved under matrix multiplication of φ. Here we will normally take V as a vector space over d × d matrices.
We call a representation φ reducible if there exists some transformation S such that for all g ∈ G. Otherwise, a representation is called irreducible. This implies that any reducible representation can be written as a direct sum of irreducible ones: more generally, Maschke's theorem [79] ensures that, for all g ∈ G where R G is a set of labels for the irreducible representations, and µ π is a non-negative integer denoting the multiplicity (equivalent copies) of φ π . In this manuscript we deal solely with groups admitting multiplicity-free representations, meaning µ π = 1 for all π.

A.2 Schur's lemma and twirling
A central result in representation theory is known as Schur's lemma: in a nutshell, it states that the only matrices that commute with all elements of an irreducible representation of a group are constant matrices (i.e., scalar multiples of 1). This is stated as in Lemma 1 of [12], which in turn refers to [78] for detailed proofs; similarly other standard literature can be consulted for an in-depth treatment, e.g., [79]. Here we employ a consequence of Schur's lemma, rather than the lemma itself, applying to the so-called twirl of an operator given a representation of a group.
Consider G a finite group and V be some finite dimensional complex vector space as above. For a representation φ of G, the twirl T φ is defined by for all linear maps A : V → V .

Lemma (Lemma 1 in [32]). Let G be a finite group and let φ be a multiplicity-free representation of G on a complex vector space V with decomposition
into inequivalent irreducible subrepresentations φ π . Then for any linear map A : V → V the twirl of A over G takes the form where P π is the projector onto the support of the representation φ π .
The multiplicity-free requirement is relaxed in [8] in the context of Markovian randomized benchmarking, but here we will solely focus on the former case.

A.3 The superoperator representation
To apply Lemma A.2, we may use the so-called superoperator representation (a.k.a. natural, Liouville or vectorized representation). The idea is to represent quantum channels as matrices acting on vectorized states (in an extended space of dimension squared). It is similar in spirit to the Choi-Jamiołkowski representation [67], in the sense that it maps channels to matrices, although different in that it does not necessarily maps them to a quantum state (density matrix). Here we are mainly going to need the definition vec(|i j|) := |ij , so that generally |X := vec(X), for any matrix X. This implies that for a Completely Positive (CP) map Φ, where, for Kraus operators ϕ µ of Φ, and where * here means entry-wise conjugate 8 . The Liouville representation of Φ is here denoted byΦ, and we generally will distinguish maps, X , when they are in the Liouville representation with a circumflex on top,X . The main point here is that quantum maps become matrices in this representation, which will let us work with them more easily without necessarily trying to extract properties of the maps in question. A clear exposition of this representation can be found e.g., in Section V.B of [39].

A.4 The standard randomized benchmarking protocol
A standard Randomized Benchmarking (RB) protocol proceeds as follows: 1. Prepare an initial state ρ on the system of interest S.
2. Sample m distinct elements, C 1 , C 2 , . . . , C m , uniformly at random from a given gate set K containing the corresponding inverse elements. Let C m+1 : where • denotes composition of maps. We refer to C m+1 as an undo-gate.
3. Apply the composition m+1 i=1 C i on ρ. In practice, this amounts to applying a noisy sequence S m := m+1 i=1 J i of length m on ρ, where J i are the physical noisy gates associated to C.
6. Examine the behavior of the ASF, F m , over different sequence lengths m. 8 This follows from vec(AXB) = (A ⊗ B T )vec(X). Depending on the definition of the vec map, one may find this property with the right-hand side arranged differently. The definition in Eq. (50) effectively stacks the rows of the matrix in a vector. A definition that is also common is vec(|a b|) = |ba , which in braket notation looks odd but when looking at matrices looks sort of natural because it stacks columns of the matrix into the vector; this definition would lead to a slightly different Liouville rep of quantum maps.

A.5 Markovian finite-group time-independent average sequence fidelity
In standard, Markovian, time-independent and gate-independent RB, we have a sequence where C m+1 := C † 1 • · · · • C † m with C(·) = G(·)G † and here C † (·) = G † (·)G, both for G ∈ G unitary representations being the target gates and where the Completely Positive Trace Preserving (CPTP) map Λ models the noise. Now, consider a change of variables G j = j i=1 C i ; this implies that, equivalently, Thus we have, for an initial state ρ and a measurement with POVM element M , over a sequence length m, where E implicitly means uniform average over the gates G. We will henceforth omit the explicit ρ and M dependence.
We can equivalently express Eq. (56) in the superoperator representation as Then it follows by Lemma A.2, that where here, as done in the main text and as we do onward, we denote byP π the projector onto the irreducible space corresponding to φ π , and where is called a quality parameter, as in [32].

A.6 Standard case: Markovian, time-independent, 2-design
Consider the case of the group G forming at least a 2-design, i.e., when uniformly averaging over it gives the same result as uniformly averaging over the whole unitary group. When we move to the superoperator representation, we may take the gates to the formĜ = G ⊗ G * for unitary G of dimension d S and where G * denotes entry-wise complex conjugate; this is a representation with support on C d S ⊗ C * d S , which can be decomposed into invariant subspaces having projectors (see e.g. [80,81]) where Ψ = |ii jj|/d S can be thought of as a partial transpose after a swap (and normalized). Hence too, assuming that Λ is also trace-preserving, where here λ µ are the Kraus operators of the S noise map Λ, and so as and noticing that Ψ|ρ = Ψ vec(ρ) = |ii /d S = |1/d S , which is simply the superoperator representation of 1/d S , this gives the standard result B Finite group non-Markovian randomized benchmarking

B.1 Time-independent non-Markovian noise
With this setup, the only difference for the case of non-Markovian RB is that now the noise Λ acts jointly on a system S and an environment E, with respective dimensions d S and d E , at every step of the RB sequence, and the gates G are now assumed to act solely on S [29]. Keep in mind that if we employ the superoperator representation, the environment will have implicit two copies 9 . So we may now write Schur's lemma on subspace S as where |e , |e , |ε , |ε are E orthonormal basis vectors and where now we define f ee εε π := tr(Λ ee εε SP π ) tr(P π ) , withΛ ee εε where here λ µ are the Kraus operators of the CP map Λ, which is acting on the full SE system (i.e., there is an implicit identity operator on S). Hence for the case of time-independent non-Markovian noise, where f ee εε is now a quality factor containing all correlated individual quality factors defined in Eq. (66). Notice that a time-independence in the sense of all Λ maps being identical at each timestep does not provide any simplification given the environmental dependence in each quality factor.

B.2 Standard case: non-Markovian, time-independent, 2-design
For the unitary 2-design case, as in Section A.6, we haveP 1 = Ψ andP 2 = 1 − Ψ where here again Ψ = |ii jj|/d S . Now, however, where here λ µ are Kraus operators of the full SE noise map Λ, and the maps $ Λ and Θ Λ are exactly those defined in [29], The compositions of the $ and Θ maps are now simply products: first, we have for m = 2, so that it follows that f ee εε m,π=1 = e|Θ •m Λ (|ε ε |)|e . Similarly, and thus similarly,

B.3 From quality parameters to quality maps
In general, we can think of the whole term Q m,π := e,e ,ε,ε f ee εε m,π |ee εε |, where f ee εε m,π is the quality factor defined in Eq. (68), as the Liouville representation of an object that we may label as a quality map, or equivalently we can label this object generally as a quality tensor, given that each f m,π can be thought of precisely as an Matrix Product Operator (MPO) with quality parameter nodes and environment bonds.
We may then equivalently write the full non-Markovian ASF as with P π the map associated to the projectorP π : these will depend on each particular group case and in general they are not projective maps (as one would initially be led to think), e.g., for the unitary 2design caseP 1 = Ψ corresponds to P 1 (·) = 1 d S tr(·) andP 2 = 1−Ψ corresponds to P 2 (·) = (·)− 1 d S tr(·).
where here λ (i) µ are the Kraus operators of the respective SE map Λ i . Notice that non-Markovianity is a time-dependence in itself, with the time-labels constituting an extra layer of time-dependence where the noise maps Λ can themselves differ arbitrarily between timesteps. The crucial difference between Markovian and non-Markovian time-dependence, however, is that the Markovian one simply refers to an explicit dependence on timesteps but not to a temporal correlation between these: this is clear if one compares Eq. (85) with its non-Markovian counterpart in Eq. (87): it is apparent that Markovian time-dependent ASFs can be reproduced by non-Markovian time-independent ones with a large enough environment.

B.5 The quality map in the Markov limit
Consider a single step, m = 1, and two distinct ASF, one Markovian and the other non-Markovian. Suppose for the Markov case we have some CP map Φ acting on S such that while for some other non-Markovian case we have an SE map Λ such that The two quantities are actually somewhat akin: let Φ have a Stinespring dilation representation Φ(·) = tr E [U(ν ⊗ ·)] for some superfluous environment E, an SE unitary channel U and a pure state ν := |ν ν| on E. Then the Kraus operators of Φ are φ µ = µ|U|ν , so that The non-Markovian quality parameter will reduce to something of this form in the Markov limit: indeed, in such a case, within the full ASF, we can interchange the partial trace with the noise due to the undo gate, which would have the form Λ 2 = I E ⊗ Λ (Mkv) 2 , thus rendering a term δ ee , and hence reducing effectively to Eq. (91). For the general sequence length case, this limit essentially corresponds to Markovianizing the process tensor as described in [29], i.e., tracing out the environment at every step.

B.6 Average sequence fidelity finite memory perturbation
Consider now noise maps that take a convex combination form, where Φ i acts solely on system S while Λ i acts on SE, and we are interested in the case where q i is small and much less than 1/2. Because the Kraus operators of this map are of the form ν }, the quality parameters are where the dependence on the noise maps is explicit.
Thus, consider first the case m = 2; we will now omit the noise map dependence, as it should be clear whether these refer for Λ i or Φ i whenever they have upper E indices or otherwise, let us also omit the π subindices for now and let r i := (1 − q i ) to shorten notation, then where Γ here denotes Γ i , Γ j , . . . , Γ m with i < j < m and similarly for Λ and Φ.
If the noise is time-independent then the general m case will simply be a binomial expansion. Otherwise we may express the non-Markov contribution as a sum over permutations of Markov and non-Markov quality parameters. Let us look then at m = 3; again keep in mind that r i := (1 − q i ), then where the notation . . , Λ im ) must be such that i 1 < · · · < i < i +1 < · · · < i m . This is irrelevant in the Markovian quality parameters and could be relaxed but an extra factor would need to be added to avoid over-counting. The notation is suggestive because in the static case these will just correspond to binomial coefficients and powers of the Markovian quality parameters as m (1 − q) q m− f f ee εε m− . We may thus write the general m case as, and thus the ASF for this noise model with any finite group can be written as where the first term is the single Markovian contribution, dominant whenever q ≈ 0.
For static noise, Γ := Γ i = Γ j , this reduces to which more easily can be read as a correction to the Markovian ASF with contributions at each intermediate step.
We can further relabel = m − to obtain a binomial expansion dominated by the leading terms whenever 0 q 1/2, i.e.
In this sense this constitutes a perturbative expansion with finite-memory corrections to the ASF.

B.7 Character randomized benchmarking
The character of a group G associated to the rep φ is a function χ φ : G → R. In particular, it satisfies withP φ the projector onto the support of all the subreps of G equivalent to φ and |φ| is the dimension of the representation φ.
In particular, the core idea in the Character RB protocol of [32] is that we can attach a gate from a subgroup of the RB group that we are benchmarking to isolate individual quality parameters.
Explicitly, fix a λ ∈ R G ; now consider H ⊂ G such that the Liouville representation has a subrepresentationφ, with character function χφ, that has support inside the rep φ λ of G. This meanŝ Pφ ⊂P λ . Then attaching some extra gate h ∈ H to the original RB sequence, we can estimate the weighed average which isolates the part of the ASF with quality tensorQ m,λ . Whereas in the Markovian case one can readily fit single exponentials and extract average gate fidelities, for the non-Markovian case a further analysis or adapted protocol to extract the noise maps would be required, e.g., as we show, the averaging over initial states and measurements.

C Quality parameters and process fidelity
Single quality parameters arise from twirling via Schur's lemma, and the usual average gate fidelity of some channel Φ with the identity can also be directly related to a twirl. In fact, from the definition of the uniform average gate fidelity of a map Φ with respect to the identity, where µ is the Haar measure and U is a uniformly drawn random unitary, and given the formula for operators on a d-dimensional space, one can deduce that We may just employ invariance of the trace under a twirl with a subgroup G of the unitary group so that via Schur's lemma, In the standard case of Markovian RB with unitary 2-designs this gives a simple relation of the average gate fidelity with the noise-strength p because the only quality parameters are f 1 = 1 and f 2 = p, the latter of which contains the term tr[Φ] such that and for this reason, in this scenario it suffices to run a standard RB protocol to extract average gate fidelities of the noise. More generally for finite groups (with a multiplicity-free representation) character RB described in Section B.7 extracts each quality factor, allowing to estimate the respective average gate fidelity.
For the non-Markovian case, however, after an RB protocol, the quality tensor is still cloistered between the noise due to the undo gate and the state preparation error, both of which can act on the full SE. While in the Markovian case these two can be absorbed as State Preparation and Measurement (SPAM) errors not affecting the ASF decay, this is not the case for non-Markovian noise. Furthermore, non-Markovianity as a time-dependence is explicit in the quality map as all quality factors are correlated with each other through E, as discussed in Appendix B.4.
Thus we propose quantifying the average fidelity of a noise process as a whole by means of RB. Whenever this average fidelity factorizes, we will recover the standard Markovian case; this points to the possibility as well that other clear functional forms of this process fidelity could be identified for certain types of noise. Consider then the following. Suppose we have a non-Markov ASF given as in Eq. (83); we care about how well the inputs are being mapped to outputs on average, thus let us define a map F S m,π such that i.e., for some CPTP map Λ 0 and a fiducial pure state ε of E, we have The map Λ 0 thus explicitly refers to a state preparation error correlating the input solely in space S with the environment E. The previous discussion then means we wish to extract the average gate fidelity of the map F S m,π . The ASF contains only two free inputs which are chosen arbitrarily, ρ S and M : if we average over these, we could effectively extract the gate fidelity of F S m,π . Of course, this added step in the RB protocol would only be relevant in the non-Markov case, i.e., when deviations from an exponential are observed, as otherwise it would simply amount of averaging the SPAM terms.
To extract an average gate fidelity we require a second moment over a given distribution, so consider where U is a unitary belonging to a 2-design, sampled at uniformly at random, {|s } d S s=1 is a basis vector of S, so that M is the s th -POVM element of such random POVM [73], |ψ is a pure state in S, and each N x is some CPTP map modeling SPAM due to the randomization. Overall we would now have a distribution of ASFs specified by Then we can average over these inputs and measurements so that denoting by K (m,π) µ the Kraus operators of F S m,π and by N (x) µ the ones of N x , we get where we defined K (m,π) β as the Kraus operators of the composition The maps F S m,π and F S m,π are essentially equivalent, as we may simply absorb the N on the respective Λ maps by defining where the N maps act solely on S.
means the average gate fidelity of π F S m,π ; we can now employ Eq. (105) to get The full map F S m,π sends S input states to S output states, so it must be CP; we can now further demand the whole of π F S m,π to be CPTP, so that tr[ π αβγ K (m,π) αγβ K † (m,π) αγβ ] = d S ; this implies the noise being trace-preserving, then where the second line can be identified noticing that is still a quantity containing the RB protocol, i.e., average noise, with respect to any finite group.
The dependence in m will generally be non-trivial, as it is directly related to a dependence of the environment E, except for specific cases or whenever there are assumptions on the noise, e.g., for Markov time-independent noise, and other particular cases might also display some useful functional structure.

C.1 Process fidelity -The Markovian case
Clearly when averaging over initial states and measurements, the ASF of a Markovian RB experiment should simply average the SPAM terms. This might still be useful in case average SPAM error rates are of interest. First, Eq. (117) turns into with all maps now acting solely on S. Then, which amounts exactly to uniformly averaging the SPAM terms over noisy measurement elements and initial states. Now, however, we have where f is a quality factor of the SPAM noise. Here clearly all of the m-steps are being taken into account and the SPAM errors are directly incorporated: this is because the average over initial states and measurements will always contain both the Λ 0 and Λ m+1 maps.
For the unitary 2-design / Clifford case, with time-independence and CPTP noise, it can be seen that where being the corresponding SPAM quality factor or noise-strength. (128) In the Markovian time-dependent case, it is possible to estimate average gate (in)fidelities on arbitrary sequence length intervals [39] by taking ratios of ASFs, precisely because these do not depend on all previous steps, i.e., the quality factors satisfy for m 2 > m 1 . As we argue in Appendix B.4, time-dependent Markovian ASFs are contained in non-Markovian ones, either time-independent or time-dependent, a crucial difference, however, is that quality factors in time-dependent Markovian ASFs only change its rate of decay, so it is not possible to obtain a higher ASF in increasing sequence lengths, while there is virtually no reason why non-Markovian ASFs should also be constrained in this way.
Consider then n > m such that F n > F m for RB experiments with the exact same group and noise up to step m. This would imply that which will hold wheneverΛ n+1 Q n,π ⊗ 1 −Λ m+1 Q m,π ⊗ 1 0 (130) i.e. where the matrix difference on the left side is positive definite for all π, in turn implying tr tr S Λ n+1 Q n,π − tr tr S Λ m+1 Q m,π > 0, where the sum here is over all environment e, indices. Let us first consider the simplest case, n = m+1, so that εε |Λ E m+1 |ee > 0, and furthermore, consider finite memory such that effectively m = 1, i.e. (1),π f 1 1 εε which by definition means and so, d S tr(Λ 2 ) tr P π < tr Λ 2 Λ E 3 ⊗P π , where furthermore, we may use tr(XY ) ≤ X tr(Y ) for positive semidefinite X, Y , on the right-handside, so that tr(Λ 3 ) Λ 2 > d S tr(Λ 2 ).
The general form of inequality (135) is now clear for any m and n as simplified to Λ m+1 · · · Λ n tr(Λ n+1 ) > d S tr(Λ m+1 ), repeatedly employing tr(XY ) ≤ X tr(Y ) for positive semidefinite X, Y . Both trace terms are lower-bounded by zero, although they realistically are less and close to d E d S .
In particular, for time-independent noise this reduces to which says that Λ must be able to increase the purity of its input in at least d 2/(n−m) S , ruling out e.g., coherent noise. Furthermore, by X ≤ √ d for CPTP map X acting between d-dimensional spaces (theorem II.I in [75]), one gets d E > d E Gate-dependence E.1 General case -gate-dependence is taken to the environment Suppose now instead of having chosen to model the noisy gates through the maps Λ, we choose to employ J := L • G • R for some given CP maps L and R. This is such that a time-independent and gate-independent RB sequence reads which produces an equivalent ASF as the one using the noise maps Λ, here with the outermost noisy maps accounting for SPAM errors. We may now incorporate gate-dependence by taking where ∆ g is the map containing the gate-dependent contribution, which is not necessarily small. In the non-Markovian case, L, R, ∆ g , and J all act on both SE, while the ideal G acts on S alone.
Let us now denote X j:i = X j • · · · • X i , and X j:i = X j · · · X i for any maps X i , . . . , X j or matrices X i , . . . , X j . We can now expand the corresponding noisy sequence as where here ∆ i = ∆ g i , where the second line follows from the multi-binomial theorem and the last line contains terms mixing J and ∆: the results in [32,33] imply that for Markovian noise, such mixed terms do not contribute to the ASF, and that the term ∆ m+1:1 gives a contribution that vanishes exponentially in sequence length.
Extending the previous result to a non-Markovian ASF in gate-dependent noise would require L and R to satisfy the properties where averages again are uniform over the group G, witĥ for some E to E maps Q π .
In the Markov case, these are simply eigenvalue equations (as the quality factor is a scalar), so the solution of L and R follows through them being eigenvectors of the average operator in the left-hand side.

E.2 Stability under gate-dependent perturbations
While this is not possible in the non-Markovian case, we may consider a perturbative gate-dependence and look at how the ASF and the variance of the sequence fidelity changes. As we now show, this can be done in a similar way as done for the Markov and Clifford case in [39]. That is, take the noisy gates to be for some scaled such that ∆ g ≤ 1 for all g, and denote the sequence fidelity by Z where Z m is the gate-independent sequence fidelity and where here Ξ (n) m contains gate-dependent contributions to the n th order in , i.e., all terms containing n amount of ∆ terms in the gate-sequence; specifically we can write which is equivalent to the second and third terms of Eq. (143), just highlighting the amount n of ∆ terms; here H(s) is the number of bits equal to 1 in the bit string s (more generally, its Hamming weight) and ∆ i := ∆ g i . Now then, the gate-dependent ASF, F (g) m , has the form and the gate-dependent term satisfies the following,