Relaxation of Multitime Statistics in Quantum Systems

Equilibrium statistical mechanics provides powerful tools to understand physics at the macroscale. Yet, the question remains how this can be justified based on a microscopic quantum description. Here, we extend the ideas of pure state quantum statistical mechanics, which focus on single time statistics, to show the equilibration of isolated quantum processes. Namely, we show that most multitime observables for sufficiently large times cannot distinguish a nonequilibrium process from an equilibrium one, unless the system is probed for an extremely large number of times or the observable is particularly fine-grained. A corollary of our results is that the size of non-Markovianity and other multitime characteristics of a nonequilibrium process also equilibrate.


Introduction
Conventional open quantum system theory [1,2] emphasizes understanding single-time expectation values ⟨X(t)⟩ ρ := tr[X(t)ρ], where X(t) is a linear operator in the Heisenberg picture, and ρ is a density operator describing some initial state. However, such single-time expectation values only reveal a small fraction of the available information about a quantum process compared to multitime correlation functions of the form ⟨X k (t k )· · ·X 1 (t 1 )⟩ ρ = tr [X k (t k )· · ·X 1 (t 1 )ρ] (1) which appear, for example, in linear response theory [3][4][5], the formalism of non-equilibrium  Figure 1: (a) A general quantum process Υ composed of infinitesimal unitary evolution operators U δ and some initial state ρ. (b) Multitime instrument A k , corresponding to any sequence of quantum operations (including measurements), is applied at times k on the accessible system Hilbert space labeled S, with implicit identity operators I in-between. (c) The resultant expectation value is sampled from the marginal process Υ k (purple dotted comb). (d) This is indistinguishable from an equilibrium process Ω (green dotted comb), composed of dephasing $ rather than unitary evolution, for most A k .
Green's functions [6], the quantum regression theorem [7], and also play an integral role in our modern understanding of quantum stochastic processes and (non-)Markovianity, as we explain later in greater detail [8][9][10].
Due to the greater informational content of multitime correlation functions, they may reveal richer non-equilibrium features. We can naturally ask then, if such objects tend towards an equilibrium quantity in generic many-body situations ? We can phrase this question in a rigorous manner by reformulating Eq. (1) in terms of an expecta-tion value of a single object, ⟨X k (t k ) · · · X 1 (t 1 )⟩ ρ = tr[O k Υ k ] =: ⟨O k ⟩ Υ , (2) where Υ k and O k are tensor representations of a quantum process and a multitime quantum observable, respectively, over times k := {t 1 , . . . , t k }.
That is, O k encodes the time-ordered sequence of operators {X k , X k−1 , . . . , X 1 }, now in the Schrödinger picture, applied at times k. Here, Υ encompasses all time evolution operators and an initial state ρ, with the tensor Υ k being the 'marginal process' encompassing only times k [11,12], as depicted in Fig. 1 (a)-(c), and detailed below.
Recasting the above question, we are asking whether we could replace the process Υ, which contains time dependencies, with a corresponding time-independent equilibrated process Ω, such that ⟨O k ⟩ Υ ≈ ⟨O k ⟩ Ω for most k.
If so, how can we define Ω? for which observables is this valid? and how universal is this equilibration?
In this work, we show that Eq. (3) indeed holds for generic many-body quantum processes. Further, in a follow up work we show that Eq. (3) holds over finite time intervals [13]. In both cases, the error in the approximation is shown to be smaller than the inverse effective dimension (participation ratio) of a quantum state σ with respect to a Hamiltonian H = n E n P n , defined as , with $(·) := n P n (·)P n (4) being the dephasing map with respect to H. The effective dimension naturally appears in many contexts [14][15][16][17][18]. For instance, the recurrence time for an isolated quantum system typically scales exponentially with d eff [19]. Intuitively then, it acts as a quantifier for the validity of a statistical description of this system. A system with a small effective dimension essentially behaves as a small quantum system, and so is not expected to display thermodynamic behavior. On the other hand, in realistic many-body situations the effective dimension scales exponentially with system size [14,20].
Our results extend and complement related work, such as Ref. [21] where they prove a thermal two-point correlation function equilibration bound, which can be seen as a special case of our general results Eq. (3). Formulating our results in terms of instruments and process tensors has the advantage of allowing us to determine equilibration of operationally meaningful multitime properties, such as the degree of non-Markovianity (see Section 6), and beyond [13]. In both instances, however, it is the large effective dimension that dictates equilibration and the related phenomenon of Markovianisation [22]. We comment further on the relation between Ref. [21] and the present work in Section 6, and in Appendix C.
Eq. (3) is addressing the foundational problem at the heart of quantum statistical physics of how equilibrium conditions emerge from underlying unitary dynamics. Here, we show that to witness nonequilibrium signatures of a process describing many-body dynamics, one requires an astronomically large number of measurements. This further substantiates the widely held belief that quantum mechanics alone should be enough to derive statistical mechanics and the emergence of thermodynamics, without any additional assumptions [18,[23][24][25].
We begin by casting generic quantum dynamics as a process tensor Υ [8,9,12,26], independent of the observables, using the quantum combs formalism [27]. This innovation then allows us to unambiguously define a corresponding equilibrium process Ω, which we show to be nearly indistinguishable from Υ. We will show that Eq. (3) holds for multitime correlation functions under general conditions, extending the usual paradigm of equilibration which largely considers only single time observables [14,20,28,29] ( [21] is an exception to this). However, we go further by providing a platform for comparing nonequilibrium processes with equilibrium ones. For instance, we show that the non-Markovianity of a process is restricted by the process equilibration bounds, while in a different work we show that this extends to a range of other general multitime features, such as classicality of measurement statistics [13]. While equilibration is a necessary condition for thermalization, other assumptions are needed for this equilibrium state to be of Gibbs form [18,30]. Our results on process equilibration and the restriction of non-Markovianity explore a different facet of the question of thermalization of isolated systems, indicating a stability of equilibrium against perturbations, and a relaxation of quantum memory.

Tensor Representation of Quantum Processes
Consider an isolated finite-dimensional quantum system as in Fig. 1 (c). Let the initial state of the system, ρ, evolve to time t 1 due to unitary evolution generated by Hamiltonian H. We write the unitary dynamics from t ℓ−1 to t ℓ in its superoperator form as U ℓ (·) := e −iH∆t ℓ (·) e iH∆t ℓ , where ∆t ℓ = t ℓ − t ℓ−1 . In accordance to Eq. (1), we interrogate the system at times k := {t 1 , . . . , t k } with operations {A 1 , . . . , A k }. The A i are called instruments [31] and are completely positive (CP) maps, representing any physical operations in a laboratory. In principle, these act on an experimentally accessible space, which we refer to as the system (S) and we describe the rest as the environment (E), such that A ≡ A S ⊗ I E . If the A i are further specified trace preserving (TP), they may represent a complete set of outcomes of the measurement (or simply a quantum channel), and otherwise generally correspond to a subset of outcomes. This is the most general way to describe measurement and observables in quantum theory.
We can then cast the full process as Here, Υ k is called a process tensor and A k is a multitime instrument, explicitly defined in Eq. (18). To see how Eq. (5) can be decomposed into these objects, see Fig. 1. Further explanation of this can be found in Appendix B.
Any observable can be decomposed as a linear combination of a family of instruments, and so any correlation function (2) can be obtained from corresponding expectation values of instruments: with α i ∈ C. This is shown explicitly in Appendix A. The equilibration of Eq. (1) follows as a corollary of our main results due to Eq. (6), while instruments have the conceptual advantage of being meaningful interventions in a laboratory.
Eq. (5) is the multitime generalization of the Born rule [27,40,41], where Υ k plays the role of a state (where it has the properties of an unnormalized density matrix) and A k that of a measurement, accounting for the invasive nature of measurements in quantum mechanics [31]. When k = 1, we obtain the usual quantum mechanical expectation value with A k → A 1 , which is a measurement operator: an element of a positive operator valued measure (POVM). Equipped with these concepts, we are now in position to define an equilibrated process.

Equilibrium Process
The key advantage of using the process tensor formalism is that all correlations and dynamics are stored in a single object, Υ k . This allows us to define an equilibrium process Ω k , corresponding to the nonequilibrium Υ k , as the infinite-time average over all time intervals k, That is, the equilibrium process is defined by replacing all unitary evolutions with dephasing (defined in Eq. (4)), as depicted in Fig. 1 (c)-(d).
The equilibrium process Ω k is the reference with which we will compare an arbitrary process to, in order to define process equilibration. A multitime expectation value on such a process then corresponds to The key feature here is that the expectation value is an intervention-time independent quantity.
Up until now we have defined k-step processes. However, both Υ and Ω are defined over all times [11,12]. In fact, it is the instrument that is defined on k-steps, such that Eq. (5) means that we are sampling from a marginal process Υ k ⊂ Υ [11]. This allows the physical interpretation of ⟨A k ⟩ Ω as the average of ⟨A k ⟩ Υ over all possible interrogation times an experimenter could choose to apply the multitime instrument A k ; see Fig. 1. This marginalization to finite number of measurement times can be interpreted as a coarse graining in time, which along with coarse graining in space is needed for general process equilibration, as we will see in the results that follow. First, we quantify how well a given k-time instrument can tell apart Υ from Ω.

Equilibration of Multitime Observables
Dynamical equilibration of quantum processes says that for almost all k, a nonequilibrium process Υ cannot be distinguished from the corresponding equilibrium process Ω. This is stronger than usual notions of equilibrium, as it encompasses multitime observables, and allows for almost arbitrary interventions. For all following results, we assume that the full SE is isolated, initially in the state ρ, and so evolves unitarily. Our only assumption on the dynamics is that the time independent Hamiltonian that dictates the SE evolution obeys the non-resonance condition: and only if m = m ′ and n = n ′ . This condition is not so restrictive in many-body systems without contrived symmetries, and can in fact be loosened; see Ref. [13]. This means that our results are valid for arbitrary non-equilibrium states ρ, with entirely minimal restrictions on the dynamics.
Our main results follow from the following bound in terms of the expectation value of any k-time instrument A k applied on the system (of dimension d S ) of a process Υ and its corresponding equilibrium Ω, A somewhat tighter bound, which depends on intermediate effective dimensions within the process and operator norms of the multitime instrument A k , can be found in Appendix C, together with a proof of Eq. (9). While all of the results in this paper may be stated in terms of this tighter bound, we have presented Eq. (9) due to its clear physical interpretation. One can immediately compare the size of the accessible system space to (an estimate of) the effective dimension, to determine if your measurement statistics will equilibrate. Note that if the process ends after only one intervention, the tighter bound reduces to the single-time equilibration result of Ref. [29].
As we have discussed, d eff typically scales exponentially with the number of particles N , and so for macroscopic systems and coarse enough observables the right hand side of this bound is vanishingly small unless one probes the system for a very large number of times, k ∼ N/(2 log 2 d S ). This is clearly not possible for realistic manybody situations, where N ∼ O(10 23 ), and where experimental limitations restrict the degrees of freedom that an instrument may access (corresponding to the size of d S ). Even constructing an experiment in the lab with k = 100 time measurements, and determining all correlations between them, is practically unfeasible.
Considering that ⟨A k ⟩ Υ is a random variable on {(∆t 1 , ∆t 2 , . . . , ∆t k ) : ∆t i ≥ 0}, ⟨A k ⟩ Ω is exactly the mean and Eq. (9) gives an upper bound on the variance. We can therefore use Chebyshev's inequality to bound the probability P of deviation from equilibrium multitime observables, to arrive at our first main physical result on the equilibration of multitime observables.

Result 1. For any k-point correlation function,
computed over a process Υ and equilibrated Ω, where α i are defined in the decomposition Eq. (6).
A proof of this follows from Eq. (9), and can be found in Appendix D. This result states that, for most k−time observables O k , Υ and Ω look the same when the effective dimension is large. This can be interpreted as the statement that for typical many-body systems (with a large effective dimension), the overlap between measurement operators and the system state ρ is unbiased for most times, even with respect to temporally nonlocal apparatuses. Of course, this is a statement for most times, and does not exclude transient non-equilibrium behavior. For example, soon after a quantum quench, non-equilibrium behavior may of course be apparent; see also Ref. [56].
Since no single multi-time observable can differentiate between a nonequilibrium and an equilibrium process on average, what about a family of optimal observables? That is, given an Υ, how finely grained does an instrument A k need to be for one to distinguish it from Ω? We will now address this with an alternative result derived from Eq. (9).

Process Equilibration Through Coarse Observables
In contrast to Result 1, rather than fixing an observable, we now consider a set of allowed instruments with which a process could be probed, and ask how well can we differentiate between Υ and Ω in the best case scenario. Mathematically, these possible measurements are represented by a set M k of multitime POVMs on at most k times that can carry memory. We define the operational diamond norm distance of two processes as where ⃗ x denotes an outcome of the instrument A k . In the limit of M k encompassing all possible measurements, this reduces to the generalized diamond norm distance for quantum combs [32,57]. This is the natural measure of the distance between processes, and corresponds operationally to the probability of mistaking them when measuring with the most distinguishing instrument available.
From our previous bound, Eq. (9), one can show that the average multitime operational diamond norm distance between Υ and Ω satisfies where S M k is the combined total number of outcomes for all measurements in the set M k , meaning it is proportional to the resolution of an ex-perimental apparatus. This quantity is a multitime analogue of that defined in Ref. [29], with a technical definition given in Eq. (41). While S M k is in general large, a typical macroscopic d eff will nonetheless out-scale it. Given that the diamond norm is non-negative, we can now use Markov's inequality to give a result on the distinguishability of a given process from an equilibrium one, given a set of possible measurements.

Result 2.
For any quantum process Υ, with equilibrated process Ω, and a set of multitime measurements M k , . (13) A proof of Eq. (12) and this result can be found in Appendix E. Therefore, even with an optimal set of measurements-that are temporally nonlocal, but with a reasonable number of finite outcomes-a nonequilibrium process looks like an equilibrated process when d eff is large. Only in the limiting case of a highly fine-grained measuring apparatus, could one feasibly distinguish between Υ and Ω.
This latest result readily implies Eq. (3), as long as the constituent instruments of O k in the decomposition Eq. (2) are in the span of M k , i.e., that they are in principle experimentally realizable. If the total process is on a finite dimensional space, then quantum recurrences occur even for multitime observables. As such, it is also possible to generalize the above results to finite time intervals, but it is nontrivial and will be presented elsewhere [13].

Equilibration, Markovianization, & Thermalization
Our results on process equilibration, and hence correlation functions, bound the degree of detectable non-Markovianity in Υ. One can quantify this in an operational sense [9,26,58] by considering the application of two sequential instruments, A − and A + with respective outcomes {x − } and {x + }. From Eq. (5), the conditional probability for x + to occur given x − is then where I + is the identity on + part of the process. corresponding Ω of a single qubit coupled to a random matrix environment, with a varying effective dimension d eff (details can be found in Appendix G). One can see that the non-Markovianity of a general process N (Υ) trends to an equilibrium quantity (N (Ω)) with increasing d eff . Note that strikingly the equilibrium non-Markovianity is also small, and decays with d eff . The instruments A± used to probe these processes involved a repeated measurement and independentlyprepare protocol, with a system-level causal break between A− and A+, as displayed in the upper right. The red boxes in this circuit represent unitary evolution, with uniformly sampled 5 ≤ ∆t1, ∆t2, ∆t3 ≤ 50, and dephasing, for Υ and Ω respectively. This was maximized over the difference in outcomes A−, and averaged over A+ to determine the degree of non-Markovianity of each process.
The instruments are chosen to be causally separate, i.e., A − := A j − :1 ends on an output of the process and A + := A k:j + begins on the next input. This ensures that no information is transmitted via the system from the past to the future (see the inset of Fig. 2). Then, a process is Markovian according to this instrument if and only if the above conditional probability is independent of the outcome x − : P Υ|A ± (x + |x − ) = P Υ + |A + (x + ). Conversely, we define the degree of non-Markovianity by In other words, this quantifies the memory transmitted through the environment, which is measurable by the apparatus A ± . Note that there could be hidden non-Markovian effects, which would only be detectable if A ± encompassed some later times and/or many instruments [59,60]. Indeed, in such a many-time case, divisible quantum dynamics is not enough to determine quantum non-Markovianity necessarily [9,58].
This allows us to use Result 1 to make a statement about the detectable non-Markovianity of a general process.

Result 3. For any quantum process Υ, with equilibrated process Ω, and causal-break instrument
A full definition of the term η k is given in Eq. (43) in Appendix F, alongside a proof of this result. The numerator of η k goes as ( This means that a given causal break instrument will likely show the equilibration of Markovianity, given that it has no outcome which occurs with probability ∼ O(1) for one process out of {Υ, Ω}, and with probability ∼ O(1/d for the other. This is additionally taking the effective dimension to dominate the other terms, which occurs under similar conditions to Result 1. In this case the degree of non-Markovianity measurable by some A ± of any Υ will be close to that of Ω. This does not mean that Ω is necessarily Markovian. However, there is no dynamics in Ω, only dephasing, and so the non-Markovianity therein must be time-independent. Therefore, on average the transient non-Markovianity of Υ is bounded by the effective dimension. We also expect the time-independent non-Markovianity to decay with effective system size, due to the prevalence of Markovian phenomena in macroscopic systems. Indeed, we numerically confirm this for a spin coupled to a random matrix environment; see Fig. 2. The model, detailed in the Supplemental Material, is representative of a generic manybody evolution. Note that such a model can also be used to confirm the previous results. These results are complementary to the recent work of Ref. [21], where it is shown that thermal two-time correlation functions factorize on average, assuming the system satisfies the Eigenstate Thermalization Hypothesis (ETH) [61][62][63][64][65][66][67][68][69]. This directly implies that memory encoded in two point correlations washes out for a thermal state. We expect that methods from this work could be applicable to our results, in that a property such as the ETH may be necessary to arrive at entirely Markovian phenomena, in comparison to the small steady-state quantity generically observed (as seen in Fig. 2). This would be an interesting avenue for future research.
Result 3 can be extended to geometric measures of coherence, entanglement, etc., which confines the process in the classical domain [46,47,70]. For related multitime equilibration bounds based on Result 2, see [13]. Here, we focus on Markovianization because its direct relevance to thermalization, which we discuss in our concluding remarks.

Conclusions
Our results connect equilibration with quantum stochastic processes, which encompass correlations and (quantum) memory, in addressing the problem of the quantum foundations of thermodynamics. It has been long-known that multitime non-Markovian correlations in a process vanish for a system in contact with a thermal bath in the weak system-bath coupling limit [71]. It was recently shown, using typicality arguments, that almost all multitime non-Markovian correlations vanish for Haar random unitary evolution [72] and for unitary t-design processes [22]. This is dubbed 'process Markovianization' where a subpart of an isolated quantum system closely resembles a Markovian process even though it is non-Markovian. Here, we argued that process equilibration also implies a form of Markovianization. The strength of these results lie in their robustness against (a reasonable number of) perturbations, remaining close to an equilibrium for most times, given a sufficient coarse graining in both time (small k) and space (small d S and S M ). This in turn leads to a better understanding of the extra ingredients needed to obtain thermalization from isolated quantum theory. Process equilibration and Markovianization are necessary conditions for a stable steady-state, and thus also thermalization. How then is the emergence of Markovianity connected to the ETH? and how does quantum chaos fit into the picture of multitime correlations?
Regarding chaos, recently out-of-time-order correlators (OTOCs) [73] have been explored in the quantum combs framework [74,75]. Indeed, we expect that, with careful modifications, our results may extend to time-unordered correlators; see also Ref. [76] for equilibration results on a certain kind of OTOC. Finally, we have extended the current results to finite-time process equilibration [13], which may provide new ways to explore equilibration-time bounds [77][78][79][80]. We fix t 0 to be the time where the Schrödinger and Heisenberg pictures coincide. Then, an observable in the Heisenberg picture is defined as where U n,0 := e −iH(tn−t 0 ) , and X is the observable in the Schrödinger picture. Then, for an initial state ρ(t 0 ), where is the usual unitary evolution superoperator, and X i (·) := X i (·)1 is the operator sum representation of the linear map superoperator X i [81]. Any linear map can be written as a (complex) linear combination of CP maps, so by the linearity of trace, we can write ⟨X n (t n ) . . . X 1 (t 1 )⟩ as a linear combination of multitime instrument expectation values over a quantum process, where A i are CP and t 0 is fixed. This can be measured as an expectation value of an instrument A k on a real physical process which is described by the process tensor Υ.

B The Process Tensor
Here we give an explicit definition of quantum processes, corresponding to Fig. 1 in the text. Decomposing Eq. (5) from the text yields where * is the link product [12,27], corresponding to a tensor product on the S space and a matrix product on E. The left hand side of Eq. (5) in the main text contains A i , U i , which are abstract maps with many representations [12,81,82]; here we use the Choi state representation for these maps, U i and A i , resulting in the multitime Choi states Υ k and A k . With this machinery, we can define precisely what we mean by an equilibrated process, as in Eq. (7).
To see how one arrives at Eq. (18) from Eq. (5) in the main body, we here present a simpler example constructing a link product for matrix multiplication -as opposed to Choi states as in Eq. (5). Consider the following trace equation with composite-space matrices µ, ν, π ∈ M F ⊗M G and single-space matrices x, y, z ∈ M F : with the superscript (subscript) indices belonging to the spaces G (F ). Eq. (19) can be expressed as tr[Ξ R] by grouping the 'Latin' and the 'Greek' operators to define tensors on M ⊗3 F : R := a...f x bc y de z f a |bdf ⟩⟨ace| and Ξ := α...γ,a...f µ αβ ab ν βγ cd π γα ef |ace⟩⟨bdf |. To succinctly express the contraction of 'Greek' indices we employ the link product [12,27], which is a matrix product and trace on the G spaces and tensor product on the F spaces: Ξ = tr G [µ * ν * π]. The link product employed to define the process tensor and instruments as in Eq. (18) can be constructed in an equivalent way, but in the Choi representation as opposed to simply matrix multiplication; for further details see [12,13,27]. This has a number of advantages including that Υ is a positive operator, entirely analogous to a single time density operator [8,9,12].

C Proof of Bound on Average Multitime Correlation Variance (Eq. (9))
Consider a partition of an isolated quantum system into a system (S) of interest and the rest, which we call an environment (E). The instruments A are then taken to act on S, and we suppress the notation of a tensor product with the identity on E: A ≡ A ⊗ 1 E . Consider the process Υ, together with the unitary evolution, to encompass a pure initial state ρ, from which the results can be generalized to mixed states via purification [29]. The time-independent Hamiltonian H = E n P n describes the evolution between instruments, where P n is the projector onto the energy eigenstate with energy E n . We may in addition take these projectors to be rank one. This we can ensure, via a similar argument to [29], by choosing a basis for any degenerate subspace such that the SE state at any particular time step may overlap only with one of the degenerate basis states |n⟩ for each distinct energy. Therefore with this choice, the total SE state between each instrument evolves unitarily in the subspace spanned by {|n⟩} as if according to a non-degenerate Hamiltonian H ′ = E n |n⟩ ⟨n|, where {|n⟩} could be different for different 'steps' in the process. This argument holds for pure states at all times, and so only for pure instruments A i , i.e., the action A i preserves the purity of the input state. Physically, nonpure instruments may only equilibrate the system more, as they increase mixing. More precisely, the object of interest, |⟨A⟩ Υ − ⟨A⟩ Ω | 2 ∞ , is convex in mixtures of pure instruments, so any bound for pure instruments directly implies a bound for mixed instruments. Therefore, without loss of generality we consider only the limiting worst case scenario of exclusively pure instruments, thus allowing degenerate energy levels. Note that we do not allow energy gap degeneracies which do not correspond to energy degeneracies, as defined above Eq. (10).
Recalling the definitions seen in Eqs. (5), (7) and (8), the expectation values of A over the quantum processes Υ and Ω are, respectively, Here $ i is the dephasing operation with respect to the non-degenerate Hamiltonian H ′ i , which describes the evolution between instruments A i−1 and A i . Similarly, we define the effective dimension d eff i as in Eq. (4) but in terms of $ i . We keep track of this as the Hamiltonian may be different between different instruments, due to the degenerate energies modifications we have made above. Indeed, the results that follow may allow for a Hamiltonian which changes arbitrarily between steps, but this has nontrivial implications for the interpretation in terms of the process tensor.
We wish to bound |⟨A⟩ Υ − ⟨A⟩ Ω | 2 ∞ . Expanding the unitary SE evolution in the energy eigenbases, for arbitrary k instruments we have Consider first the case with only two instruments (k = 2), where we have defined the multitime dephased state, If we label the energy levels in the complex conjugate to Eq. (22) with primed indices, n ′ i and m ′ i , taking the modulus square will give rise to exponentials on −i∆t Then after time-averaging over ∆t i independently for all times, given that we have assumed that there are no energy-gap degeneracies, the integrals kill all cross terms and we arrive at a sum over modulus-squared terms without an explicit energy phase dependence. For the other terms, we have exponentials on , which go to zero with the integral over ∆t i , given the no energy-gap degeneracies condition and that n ℓ ̸ = m ℓ and n ′ ℓ ̸ = m ′ ℓ . Therefore for k = 2, Every instrument A i can be expanded in its Kraus representation, such that [81,83]. Looking at the terms in Eq. (24) with a sum over n i ̸ = m i at a single ∆t i , we now prove the following identity: For σ a general density operator and A CP, where we have introduced the POVM element A := β K β † K β = A † , the energy eigenstate decomposition σ := n i ,m i σ n i m i |n i ⟩ ⟨m i |, and the effective dimension with respect to an intermediate In the fifth line we have used the fact that |σ nm | 2 ≤ σ nn σ mm for any density operator σ, which follows directly from the Cauchy-Schwarz inequality and the fact that σ is positive hermitian (equality for pure σ). In the sixth line we have simply added the positive terms where m i = n i to the sum. In the eighth we have again used Cauchy-Schwarz, but for the Hilbert-Schmidt inner product, tr[A † B] ≤ ∥A∥ HS ∥B∥ HS with ∥A∥ HS := tr[A † A]. Finally, we have used the identity tr(XY ) ≤ ∥X∥ p tr(Y ) for positive operators X and Y , and that operator norms satisfy ∥X † X∥ p = ∥X∥ 2 p . This identity is due to Hölder's inequality for the Hilbert Schmidt inner product tr(XY ), with the 1-norm tr(Y ) and infinity norm ∥X∥ p on the right hand side [82]. We call this operator norm the POVM norm, so as not to mistake it with other norms used in this work, and it is explicitly defined as Note that this single sum identity Eq. (25) is essentially the same as that proved in Ref. [29].
We may apply this identity (25) directly to the second and third terms of Eq. (24), with A 1 (ω 1 ) ≡ σ and A ≡ A 2 for the second term, and ρ ≡ σ with A ≡ A 2 $ 2 A 1 for the third. Note that as the composition of trace-non-increasing CP maps is again trace-non-increasing CP, the map A 2 $ 2 A 1 admits a Kraus representation and thus a corresponding POVM element which we have called A 2:1 .
Next we need a new identity to handle the first term of (24), where in the third line we have used that |ρ n 1 m 1 | 2 = ρ n 1 n 1 ρ m 1 m 1 for pure states, split the sums α, β = α̸ =β + α δ αβ , and added the (positive) extra terms n 1 = m 1 and n 2 = m 2 in the summations. In the third last line we have chosen an orthogonal (canonical ) Kraus representation for A 1 , a minimal representation such that tr[K α † 1 K β 1 ] ∝ δ αβ [83]. At that point we have a term of the form of the seventh line of Eq. (25) and so can use that result to arrive at the penultimate line. In the final line we bring the sum inside by the linearity of the trace, and as |x i | 2 ≤ | x i | 2 for positive x i . Therefore for k = 2 we have in the end, We now generalize the result (29) to arbitrary k instruments. Starting from Eq. (21), just as in the k = 2 case we multiply it by its complex conjugate and take the infinite time average over each . We arrive at |⟨A⟩ Υ − ⟨A⟩ Ω | 2 ∞ being bounded above by 2 k − 1 terms of the form with 1 ≤ i < j ≤ k, such that ∥A k:···:j ∥ 2 p is POVM norm of the composition of CP maps A k $ k A k−1 · · · $ j+1 A j , and C {n,m} is a composition of CP maps, dephasing maps, and projections all acting on ρ, where its exact form does not matter. This inequality (30) is found using Eq. (25) for a single sum over n i ̸ = m i , and the generalization of Eq. (28) for multiple sums. In other words, the final (leftmost) projector is what determines the form of the inequality for each term. We here give the proof of Eq. (30) for a triple sum over n i ̸ = m i , which one can see will generalize to higher sums, Here, in the second line we have included the positive diagonal terms n i = m i , and from the third line onward we have applied the reasoning behind the first steps of Eq. (28), to eliminate the α ̸ = β cross terms. This we have done by choosing canonical Kraus representations for the combined Kraus operator R α ′ n 2 α := K α ′ 2 |n 2 ⟩ ⟨n 2 | K α 1 . Note that this is indeed a Kraus operator as In the second last inequality we have used that i x i ≤ i |x i |, and included extra positive terms when n 2 (m 2 ) ̸ = n ′ 2 (m ′ 2 ). The rest of the proof follows essentially the same steps as Eq. (28). Note that this argumentation will generalize to larger sums. Now, we can count all the terms in the expansion of Eq. (21), for arbitrary k instruments by noting that the leftmost projector is what determines the bound on the term. There are exactly 2 k − 1 terms, with for instance exactly one term in the expansion that has no projectors (only dephasing); two terms with the leftmost projector at the first position (acting directly on A 1 ) with either dephasing or another projector at the zero position (acting on ρ); 2 0 + 2 1 + 2 2 = 4 terms with the leftmost projector being at the second position (with a choice of dephasing or projector for positions zero and one); and so on. all the structure of multitime temporal correlation functions like the minimal setup using the process tensor framework does.
Finally, we may also extend these results and consider measurements using correlated instruments {A i }, which are represented by appending an ancilla space W to SE [85]. Instruments are then taken to act on the experimentally accessible space of SW , in which case they are often called testers [27]. Projective measurements are then performed on the ancilla space at some stage of the process, in order to determine the correlations. Considering our previous results, this means for an ancilla initial state γ W , we take ρ → ρ SE ⊗ γ W , and A(·) → A SW ⊗ I E (·), where we take the ancilla to be a perfectly controlled space such that only SE evolves unitarily: P n → (P n ) SE ⊗ 1 W . It is straightforward to see that all previous results hold in this case.

D Proof of Result 1
Chebyshev's inequality states that for any random variable x with mean µ and variance σ 2 , and for any δ > 0, We use that Eq. (10) is a bound on the variance, and choose to arrive at Recalling the decomposition Eq.

E Proof of Result 2
We first provide a proof for Eq. (12), from which Result 2 follows directly. Note that this proof follows closely to one given in Ref. [29].
Consider that each A w ∈ M k has outcomes {⃗ x} = {(x 1 , x 2 , . . . , x w )} corresponding to the instrument A ⃗ x , where w ≤ k. We can then bound the time averaged diamond norm, as defined in Eq. (11), where in the second line we have used that max i |a i | ≤ i |a i |, in the fourth we have used Eq. (9), and in the penultimate line that w ≤ k. We have defined the total number of outcomes for all instruments in the set of (up to) k−time measurements M k where card(M ) is the cardinality of the instrument M (also called the Kraus rank), i.e. the number of Kraus operators in the minimal (canonical) Kraus representation.
Markov's inequality states that for any non-negative random variable X with mean µ, and for any δ > 0, P(X ≥ δ) ≤ µ/δ. Using the bound on the mean Eq.

F Proof of Result 3
Consider the instruments A ± and the definition of non-Markovianity as described above Result 3.
Define ⟨A x − ⟩ Υ := ⟨I + ⊗ A x − ⟩ Υ . Consistent with the definition of N , take the non-Markovianity of Υ and Ω to be, respectively, where in the first line without loss of generality we have assumed that N Υ (A ± ) ≥ N Ω (A ± ) -otherwise all x/y − are replaced with x ′ /y ′ − , which is accounted for by the maximum in the penultimate line. In the third line we used that N Ω (A ± ) is defined as a maximum over x ′ and y ′ , and so any other choice x and y will give a smaller number. In the fifth line we have reintroduced the absolute value (as in the first line we have assumed the entire right hand side is positive), and factored the denominators of the first two terms. This allows us in the following line to expand the geometric series with respect to the term r = ⟨A x/y − ⟩ Υ−Ω / ⟨A x/y − ⟩ Ω . This requires the assumption that, without loss of generality, In the alternative case (r > 1 for this choice of r) there will be a geometric expansion for the third and fourth terms instead (where r ′ = ⟨A x/y − ⟩ Ω−Υ / ⟨A x/y − ⟩ Υ and so |r ′ | < 1 is then ensured), and this is accounted for by the maximum in the last line (if Λ = Υ (Ω), then its complement isΛ = Ω (Υ)). Finally, we have used the triangle inequality, and that |x − y| ≤ 2 max{|x|, |y|}. G Numerical Analysis of Equilibrium Process Non-Markovianity (Fig. 2) Our model for the numerical simulation is a single spin coupled to a random matrix environment, with the system-environment Hamiltonian specified as follows: To mimic a large bath as well as possible with finite numerical resources, random matrix theory is used to sample the bath coupling operator B and the free bath Hamiltonian H E . Specifically, the energies k of the bath are randomly sampled from the interval [0, 1]. Furthermore, the bath coupling operator is then constructed as B = (R + R T )/2 − diag[(R + R T )/2], where R is a random matrix with entries sampled from the interval [−1, 1], and also the system splitting ω is used as a sharp cutoff for deciding which bath energies may couple in the matrix R (see Fig. 3). Consequently, B is a Hermitian operator with diagonal elements set equal to zero (to make the bath 'passive'), with the far-fromdiagonal elements also set to zero (to simulate a more physical interaction between energy eigenstates, compared to if B was fully populated). The other numerical parameters are the level splitting of the qubit, which is set to ω = 0.5 (note that ω should be smaller than one, otherwise the system energy dominates the bath energy), the tunneling of the qubit, set to ∆ = 0.2, and the system-bath coupling strength, set to λ = 0.1 (note that λ should not be too small to let the system 'talk' to many levels of the finite bath). Hence, the only free parameter of the Hamiltonian in the following is the dimension d E of the bath Hilbert space. Note that there is no averaging over the random matrices in the Hamiltonian: only a single instance is used for them.
A random pure initial SE state ρ is generated via a random complex vector in the computational basis, followed by normalization. This is to simulate a generic situation, with a relatively large d eff on average.
To simulate the instrument A ± , a random rank-1 measurement on the (single qubit) system state (with the identity acting on the variable sized environment) is generated for steps 2 and 3, with a new system state ρ S,i randomly generated independently after each step, such that e.g. the input to step 3 is ρ 3 = ρ S,3 ⊗ tr S [A 2 ($(ρ 2 ))]. By a random measurement, we mean a POVM element constructed via choosing a (normalized) Kraus operator with uniformly random complex components. Across the three steps, together this is the instrument A + . 20 iterations of a step 1 measurement A − are also randomly The instruments A± used to probe these processes involved a repeated measurement and independently-prepare protocol, with a system-level causal break between A− and A+, as displayed in the upper right of Fig. 2. The unitary evolution of this process was generated with uniformly sampled 0.01 ≤ ∆t1, ∆t2, ∆t3 ≤ 0.5. This was maximized over the difference in outcomes A−, and averaged over A+ to determine the non-Markovianity. This was produced using [86,87].
generated, and for a given A + , ρ, and H SE , the largest difference in the expectation value ⟨A ± ⟩ Υ/Ω for varying A − is computed. This simulates the different possible outcomes one could possibly obtain for a step 1 measurement, and what effect this may have on the final measurement. For a given H SE and ρ, this is repeated and averaged over for separately 25 sampled A + .
Two unitary processes Υ ℓ and Υ s were used in the analysis, in addition to the dephased process Ω. Random long time intervals were used for Υ ℓ , picked uniformly such that 5 ≤ ∆t 1 , ∆t 2 , ∆t 3 ≤ 50, and short time intervals for Υ s , such that 0.01 ≤ ∆t 1 , ∆t 2 , ∆t 3 ≤ 0.5. Υ s produced relatively random results for the non-Markovianity, which is expected as the process does not have enough time to relax from its initial conditions. The results for Υ s are shown in Fig. 4, while the results for Υ ℓ are a part of Fig. 2. The standard error for the data depicted in Fig. 2 was too small to include as error bars, and the average for Υ ℓ was σ av = 1.0 × 10 −3 , and for Ω it was σ av = 1.8 × 10 −4 .
For each d E increasing by two, up to d E = 400, the entire computation is repeated for 40 iterations of generating a random H SE and random initial state ρ, with the final non-Markovianity being the average of these runs. Fig. 2 was plotted using a moving average in bins of 5 points, so the trend could be more easily discernible.