How Dynamical Quantum Memories Forget

Motivated by recent work showing that a quantum error correcting code can be generated by hybrid dynamics of unitaries and measurements, we study the long time behavior of such systems. We demonstrate that even in the"mixed"phase, a maximally mixed initial density matrix is purified on a time scale equal to the Hilbert space dimension (i.e., exponential in system size), albeit with noisy dynamics at intermediate times which we connect to Dyson Brownian motion. In contrast, we show that free fermion systems -- i.e., ones where the unitaries are generated by quadratic Hamiltonians and the measurements are of fermion bilinears -- purify in a time quadratic in the system size. In particular, a volume law phase for the entanglement entropy cannot be sustained in a free fermion system.

Motivated by recent work showing that a quantum error correcting code can be generated by hybrid dynamics of unitaries and measurements, we study the long time behavior of such systems. We demonstrate that even in the "mixed" phase, a maximally mixed initial density matrix is purified on a time scale equal to the Hilbert space dimension (i.e., exponential in system size), albeit with noisy dynamics at intermediate times which we connect to Dyson Brownian motion. In contrast, we show that free fermion systems -i.e., ones where the unitaries are generated by quadratic Hamiltonians and the measurements are of fermion bilinears -purify in a time quadratic in the system size. In particular, a volume law phase for the entanglement entropy cannot be sustained in a free fermion system.
Recently it has been argued that a low-dimensional (even a one-dimensional) quantum system which mixes unitary evolution by local circuits with local measurements can act as a quantum memory [1][2][3][4][5][6][7]. If one records the outcomes of the measurements, this process can protect nontrivial quantum information. Here, we investigate the long-time dynamics of this process to understand how the system ultimately "forgets," i.e., if the system is used to store quantum information, how the information necessarily is lost by these measurements.
To study this long time dynamics, we ignore the spatial structure. The system consists of just a single Hilbert space of high dimension N , with N even. Our model consists of alternating two different steps: first, a unitary evolution, followed by a measurement of a single bit of information 1 , represented by a rank N/2 projector. We can also choose to conjugate the measurements by the unitary, and so the model can be described by measuring a single bit of information at each step, with the measurement basis changing each time. Thus, if we evolve by unitary U 1 , then measure projector P 1 , then evolve by unitary U 2 , then measure projector P 2 , this is equivalent, up to an overall unitary, to measuring projector U † 1 P 1 U 1 , followed by measuring projector U † 1 U † 2 P 2 U 2 U 1 . We keep track of the quantum trajectories by writing down the measurement outcomes, so in particular pure states always evolve to pure states along such trajectories.
We consider two different cases, that we term "many-body" and "free fermion". In the many-body case, the unitaries as chosen to be Haar random. The term "many-body" is a bit of a misnomer: we have some fixed high-dimensional Hilbert space, perhaps formed by tensoring many qubits, so a better term might be "high-dimensional single body". Nevertheless, we persist in using the term many-body; in particular, one may hope that sufficiently deep quantum circuits for a tensor product Hilbert space can be well-approximated by our Haar random measurements [8][9][10]. In the free fermion case, the Hilbert space is a Fock space of fermions, and measurements are only allowed to be of fermion bilinears. In the many-body case we will find that the system preserves information up to a time scale proportional to N (which, for a tensor product Hilbert space, is exponentially large in the number of degrees of freedom). However, up to that time, we will find large sample-to-sample fluctuations in how well information is preserved. In contrast, in the free fermion case, on a system with n modes (hence, 2 n -dimensional Hilbert space), we prove that the purification time is ∼ n 2 . While not exponential, this time is still slower by a factor of n than the purification time with optimally chosen measurements, which is only proportional to n. Indeed, if measurements can be done in parallel, one can purify a many-body or free fermion system in a single step with n commuting measurements, such as measuring the Pauli Z operator on each of the n qubits.
To understand how well the system preserves quantum information, we will use a reference system. We will start with a state that is maximally entangled between the system and the reference, so that the reduced density matrix ρ on the system is maximally mixed. We then study the evolution of this reduced density matrix under a sequence of measurements [5]. Before describing results, it is helpful to see the output of a numerical simulation of the many-body case for a Hilbert space of dimension N = 2000, shown in Fig. 1. At each step we choose a rank N/2 projector randomly, by choosing P = U P 0 U † for some fixed projector P 0 with U a unitary chosen from Haar measure. The red "measurement" curve is the physically meaningful case: we compute the probability tr(P 0 ρ) that the measurement has a given outcome, and with that probability replace the density matrix ρ with (P 0 ρP 0 )/ tr(P 0 ρ), otherwise replacing it with (1−P 0 )ρ(1−P 0 )/ tr((1−P 0 )ρ). The black "post-selection" curve is not physically meaningful; here we always replace ρ with P 0 ρP 0 / tr(P 0 ρ). The quantity being plotted is the "purity" of ρ, defined to be tr(ρ 2 ).
One sees that the dynamics has roughly three regimes. First, at early times in both cases the purity grows linearly with only small fluctuations. Then, at intermediate times, the purity has noisy dynamics, and can actually decrease. Indeed, one can observe long fluctuations in which the purity decreases for many steps. Finally, the purity becomes close to 1 and converges to 1. The long time convergence is quite different between the measurement and postselected case, with an exponential convergence for measurement, but not for postselection. This will follow from Eqs. (5) and (10).
Perhaps the most interesting question is how the decrease in purity can occur. For one thing, it seems strange as it suggests that a random measurement can actually restore quantum information that has been lost. Indeed, it is possible for a measurement to reduce purity (or increase entropy), but importantly, the square-root purity averaged over measurement outcomes cannot decrease after a measurement (and similarly, the entropy averaged over measurement cannot increase). We show this in Appendix A. Still, though, it may be surprising that large fluctuations occur for random choices of projector, and that they are still present even for N = 2000. One heuristic explanation as to why they are present is that as the purity increases, the system starts to get a few larger eigenvalues, and this reduces the tendency of the system to self-average. Our analysis in Section 1 shows mathematically what happens: the average purity increases in a single step by an amount proportional to order 1/N . There are terms which are of order 1/N 2 and smaller, but we ignore those. At the same time, the variance of the purity after a single step is of order 1/N also, i.e. the root-mean square is 1/ √ N . Thus, it suggests the picture that the purity obeys a biased diffusion equation; the bias and the noise both depend on purity. At a time scale of order N , both bias and noise are equally important; that is, the Peclet number is of order 1, independent of N at this time scale. We show that for a maximally mixed state, as well as a nearly pure state, the variance is negligible, which explains why the early and late time dynamics are approximately noiseless.
This picture that the purity obeys a biased diffusion equation is not completely correct: in general, the drift and diffusion of the purity depend on traces of higher powers so one cannot write down a Markovian evolution for the purity alone. In the special case that the density matrix has rank 2, one can describe the dynamics just in terms of the purity: see Fig. 2 and Appendix B. For low rank density matrices, we are able to describe the dynamics of the set of eigenvalues by a diffusion equation with drift.
In the case of free fermion dynamics, we consider two cases: (1) particle-number conserving dynamics and (2) general free fermion evolution, which may involve pairing. In both cases we use the formalism of Gaussian states [11]. The free fermion unitaries are chosen from U (N ) or SO(2N ) respectively, randomly with respect to Haar measure. We use a proxy for the second Renyi entropy which is only well defined for Gaussian states but has the advantage of being much easier to work with. This proxy entropy is equal to the second Renyi entropy for the maximally mixed state and all pure Gaussian states, and stays within fixed positive bounds of it for all Gaussian states, so understanding this proxy entropy is just as good as understanding the actual second Renyi entropy (and hence purity) insofar as the purification dynamics goes. Our main result is that if we start with a state on n modes whose proxy entropy is s n log 2 -i.e., the entropy density is s log 2 -then measuring a single mode must decrease the proxy entropy by at least an amount of order s 2 , when averaged over the two possible measurement outcomes. This leads to a rigorous bound on the purification time of order n 2 , showing that this maximally entangling free fermion system is in a purifying phase [5,12]. We expect, but do not prove, that any free fermion system, no matter how non-local, will purify in a time polynomial with n. Our result is also consistent with the lack of a volume law phase in free fermion unitary-measurement dynamics [13][14][15][16].
The paper is organized as follows. In Section 1 we give results for average change in purity and fluctuations in purity for the many-body case, and analyze these results. In Section 2 we give a Brownian motion picture for eigenvalues of the density matrix, valid when the rank of the matrix is small compared to N . In Section 3 we discuss the free-fermion case. Finally, we give a brief discussion in Section 4. In Appendix A, we prove that entropy, averaged over measurement outcomes, decreases after measurement. In Appendix A we prove general entanglement inequalities. In Appendix B we derive necessary formulas for averages of products of traces over choices of unitary matrices, using a set of "Schwinger-Dyson equations" [17], for which we give a self-contained derivation. There are of course many tools one could use, such as the Weingarten calculus, and other readers may prefer that. The Schwinger-Dyson equations however have the advantage of being relatively simple to use and of naturally organizing the result in powers of N −1 .
Note: After finishing this preprint, we became aware of Ref. [18] which found a similar exponential time purifying behavior using a capillary-wave description of domain walls in an effective statistical mechanics model describing the mixed phase of a 1d local hybrid measurement-unitary circuit.
1 Many-Body Dynamics

Post-selected case
We wish to compute the entropy of ρ = tr(P ρ) −1 P ρP.
where ρ is a density matrix on an N dimensional Hilbert space, and P = U P 0 U † is a random rank N/2 projector. Here P 0 is a fixed rank N/2 projector and U a Haar-random unitary. For computational purposes, we use the 'purity', i.e. the trace of the square of the density matrix, as our entropy measure. The purity is maximal for a pure state, where it is 1, and achieves its minimum of 1/N for a maximally mixed state. At large N we expect tr P ρ to be close to 1/2, so we write tr P ρ = 1/2 + δ. Then where E denotes the average over the unitaries U . Let us estimate how big δ can be. Using Eq. (37) of Appendix B we have that so E[δ 2 ] is O(N −1 ) and thus with high probability δ is smaller than N −1/2+ for any > 0.
In fact, we can give even better bounds on the fluctuations in δ using concentration of measure. We have E[δ] = 0. Regard δ = tr P ρ − 1/2 = tr U P U † ρ − 1/2 as a function of U . By a triangle inequality, this function is 2-Lipschitz using the operator norm as a metric for U 2 Hence, this function is also 2-Lipschitz using Hilbert-Schmidt norm as a metric. Hence, (see for example theorem 5.17 of [19]), the probability that |δ| ≥ x for any x > 0 is bounded by exp(−Ω(N x 2 )). So, the terms of order δ 3 are bounded by O(poly(log(N ))/N 3/2 ) with probability 1 − 1/N α for any constant α and so we may neglect them when estimating to order 1/N . We formalize this into the following 'non-perturbative' result: Lemma 1. For any α > 1, the following holds: Proof. We bound the expectation value of the "error": tr P ρP ρ (tr P ρ) 2 − 4 tr P ρP ρ − 16(tr P ρP ρ)δ + 48(tr P ρP ρ)δ 2 by dividing into two cases.
In Eq. (42) Appendix B we compute the first three terms in Eq. (3) to order 1/N , using the Schwinger-Dyson equations to perform averages over U , to obtain: where the dots represent terms the asymptotically small error term in Eq. (3). When ρ is close to a maximally mixed state, the traces of the higher powers of ρ on the right hand side of Eq. (4) can be neglected, and we see that the purity of ρ increases by 1/N . This explains the initial linear growth of purity in Fig. 1.
Note that by monotonicity of Schatten p-norms, (tr given a bound on the operator norm U − V . We will prove the bound in a more general case allowing U, V to be non-unitary but requiring that U , V ≤ 1. We We claim that we can ignore the last U − V 2 term; indeed, given any U, V , and any integer k consider the sequence of operators U 0 = U, U 1 , U 2 , . . . , U k = V given by linearly interpolating between U and V so that When ρ is close to a pure state, i.e. for tr ρ 2 = 1 − with small, this gives E tr P ρP ρ (tr P ρ) 2 where now the · · · denote higher order terms in and N −1 . Naively iterating this would give evolving in time t (number of steps) as N/t, implying purification at a time scale scale t ∼ N 2 . However, we cannot ignore non-linearities in the noise at those time scales, so we cannot draw any sharp conclusions about the late time behavior in the post-selected case. In Eq. (56) of Appendix B we also calculate the noise to leading order in 1/N : Again, the dots indicate an error term that is bounded to be asymptotically smaller than the leading 1/N term by Lemma 1. When tr ρ 2 is small, tr ρ 3 and tr ρ 4 are upper bounded by (tr ρ 2 ) 3/2 and (tr ρ 2 ) 2 respectively, and lower bounded by 0, which shows that, to leading order in tr ρ 2 , the noise is upper bounded by 4(tr ρ 2 ) 2 /N . This explains the lack of noise during the initial growth of the purity in Fig. 1. Conversely, for a nearly pure state, tr ρ 2 = 1 − , we show in Eq. (59) in Appendix B that the noise is upper bounded by 4 2 /N . This is consistent with the lack of noise when the purity is close to 1 in Fig. 1.

Measurement case
Now let us consider the case without post-selection. We want to compute tr P ρP ρ (tr P ρ) 2 + (tr(I − P )ρ) where I is the identity matrix. This is the desired value since the measurements outcomes occur with probability tr P ρ and tr(I − P )ρ respectively. After cancelling the probability against one power of the trace in the denominator, and using the fact that the probability distribution for P is invariant under where again the dots represent an asymptotically small error bounded in the same way as that in Lemma 1. As shown in Eq. (44) in Appendix B, the result is Once again we see a linear initial growth in steps of 1/N , consistent with the red measurement curve in Fig. 1. The second term is in fact larger than 1 we see that the change ∆F of F in the large N limit obeys This implies that for any initial probabilistic ensemble of ρ, the ensemble average purity converges to 1 exponentially with characteristic time N . This has two immediate consequences. First, the impurity F is exponentially small with probability exponentially close to 1. Indeed, since F ≥ 0, Markov's inequality implies that for any a ∈ (0, 1) This implies in particular that the variance of the purity, which is at most E F 2 , decreases exponentially. In fact, this ensemble variance can be computed more explicitly. We average the square of the post-measurement purity over both measurement outcomes and the unitaries U . We get the same answer as in the post-selected case (see Eq. (58) in Appendix B): so we obtain the same bounds as in the post-selected case: for small tr ρ 2 the noise is bounded by 4(tr ρ 2 ) 2 /N , whereas for a nearly pure state, tr ρ 2 = 1 − with small , the noise is bounded by 4(tr ρ 2 ) 2 /N . This is consistent with the lack of noise seen in the red measurement curve in Fig. 1 at early and late times. Second, the von Neumann entropy S vN also is exponentially small with probability exponentially close to 1. To see this, we observe that for any state ρ of impurity F = 1 − tr ρ 2 there exists a pure state σ within trace distance 1 2 ρ − σ 1 ≤ F . Then, the continuity of von Neumann entropy (the Fannes-Audenaert inequality [20]) implies that for any In Appendix B.3 we explicitly work out the case of rank 2 density matrices ρ, where all the terms in the various equations can be expressed in terms of the purity tr ρ 2 .

Connection to Dyson Brownian motion
Brownian motion is a stochastic process of a point x(t) in R d where for each (small) time step dt the point is displaced by x(t + dt) − x(t) = dx that is drawn from the Gaussian distribution with mean zero and variance dt. It is well known that in the limit dt → 0, the density of the points evolves according to the heat equation. Dyson Brownian motion [21] is a Brownian motion in R d 2 that is identified with the space of all Hermitian matrices. More concretely, the update rule for a Hermitian matrix X(t) is that the increment X(t+dt)−X(t) is drawn from the Gaussian unitary ensemble where each independent matrix element is normalized to have variance dt. The probability density σ of the eigenspectra {λ 1 (t), . . . , λ d (t)} of X(t) follows the Dyson partial differential equation: The purpose of this section is to derive an analogous differential equation for the dynamics of probabilistic ensembles of states according to the "measurement" setting defined in the introduction. 3 We will derive the following equation that holds for an arbitrary real-valued smooth function F : λ → F (λ) of the eigenspectrum {λ 1 , . . . , λ d } of a state of rank ≤ d N : where the expectation is over the distribution of ρ at the given time. If desired, one may turn this into an equation for the probability density of eigenspectra by applying the adjoint differential operator D † . To have a differential equation we are of course taking some limit of our discrete measurement process. Here, we are taking N , the dimension of the total Hilbert space, to infinity while setting the "infinitesimal" time step dt to be equal to 1 N . In other words, Eq. (15) governs the dynamics of the ensemble average of F accurately to order 1 N when the initial state ρ has rank at most d N . Before we start the derivation, we note that the equation implies that if F is a function of the sum of all eigenvalues a λ a = 1, then ∂ t EF = 0, as expected. Indeed, if F (λ) = f ( a λ a ) for some smooth real function f , then ∂ a F = f and ∂ a ∂ b = f and the first sum vanishes because the summand is antisymmetric under a ↔ b and the second sum becomes (tr ρ 2 ) − 2(tr ρ 2 )(tr ρ) + (tr ρ 2 )(tr ρ) 2 = 0.
Note also that the calculation of the previous section is reproduced.
which agrees with Eq. (9). Observe that Eq. (10) is reproduced by the first term (the level repulsion term) of DF in Eq. (15).

Derivation
Here we derive Eq. (15) assuming d is a constant independent of N . Put For any given unitary U and a (fixed) projector P of rank N/2, we find it convenient to introduce M defined by where the first and second moments are 4 4 These are computed by where I is the identity and Swap is the operator that swaps the two tensor factors. Observe that these moments agree with those of Gaussian unitary ensemble, rescaled by dt, up to the leading order.
After the measurement of ρ by {U † P U, I − U † P U } we have two outcomes whose spectra coincide with those of Given M , we obtain ρ M with probability 1 2 (1 + tr M ρ) and ρ −M with probability 1 2 (1 − tr M ρ). Thus, the expectation of F (ρ ) for any function F over post-measurement states ρ is given by Since the Haar measure on U is invariant under the left multiplication by V where V † P V = I − P , we see that the random matrix M has the same distribution as −M , implying that the two terms of Eq. (21) are the same. We conclude that the post-measurement ensemble given ρ is where M is determined from U by Eq. (18), and dM is the induced measure. Hence, M ab M cd dM = δ ad δ bc dt by Eq. (19). Let ρ M have eigenvalues λ 1 , . . . , λ d . By second order perturbation theory, where 1 + tr M ρ in front of λ a is to normalize λ so that a λ a = 1. So the change in the eigenvalue is For an arbitrary smooth function F from spectra to R, we calculate the expected value of F with respect to the post-measurement ensemble in Eq. (22) as follows. That is, EF = F dM . We will ultimately want to take the expectation over a general ensemble, but for the moment we are assuming that the pre-measurement ensemble is a Dirac delta distribution. Abbreviate ∂ ∂xa F (x 1 , . . . , x d ) as ∂ a F . We only keep terms up to order dt.
In the second term of the last line we may ignore the denominator because ξ a ξ b is already O(dt). Using Eq. (19), we arrive at Eq. (15) after integrating over a pre-measurement distribution, and letting dt → 0.

(Un)Importance of Haar randomness
In the above derivation, we assumed d N is fixed and used the following: (i) the first and the second moments of M is given by Eq. (19), and (ii) lim dt→0 E M 3 /dt = 0. 5 The first condition is obviously used, and the second is to estimate the error term by Taylor's theorem. (Since F is a smooth function over a compact domain, the derivatives are all bounded.) This implies that our differential equation Eq. (15)

Free Fermion Dynamics
We now specialize to the case of free fermion dynamics. Consider the 2 n dimensional Fock space of n fermionic modes, acted on by creation and annihilation operators a † µ and a µ , µ = 1, . . . , n. In this section we will study non-interacting measurement-unitary dynamics. This is dynamics where the observables being measured are quadratic in the a µ , a † µ , and the unitaries being applied are exponentials of anti-Hermitian operators quadratic in the a µ , a † µ . We will allow for non-particle conserving processes, and will find it useful to work with the Majorana operators Our main result in this section is that, starting from the maximally mixed state, or, more generally, any mixed Gaussian state, a free fermion system purifies after ∼ n 2 random free fermion measurements. This is slower than e.g. a system in an area-law entanglement phase, which purifies in time ∼ n, but much faster than a general interacting system, whose purification time scales like the many body Hilbert space dimension (2 n in this case), as we shall see in the next section. In the framework of Gullans and Huse [5], a free fermion hybrid measurement-unitary circuit is thus always in the purifying phase, consistent with the intuition that a quantum error correcting code cannot dynamically emerge from free fermion dynamics. We emphasize that we make no locality assumptions: the free fermion unitaries are fully random and thus non-local.
Since our evolution takes place entirely within the space of Gaussian states, let us recall some basic facts about these states, following Ref. [11]. A Gaussian state ρ is a density matrix that, when written as a polynomial in the Majorana operators γ j with each γ j appearing with exponent 0 or 1 in each term, can be put into the form when the γ j are replaced with Grassmann numbers θ j . Here M is a real anti-symmetric 2n-by-2n matrix known as the correlation matrix. We have Any such M can be transformed by an SO(2n) rotation R into the following block-diagonal form: where the λ µ satisfy −1 ≤ λ µ ≤ 1 and are known as the Williamson eigenvalues of M . Implementing the rotation R on Fock space, the state ρ is transformed into Since iγ 2µ−1 γ 2µ = a µ a † µ − a † µ a µ is the operator that measures the fermion parity of mode µ, we see that ρ 0 is a tensor product state where each mode µ is independently filled or empty with probabilities 1 2 (1 − λ µ ) and 1 2 (1 + λ µ ) respectively. The correlation matrix of ρ 0 is Before proceeding, it will also be useful to define the following function on Gaussian states: where M µν = 2 Tr (ρ a µ a † ν ) − δ µν is discussed below. S proxy (ρ) is a proxy for the second Renyi entanglement entropy S 2 (ρ) = n log 2 − 1 2 Tr log(1 − M 2 ) because it agrees with it on the maximally mixed state (where M = 0) and all Gaussian pure states (where M 2 = −1), and in all other cases stays within order 1 constant multiples of S 2 . In the rest of this section we will establish bounds on the rate at which S proxy decreases; these will immediately translate to bounds on the second Renyi entropy.

Particle number conserving dynamics
If we start with ρ 0 and apply dynamics that conserves U (1) particle number, then the system moves through a restricted set of Gaussian states which are slightly easier to work with, and whose form we now derive. A U (1) particle number conserving unitary acts on the operator algebra by: where U is a unitary n-by-n matrix. Written in terms of Majoranas this is: where the 2n-by-2n orthogonal matrix O describes the action of the unitary U on C n viewed as a real vector space R 2n . Explicitly, the 2-by-2 block spanning rows 2µ − 1 and 2µ and columns 2ν − 1 and 2ν of O (µ, ν = 1, . . . , n) is Now consider a Gaussian state ρ =Û † ρ 0Û , whereÛ is the Fock space operator that implements the action of U in Eq. (27). We have so the correlation matrix of ρ is M = OM 0 O T . Again identifying R 2n with C n , we see that M is the underlying real space action of the anti-Hermitian operator −iM = −iUM 0 U † , where M 0 is the diagonal matrix with λ µ on the diagonal. We have M µν = 2 Tr (ρ a µ a † ν ) − δ µν . This class of Gaussian states is precisely the class in which pairing correlations Tr (ρ a µ a ν ) all vanish.
Let us now consider ∆S proxy , the change in S proxy (ρ) = (log 2)(n − Tr M 2 ), averaged over measurement outcomes, after measuring the occupation number of the first mode, i.e., the observable iγ 1 γ 2 . In Appendix C we show that where M 11 is the entry in the first row and first column of M. We thus see that, averaged over measurement outcomes the proxy entropy cannot increase, consistent with the same facts about the von Neumann and second Renyi entropies as proved in Appendix A. Now imagine that we have an ensemble of density matrices, one that results for example from the application of several steps of a hybrid unitary-measurement circuit. Letting the bar denote the average over this ensemble, we have: Now suppose that our circuit consists of Haar random free fermion unitaries interspersed with measurements of the first mode. In this case, 1 − (M 2 ) 11 = 1 − 1 n Tr M 2 = S proxy /(n log 2). Letting s = S proxy /(n log 2) be the density of the proxy entropy, we then have This equation roughly means that when the average proxy entropy density is s, we learn at least ∼ s 2 about the system by measuring a single mode. It implies that s(t) ≤ (1 + t/n) −1 , where t measures the number of unitary-measurement steps taken. Note that due to the convexity properties above, this is a rigorous upper bound on s(t). We thus see that the system loses half of its entropy density in a time ∼ n, and its purity becomes of order 1 in a time ∼ n 2 . We expect these results to hold for a much more general class of free fermion unitaries, since the Haar random case intuitively corresponds to the situation where mixing is maximal. Indeed, one generalization is a protocol where one alternates the application of an arbitrary free fermion unitary (not necessarily Haar random, and possibly different at each step) and measurement of each site with some nonzero probability. In this case, an argument similar to the above shows that the proxy entropy decreases as ∼ 1/t, implying that a mixed phase cannot be sustained in such a system. As a consequence, an entanglement volume law phase cannot be sustained in such a system either.

Particle number non-conserving dynamics
Now let us perform the same calculation for a general Gaussian state ρ, with 2n-by-2n real antisymmetric correlation matrix M which may now contain non-zero pairing correlations. To find the correlation matrix of the post-measurement state we use the techniques of Sec. VIII of Ref. [11]. Let K be the 2n-by-2n matrix K pq = δ p1 δ q2 − δ p2 δ q1 . Let α = M 12 , let Q pq = δ p1 δ q1 + δ p2 δ q2 be the projector on the first two basis vectors, and let P = 1 − Q. Note that QM Q = αK. The probabilities of the first mode being empty and filled are p ± = (1 ± α)/2, and the post-measurement correlation matrix is We thus have On the other hand, We note that the only non-zero entries of KM P M K are in the upper left 2 by 2 block; these are ) and y = (m 21 , 0, m 23 , . . . , m 2(2n) ) be the first two rows of M . Note that |x| 2 = α 2 + B and |y| 2 = α 2 + A. Using these facts, the expression in Eq. (32) simplifies after some algebra to where we take the determinant of only the upper left 2-by-2 block of Q(1 + M 2 )Q. The matrix 1 + M 2 is positive semidefinite since its eigenvalues all lie between 0 and 1, so the same is true of Q(1 + M 2 )Q, and hence the determinant above is always non-negative. This shows that the proxy entropy, averaged over measurement outcomes, always decreases. Let us now start from the maximally mixed state and alternately apply SO(2n) Haar-random unitaries and measurement operations. We claim that this hybrid unitary-measurement circuit purifies in time ∼ n 2 . To see this, we have to average the above formula for ∆S proxy over Gaussian states with with correlation matrices of the form OM O T . This requires computing averages of various quartic expressions in the entries of O, which we do using the Weingarten calculus. 6 We use the fact 6 The average of any quartic polynomial in Haar random O ∈ SO(2n) can be computed as follows. The average E = commutes with R ⊗ R for any R ∈ SO(2n). Therefore [22], we must have E = xI + yS + zW for some x, y, z ∈ R where I = 2n j,k=1 |j, k j, k|, S = 2n j,k=1 |k, j j, k|, and W = 2n j,k=1 |j, j k, k|. Observe that Tr I = 4n 2 , Tr S = 2n, and Tr W = 2n. Now, Tr E, Tr ES, and Tr EW can be computed directly, determining the coefficients x, y, z.
. The average of the determinant in the above equation can then be expressed in terms of the spectrum of 1 + M 2 ; in fact, all that enters is the sum of the eigenvalues, Tr (1 + M 2 ), and the sum of the squares of the eigenvalues, Tr (1 + M 2 ) 2 . We have, to leading order in 1/n, S proxy n log 2 (33) We thus again see that, as long as the proxy entropy is much greater than 1, the proxy entropy density decreases at a rate at least as fast as the square of this density, leading to similar bounds as in the particle number conserving case. In particular, starting from the maximally mixed state, half of the proxy entropy is lost in time ∼ n, and the purity becomes of order 1 in time ∼ n 2 . However, from the above bound we cannot determine that the state will eventually purify (i.e., that the purity will tend to 1), only that it will reach purity of order 1. Also, the bound in Eq. (33) may not apply to protocols other than the Haar random case. Indeed, in Ref. [16] it is shown that a 1-dimensional fermionic system in which random iγ j γ j+1 operators are measured (for both parities of j) lies in the bond percolation universality class, and there we expect that after t measurements -which corresponds to 1 + 1d time ∼ n -the entropy should be of order n/t. Furthermore, it is relatively easy to construct a (still free fermion) generalization of this model, whose associated statistical mechanical model is the loop model with crossings of Ref. [23]. At any point corresponding to the 'Goldstone phase' of the associated loop model, the entropy -which corresponds to the spanning number -for a fixed ratio of n/t scales as log n.

Discussion
We have analyzed the many-body and free-fermion case. If the Hilbert space dimension N in the many-body case equals 2 n for some system of n qubits, then in the many-body case, the relaxation time is exponentially long in the number of degrees of freedom. This contrasts strongly with the freefermion case where the relaxation time is only polynomially long in the number of degrees of freedom -more precisely, it is of order n 2 . It is an interesting question whether some intermediate behavior is possible. Let us mention one further toy model. Consider a system of n qubits, with Pauli measurements and Clifford unitaries applied. If we apply a sequence of measurements Z 1 , Z 2 , Z 3 , . . ., with no intervening unitaries, then the system purifies in linear time. On the other hand, if we apply random Cliffords in between measurements of a fixed Pauli, it is easy to see that the purification time is exponential in n. In this process, the many-body state can be described as a stabilizer state with ≤ n linearly independent stabilizers. Initially, the maximally mixed state has no stabilizers. Given a state with k linearly independent stabilizers, a further measurement of a product of Paulis will add a new stabilizer to the state (hence decreasing the entropy by 1 bit) if the new stabilizer commutes with all the previous stabilizers and is linearly independent of them. Without loss of generality we can fix the k given stabilizers to be Z 1 , Z 2 , . . . , Z k . If we measure a random product of Paulis, the probability that it commutes with the previous stabilizers is exponentially small in k. 7 Thus, the purification proceed monotonically: each stabilizer measurement can either reduce the entropy by 1 bit or leave it unchanged. However, the purification time indeed is exponential in n.
Our results and this toy model suggest that there might be a dichotomy between polynomial and exponential relaxation times. However, perhaps richer relaxation behavior may be observed at a phase transition between these two possibilities.

Acknowledgments
LF acknowledges the support of NSF DMR 1939864, and useful conversations with Matthew P. A. Fisher.

A Appendix: Entropy Inequalities
Here we prove that entropy cannot increase on average after a measurement and square-root purity cannot decrease on average after a measurement.

Lemma 2.
Given any density matrix ρ, and any measurement (either a projective measurement or more generally a POVM), the average, over measurement outcomes, of the von Neumann entropy of the state after measurement, cannot increase.
Proof. Note first that since a POVM can be implemented by a projective measurement in a larger Hilbert space, it suffices to consider the case of projective measurements. Consider a tripartite system, with three subsystems A, B, C. Choose subsystem A have have reduced density matrix ρ A = ρ and choose C to be an arbitrary purification of ρ, i.e., the density matrix ρ AC on AC is a pure state. We use B as a register for the measurement: initially, B is in some fixed state |0 , and then some unitary on AB is applied so that the state of B in some given basis records the outcome of the measurement. This unitary is chosen in the obvious way so that if ρ is in the range of any of the projectors defining the projective measurement, then the reduced density matrix on A is left unchanged. We write τ for the density matrix of the tripartite system after this unitary is applied.
Then, the entropy of A after measurement, averaged over measurement outcomes, is equal to S(τ A ) − S(τ B ). Since C still purifies AB, this is equal to S(τ BC ) − S(τ B ). By subadditivity, this is ≤ S(τ C ) = S(ρ).
We now prove a similar result for the square-root of the purity of a quantum state. Then, Lemma 3. Given any density matrix ρ, and any measurement (either a projective measurement or more generally a POVM), the average, over measurement outcomes, of the square-root purity of the state after measurement, cannot decrease.
Proof. As before, consider only projective measurements. Label measurement outcomes by an index i and regard ρ as a block matrix with row and column blocks labeled by this index i. Let ρ ij denote the submatrix in the i-th row and j-th column block. Each normalized block σ i = ρ ii / tr ρ ii is a post-measurement state, which comes with probability p i = tr ρ ii . Hence, our goal is to show This inequality follows from the case where the index i assumes only two values since one can consider coarser blocking of ρ and subdivide it. So, assume that ρ consists of A = ρ 11 , B = ρ 12 = ρ † 21 , and D = ρ 22 . Squaring the inequality, we see that our inequality is equivalent to tr(B † B) ≤ tr(A 2 ) tr(D 2 ). linearly independent given that it commutes is 1 − 4 −(n−k) since this is the probability that it is not equal to the identity on the remaining n − k qubits.
Let us work in the basis where B is diagonal by taking singular value decomposition of B. (In this basis, A and D are not necessarily diagonal.) Since ρ is positive semidefinite, the determinant of any principal 2-by-2 block must be nonnegative. In particular, A jj D jj − (B jj ) 2 ≥ 0. Summing over j, we have . This completes the proof.
Using (P ) = N/2, (ρ) = 1 this can be re-written as: Now, we only care about the terms in brackets to zeroth order in 1/N , so we can repeatedly use Schwinger-Dyson equations to turn each P in each term in brackets to a 1/2. Note that the trace of a product of an arbitrary number of P 's and various powers of ρ is always bounded by 1, as can be seen by working in the eigenbasis of ρ, inserting complete sets of states between all the terms, and using the fact that i|P |j ≤ 1 for any unit vectors |i , |j . Thus we can get uniform bounds on the terms at order N −k , k ≥ 1, and hence are justified in dropping them. The result is: Now let X = P ρ 2 (P ρ). Then (HP ρ 2 )(P ρ) → (HP ρ 2 )(P ρ) + i (HHP ρ 2 )(P ρ) − (HP Hρ 2 )(P ρ) + (HP ρ 2 )(HP ρ) − (HP ρ 2 )(P Hρ) Multiplying by exp − 1 2 tr M M † and integrating over M gives, using Eqs. (34) and (35): which can be rearranged to Again replacing all the P 's with 1/2's in the term in brackets gives  (2) which is Eq. (4) of Section 1. These same three terms enter into Eq. (9): tr P ρP ρ tr P ρ = tr ρ 2 + 1 N 1 − 2(tr ρ 3 ) + (tr ρ 2 ) 2 + . . .

B.2 Schwinger-Dyson computation of the noise term
We now derive the noise for the post-selected and measurement cases, Eqs. (6) and (12) in Section 1. First let us treat the post-selected case, and derive Eq. (6). We have: so we just need to compute the expectation values of (P ρP ρ) 2 , (P ρP ρ) 2 δ, and (P ρP ρ) 2 δ 2 . We use the same Schwinger-Dyson methods as above, but give fewer details here.
Computing the first term: First take X = (P ρP ρ)(P ρP ρ). We obtain: (P ρP ρ)(P ρP ρ) = 1 2 (P ρ 2 )(P ρP ρ) For the terms with coefficient N −1 above, we only need to compute them to order N 0 . This can be done by repeated use of Schwinger-Dyson equations in which only two terms are of order N : essentially, these equations allow us to repeatedly replace P with 1 2 . We obtain, to order N 0 : Now we take X = (P ρ 2 )(P ρP ρ). We obtain (P ρ 2 )(P ρP ρ) = 1 2 (ρ 2 )(P ρP ρ) + 1 N −2(P ρ 2 P ρP ρ) + 2(P ρ 3 P ρ) Recalling that δ = (P ρ) − 1 2 , we therefore have that to order N −1 , On the other hand, using our previous computation we have that, to order N −1 : Subtracting these two equations we therefore see that to order N −1

E[
(P ρP ρ) 2 (P ρ) 4 ] − E[ which is just Eq. (6) in Section 1. Now let us do the computation in the case of measurement. Here we want to average over both the measurement outcomes and the unitaries U . Thus we want to compute E (P ρ) (P ρP ρ) 2 (P ρ) 4 + ((I − P )ρ) Writing the denominator (P ρ) = 1 2 + δ and expanding in δ as usual, we obtain E 2 (P ρP ρ) 2 (P ρ) 3 = 16(P ρP ρ) 2 By the same argument as in Lemma 1 the O(δ 3 ) terms add up to an error that is asymptotically smaller than 1/N , so we just need to compute the first 3 terms. Using Eqs. (51) to (53) these add up to E 2 (P ρP ρ) 2 (P ρ) 3 = (ρ 2 ) 2 + 1 N 2(ρ 2 ) + 4(ρ 4 ) − 12(ρ 2 )(ρ 3 ) + 6(ρ 2 ) 3 On the other hand, squaring Eq. (9) gives, to O(N −1 ): Subtracting these two gives E 2 (P ρP ρ) 2 (P ρ) 3 − E 2 (P ρP ρ) (P ρ) Let us make some comments about higher order contributions to this. Note that for ρ = 1 N 1, all of the terms in the above expression are O(N −4 ). One could ask if there are any O(N −2 ) contributions in this case. The N −2 term will look similar to the above, with a sum of products of traces of powers of ρ. By noting that the right hand side has to vanish identically for ρ = 1 N 1 (after summing the whole 1/N expansion), we see that, among the N −2 terms, there cannot be a constant piece. Hence we conclude that for ρ close to 1 N 1 (i.e. after a few applications of random P 's) the noise will be O(N −3 ).