Solvable model of deep thermalization with distinct design times

We study the emergence over time of a universal, uniform distribution of quantum states supported on a finite subsystem, induced by projectively measuring the rest of the system. Dubbed deep thermalization, this phenomenon represents a form of equilibration in quantum many-body systems stronger than regular thermalization, which only constrains the ensemble-averaged values of observables. While there exist quantum circuit models of dynamics in one dimension where this phenomenon can be shown to arise exactly, these are special in that deep thermalization occurs at precisely the same time as regular thermalization. Here, we present an exactly-solvable model of chaotic dynamics where the two processes can be shown to occur over different time scales. The model is composed of a finite subsystem coupled to an infinite random-matrix bath through a small constriction, and highlights the role of locality and imperfect thermalization in constraining the formation of such universal wavefunction distributions. We test our analytical predictions against exact numerical simulations, finding excellent agreement.


Introduction
The advent of quantum simulators-systems of many quantum particles whose interactions are finely programmable [1]-is beginning to allow the exploration of novel, complex many-body phenomena, particularly out of equilibrium [2,3,4,5,6]. Beyond their advanced control capabilities, one unique aspect of quantum simulators is the nature of the information they provide on the underlying quantum state. In many physical realizations (e.g., ultracold atoms with quantum gas microscopes, individually trapped ions or Rydberg atoms, superconducting qubits etc.), measurements involve the simultaneous read-out of the state of individual degrees of freedom, such as the occupation number of particles at a site in an optical lattice, or whether an atom is in its ground or excited state. In such cases, we can think of a measurement in a given run of the experiment as obtaining a classical 'snapshot' of the configuration of the entire system. Running an experiment multiple times, we then get a collection of snapshots, which constitutes microscopically-resolved information about the state. This represents a fundamentally new way of interrogating quantum many-body systems (as compared to more traditional macroscopic, coarse-grained probes, such as linear response to external perturbations or expectation values of order parameters). The vast amount of information that is generated by this type of probe immediately raises a number of questions. For example, what universal phenomena can we uncover based on the information content of these bit-strings? Can such information be used to characterize novel phases of matter, whether in or out of equilibrium? And what sorts of applications, especially for quantum information science, can we develop that leverage such large amounts of data?
There have been many recent developments along these fronts. Measuring local degrees of freedom during an otherwise-unitary many-body evolution was found to give rise to a novel kind of nonequilibrium, 'measurement-induced' phases and associated phase transitions, driven by a competition between entanglement creation and destruction [7,8,9,10,11,12,13,14,15]. Advanced data processing methods have been developed which allow for the efficient extraction of many properties of a quantum system by manipulating only compact, classical descriptions of its state, dubbed 'classical shadows' [16,17,18], which are constructed from snapshots in locally-randomized measurement bases. In the context of digital quantum simulators, the statistical analysis of classical snapshots is at the heart of the random circuit sampling problem, an ideal test of quantum computational advantage with noisy, intermediate-scale quantum hardware, which has been the subject of intense theoretical and experimental study in recent years [19,20,21,22,23,24,25]. Relatedly, methods based on the statistics of such classical snapshots have been proposed to quantify the many-body fidelity between two quantum states, allowing the benchmarking of quantum simulator performance [26,27].
In this work, we consider a novel nonequilibrium universality enabled by this measurement paradigm, which can be understood as a generalization of the concept of quantum thermalization in isolated quantum many-body systems-the process by which local subsystems approach thermal equilibrium despite the global dynamics being unitary and thus reversible [28,29,30,31]. Conventionally, such a phenomenon is explained by the fact that a local subsystem A typically becomes entangled with its surroundings B (the 'bath', which is taken to be large), over time. Ignoring the precise state of the bath, the state on A is then described by a statistical mixture of pure quantum states which replicates thermal equilibrium predictions on average (i.e., expectation values in an appropriate Gibbs ensemble), in accordance with statistical mechanical principles. However, the measurement capabilities of quantum simulators, as described above, motivate a refinement of the above concept: it is not necessary to assume that the bath is inaccessible; instead, one can study properties of the local subsystem A conditioned upon particular configurations of the bath B, information which is accessible from global measurement snapshots. In particular, one can imagine obtaining (classical) information of the state of B, which can then be used to index a pure quantum state on A arising from the collapse of the global wavefunction after the measurement; the collection of all such quantum states is called the projected ensemble [26,32].
Much of the interest in studying these ensembles lies in the possibility that they might exhibit a universal limiting form at late times. Indeed, it has been conjectured [26,32] that for ensembles built from quantum many-body states undergoing chaotic quench dynamics, and in the absence of conservation laws or at very high energy density (corresponding to infinite temperature), the limiting form should be the unitarily-invariant (i.e., Haar) distribution over the Hilbert space-i.e., any pure state should be as likely as any other. One can view this as a generalization of the fundamental statistical mechanical principle that a local subsystem tends to maximize its entropy subject to global conservation laws (none, in this case). For the projected ensemble, the entropy in question is that of the distribution of pure states over the Hilbert space. This condition can therefore be referred to as deep thermalization, as opposed to regular thermalization, which only constrains the behavior of ensemble-averaged observables, i.e., the distribution's mean.
That such universal distributions emerge was recently rigorously proven in a large family of models of quantum dynamics in one dimension [33,34]. Formally, the convergence to the uniform ensemble can be systematically probed by studying how well each statistical moment of the ensemble replicates that of the Haar distribution, a condition known in quantum information theory as forming an approximate quantum state k-design [35,36,37], where integer k ≥ 1 enumerates the statistical moments. It is interesting, then, to ask about the timescales t k for which a projected ensemble forms a good k-design in quench dynamics (within some fixed accuracy, captured by an appropriate distance measure), and the physical processes which govern these time scales. Note that by construction, the design times t k are monotonically increasing (t k+1 ≥ t k ), and that t 1 is the regular thermalization time. Ref. [33] found that t k = t 1 for all k in a Floquet Ising model tuned to a selfdual point, a result which was later extended to dual-unitary circuits 1 under more general conditions [34,43]. Further, breaking of dual-unitarity in (1 + 1)-dimensional unitary circuits was shown to gap t ∞ away from t 1 in general, via a mapping to measurementinduced dynamical purification [34], which limits the transfer of information over long spatial distances in the system. Aside from these results, our understanding of how t k scales with k and with other parameters of dynamics in general is still rather limited.
Here, we consider a distinct and complementary mechanism that can gap t ∞ away from t 1 , having to do with local bottlenecks to the transfer of information in the system. To do so, we will focus on a model of chaotic quantum dynamics with minimal structure, meant to capture the basic feature of a spatially-local system: that the subsystem of interest A interacts with the rest of the universe (an unstructured 'bath') through the mediation of another subsystem (informally, its boundary). This minimal requirement is enough to give rise to an interesting separation between regular and deep thermalization times, with the model remaining simple enough to allow for exact analytical results using random-matrix methods.
We find that the projected ensemble on the subsystem of interest A, in the limit of an infinite random-matrix bath, almost always realizes the so-called Scrooge [26,44] or Gaussian Adjusted Projected (GAP) [45,46] ensemble-a maximally-entropic distribution of pure states consistent with a given density matrix, which is of independent interest in quantum information theory. We further compute statistical moments of this distribution and derive an exact form for the design times t k in our model. As a consequence of this result we show that (i) t k grows as log(k) for sufficiently small moments k, and (ii) this growth asymptotes at large k to t ∞ = 2t 1 . This represents the second analytically-solved model of deep thermalization, after the dual-unitary circuit model of Ref. [33], and the first to exhibit a nontrivial separation between distinct design times t k . We conjecture that the latter phenomenology should also be present in systems in any dimension as long as they harbor local interactions, and whenever thermalization is approximate, i.e., when ρ A is close to, but not exactly equal to, a thermal equilibrium state (dual-unitary circuits are exceptional in this sense as thermalization there is exact [47]). Thus, our findings provide a better understanding of the physical phenomenon of deep thermalization beyond the specific setting of the model studied in this work.
The paper is structured as follows. In Sec. 2 we briefly review key concepts and introduce our model of dynamics. An analytical solution for the design times is derived in Sec. 3 and is tested against exact numerical simulations in Sec. 4. Finally, we summarize our results and discuss directions for future research in Sec. 5.

Setup
In this section, we first present a brief review of the object of interest, the projected ensemble, as well as the quantum information-theoretic formalism used to characterize it, namely quantum state designs (Sec. 2.1). For a more detailed review, see Refs. [32,34]. We then define the specific model of dynamics considered in this work (Sec. 2.2).

The projected ensemble and quantum state designs
Consider a pure quantum state |Ψ AB on a bipartite system AB. An ensemble E of pure states on A, called the projected ensemble 2 [32], can be naturally defined from it via the outcomes of projective measurements on subsystem B: (1) Here d B is the Hilbert space dimension of B, z labels an orthonormal basis 3 of B, p(z) is the probability of obtaining the outcome z upon measuring B, and |ψ z A is the postmeasurement pure state of subsystem A, conditional on the outcome z. Explicitly, these are given by If |Ψ AB is a typical state in the many-body Hilbert space H A ⊗ H B , for sufficiently large d B the projected ensemble can be shown to acquire a universal form with high probability, namely its states are distributed according to the Haar (i.e., unitarily-invariant) measure on A [32]. The same behavior was conjectured to emerge at late times in quantum chaotic many-body dynamics from simple initial states without conservation laws or at infinite temperature, hence representing a novel kind of nonequilibrium universality. This was supported by extensive numerical results [32]; later, rigorous exact results were obtained for dynamics in (1 + 1)-dimensional non-integrable dual-unitary circuits [33,34,43]. We note the projected ensemble is not only a theoretical construct, but is in fact also accessible in many present-day experimental platforms of programmable quantum simulators, owing to their ability to obtain site-resolved measurement data of the global system: by classically post-processing the data (for example, analyzing the distribution of bit-strings corresponding to measurement outcomes of A, correlated to a particular measurement outcome of B), one can probe the statistics of the projected ensemble [26].
As mentioned in the introduction, the approach of the distribution of states in the projected ensemble to the uniform distribution can be characterized in a systematic fashion: one can ask about the extent to which the projected ensemble E reproduces the kth moment of the an ensemble of states drawn from the Haar measure. If the agreement is perfect, E is said to be a quantum state k-design, a concept from quantum information theory which has significant relevance in applications like randomized benchmarking and tomography [17,18,35,50,51,52]. By allowing for a finite error > 0 in agreement (quantified by a distance measure, introduced below), this can be extended to the notion of an approximate quantum state k-design.
Concretely, one can construct "moment operators" from E, which are density matrices on a k-fold replicated Hilbert space H ⊗k A , defined for all integer values of k ≥ 1: (we omit the subscript A for brevity). It is worth emphasizing that ρ (k) = ρ ⊗k (i.e. the k-th moment operator for the ensemble E is not equal to k copies of the density matrix ρ = z p(z) |ψ z ψ z |), and in fact ρ (k) is generally not a function of ρ alone; in other words, the ensemble E genuinely contains more information than its associated density matrix ρ.
For the particular ensemble of states distributed according to the Haar measure, the moment operators read whereσ is an operator that shuffles the k replicas of the Hilbert space according to a permutation σ ∈ S k , andΠ symm = 1 k! σ∈S kσ is the projector on the permutation-symmetric sector of H ⊗k A , which has dimension k+d A −1 k . The condition of E being a quantum state k-design is equivalent to the statement that ρ (k) = ρ (k) H . As the moment operator is a density matrix, we can quantify the distance between two moment operators ρ 1 . In this work, we will find it convenient to consider a distance measure based on the Frobenius norm ∆ (k) = 0 is then equivalent to E being a quantum state k-design, while ∆ (k) ≤ is then equivalent to E being an -approximate quantum state k-design. A closely related quantity is the frame potential of the ensemble, defined from the purity of the moment operators: For the Haar ensemble, the frame potential is F , may be rewritten in terms of the frame potentials of the two ensembles (projected and Haar) as This directly implies F (k) ≥ F H . Now, consider a quantum many-body state |Ψ AB which arises in quench dynamics, beginning from a disentangled state. We can define the time taken for its projected ensemble to form an -approximate state k-design, i.e., the minimum t such that ∆ (k) (t) ≤ for some arbitrary small threshold . This defines the design times t k for each k. Due to the fact that ∆ (k+1) ≥ ∆ (k) as shown in Ref. [34], these times are non-decreasing in k, t k+1 ≥ t k . For k = 1 one recovers the 'regular thermalization' time, since the first moment operator is simply the reduced density matrix of A: ρ (1) = Tr B |Ψ Ψ| = ρ A ; thus t 1 is the time taken to achieve ρ A ρ (1) i.e., thermalization to infinite temperature 4 . The design times with k > 1 may be viewed as probing timescales associated with stronger forms of thermalization: equilibration not just of (averaged) expectation values, but also of statistical distributions of the conditional wavefunctions |ψ z themselves, which we may call deep thermalization. In particular, t ∞ ≡ lim k→∞ t k (which exists due to monotonicity of k, though a priori may be infinite) is the time taken to completely reproduce the Haar measure. Such deep thermalization is genuinely distinct from (and stronger than) conventional thermalization: it is fully possible to achieve the latter but not the former. As a paradigmatic example, a maximally-entangled "EPR pair" state |Ψ AB = 1 is very far from the Haar measure 5 .

Model
A key feature of any geometrically local, short-range interacting system (which many physical systems are) is that a contiguous subsystem A exchanges information with the rest of the system through the mediation of other local subsystems. For example, a ball of radius ξ in a d-dimensional lattice, {r ∈ Z d : |r| < ξ}, is surrounded by a shell {r ∈ Z d : ξ < |r| < ξ } (ξ being another arbitrary radius > ξ) which contains a finite number of degrees of freedom 6 ; with short-range interactions, any exchange of information with the outside world (|r| > ξ ) must necessarily be mediated by this shell, which may act as a bottleneck and slow down the dynamics (compared e.g. to a non-local, all-to-all interacting system).
Here we consider a minimal example of a system featuring this mechanism, with a single "local bottleneck" mediating interactions between a finite subsystem of interest and the rest of the system, with the latter harboring nonlocal, all-to-all interactions. Our aim is to understand the effect of such locality constraints on the projected ensemble and on the emergence of higher state designs, i.e. deep thermalization. Concretely, we consider a bipartite system AB, where B is the "bath" to be measured in order to generate the projected ensemble at A. In addition, we partition B into two subsystems B 1 and B 2 . A, B 1 and B 2 have Hilbert spaces of dimension d A , d B 1 and q, respectively. Subsystem A interacts only with degrees of freedom in B 1 , while B 1 and B 2 interact strongly and without spatial structure. This is shown schematically in Fig. 1(a).
The system is initialized in a disentangled state |0 A |0 B 1 |0 B 2 and subsequently evolves under a unitary circuit comprised of alternating gates on AB 1 The gates U t , V t are sampled from the Haar measure on the respective unitary groups (this means there is no additional locality structure inside the subsystems). We define a time step to consist of a unitary (U t ⊗ I B 2 )(I A ⊗ V t ); the system is evolved for T time steps (t = 0, . . . T − 1) and also by an additional gate (I A ⊗ V T ), following which B is measured in an orthonormal basis {|z : z = 1, . . . d B 1 q}, inducing a projected ensemble at A. This is illustrated in Fig. 1 At this point it is worth remarking on different sources of randomness in the problem. We wish to characterize the universal distributions on A induced solely by randomness in 5 It is easy to see that the ensemble's frame potential is d −1 A , independent of k. Thus ∆ (1) = 0 but e.g. ∆ (2) = (dA − 1)/2 ≥ 1. 6 In fact, there are infinitely many such shells, with increasing radii ξ , ξ , . . . , giving a hierarchy of potential bottlenecks that may prevent or slow down the transmission of information over long distances; Ref. [34] studies an example of this mechanism in (1 + 1)-dimensional quantum circuits, where the loss of information about measurement outcomes over long distances limits the effective size of the projected ensemble and slows down deep thermalization. Here instead we consider the effect of a single local bottleneck. The bond connecting the two blocks has dimension d C = d 2T B1 . One tensor can be viewed as a state |φ z of subsystem C (green box, right), while the other is a map E from C to A (blue box, left). We analyze the two parts in Sec. 3.1 and 3.2 respectively. the measurement outcomes at B (that is, for a fixed realization of random gates {U t , V t }). However, in our calculations we will utilize averages over the gates {V t }. Note though that this is purely a technical tool to simplify calculations: we will find that the behavior of the (gate-averaged) projected ensemble is typical, allowing us to make statements about the behavior of the projected ensemble for almost all circuit realizations.
Finally we remark on the structure of our toy model and how it relates to more realistic chaotic dynamics. Informally, subsystem B 1 plays the role of a local bottleneck that mediates correlations between A and the rest of the universe, B 2 , which is a large, unstructured random-matrix environment whose dimension q will be taken to infinity. Clearly, this is a very favorable scenario for the generation of randomness on A: realistic local models in d-dimensional space will generically feature additional "bottlenecks" at all length scales, which we disregard here by taking a single bottleneck and an unstructured random environment. For this reason, this serves as a useful minimal example of the effects of locality on deep thermalization: it is natural to conjecture that the exact results derivable in this model may turn into bounds for more realistic and structured models of dynamics.

Analytical solution
In this Section we derive the main results of our work: an analytical computation of the distance measures ∆ (k) for the model of dynamics defined in Sec. 2.2, leading to exact expressions for all design times and thus for deep thermalization, in the q → ∞ limit. Our strategy is to divide the problem in two parts, as sketched in Fig. 1(c): first a treatment of the environment B 2 in the large-q limit (Sec. 3.1) and then an analysis of subsystem A through a replica method (Sec. 3.2). The results are then unpacked in Sec. 3 Figure 2: (a) Section of the quantum circuit in Fig. 1(c) representing the environment. The tensor enclosed in the green dashed box can be viewed as a state |Φ defined on systems B (top legs) and C (left legs). Upon projecting system B on a basis state |z , one obtains a state |φ z on the bond space C, which has dimension d C = d 2T B1 (T = 3 in this example). (b) Tensor network contraction representing the pseudo-frame potential F (n,k) , Eq. (10). R = 2(n + k) is the total number of replicas. Semicircles represent operators acting on R replicas of the Hilbert space, e denotes the identity and Q is as in Eq. (11). (c) The same tensor network contraction upon averaging V 0 , . . . V T over the Haar measure on (14)), the inner product vanishes unless the permutation σ takes the form τ π 1 π 2 , with τ a transposition of the central 2k replicas as shown and π 1,2 ∈ S n+k arbitrary.

Environment in the large-q limit
Here we show that, in the large-q limit, subsystem B (the "environment" with which subsystem A interacts, and which is measured at the final time) can be replaced by a Haar-random state in the 'bond space', i.e. the temporal cut 7 C separating A from B.
The circuit setup is illustrated in Fig. 2(a), where the environment (B, comprised of B 1 and B 2 , with dimensions d B 1 and q respectively, see Sec. 2) is initialized in |0 and measured at the end to find computational basis state |z (z ∈ {1, . . . d B 1 q}). This gives a state |φ z at the timelike subsystem C, which is made of 2T qudits of dimension d B 1 each (T is the circuit depth as defined earlier), giving a total dimension d C = d 2T B 1 . We may view this state as belonging to a projected ensemble on C from a bipartite state |Φ BC , where B is being measured. Concretely, let us introduce the unnormalized states |φ z C ≡ B z|Φ BC , with |Φ the tensor in Fig. 2(a), which obeys 8 Φ|Φ = d 1/2 C . Then, the ensemble at C is given by Note the normalization of p(z) follows from z p(z) = d In what follows, we are going to show that for q → ∞, E C approaches an ensemble of states distributed according to the Haar measure on H C . To do so, we aim to compute the frame potential: (9) To make progress, we use a replica technique [43] to define a quantity F (n,k) which reduces to the frame potential F (k) upon taking the "replica limit" n → 1 − k: where R = 2(n + k) denotes the total number of replicas. The F (n,k) quantity, represented as a tensor network contraction in Fig. 2 These tensor products can be averaged exactly [63], and to leading order in q → ∞ one gets (12) where Wg d (σ, R) is the Weingarten function [63] for R replicas of a d-dimensional Hilbert space evaluated on a permutation σ ∈ S R , e ∈ S R is the identity permutation, andσ represents an operator that shuffles the R replicas according to a permutation σ:σ |i 1 , .
Additionally, here and in the following we use a standard operatorstate mapping notation where an operator O = a,b O ab |a b| defines a state |O) = ab O ab |a ⊗ |b on a doubled Hilbert space, with inner product (O|O ) = Tr(O † O ). Thus e.g. |σ)(σ| in Eq. (12) is a super-operator whose action is defined by O →σTr(σ † O).
The computation of F (n,k) , Eq. (10), features T + 1 uncorrelated gates V t , all of which are independently averaged according to Eq. (12). We have that Wg d B 1 q (e, R) = (d B 1 q) −R to leading order in large q, contributing an overall prefactor of (d B 1 q) −R(T +1) . In addition, the averaging produces T inner products between permutations (acting on B 2 only), as sketched in Fig. 2(c). We have (σ|π) = q |σ −1 π| where | · · · | denotes the number of cycles in a permutation. To leading order in large q, this forces all permutations to be the same: if σ = π we have q |σ −1 π| = q |e| = q R ; otherwise σ −1 π has strictly fewer than R cycles giving a subleading contribution. Combining all these inner products yields a factor of q RT .
In all, we obtain where the three factors inside the summation arise from contraction ofσ with the circuit's boundary conditions at the initial time (|0 ), final time (operator Q from Eq. (11), representing projection on all possible z 1 , z 2 outcomes on various replicas), and spatial edge (tracing out C) respectively, as illustrated in Fig. 2(c). For the first factor, we note that (as |0 ⊗R is permutation-invariant) 0 ⊗R σ 0 ⊗R = 0|0 R = 1. The second factor is sketched diagrammatically in Fig. 2(d). The summation over z 1,2 (see Eq. (11)) breaks up into two pieces: d B 1 q terms with z 1 = z 2 , and (d B 1 q) 2 − d B 1 q terms with z 1 = z 2 . There is no further dependence on q, so in the q → ∞ limit we can focus on the latter terms. We get where we have inserted 0 and 1 as arbitrary (orthogonal) computational basis states on B.
The term 0 ⊗n 1 ⊗k 1 ⊗k 0 ⊗n σ 0 ⊗n 1 ⊗k 0 ⊗k 1 ⊗n serves to restrict the summation to permutations in the form σ = τ π 1 π 2 , where τ is a transposition of the central 2k replicas, while π 1,2 act on the first and last n + k replicas, respectively, as illustrated in Fig. 2(e). One can see that any permutation not in this form would produce an inner product 0|1 = 0.
In conclusion we arrive at where in the last line we have traced out the 2n replicas (on which the transpositionτ does not act) obtaining (ρ (k) H,C ) ⊗2 , and then noted that Tr(τ A ⊗2 ) = Tr(A 2 ). Finally, having obtained an analytical function of R, we continue it to R = 2(n + k) = 2 (i.e. n = 1 − k), which gives the desired result: lim Given that F (k) ≥ F (k) H in each realization (following Eq. (7)), we can conclude that F (k) = F (k) H almost always (over the Haar measure for gates V t ). This means that statistically speaking, we can replace the entire subsystem B with a Haar-random state on the temporal cut C in order to analyze the formation of state designs on A.
Some comments are in order here. First, we note that the same result holds in dualunitary circuits with suitable initial states and final measurement bases in 1+1 dimensions [33,34,43]: there, too, one obtains the Haar distribution in the bond space C in the limit of an infinite bath, N B → ∞ where N B is the number of degrees of freedom in B (the corresponding thermodynamic limit in our case is q → ∞). In that case, the Haar distribution emerges essentially from a deep random unitary circuit running in the space direction; here instead it emerges from measuring part of a given highly-entangled state |Φ BC , similarly to the case of a global Haar-random state |Ψ AB studied in Ref. [32]. Secondly, we note that realistic models of dynamics may fail to produce Haar-randomness in the bond space (C in our case); this is indeed the subject of Ref. [34], where it was shown that (1 + 1)-dimensional unitary circuits may fail to achieve this due to dynamical purification in the space direction, i.e., the loss of information about measurement outcomes over very long distances. Here, by taking an all-to-all interacting random-matrix environment, we have avoided any such issues. Therefore this is a favorable scenario for the formation of state designs at A, and we expect that results obtained in this model may provide bounds for more realistic short-range-interacting models.

Projected ensemble on A
With the above result we can greatly simplify the setup, by discarding the thermodynamicallylarge environment B 2 and focusing only on subsystem A and the temporal subsystem (or bond space) C, as sketched in Fig. 1(c). After observing outcome z on B, we have a state |φ z at C which gets mapped to A by a linear map 9 E. As we just saw, in the large-q (thermodynamic) limit the input state |φ z is effectively Haar-random on C, thus the question becomes: how well is this randomness propagated from C to A?
In the case of dual-unitary circuits in 1 + 1 dimensions, the randomness is propagated optimally: as long as d C ≥ d A , the Haar measure at C induces the Haar measure at A. This follows from the perfect thermalization of dual-unitary circuits [47]: because ρ A = I A /d A , E must act as an isometry on a subspace of H C (of the appropriate dimension d A ), which is enough to produce the Haar measure on H A [33]. However, for generic interactions (i.e. non-dual-unitary, or in higher dimension, etc), this is not the case: ρ A equals I/d A only approximately, and it is not a priori clear what the induced distribution of states at A looks like. This is the question we address next.
In terms of the states |φ z C = B z|Φ BC studied in Sec. 3.1, the projected ensemble at A takes the form It follows that the expectation of a function f (|ψ ) on the ensemble E is where we have introduced the normalized vectors |φ z = |φ z / |φ z . The result of Sec. 3.1 is that the ensemble in Eq. (8) is Haar-distributed in the thermodynamic limit q → ∞, i.e., with dφ the Haar measure and g any sufficiently regular function. With this substitution, Eq. (18) becomes which defines the limiting form of the projected ensemble at A: where again dφ is the Haar measure. The resulting ensemble, Eq. (21), is known as the Scrooge ensemble [26,44] or Gaussian Adjusted Projected (GAP) ensemble [45,46], which is the maximally entropic 10 ensemble of pure states compatible with a given reduced density matrix-in this case, As a technical remark, we note that Eq. (21) is seemingly different from the conventional construction of the Scrooge/GAP ensemble (as presented in [26,44]), as the former involves 9 The map E is obtained by combining the gates {Ut} as shown diagrammatically in Fig. 1(c). objects acting on two Hilbert spaces of different dimensions (A and C) while the latter only one. However, it can be straightforwardly shown that the two definitions are equivalent, see Appendix A.

Moments of the Scrooge/GAP ensemble near infinite temperature
To recapitulate, the analysis of Sec. 3.1 and 3.2 has shown that, in the large-q (thermodynamic) limit, the projected ensemble constructed from our model of dynamics realizes (almost always) the Scrooge/GAP ensemble, Eq. (21), induced by the appropriate density matrix, Eq. (22). With this result in hand, we are now in a position to study the formation of higher state designs at A. However, we remark that the derivation which follows is more general: it may be applied to the Scrooge/GAP ensemble built from any density matrix ρ on any system, as long as ρ is very close to thermalization at infinite temperature. Such an ensemble may potentially be obtained through other means (i.e., not by the projected ensemble on our model) and is an object of independent interest.
In order to characterize the emergence of quantum state designs, we aim to compute the kth moment operators for the ensemble in Eq. (21): At this point, for k > 1 the presence of a negative power 1 − k prevents us from directly performing the integral over the Haar measure dφ. This issue can be overcome by using a replica trick [43] analogous to the one in Eq. (10): we define a "pseudo-moment operator" which reduces to ρ (k) A for n = 1 − k. We take n > 0 in order to perform the calculation; at the end we will take the "replica limit" n → 1 − k and obtain the desired result. By using the fact that φ|φ = Tr |φ φ|, we may re-write this as where we have introduced n new replicas of A in addition to the k replicas that go into the definition of the k-th moment operator, and the trace is taken over these new replicas We can now perform the integral over the Haar measure dφ exactly: where ρ is permutation-symmetric: for any integer m > 0, we have Next, we use Eq. (22) to eliminate EE † from Eq. (26), obtaining where we have collected the awkward prefactors into a function g which can be seen to equal 1 for all d A , d C when n + k = 1, and can thus be safely dropped for our purpose. The expression Eq. (28) is noteworthy as it only involves the reduced density matrix ρ A and the moment operator of the Haar ensemble. It is clear that H,A , independent of n, and we have an exact quantum state k-design for all k, i.e., the exact Haar measure at A. Conversely, any deviation ρ A = I/d A (imperfect thermalization) directly maps onto a deviation between the kth moment operators. This deviation is what we aim to quantify next.
Let us assume that ρ (1) where µ is a Hermitian operator that represents the slowest observable in the process of relaxation towards equilibrium at infinite temperature. By definition, we have At late times, when µ 2 is small, we can expand Eq. (28) in powers of µ: where µ j denotes a µ operator acting on replica A j only (with I on all other replicas). Focusing on the linear term, we can split the sum into two parts: • Terms with j > n: tracing out A 1 , . . . A n directly gives ρ (k) H,A k j=1 µ j ; • Terms with 1 ≤ j ≤ n: due to permutation symmetry these n terms are all identical, giving n contributions of The latter can be computed as follows. Decomposing ρ H,A into permutations σ ∈ S k+1 , we find that permutations with σ(k + 1) = k + 1 give a contribution ∝ Tr(µ) = 0 (µ is traceless by definition). On the other hand, permutations with σ(k + 1) = j ≤ k give a contribution ∝ µ jπ , where π ∈ S k is a permutation induced on the remaining k replicas 11 . We have 11 There is a one-to-one correspondence between π ∈ S k and σ ∈ S k+1 with the constraint σ(k + 1) = j.
Adding both contributions, we conclude At this point, all the n auxiliary replicas have been traced out, and the parameter n appears only as a prefactor. It is possible to take the replica limit 12 n → 1 − k, which gives This is one of the central results of our work, directly connecting imperfect thermalization (quantified by µ) to the discrepancy between k-th moments of the projected and Haar ensembles.
With this result in hand, we may now evaluate the distance measure Eq. (5): where we used the facts that i µ i commutes with ρ H,A (being permutation-symmetric) and that (ρ H,A in order to simplify the result. The summation over replicas i, j breaks up into two parts: • Terms i = j give k(k − 1) identical contributions of Tr(ρ (2) H,A µ 1 µ 2 ); using ρ (2) H,A = e+τ d A (d A +1) (where e, τ ∈ S 2 are the identity permutation and the transposition, respectively), and the fact that µ is traceless, we get Putting it all together, and using Eq. (29) to eliminate µ 2 , we arrive at our main result:

Scaling of design times
With the result above, Eq. (35-36), we are now in a position to discuss the scaling of design times in the projected ensemble for our model. We begin by observing that, by the definition Eq.
A is the second Renyi entropy of subsystem A, measured in bits. In chaotic dynamics starting from a disentangled state, we expect the entropy to grow linearly in time [64], S 2 (A) ∼ v E t (the proportionality constant v E is known as the entanglement velocity), until near saturation to the maximal value log 2 (d A ) = N A . At that point, achieved when t ≈ t * = N A /v E , we expect a crossover and an exponential convergence to the asymptote: In models described by the statistical mechanics of an entanglement membrane [65,66,67,68,69], one expects that the two velocities governing the early-time ballistic growth and the late-time exponential convergence should in fact be the same, v E ≡ v E , related to the surface tension of the entanglement membrane 13 . Our model also displays this behavior in the q → ∞ limit: as shown in Appendix B, we have We note that log 2 (d B 1 ) plays the role of |∂A|, the size of the boundary of subsystem A (i.e., the number of qubits with which A interacts directly). In a spatially local system, one indeed expects v E ∝ |∂A|, and t * ∝ |A|/|∂A| ∼ ξ if A is a ball of radius ξ in a d-dimensional lattice. Motivated by these considerations, in the following we use v E for the late-time rate of convergence of S 2 towards N A ; however this is a slight abuse of notation, and one should be mindful that this identification need not work in general, e.g. in integrable systems, dual-unitary circuits, etc. As a consequence of Eq. (37) we have, at late times, and thus the thermalization time, defined with respect to an arbitrary threshold 1: By using our main result Eq. (35-36) we can carry out the same reasoning for the k-th moment, which gives We have thus obtained the exact design times for this problem, for all k ≥ 1, in the q → ∞ limit: This is the second exact calculation of design times in a model of quantum dynamics (after the self-dual kicked Ising model studied in Ref. [33]), and the first to show a separation between distinct design times. In the following we comment on the scaling of design times with k and d A in two regimes of interest.
Large d A . If k d A , the design times from Eq. (41) approximately reduce to showing a nontrivial growth with k. However this growth is only logarithmic in k (in other words, the successive k-designs are formed extremely quickly in dynamics, with k(t) ∼ exp(t)). It is interesting to ask how this compares to other models. Numerical results on chaotic Hamiltonian dynamics at infinite temperature [32], while in a limited dynamic range (1 ≤ k ≤ 4), are suggestive of a stronger, possibly algebraic growth t k ∼ k α . Such a difference in scaling might originate from features of dynamics that are absent in our toy model: geometric locality and energy conservation, both of which could slow down deep thermalization. The latter in particular is known to parametrically change the scaling of the thermalization time (due to energy diffusion) and it is interesting to ask how it might affect deep thermalization.
Large k. The logarithmic growth of t k in Eq. (42) eventually arrests altogether when k ≈ d A . This is also the regime in which state designs for different k cease to be fully independent (i.e., for such a k, a k-design is automatically an approximate (k + 1)-design [52]). At large k we have where the approximation holds for d A 1. The time scale t ∞ governs the emergence of the Haar measure at A (fulfilling the state k-design condition for all k's). Remarkably, we see that (i) a uniform distribution of states (to within fixed accuracy ) is generated in finite time in this model, and (ii) in the limit N A → ∞ (taken after k → ∞) the emergence of the Haar distribution is governed by a renormalized entanglement velocity, v Haar ≡ v E /2. Appropriately, this takes longer than thermalization; precisely a factor of 2 longer in this model.

Numerical simulations
To test our analytical results, we perform exact numerical simulations of the circuit model. We aim to compute the ratio of design distances ∆ (k) /∆ (1) in order to confirm the relationship in Eq. (35)(36); the scaling of design times follows from the behavior of this ratio.
In order to maximize the accessible environment size, we choose subsystems A and B 1 of minimal size, d A = d B 1 = 2. We also set q = 2 L−2 for integer L so that the total Hilbert space corresponds to L qubits. We initialize a state |0 ⊗L and evolve it under the circuit in Fig. 1(b), where the gates U t and V t are sampled from the Haar measure on U (4) and U (2 L−1 ) respectively. After each time step T = 1, . . . t max = 20 we construct the exact moment operators ρ (k) for k = 1, . . . k max = 7 and evaluate the distance measures ∆ (k) (t). We repeat this process for many realizations of the Haar-random gates and average the results. See Appendix C for further details on the implementation and efficiency of this algorithm.
Numerical results for the gate-averaged ∆ (k) (t) are shown in Fig. 3(a-c), for systems of size up to L = 28 (i.e. q 6.7 × 10 7 ). Focusing first on k = 1, i.e. regular thermalization, we see an exponential decay followed by saturation to a minimum. This final plateau is seen to be a finite-size effect: it corresponds to the finite value one would obtain from a global Haar-random state on AB, which is ∼ 1/ √ q, as we explain in Appendix D. For q → ∞ the exponential decay would continue indefinitely. Moving now to k > 1, we see that the design distances ∆ (k) , while qualitatively replicating the behavior of ∆ (1) , weakly increase with k. The k-dependence is visibly slower in the intermediate-time exponential regime and faster in the late-time plateau. In Fig. 3(df) we analyze this k-dependence in detail, at different system sizes and times. For times  corresponding to the exponential-decay regime, we see an excellent agreement with the analytical prediction (dashed line). The quality of the agreement is limited by two factors: • At early time, ∆ (1) (T ) is not yet very small, so the O((∆ (1) ) 2 ) corrections in Eq. (35) are significant. Their effect can be seen clearly e.g. in Fig. 3(f), with curves for T = 1, 2, 3 showing progressively better agreement.
• At later time, finite-q effects become dominant. As soon as the ∆ (k) (T ) curves start peeling off from the exponential decay to settle onto the finite-size plateau value ∼ q −1/2 , the agreement with the prediction gets worse. The k-dependence crosses over to that of a Haar-random state, which in this case As a consequence of these two effects, the quality of the agreement with the analytical prediction is non-monotonic: it improves with time at first, and then becomes worse once finite-size effects become substantial.
However, in the q → ∞ limit, finite-size effects would disappear and we would recover the predicted behavior for arbitrarily late times. This is corroborated by the finite-size trends visible in Fig. 3(d-f): with increasing q, the time window of validity of our solution increases, and the agreement itself improves with time within this window. This represents strong evidence in support of our main analytical result, Eq. (35-36).

Discussion and Conclusion
We have presented a toy model of chaotic quantum dynamics where the process of deep thermalization can be studied exactly in a suitable thermodynamic limit. The simple structure of the model (a subsystem of interest A, a "bottleneck" B 1 , and an infinite bath B 2 ) enables this solvability while capturing a basic feature of spatially-local, short-range interacting systems in any dimension: that information may only be exchanged in and out of a subsystem via the mediation of other subsystems. This simple fact has important consequences on the process of thermalization and deep thermalization alike.
Our key result, in Eq. (35,36), shows how approximate thermalization, described by a nonzero distance ∆ (1) between the reduced density matrix ρ A and the infinite-temperature thermal state I/d A , translates to a larger distance ∆ (k) between the higher moments of the projected and Haar ensembles. In other words, the two ensembles become increasingly more distinguishable as one looks at higher moments of their distributions. A nontrivial growth of design times t k vs k follows: a logarithmic growth for k d A (d A being the Hilbert space dimension of A), saturating at a finite value t ∞ = 2t 1 . This is the second known exact solution for the design times and deep thermalization in models of chaotic dynamics (following the results of Ref. [33] on the self-dual kicked Ising chain and its generalization to dual-unitary circuits in one dimension [34,43]) and it is the first to exhibit distinct time scales t k . This crucially depends on imperfect (or asymptotic) thermalization, which is the generic fate of chaotic many-body systems. Dual-unitary circuits (with suitable initial states) are exceptional in this regard in that they instead thermalize perfectly in a finite time: one has ρ A = I/d A exactly after a finite time t 1 = N A in quench dynamics [47]. Thus they evade our present result and show a collapse of design times, t k = t 1 for all k.
Further, it is worth commenting on how our results compare to other bounds on deep thermalization that have been obtained in two settings. First, Ref. [72] proved that, if one chooses the measurement basis in the bath B Haar-randomly, then the projected ensemble yields an approximate state design with high probability whenever the reduced density matrix is close to thermalization at infinite temperature: ρ A ≈ I A /d A . More precisely, given a deviation from ideal thermalization δ = 1 2 ρ A − I A /d A 1 ≤ 1 2d A , one gets an -approximate state k-design with This bound is directly applicable to our model, as the final unitary gate V T Haar-randomizes the basis on the bath. We have found that our model realizes an -approximate state k- , where f is as in Eq. (36). This implies 14 < (2k + 1)d A δ; using the assumption that d A δ < 1, and this is seen to be consistent with the bound from Ref. [72], Eq. (45). On the other hand, the techniques we used here may be directly adapted to analyze the setting considered in Ref. [72], yielding = Θ(δ). This suggests that the bound in Eq. (45), which obeys = O( √ δ) for δ → 0, may not be tight. Secondly, Ref. [34] derived a bound on t ∞ in one-dimensional unitary circuits: where v p is an additional velocity scale (the "purification velocity"). The derivation of this bound is based on loss of information about measurement outcomes over long distances, and generally only applies in one spatial dimension. On the contrary, the exact results about our toy model characterize the effects of imperfect thermalization (ρ A = I A /d A , a strictly local condition) on the formation of higher state designs, in a situation that is arguably a best-case scenario for deep thermalization (a large, all-to-all interacting random-matrix bath, as opposed to a structured, geometrically-local one). Thus it is natural to conjecture that the results for this model might instead hold as bounds for more general models: e.g., we may conjecture that t ∞ > 2t 1 in geometrically-local models in any dimension. As the arguments leading to the two bounds are a priori independent, one would expect both of them to apply jointly in one-dimensional chaotic unitary circuits, giving t ∞ > max{2, v E /v p }t 1 . It would be interesting to further investigate the interplay between the two bounds.
Further, it is tempting to speculate that our result for the ratio ∆ (k) /∆ (1) may be optimal, i.e., that no ensemble consistent with a given density matrix ρ can approximate a k-design better than the Scrooge/GAP ensemble; this idea seems supported by the numerical results of Fig. 3(d-f), where the data never dips below the Scrooge/GAP prediction. Proving this statement is an interesting question for future work.
By a similar reasoning, our results for the design times t k may also represent bounds in more general settings-e.g., one may conjecture that the growth of t k with k should be at least logarithmic (for k d A ) in any local, non-perfectly-thermalizing model. This appears consistent with numerical results on Hamiltonian dynamics in spin chains [32], where in addition to locality one also has energy diffusion as a possible bottleneck. It would be interesting to understand the impact of conservation laws on deep thermalization, starting e.g. from quantum circuit models with U (1) charge conservation [73,74,75,76] or integrable circuit models [77,78,79,80]. A particularly tractable setting may be fermionic Gaussian dynamics, for which the existence of a limiting ensemble was recently shown [81].
Haar-random vector of size d A (i.e., a projector acting on a Haar-random state produces another Haar-random state of the appropriate size, up to normalization). We can thus replace the Haar measure dφ on C with the analogous measure on A, obtaining the usual ρ-distortion construction of the Scrooge/GAP ensemble.

B Thermalization of subsystem A
Here we derive the exact form of the purity of subsystem A as a function of time T , in the limit q → ∞, averaged over Haar-random gates U t .
Starting from the reduced density matrix ρ A as written in Eq. (22), we have This is represented diagrammatically in Fig. 4(a), where the boundary conditions are permutations of the two replicas, e, τ ∈ S 2 : the identity e before and after each gate on B 1 , and the transposition τ at the final time on A. We note that d −2 B 1 |e)(e| B 1 is a super-operator implementing the fully depolarizing (or erasure) channel on each replica of B 1 -i.e., it is the partial trace on B 1 followed by introducing the fully-mixed state This Markovian evolution is a consequence of taking the q → ∞ limit on B 2 , which introduces irreversibility in the local dynamics of A and B 1 .
We compute the purity, Eq. (46), averaged over the choice of gates {U t : t = 0, . . . T − 1} which make up E. Each U t is sampled independently and identically form the Haar measure. We compute the average iteratively, starting from the last gate, U T −1 . Acting on the operator |τ ) A ⊗ |ê) B 1 above it, the Haar-averaged gate yields This result admits a simple interpretation. At the final time T we have a "domain wall" boundary condition, with permutation τ in A and e in B 1 . This configuration is evolved backward in time and gives two possible configurations: one where the whole system is polarized along e, and one where it is polarized along τ . This structure of the Haar average for second-moment quantities was used previously to obtain exact results on entanglement growth, by mapping to a simple random walk for an extended 1 + 1-dimensional quantum circuit [59,65,82]. Moving down along the tensor network, we must contract the result of Eq. (47) with d −2 B 1 |ê)(ê| B 1 (i.e. the erasure channel on B 1 ), obtaining ). The first term, shown in Fig. 4(b), is identical to the initial boundary condition (a domain wall between τ and e) up to a prefactor; the second term, shown in Fig. 4(c), is a uniform |e) AB 1 state, which is invariant under the remaining backward evolution 15 . This is enough to derive the following recursion: e e e τ (a)

B1
e e e (c) A ) after evolution for time T = 3. The | · · · ) symbols denote operators on two replicas of the appropriate Hilbert space, with |0) = |0 0| ⊗2 (initial state on A), and |e) and |τ ) are replica permutations. Note that d −2 B1 |e) corresponds to two replicas of the completely-mixed state on B 1 , (I B1 /d B1 ) ⊗2 . Upon averaging U T −1 over the Haar measure one obtains two terms: (b) the same diagram as in (a), but with T → T − 1; (c) a diagram with a uniform |e) boundary condition at the final time, which simply computes the trace of the state and is trivial due to trace-preservation. This gives the recursion in Eq. (49).
The recursion is solved exactly (with initial condition P A (0) = 1) by This obeys several sanity checks: d −1 A < P A (T ) ≤ 1 (corresponding to a valid purity), P A (∞) = d −1 A (corresponding to the completely mixed state), and P A (T ) ≡ 1 for d B 1 = 1 (trivial subsystem B 1 , so that A is in fact isolated and remains pure).
For a large subsystem d A 1, Eq. (50) reduces to P A (T ) d −1 A + d −2T B . This has two limiting behaviors for the annealed average of the second Renyi entropy: which yields Eq. (37), proving our claim that the early-time slope of S 2 (A, T ) coincides with the time constant of the exponential approach to the asymptote S 2 (A) = N A . This result is also expected to hold more generally in extended chaotic systems where entanglement is described by a 'minimal membrane' picture [65,66,67,68,69]. There, the entropy is given by the free energy of a classical statistical system with a partition function Z = γ e −E(γ) . Here γ labels membranes that partition the spacetime volume into regions whose boundaries are A and its complement, and E(γ) is a suitable cost functional akin to a surface tension. In a local quantum circuit of depth T in arbitrary spatial dimension d, for a contiguous region A, there are two distinct kinds of membranes to consider: temporal ones, E(γ) ∼ |∂A|T , and spatial ones, E(γ) ∼ N A . The former are dominant at small T , giving ballistic growth of entanglement, motivating the definition of an entanglement velocity v E via E(γ) = v E T . The latter become dominant at large T , giving saturation to a volume-law entangled state. In general we have S − log 2 (2 −N A + 2 −v E T ), which, expanded at large T , gives Note however that exceptions to the above behavior have been recently discovered in finite one-dimensional random-unitary circuits with certain architectures and boundary conditions [70,71].

C Details on computational method
Here we present some details on the numerical method used to produce the data in Fig. 3. As explained in Sec. 4, we use d A = d B 1 = 2 and q = 2 L−2 , and we begin with the all-zero state |Ψ(0) = |0 ⊗L . There are two computational tasks to discuss: how to implement time evolution, and how to build the moment operator ρ (k) from a given state.

D ∆ (k) for Haar-random states
Here we derive the form of the design distance measures ∆ (k) for Haar-random states on a bipartite system AB and show that it matches the post-saturation behavior observed in our numerical simulations, Fig. 3. We employ the same replica trick as in Sec. 3.1 and compute the Haar-average of the pseudo-frame potential where |Ψ AB is the state of AB (to be averaged over the Haar measure) and ψ z A = B z|Ψ AB is the unnormalized post-measurement state of A. The operator Q B is as in Eq. (11) and R = 2(n + k) is the total number of replicas. On average we have Next, we have (Q|σ) B = d B + d B (d B − 1)δ σ,τ π 1 π 2 , where τ and π 1,2 are as in Fig. 2(e) and the two terms come from expanding Q into z 1 = z 2 and z 1 = z 2 terms. Following the same manipulations as in Sec. 3.1, one gets and thus, taking n = 1 − k (i.e. R = 2), Finally, we obtain Incidentally, we note that this identity may serve as an alternative approach to derive the results of Ref. [32] on obtaining an -approximate k-design from the projected ensemble on a Haar-random state with a given probability. Thus far we have made no assumptions on d A,B . Next, we plug in the values used in our numerical simulations, see Sec. 4: d A = 2, d B = 2q. We find (∆ (k) ) r.m.s. = [E(∆ (k) ) 2 ] 1/2 = 3k 4q + 1 This shows at once that we have a finite-size "floor" ∆ (1) ∼ q −1/2 , attained when the dynamics has successfully Haar-randomized the entire system AB, and that in this regime the k-dependence is ∆ (k) = √ k∆ (1) . Notably the ratio ∆ (k) /∆ (1) is unbounded, in contrast with our result Eq. (35) which for the same value of d A gives ∆ (k) = 3k/(k + 2)∆ (1) ≤ √ 3∆ (1) . This accounts for the distinct behaviors visible in Fig. 3(a-c) in the two regimes.