Emergent classicality in general multipartite states and channels

In a quantum measurement process, classical information about the measured system spreads throughout the environment. Meanwhile, quantum information about the system becomes inaccessible to local observers. Here we prove a result about quantum channels indicating that an aspect of this phenomenon is completely general. We show that for any evolution of the system and environment, for everywhere in the environment excluding an $O(1)$-sized region we call the"quantum Markov blanket,"any locally accessible information about the system must be approximately classical, i.e. obtainable from some fixed measurement. The result strengthens the earlier result of arXiv:1310.8640 in which the excluded region was allowed to grow with total environment size. It may also be seen as a new consequence of the principles of no-cloning or monogamy of entanglement. Our proof offers a constructive optimization procedure for determining the"quantum Markov blanket"region, as well as the effective measurement induced by the evolution. Alternatively, under channel-state duality, our result characterizes the marginals of multipartite states.


Introduction
By the monogamous nature of entanglement, a single quantum system cannot be highly entangled with many others. From a dynamical perspective, this monogamy constrains the spreading of information. The no-cloning theorem provides a simple example of such a constraint; more generally, quantum information cannot be widely distributed with high fidelity.
Constraints on information spreading also shed light on the quantum-to-classical transition. Many questions remain about precisely how and when classical behavior emerges from quantum many-body systems. When a small system interacts with a large environment, the environment often acts as a measuring apparatus, decohering the system in some basis. This paradigm is further elaborated by research programs on decoherence and "quantum Darwinism," describing how certain observables of the system are "selected" by the environment [1][2][3][4][5][6].
Brandão et al. [7] proved a powerful monogamy theorem constraining the spread of quantum information. In a sense elaborated in Section 6, they show that some aspects of the decoherence process must exist for any quantum channel. They consider general time-evolutions of a system A initially uncorrelated with a large multipartite environment B 1 ⊗ ... ⊗ B n . Their result states that for a large fraction of environmental subsystems B i , the only information about A that is accessible on B i must be classical, i.e. it must be obtainable from a fixed measurement on A. Crucially, they show that the relevant measurement on A is independent of the subsystem B i of interest. Thus the system A must "appear classical" to an observer at B i , in the sense that the only accessible information about A is classical.
However, the abovementioned result only constrains a large fraction of environmental subsystems. For a fixed error tolerance, the number of subsystems left unconstrained by the theorem increases arbitrarily with the total size of the environment. Intuitively, this growth seems to contradict the monogamy of entanglement, which suggests the fragment of the environment with non-classical information about A must have bounded extent. In other words, monogamy suggests the results of [7] can be greatly improved.
In this paper, we obtain this stronger constraint on quantum information spreading. Our Theorem 1 shows that for large environments, for everywhere in the environment excluding some O(1)-sized subsystem Q, the locally accessible information about A must be approximately classical, i.e. obtainable from some fixed measurement on A. This result corroborates the above intuition from monogamy. The statement is totally general, applicable to arbitrary quantum channels and quantum states. We call the excluded region Q the "quantum Markov blanket," or simply the Markov blanket, following the terminology in classical statistics [8].
The proof of our result may be framed constructively as an optimization procedure, allowing numerical demonstrations on small systems. The central idea of the proof is to imagine expanding a small region of the environment to gradually encompass the entire system. During this process, one learns gradually more about the input system A. Through a greedy algorithm, one calculates an optimized path of expansion that extracts the most information from A. By strong subadditivity, even an optimal path must reach some "bottleneck" such that further expanding the region does not yield additional information about A. Analyzing this bottleneck gives rise to the result. The simple mathematical argument is presented in Section 4, along with the path-based interpretation.
We also provide a numerical example involving a small spin chain in Section 5. Based on the proof of Theorem 1, our numerical algorithm identifies the quantum Markov blanket and the effective measurement induced on a subsystem by the dynamics.

Review
We briefly review quantum channels, channel-state duality, and measure-and-prepare channels. Readers familiar with this material may wish to skip to the results in Section 3, but the discussion relating static constraints like monogamy to dynamical constraints like nocloning may still be of interest.
Recall that quantum channels describe the most general time-evolution of a quantum system, including interactions with an environment. We denote a general quantum channel Λ from system A to B as a map Λ : D(A) → D(B), where D(X) generally denotes the space of density matrices on system X. Such a map is called a channel whenever it is completely positive and trace-preserving.

Channel-state duality
The channel-state duality allows one to associate every channel with an essentially unique state, called the Choi state. The correspondence defines a dictionary that translates between "dynamical" properties of channels and "static" properties of states.
In particular, given any channel Λ : where A is a reference system isomorphic to A. We define by acting Λ on subsystem A of an input state |Γ Γ| AA maximally entangled between A and A . Different choices of maximally entangled pure state |Γ yield different Choi states, related by unitaries on A . 1 From the Choi state, we can recover the action of the channel as follows. It is helpful to first choose bases; let |Γ AA be the maximally entangled state with respect to some orthonormal bases |i A , |i A . For any τ A ∈ D(A), define τ A ∈ D(A ) so that τ A and τ A are given by the same matrix in the |i A and |i A bases, respectively. Then we can recover the channel from the Choi state using the formula where the transpose is taken in the |i A basis. Choi's theorem states that a linear map Λ : D(A) → D(B) is a channel iff the corresponding Choi operator ρ Λ A B is a quantum state with Tr B (ρ Λ A B ) maximally mixed. This correspondence is also called the Choi-Jamiolkowski isomorphism; see [9] for an extensive elaboration.
The channel-state duality allows one to relate dynamical and static properties. The dynamical properties of a channel, characterizing information transfer from input to output, become static properties of the Choi state, characterizing correlations between the input (or rather the reference system) and the output.
Constraints on dynamical properties of channels therefore entail constraints on correlation properties of states, and vice versa. The equivalence of no-cloning and monogamy of entanglement provide a simple example. Because our main results constitute a more elaborate example, we explain this simple example first.
Consider a hypothetical cloning channel Λ : For Λ to properly clone, we demand that Λ B 1 and Λ B 2 are identity channels. However, under channel-state duality, reduced channels correspond to reduced states, and identity channels correspond to maximally entangled states. So the Choi state ρ A B 1 B 2 must have A maximally entangled with both B 1 and B 2 . Hence the the no-cloning theorem (forbidding perfect cloning) automatically implies a simple monogamy theorem (forbidding maximal entanglement with two different systems), and vice versa. 2

Measure-and-prepare channels
An important type of channel for the subsequent discussion is the "measure-and-prepare" channel. Such a channel takes the form for some states {σ α } and for some operators {M α } that form a positive operator-valued measure (POVM), i.e. M α > 0 and α M α = 1. Such a channel has the physical interpretation of performing a generalized measurement with some POVM {M α } and then preparing a state σ α determined by the measurement outcome α. Note the states σ α are not required to be orthogonal, and they may even be identical, in which case the channel is constant and transmits no information about the hypothetical measurement outcome. An important special case of measure-and-prepare channels is a "quantum-classical" channel. Such a channel takes the form for some POVM {M α } and orthonormal basis |α . (In what follows, a "quantum-classical channel on system A" refers to such a channel from A to some auxiliary system spanned by |α .) Likewise, a "classical-quantum" channel takes the form ρ → α Tr(ρ|α α|)σ α . A measure-and-prepare channel may then be seen as a quantum-classical channel (the "measurement") followed by a classical-quantum channel (the "preparation"). A channel is measure-and-prepare iff it is "entanglement-breaking," i.e. if it produces a separable state whenever it acts on one half of an entangled pair. Relatedly, a channel is measure-and-prepare iff the Choi state is separable [10,11]. For measure-and-prepare channels expressed as above in Eqn. 2, the Choi state takes the form (up to change of basis on the reference system) where The expression is arranged so that the coefficients p α form a probability distribution and the operators ρ α are normalized states. We say two measure-and-prepare channels can be written using the same measurement if they use the same POVM {M α }. Likewise, we say two separable states ρ AB 1 and ρ AB 2 can be written using the same ensemble of states {p α , ρ A α } on A if they both take the form for some choice of states σ B 1 α and σ B 2 α . These notions are equivalent under channel-state duality. Note that a single measure-and-prepare channel may sometimes be written using two different measurements, and likewise a single separable state may be written using two different ensembles.
The main result of this paper is similar in spirit to a no-cloning or monogamy result, and likewise by the channel-state duality it will have nearly equivalent dynamical and static formulations, constraining either the dynamical properties of channels or the correlation properties of states.

Main result
As discussed in Section 2, channel-state duality allows the result to be formulated as a statement about either channels or states. We first describe Theorem 1 for channels, because it is more directly related to the emergence of effective classicality described in Section 6.1. (The logic of the proofs, however, begins with Theorem 2 for states.) Theorem 1 considers arbitrary channels with many outputs, and it characterizes the reduced channels onto small subsets of outputs. It states that for all small subsets of outputs except those overlapping some fixed O(1)-sized excluded subset, the corresponding reduced channels are measure-and-prepare, and moreover they use the same measurement. We denote this excluded region Q, or also the "quantum Markov blanket." (The term "Markov blanket" follows terminology in classical statistics [8].) The result strengthens Theorem 2 of [7].
such that for all output subsets R of size |R| disjoint from Q, we have using a measure-and-prepare channel and ... is the diamond norm on channels. 3 The measurement {M α } does not depend on the choice of subset R, while the prepared states σ α R may depend on R.
This theorem is illustrated in Fig. 1.
Variations of Theorem 1. A stronger version of Eqns. 5, 6 in Theorem 1 uses a variant of the diamond norm. This version is described in Eqn. 24 and 25. The following variation 3 The diamond norm on channels N : is weaker than Eqn. 24 but perhaps simpler: one can replace Eqns. 5 and 6 with Here and always in this work, · 1 denotes the Schatten 1-norm. Note the superior dependence on d A above compared to that of Eqn. 5, while constraining a weaker norm (when interpreting the entire LHS above as a norm). We can also offer the following variation on Eqn. 5: for any sizes |R|, |Q|, there exists region Q of size |Q| and POVM {M α } such that for all regions R of size |R| disjoint from Q, and Ω d A ,d R is the dimensional factor from Eqn. 31, Restricting the RHS above to d 2 A , we recover Eqn. 5.
factor Ω d A ,d R arises from the relation between the "one-way LOCC" norm and the (Schatten) 1-norm; see more discussion surrounding Lemma 1. This dimensional factor may be sub-optimal and improved by future work.
As a final variation, in Eqn. 8 and related expressions, we can also write a slightly tighter upper bound by using the replacement where · is the integer floor function. We use the simpler (but looser) bound only for readability; note we will need |R|/|Q| 1 for the theorems to be useful, in which case the above replacement is only a small improvement.
Remarks on Theorem 1. We refer to Q as O(1)-sized because for a fixed size |R| and error tolerance , the size |Q| depends on neither the total number of outputs nor the dimension of each output. That is, the upper bounds in Eqn. 5 and elsewhere do not depend on n or dim(B i ). Physically, the region Q is where any (non-negligible) locally accessible quantum information about A must be stored. Therefore by no-cloning or monogamy of entanglement, no quantum information about A can be locally accessible outside this region. Meanwhile, Q will also contain any locally accessible classical information about A. However, unlike the quantum information, the classical information may also be present in copies outside of Q.
An essential point is that the measurement {M α } in this theorem does not depend on R, so that apart from the O(1)-sized region Q, different "observers" in different parts of the system can only receive classical information about the input in the same "generalized basis," i.e. resulting from the same POVM on A. (The observers may also receive no information at all.) This supports the "objectivity" of the emergent classical description of quantum systems; see Section 6 for more discussion.
We now formulate the result for states rather than channels.
Measure-and-prepare channel Figure 1: Illustration of Theorem 1. The quantum channel Λ is shown acting on a state ρ A , with a partial trace over the complement of the output region R. For any R that does not overlap the "Markov blanket" Q, the reduced channel is approximately a "measure-and-prepare" channel. Importantly, the measurements M α on A are independent of the choice of region R.
for some choice of states {σ α R } α that depends on the choice of R. The ensemble of states {p α , ρ A α } does not depend on the choice of R. Here S(A) denotes the von Neumann entropy of ρ A , and the above "one-way LOCC norm" for bipartite states on AR is defined as with maximization taken over quantum-classical channels M R on R (see Eqn. 2 for a definition).
Variations of Theorem 2. Note we can also replace S(A) above with its upper bound For two bipartite states ρ, σ on AR, the above "one-way LOCC norm" [12,13], commonly denoted ρ − σ LOCC← or ρ − σ LOCC 1 , is related to the maximum probability of distinguishing between ρ and σ using local operations on A, R and "one-way" classical communication from R to A (but not vice versa). 4 It satisfies with Ω d A ,d R as in Eqn. 31, The above relation is introduced by Lemma 1 in Appendix A. In the context of Theorem 2, we can then conclude Finally, we also have the slight strengthening noted in Eqn. 9. The state version of the theorem implies the channel version, by applying the state version to the Choi state of the channel. Conversely, the channel version can only be used to directly prove the state version for states that have maximally mixed marginal on A, otherwise the state is not the Choi state of a channel.

Proofs
The proof builds on methods developed in [7,14,15]. 5 First we will show Theorem 2 for states. Afterward, we will use channel-state duality to obtain the theorem for channels.
We make use of the (quantum) mutual information, defined for a state ρ on a system containing subsystems X, Y , as where S(·) is the von Neumann entropy. We will suppress the state in the subscript when it is clear from context. We also make use of the (quantum) conditional mutual information [16], defined for a state ρ on a system containing subsystems X, Y, Z, as which one reads as "the mutual information between X and Y , conditioned on Z." The quantity is always non-negative, and the non-negativity is equivalent to strong subadditivity [17]. Classically, the conditional mutual information quantifies how much information X and Y have about each other after conditioning on knowledge of Z. When the (quantum) conditional mutual information is small, the state on XY Z forms an approximate (quantum) Markov chain [18]. In that case, the conditioned region Z is sometimes referred to as a "Markov blanket" or "Markov shield." The Markov blanket protects X from direct correlations with Y (or vice versa) in the sense that X and Y are independent when conditioned on the Markov blanket. The region Q of our main theorems is precisely such a Markov blanket. In other words, the correlations between X and Y are (almost) entirely mediated by their separate correlation with Q, if the conditional mutual information The mutual information obeys a "chain rule" stating that for any state on subsystems X, Y 1 , ..., Y n , 5 The result in [7] might initially appear to have superior dependence on dA = dim(A) compared to Theorem 1, despite constraining fewer outputs. But a side-by-side comparison reveals our Theorem 1 actually has smaller error even for large dA. To make the comparison, first plug in |Q| = δn, where the LHS is in our notation and the RHS is in the notation of [7]). Then note that for the bound in [7] to be useful, one must also have n d 6 A log (dA).
which can be verified by the definition of conditional mutual information, using a telescoping sum. This simple equality may already be used to derive a monogamy result similar to Theorem 2 but not as powerful. First note the LHS of Eqn. 16 is upper bounded by 2 log(dim(X)), independent of n. Because each of the n terms on the RHS is positive and their sum has constant upper bound, most of them must be small. In particular, for any q, no more than q terms can be larger than 1 q times the upper bound. So all but q of the subsystems Y 1 , ..., Y n have the property that When X and Y i have low conditional mutual information conditioned on some third subsystem, they are close to separable. (More precisely, the above LHS upper bounds the "squashed entanglement" E sq (X, Y i ) between X and Y i , which is an entanglement measure defined using conditional mutual information. In [13] the authors demonstrate that states with small squashed entanglement are close to separable states, in the appropriate norm.) So for most Y i , the state on XY i is close to separable. The above statement is already close to the desired Theorem 2, but it is weaker in an important way. We want to prove not only that most reduced states on XY i are close to separable, but also that they are close to separable when using a fixed ensemble of states on X independent of Y i , in the sense of Eqn. 4. Equivalently, when we use channel-state duality to translate the claim to the channel setting, we want the reduced channels to be measure-and-prepare channels using the same measurement.
The result we need is stated below in Proposition 1, and it provides the core of the argument leading to Theorem 2 and then Theorem 1. Proposition 1 also enables our improvement over the analogous results in [7], by using a simpler and more efficient optimization.
where · is the integer floor function, and the maximum is taken over all quantum-classical channels M R on R.
We refer to the region Q as a quantum Markov blanket that "covers" or "shields" the region A. See also the discussion below Eqn. 15. Proposition 1 essentially states that A has small correlation with any region R when conditioned on some measurement M Q on some sufficiently large region Q, and moreover this measurement need not depend on the choice of R. 6 The region Q is "small," or rather O(1)-sized, in the sense that for fixed error tolerance (when viewing the RHS of Eqn. 18 as an "error," i.e. a deviation from zero conditional mutual information), Q does not scale with the total number of subsystems n.  (c)). The increase of mutual information in each step is given by some conditional mutual information, shown in panel (b), where the first red term corresponds to the first red arrow, and so on. These are positive by strong subadditivity. The proof of Proposition 1 considers the greedily optimized path, chosen by maximizing the terms in panel (b) from left to right. Because the mutual information has a constant upper bound, for a long enough path we are guaranteed to find a "bottleneck," where the conditional mutual information along any subsequent edge to any subsequent node must be small. Note that the mutual information is actually computed after applying a quantum-classical channel, which must also be optimized (as in the main text).
Proof of Proposition 1. A visual representation of the argument is sketched in Fig. 2 for the case of n = 4, |R| = 1 and summarized in the caption.
First, choose the region S 1 ⊂ {B 1 , ..., B n } of size |R| and the quantum-classical channel M S 1 on S 1 such that S 1 and M S 1 together maximize I(A, S 1 ) M S 1 (ρ) . Next, choose the region S 2 ⊂ {B 1 , ..., B n } of size |R|, disjoint from S 1 , and the quantum-classical channel M S 2 on S 2 such that S 2 and M S 2 together maximize the quantity Continuing, choose the region S 3 ⊂ {B 1 , ..., B n } of size |R|, disjoint from S 1 ∪ S 2 , and the quantum-classical channel M S 3 on S 3 so that S 3 and M S 3 together maximize the quantity The LHS of the inequality in Eqn. 19 has m terms, each of which is positive by strong subadditivity. Then the average term is at most m −1 S(A), and at least one of the terms must be less than or equal to the average. Denote this term the i th term. Then Moreover, by our construction of S i and M S i , these choices maximized the LHS above. So for any region R of size R disjoint from S 1 ...S i−1 , and for any quantum-classical channel M R on R, Letting Q = S 1 ...S i−1 , we have obtained the desired result, and |Q| ≤ |R|(m − 1) ≤ q by construction.
Proof of Theorem 2 for states. The proof of Theorem 2 for states now proceeds as follows. We begin with the setup and conclusion of Proposition 1. We conclude that for any q, there exists a region Q ⊂ {B 1 , ..., B n } of size |Q| ≤ q, along with quantum-classical channel M Q on Q, such that for all regions R ⊂ {B 1 , ..., B n } of size |R| with R ∩ Q = ∅, for all quantum-classical channels M R on R, Then we apply Lemma 2 from Appendix A to the state M Q M R (ρ AQR ) to conclude there exist probabilities p α and states ρ α with maximum again over quantum-classical channels M R on R. Note that the quantities p α , ρ α A produced by Lemma 2 depend only on ρ AQ and M Q , not on the choice of R.
We have nearly arrived at the conclusion of Theorem 2. Note that if |Q| < q, we can add q − |Q| arbitrary extra subsystems to Q so that |Q| = q. Then using this enlarged region, Eqn. 20 holds a fortiori for all R with R ∩ Q = ∅, and for simplicity we formulate Theorem 2 without the q parameter of Proposition 1.
Thus we arrive at the conclusion of Theorem 2 for states.
Proof of Theorem 1 for channels. for a maximally entangled state |Γ AA and reference system A isomorphic to A. Then apply Theorem 2 for states to this Choi state. We obtain We can relate the above one-way LOCC norm to the (Schatten) 1-norm using Lemma 1, to obtain and Ω d A ,d R is the dimensional factor from Eqn. 31, Eqn. 22 is almost the desired conclusion of Theorem 1; we just need to translate between Choi states and channels.
Recall from Section 2 that reduced channels correspond to reduced states of the corresponding Choi state, and measure-and-prepared channels correspond to separable Choi states. So the first term on the LHS above is the Choi state of the reduced channel Λ R , and the second term on the LHS is the Choi state of some measure-and-prepare channel. In particular, referring to Eqn. 3, the second term on the LHS of Eqn. 22 is the Choi state of the corresponding measure-and-prepare channel T . Now we just need to relate the (Schatten) 1-norm for Choi states to the diamond norm for the corresponding channels. For any channels N 1 , N 2 on A with corresponding Choi states ρ N 1 , ρ N 2 , a well-known lemma (see e.g. Lemma 6 of [7]) gives the relation Finally, Eqn. 8 directly follows from Eqn. 22 and the above translations between channels and Choi states. Restricting to Ω d A ,d B = d 2 A , the conclusion of Theorem 1 follows as well.
Note the additional factors of d A in Theorem 1 compared to Theorem 2. One d 2 A factor arose from the factor of d A in Eqn. 23. The other factors arose from the d 2 A , or more generally the Ω factor in Eqn. 31, stemming from Lemma 1.
Proof of further variations of Theorem 1. Alternatively, we can obtain a result for channels which avoids the factor of d 2 A or Ω noted above by translating directly from Equation 20. In that case, we can modify Theorem 1 for channels to conclude where we have defined a modified diamond norm, the "diamond norm restricted to one-way LOCC," defined for a channel N : D(A) → D(B) as with the maximization taken over quantum-classical channels M B on B. Note the advantage of this bound compared to the statement of Theorem 1 using the diamond norm: here we have only d A on the RHS rather than d 3 A . To interpret this norm, note that for two channels N 1 , N 2 , the distance N 1 − N 2 LOCC← measures the maximum distinguishability of N 1 , N 2 when feeding them some state ρ AA entangled with a reference system A and then using one-way LOCC on A and B to distinguish the two outputs, i.e. using only local operations on A , B and one-way classical communication from B to A . We then also have Applied to Eqn. 24, the above yields Eqn. 7 of Theorem 1.
In closing, we note that some more naive extensions of the proof methods in [7] would fail here, as described in the footnote. 7

Examples and numerics
Because Theorem 1 applies to any channel, it will be helpful to consider a few very different cases. Take A to be a single qubit, and take B to consist of n qubits B 1 , ..., B n . We discuss several simple examples before turning to a detailed numerical example.
• Let Λ : D(A) → D(B) be the constant channel that takes every input to some constant state on B. Then all the reduced channels are also constant, and moreover they are measure-and-prepare channels in a trivial sense: they can be expressed as any measurement on A followed by a preparation of some constant state, independent of the outcome of the measurement. Thus Theorem 1 easily holds, and in fact the approximation has zero error, and one could even take the excluded region Q to be the empty set.
• Let Λ be a Haar-random isometry. Then for A fixed and n large, the reduced channels on small subsets will be again be approximately constant channels. Thus the theorem applies as before.
• Let Λ faithfully transmit A to some B i , while preparing an arbitrary state on the remaining outputs. Then the reduced channel Λ B i is the identity channel, and the excluded region Q must consist of at least B i . The remaining reduced channels are constant channels, and thus the error in Theorem 1 is already zero for |Q| = 1.
Then every reduced channel Λ B i is a measure-and-prepare channel, measuring in the 0/1 basis and likewise preparing the 0/1 state. Thus the error in Theorem 1 is already zero for empty Q.
A final example will be demonstrated numerically. Consider a qubit A that couples to a spin chain environment E of n − 1 qubits, E = E 1 ⊗ ... ⊗ E n−1 . The qubit begins in 7 One might naively guess that Theorem 1 of [7] could be used to prove our Theorem 1 with the following trick. First apply the former theorem, which excludes some region Q that grows with n. Then because Q is large for large n, focus on the reduced channel to Q and apply the theorem to this channel alone. Iterate the result in this fashion until the remaining region Q is O(1)-sized. However, this method suffers two flaws. First, for a fixed error tolerance, more careful analysis reveals that the the final region Q will still grow with n, albeit more slowly. Second, each iteration of the theorem yields a new measurement for the measure-and-prepare channels, and these measurements will generally be different. an arbitrary input state ρ A , and the environment begins in some initial state |ψ 0 E . Then the joint system AE evolves unitarily under a joint Hamiltonian H AE for some time t. Coupling the extra qubit to the spin chain produces the channel If desired, one may re-label the systems to obtain Λ : D(A) → D(B 1 ⊗ ... ⊗ B n ), matching the notation of Theorem 1. For our numerical example, we take H AE to be the mixed-field Ising model with translation-invariant interaction term, with couplings chosen as in Eqn. 1 of [19], in particular with g = −1.05, h = 0.5, so that the Hamiltonian is chaotic and far from any integrable model. We take the initial environment state |ψ 0 E to be the ground state of the same Hamiltonian restricted to E. We choose H AE to have open boundary conditions: we attach a single extra qubit A to one end of an open spin chain with n − 1 qubits. Physically, we expect energy from A to flow into the cool environment E, so this example is more representative of diffusion than of a measurement process. However, it still illustrates the spread of information about A into E.
For short times, any information about A will be confined to a small effective lightcone near the end of the chain where A was attached. The interior of this light-cone will constitute the optimized Markov blanket Q, and the reduced channels A → E i for E i outside Q will be nearly constant. For longer times, the details depend on the dynamics of H AE , and a larger Q may be required to ensure the remaining reduced channels are close to measure-and-prepare. However, for fixed error tolerance, Theorem 1 guarantees |Q| will have some finite maximum extent, independent of the size of E. Thus Q need not grow forever, even as the light-cone increases, and even if the environment were arbitrarily large.
This example is depicted in Fig. 3. For each fixed t, and for each size |Q| = 1, ..., n, we construct an optimized Markov blanket Q of size |Q| and the associated optimal quantumclassical channel M Q , for the case of |R| = 1. The procedure for constructing Q is described in the proof of Proposition 1.
The construction involves an optimization over quantum-classical channels M R at each step. Here, we further restrict to simple projective measurements with rank-1 projections. Although this restricted optimization is not equivalent to an optimization over all quantum-classical channels, the result nonetheless implies the upper bounds of Theorem 1, because Eqn. 32 of Lemma 1 still holds for this restricted optimization. 8 We perform the optimization numerically with a naive global optimization algorithm.
In Fig. 3, for each Q, we plot the quantity 8 Restricting the optimization to projective measurements with rank-1 projections is most efficient, but it does not entail the same optimum. Alternatively, we can always restrict the optimization over quantum-classical channels, without loss, by restricting to quantum-classical channels with at most dim(R) 2 outcomes and that use only rank-1 POVM elements. To see this, first note that by convexity, an optimum will always occur on a so-called "extremal POVM," and these have at most dim(R) 2 outcomes [20]. Second, if any of the POVM elements of some optimum are not rank-1, the same optimum can be achieved with a rank-1 fine-graining, because the latter can be post-processed via coarse-graining into the original optimum, and the coarse-graining channel cannot decrease 1-norm.  For each t and |Q| = 1, ..., 8, we numerically calculate the optimal Markov blanket Q of size |Q|, which best mediates the correlations between the input A and the rest of the spin chain. For the present example, in each case we find the optimal Q consists of the |Q| contiguous qubits at the end of the chain where A was attached. For the optimal Q, we plot the quantity α Q of Eqn. 27, which has the interpretation of bounding the distance of the reduced channels (outside Q) to measure-and-prepare channels, as in Eqn. 28. We also plot the upper bound on α Q given by Proposition 1. The figure demonstrates that at later times, a larger Markov blanket Q is needed to ensure the remaining reduced channels are nearly measure-and-prepare. However, for fixed error tolerance, Theorem 1 guarantees |Q| to have some finite maximum extent.
where the maximum is over all regions R of size |R| = 1 disjoint from Q, and all quantumclassical channels M R on R (using only projective measurements). The channel M Q is the optimal quantum-classical channel obtained together with Q. The significance of the above quantity is that it upper bounds the distance of reduced channels Λ R to measureand-prepare channels. In particular, from the discussion around Eqn. 7, for all regions R of the fixed size |R| disjoint from Q, there is a measure-and-prepare channel E R with measurement independent of R such that max ρ∈D(A) Fig. 3 also includes the upper bound on α Q given by Proposition 1. Evidently it is not very tight, and so for this example the bound of Theorem 1 is not tight either. For other examples, it may be tighter. Promising directions for future work include exploring the tightness of the bound, improving the bound for general states and channels, or alternatively improving the bound by specializing to a natural class of dynamics relevant to many-body physics.

Further discussion
We proved Theorem 1 constraining the spread of quantum information in multi-output channels. Alternatively, Theorem 2 constrains the correlation structure of multipartite states. By constraining all but an O(1) number of subsystems, these results give a much stronger constraint than the result of [7], which inspired the present work.
To explicitly compare our Theorem 1 with the analogous result of [7], observe that their Theorem 2 becomes in our notation. Compare the above to our Eqn. 5, repeated for convenience, Their upper bound on |Q| evidently scales with n, whereas our Eqn. 5 is independent of n. While their bound may appear to have superior scaling with respect to d A , |R|, and , the bound is only useful when n > |Q|, in which case the RHS of Eqn. 29 is at least so in fact our bound is tighter in all regimes.

Emergent classicality
One significant motivation is to explain the emergence of the effective classicality of the quantum world, as discussed in [7]. An important ingredient in any such explanation is decoherence [21]. Suppose a previously isolated system A interacts with a large environment B. Trace out A and consider the resulting channel A → B. According to the standard narrative of decoherence, if the environment decohered the system, then any reduced channel A → B i must be measure-and-prepare, with the measurement taken in the "pointer" basis for A, determined by the details of the decoherence process [22]. Perhaps surprisingly, our results (beginning with those of [7]) demonstrate that an aspect of this classical structure exists in all large states and channels. Proceeding with the previous example, let us first examine a less interesting case. It is possible that after the interaction, A is maximally entangled with B 1 . In that case, there is little sense in which A has been robustly measured in some pointer basis: no systems other than B 1 have obtained any knowledge of A, so the information about A has not spread. Regardless, Theorem 1 holds. The more interesting application of Theorem 1 occurs when some information about A does become widely accessible to local observers B i in the environment. In that case, Theorem 1 states that the transmission of information A → B i to these observers may be approximated as the result of some observer-independent measurement on A. The POVM {M α } produced by Theorem 1 is effectively the pointer basis for this measurement process.
In discussions of decoherence in many-body systems, often a particular subsystem is identified as "the system," which is decohered by the remaining subsystems identified as "the environment." This distinction may depend on particular features of the dynamics. However, the authors of [7] point out that their results (and by extension ours) remove the need for a presupposed split between system and environment; instead, we can choose any subsystem as the input system and treat the remaining subsystems as the environment. Still, the decomposition of the total system into subsystems, including the decomposition of B into regions B 1 , ..., B n , may affect the POVM determined by Theorem 1, posing a question for future work.
The great generality of Theorems 1 and 2 also leaves many important gaps in the explanation of emergent classicality. On one hand, we have shown that for everywhere in the environment excluding an O(1)-sized region, any locally accessible information about a subsystem must be approximately classical. On the other hand, as discussed in the examples of Section 5, in many cases the environment will contain no locally accessible information about the subsystem. In these cases, there may be no effective classical description of the dynamics. For instance, the computational degrees of freedom inside a quantum computer certainly have no such description.
Given that not all dynamics exhibit effective classicality, one must still ask what type of many-body dynamics allow a such an effective description, and which subsystems or degrees of freedom in particular exhibit this classicality. See [3,5,23,24] for some discussion of this nature.

Compatible channels and states
Our results may also be framed in terms of the theory of compatibility [25,26]. On a tripartite system AB 1 B 2 , two reduced states (or "marginals") ρ AB 1 and ρ AB 2 are "compatible" if there exists a joint state ρ AB 1 B 2 with those marginals. Similarly, two channels Λ Two measure-and-prepare channels that can be expressed using the same measurement are always compatible. The converse is not true in general: there exist compatible measureand-prepare channels that cannot be written using the same measurement (see Appendix B).
From the perspective of compatibility, Theorem 1 states that for any large collection of compatible channels, all but O(1)-many channels must be approximately measure-andprepare, and moreover, they must be expressible using the same measurement. The existence of compatible channels that do not arise from the same measurement, shown in Appendix B, highlights the non-trivial nature of the second statement.

Previous monogamy-related results
Quantum de Finetti theorems characterize the marginals of permutation-invariant states, which are approximately separable for large systems [27]. Thus de Finetti theorems corroborate the monogamy of entanglement. Our result may be seen as a quantum de Finetti-type theorem for non-permutation-invariant systems. For instance, the result about k-extendible states in Corollary 2 of [13] may be seen as a special case of our Theorem 2 when specialized to permutation-invariant states. Likewise, compare to Theorem 1 of [14].
Early work in the direction of de Finetti-type results for non-permutation-invariant systems includes the "decoupling" theorems of [15]. These show that for large multipartite states, after conditioning the state on a measurement of a small random subset of qudits, the marginals on most other small subsets are approximately product states. (The measurement "decouples" them.) The result of [7] and our Theorem 2 may also be seen as decoupling theorems in this sense.
The technique of using small conditional mutual information I(X, Y |Z) to show ρ XY is close to separable was developed by [13], where they use the one-way LOCC norm. The use of the one-way LOCC norm in Theorem 2, supported by Lemma 1, is a technique inspired by [14], where it was applied to obtain de Finetti theorems. The method is further developed by [7,14,28].
In [29] the authors demonstrate the the tradeoff between quantum and classical correlations. In particular, if A and B have near-maximal classical correlation, then A cannot have quantum correlations with any other system. Using this result, one can show that in the setup of our Theorem 1, if even a single system B i receives near-maximal classical information about A, then automatically the other reduced channels must be approximately measure-and-prepare. This fact also relates to the discussion about "objectivity of outcomes" in [7]. However, our results, and those of [7], do not require that any subsystem of the environment receives near-maximal classical information about A.

Future work
There are many opportunities for future work. The optimality of Theorems 1 and 2 is unknown. Certainly many channels will fail to saturate the inequalities. Are the bounds tight for some channels, or can they be generally improved? Some dependence on the dimension d A of the input system is necessary, but the exact dependence is unclear. Likewise, the optimality of our bound with respect to the size |Q| of the excluded region is also unclear.
None of our results depends on the size of the environmental subsystems B. Moreover, Proposition 1 and Theorem 2 have no explicit dependence on the dimension d A of the input system. These results already hold naively for infinite-dimensional inputs, as long as the state has finite entropy S(A). On the other hand, more care is required in Theorem 1 for channels when A is infinite-dimensional.
References [30,31] extend the results of [7] to infinite-dimensional input systems A. Essentially, they replace the dimensional dependence with the assumption that the system A has bounded energy. The energy is taken with respect to some reference Hamiltonian; if the Hamiltonian's density of states does not grow too quickly, then the energy constraint implies an entropy constraint, which then replaces the dimensional dependence. We imagine similar techniques could be used to extend our results to infinite-dimensional systems, combining our Proposition 1 with the tools developed in [30,31].
We are motivated by the emergence of effective classical descriptions of quantum manybody systems. While our results demonstrate that some aspects of classicality are generic, an effective classical description requires more detailed properties of the dynamics. Identifying these properties is an important area of research. Moreover, the bound in Theorem 1 might be improved by specializing to some natural class of dynamics relevant for manybody physics.
Finally, this effective classicality suggests to us there exist efficient classical simulations of some quantum many-body systems. We hope our numerical method in Section 5 for determining the quantum Markov blanket and effective measurements may be useful here.
Lemma 1 Let L AB be any Hermitian operator on AB. Then and where · LOCC← is the "one-way LOCC norm"  [7]. (This case has the most straightforward proof.) The case of Ω = 4 min{d A , d B } 3/2 follows from Corollary 9 of [32]. The use of this result was brought to our attention by Lemma A9 of [31]. The case of Ω = √ 153d A d B follows from Theorem 15 of [12]; see also [13]. Finally, the case of Ω = 2d B − 1 follows from Theorem 16 of [33]. 9 We also note the following useful variant of Lemma 5 in [7], i.e. the case of Ω = d 2 A . There they showed as noted above, where the maximization is taken over quantum-classical channels M B on B. Moreover, the inequality 32 still holds when the maximization is restricted to quantum-classical channels implemented by projective measurements, rather than more general POVMs. This slight strengthening of Lemma 5 of [7] is useful for the numerical applications discussed in Section 5. The proof of the modified lemma follows from the proof in [7] after noting that for any Hermitian operator X, where the optimization on the RHS yields the same answer whether taken over all channels M , just quantum-classical channels, or just quantum-classical channels implemented by projective measurements.
The next lemma we have excerpted from the proof in [7].
Lemma 2 Adapted from the argument in [7]. Let ρ ABC be any state on ABC, let M C be any quantum-classical channel on C (see Eqn. 2), and let = I(A : B|C) M C (ρ) . 9 The use of [33] for this purpose was brought to our attention by Ludovico Lami. Then where the quantum-classical channel M C measures POVM {M α C } α and For convenience we repeat the argument used in [7]. Proof. The state M C (ρ) is a quantum-classical state that is classical on C, i.e. Note that in general where D(·||·) is the relative entropy and the inequality follows from quantum Pinsker's inequality. Then where the second inequality follows from the convexity of both the 1-norm and the function x → x 2 . The result follows.

B Compatible measure-and-prepare channels with distinct measurements
Measure-and-prepare channels are those which take the form (Eqn. 1) ρ → α Tr(M α ρ)σ α for some POVM {M α } and set of prepared states {σ α }. Note that in general, this decomposition into a measurement and preparation is not unique; sometimes a different POVM and preparation yield the same channel.
In this Appendix we demonstrate there exist measure-and-prepare channels that are compatible (in the sense of 6.2) but that cannot be written using the same measurement. That is, there exists some channel Λ 12 : D(A) → D(B 1 ⊗B 2 ) for which the reduced channels are both measure-and-prepare but cannot be expressed using the same POVM.
One can verify by inspection that the reduced states ρ AB and ρ AC coincide with the Choi states of the reduced channels Λ 1 , Λ 2 . To verify Λ 12 is a valid channel, we need only verify it is completely positive, or equivalently that ρ AB 1 B 2 is a positive operator. Diagonalizing the above 8-by-8 matrix, one finds the eigenvalues are positive for p ∈ [ 1 2 − 1 2 √ 2 , 1 2 + 1 2 √ 2 ]. Thus for any such p, the channels Λ 1 , Λ 2 are compatible.