Witnessing environment dimension through temporal correlations

We introduce a framework to compute upper bounds for temporal correlations achievable in open quantum system dynamics, obtained by repeated measurements on the system. As these correlations arise by virtue of the environment acting as a memory resource, such bounds are witnesses for the minimal dimension of an effective environment compatible with the observed statistics. These witnesses are derived from a hierarchy of semidefinite programs with guaranteed asymptotic convergence. We compute non-trivial bounds for various sequences involving a qubit system and a qubit environment, and compare the results to the best known quantum strategies producing the same outcome sequences. Our results provide a numerically tractable method to determine bounds on multi-time probability distributions in open quantum system dynamics and allow for the witnessing of effective environment dimensions through probing of the system alone.


Introduction
Physical systems are never truly isolated, and inevitably interact with their surrounding environment [7,55].As a consequence, information within the system leaks away into its surroundings, leading to entanglement between the system and the environment.In many instances, this leaked information may be partially recovered at a later time, leading to non-Markovian dynamics, i.e., non-negligible memory effects and complex correlations in time [8,48,56].
Like their spatial counterpart, temporal correlations in quantum mechanics fundamentally differ from those that can be observed in the classical case.Such differences between classical and quantum tempo-ral correlations have been noted since the works of Leggett and Garg [26,40,41,68].In the presence of memory, a clear distinction between an underlying process -carrying temporal correlations -and the measurement process -probing said correlations -can be obtained by employing correlation kernels [1,42] or higher order quantum maps [54] for their description.
Broadly speaking, the dimension of a physical system, i.e., the number of perfectly distinguishable states, imposes fundamental constraints over the temporal correlations it is able to produce [12].In this sense, the physical dimension acts as a memory resource, restricting the amount of information stored about the past that is capable of affecting the future.This memory constraint leads to different behaviors (i.e., different achievable temporal correlations) in classical, quantum, or more general physical theories [12,27,35,46,61,67].In particular quantum memories are known to allow for a larger set of correlations than classical ones of the same size [12,13,67].However, this advantage only holds for restricted memory size, i.e., if the memory size is unrestricted, all correlations compatible with a timeordered causal structure may be achieved with either classical or quantum memories [27,35].Therefore, understanding the nature of dimensional constraints on observable correlations sheds light on fundamental differences between our descriptions of physical systems, as well as their connection with causality.
These limitations imposed by the dimensionality of physical systems have been exploited for the construction of "dimension witnesses", inequalities which, when violated, certify the minimum dimension of the system compatible with made observations [9,11,29,31,58,60,61].In a similar spirit, our approach allows for the computation of upper bounds on the temporal correlations achievable in an open system dynamics with an environment of bounded dimension, so that any violations of these bounds certify the minimum dimension of the effective environment.Employing techniques from entanglement detection [25,49], these bounds are obtained by exploiting the inherent symmetries of the problem.In particular, we relax the The sequence of measurement outcomes a = a1 . . .aL is obtained by repeated preparations of a probe state ρ 0 S , fixed and the same at every time step, which is left to interact with the environment, then followed by measurements (semicircles).The environment acts as a memory resource, capable of establishing long-term correlations between measurements.
a priori non-linear problem of computing maximum joint probabilities in sequential measurements to a numerically tractable hierarchy of semidefinite programs (SDPs) [5].Both the computational accessibility of the final formulation of the problem, as well as the non-triviality of the resulting bounds, are then demonstrated for paradigmatic examples, showing that, indeed, joint probability distributions obtained from probing an open system alone provide viable means to deduce dimensional properties of the a priori experimentally inaccessible environment it is coupled to.
The paper is organized as follows.In Sec. 2, we introduce the temporal sequence protocol, the notion of temporal correlations, and their mathematical description.Sec. 3 discusses the application of temporal correlations to the task of characterizing open systems and their environment.Sec. 4 describes the semidefinite program we constructed to bound temporal correlations, with Sec. 5 presenting the numerical results we obtained in their optimization.In Sec. 6 we comment on the various challenges involved in realizing these numerical optimizations, and how we have overcome them.In Sec. 7 we comment on several additional technical details involving our work, with Sec. 8 covering our conclusions and future outlook.

The measurement protocol
We assume throughout that an experimenter is able to prepare the system in a known state and perform measurements in a certain basis.Its dynamics, on the other hand, are not under experimental control, and are governed by its inevitable interaction with the environment.Typically, this environment is a much larger system, generally inaccessible, and featuring complex dynamics.Together, system and environment undergo closed, i.e., unitary evolution.Their interaction leads to an imprinting of information on the probe system, which can be used to learn something about the environment by means of the probe alone.In fact, this is a common way of making indirect measurement on (typically large many-body) systems that are not fully controlled [6,14].For example, a small probe can be used for estimating an unknown parameter of a larger (many-body) environment [22], in particular temperature [3,21,47,57].
We now introduce the details of our scenario and the notation used.By H S and H E we denote the finite-dimensional Hilbert spaces of system and environment, with d S = dim H S and d E = dim H E , and their joint space as The experiment involves L identical measurements on the probe system, at discrete time steps, each outcome a ℓ ∈ A collected into a sequence a = a 1 . . .a L .For simplicity, we consider A = {0, 1} in the following; the generalization to arbitrary outcomes is straightforward.Measurements on the system are described by a Positive Operator-Valued Measure (POVM) (E a S ) a , i.e., E a S ≥ 0 and a∈A E a S = 1 S .
The system is initially in the state ρ 0 S , interacts with the environment via the global unitary U(•) = U • U † , then it is measured and reprepared in the state ρ 0 S .This procedure is repeated a total of L times; see Fig. 1.The unitary operation is assumed to be the same at each iteration (corresponding to, for example the situation of a time-independent systemenvironment Hamiltonian and temporally equidistant measurements); see Sec. 7.1 for more details.
In the following, we consider a measure-and-prepare operation of the form with the generalization to arbitrary operations being straightforward.The probability of a sequence of outcomes can then be written as where p(a|d E ) denotes that it is obtained with an environment of dimension d E .Our task is to establish upper bounds on such probabilities given a finite amount/dimension of "memory" d E .The assumption of identical measurements is essential for capturing this notion of memory, as arbitrary time-dependent operations would impose no non-trivial restriction on the probabilities.
For simplicity, we consider the maximization of the probability of a given sequence a, namely, to find ω a d E such that p(a|d E ) ≤ ω a d E , for all possible correlations generated by an environment of dimension d E .The generalization to arbitrary linear functions of the distributions (p(a|d E )) a is straightforward.
We formulate the problem via the Choi-Jamiołkowski (CJ) isomorphism [19,37], which involves taking multiple copies of the original Hilbert We note that the decomposition of sequential maps in Eq. ( 2) is equivalent to the standard formulation of temporal scenarios in the quantum comb [17], process matrix [18,52], and process tensor [54] formalism.
3 Witnesses for open system dynamics Given a probe system S, the first question we may ask is whether or not it is open, i.e., if it is interacting with an environment at all, or equivalently, whether d E > 1.To answer this question, we compute ω a 1 := max ρ 0 S ,(E a S )a p(a|d E = 1), giving the inequality As an example, the maximum for the sequence a = 00101 is ω a 1 = (3/5) 3 (2/5) 2 = 0.03456, which we explain how to obtain shortly.If we perform an experiment and observe p(a = 00101) = 0.5, then we must conclude the system is open.In other words, the inequality Eq. ( 4) acts as a witness for open systems.However, if no violation is observed, the experiment is inconclusive.
To compute this maximum, we first note that, for our choice of M a , since both ρ 0 S and E a S are the same at each measurement, all outcomes are independent and identically distributed.Writing q a = tr ρ 0 S E a S as the probability of outcome a, with a q a = 1, and using n a as the number of occurrences of a symbol a in a, we have (5) The global maximum ω a 1 can be found analytically with standard techniques for optimizing {q a } a (see App. C), and is given by with the convention q na a = 1 for q a = n a = 0.The maximum is independent of d S and is obtained with E a S = na L 1 S and any ρ 0 S .Similar principles apply when developing witnesses for the case of d E > 1, although the calculation of the maximum becomes nontrivial, as the probability no longer factorizes as in Eq. (5).Concretely, we want to obtain the tightest bounds of the form which hold for all possible realizations of the sequence a in an experiment.If a violation of Eq. ( 7) is observed, we can certify that the dimension is greater than d E .
In the simple case of d E = 1, we were able to optimize over ρ 0 S and (E a S ) a .In the general case, however, optimizing over all possible protocols (i.e.all possible ρ 0 S , (E a S ) a and U ) is difficult.Nevertheless, some general considerations can be made about how the bounds depend on these parameters; see Sec. 7.3.In the following, we assume fixed and well characterized preparations and measurements, in agreement with our initial assumptions on the probe system.The optimization we need to perform is, thus, where assumptions about the initial state of the environment can be removed by convexity and symmetry arguments; see Sec. 7.2.In the following section, we explain how to compute the non-trivial maxima ω a d E via convex optimization methods.

A hierarchy of semidefinite programs bounding temporal correlations
The maxima in Eq. ( 7) require that the probe state ρ 0 S and measurements (E a S ) a are characterized and fixed throughout the experiment, as discussed in the previous section.To obtain upper bounds for the maximum ω a d E , we formulate the problem as a hierarchy of semidefinite programs (SDPs) [5].This class of optimization problems are ubiquitous in quantum information theory, offering strict convergence of solutions for many classes of problems, as well as having efficient algorithms widely available for solving them [5,70].For a recent compendium on SDPs in quantum information science, see [59,64].In the following, we explain the general steps taken in formulating our problem as an SDP, which can be solved numerically and, importantly, efficiently.A more detailed step-by-step formulation is included in App. A.
Under the CJ representation, the repeated applications of the unitary and measurements can each be written as single operators existing in a larger space, encompassing multiple time steps, with their input and output spaces interleaved.This is illustrated in Fig. 2. Via Eq. ( 3), we define With this, we can write the L repeated applications of U as C ⊗L U , and the measure-and-prepare operations as a single operator where Ma is simply the final measurement without a re-preparation, obtained by partially tracing M a over its output space.As the goal is to obtain a maximum, the initial environment state ρ 0 E can be chosen to be pure by convexity arguments, fixed to be |0⟩⟨0| E by unitary invariance, and considered as part of the experimental setup X without loss of generality; see Sec. 7.2.We can then express the probability of a as Note that by Eq. (3) and Eq. ( 9) C U is positive, rank-1, and normalized, which makes C ⊗L U a pure symmetric separable quantum state.In addition, each , which arises from the trace-preserving property of each individual unitary.With all of these observations, we reformulate Eq. (8) as the equivalent optimization problem: As C U enters in the objective function as a tensor power, and we further require it to be rank-1, the problem is both non-linear and non-convex with respect to C U , and thus the above problem cannot be directly solved as an SDP.To reach an SDP relaxation of Eq. ( 12), we first transform it into a chain of equivalent problems.The problem can be made convex by replacing C ⊗L U with a separable state on the symmetric subspace with p i ≥ 0, i p i = 1.By convexity, the maximum will be achieved for a rank-1 Φ A L 1 , therefore this relaxation leaves the optimal value unchanged.Thus, we replace C ⊗L U in Eq. (12) with and the partial trace constraint of Eq. (12), where SEP L is the set of fully separable L-partite states, and P + L is the projector onto the symmetric subspace of L systems (see the definition in Eq. (36) in App.B.1).In fact, the constraints in Eq. ( 14) are exact, as it can be shown that all separable states in the symmetric subspace are of the form in Eq. ( 13); see App.A.1.1 for a proof.
So far, we transformed the original problem into an equivalent one, but due to the condition Φ A L 1 ∈ SEP L it is not yet an SDP.This requirement can be obtained by relaxing the original problem via the quantum de Finetti theorem [15] (see also [20]), which tells us that Φ A L 1 can be approximated as the reduced state of a larger symmetric (and potentially entangled) state , where tr A N L+1 is the trace over systems L + 1, . . ., N .
While this relaxation a priori only provides an upper bound for the original problem, it establishes a hierarchy of approximate solutions, which are known to converge exactly to the separable set as N increases [25].The de Finetti theorem, and analogous results for permutationally invariant (or exchangeable), rather than symmetric, operators have found broad applications in quantum information theory, from entanglement detection [24,25,49] to more general optimization problems, such as evaluating convex roofs of entanglement measures [66], constrained bilinear optimization [4], dimension-bounded quantum games [38], as well as rank-constrained optimization [73].All the above mentioned results exploit the basic idea of using either the symmetric subspace or permutation invariance to relax nonlinear constraints.In particular, Ref. [4] analyzed the optimization over pairs of quantum channels, basing their construction on permutation invariant operators, and showing the partial trace constraint necessary to translate the usual procedure (e.g., in the entanglement detection approach) from states to channels.Moreover, [73] introduced the idea of a rank-constrained optimization based on the symmetric subspace, which is central to impose the rank constraint in Eq. (12).Our construction is inspired by these works and could be derived starting from some of these results.However, it is more straightforward to provide a direct construction in terms of the de Finetti theorem; see App.A.2.
Ultimately, we arrive at an SDP relaxation of the original problem: and Here, N ≥ L is the size of the symmetric state Φ A N 1 used in the approximation of the separable L-partite state Φ A L 1 .The last constraint in Eq. ( 15) is required to ensure that Φ A N 1 represents a sequence of trace-preserving maps, and it can be enforced on a single step due to the symmetry constraint on Φ A N 1 ; see App.A.1.2.Optionally, one can also add further linear constraints to the SDP to improve its approximation to the separable set, e.g., entanglement witnesses [33,65] or the positive partial transpose (PPT) criterion [53] is the partial transpose with respect to a bipartition α.These conditions are not necessary for convergence, but they may provide better results [49] at an extra computational cost.
Once approximate solutions ωa,N d E are obtained from Eq. ( 15), they can be used to establish a convergent hierarchy of upper bounds on the temporal correlations for each sequence, i.e., Consequently, the above formulates a sequence of SDPs capable of approximating the maxima ω a d E to arbitrary precision.Numerical results obtained from any of these SDPs (i.e., for any N ≥ L) can be used to construct dimensional witnesses, which, when violated, certify the minimum dimension of the effective environment interacting with the system.A software implementation of the SDP in Eq. ( 15), however, is not straightforward even for small values of {d S , d E , L, N }, as the memory requirements quickly render the problem computationally intractable.Obtaining the numerical results presented in the next section, thus, required a significant amount of optimization; see Sec. 6 for details.A schematic outline of all steps undertaken to formulate the SDP and obtain the numerical results is presented in Fig. 3.
The asymptotic convergence of the hierarchy of bounds in Eq. ( 16) for the SDP in Eq. ( 15), without PPT constraints, is given by [16] If PPT constraints are included, we only have partial analytical results for the asymptotic error.Numerical evidence and previous results involving PPT constraints [49] lead us to conjecture an asymptotic scaling of the form where f (L, d ES ) is a function of L and d E .We leave the analysis of these asymptotic error bounds to a future investigation.

Numerical results
In this section, we discuss the numerical results we obtained for ωa,N d E .The SDPs were run with CVXPY [2,23] using the solver SCS [50,51], on a compute server with an Intel Xeon Gold 5218 24-core processor at 2.294 GHz, and with 128 GB of RAM.
For d E = 1, optimization time was in the order of seconds, whereas d E = 2 scenarios required from 4 up to 14 hours to complete.
For the explicit computation, we chose the following measurement protocol: For this particular choice, the bounds ω a d E are the same as for the isolated system case studied in [67], where explicit time evolutions were found through gradient descent techniques.
The analytical maxima for d E = 1 described in Sec. 3 served as a useful test for the soundness of our approach.We ran the SDP for all binary sequences  .Note that 4/27 = 0.148.It can be seen that either PPT constraints or an extension of one extra system (i.e.N = L + 1) was sufficient to achieve the analytical maximum in this case.Similar results were obtained for L = 4, omitted here for conciseness.
(up to 0 ↔ 1 relabeling symmetry), for L = 2, 3, 4, N = L, L + 1.Every case was tested with and without the additional PPT constraints for comparison.Results for L = 2, 3 are shown in Table 1, in which it can be observed that, for d E = 1, either the PPT constraints or a symmetric extension of a single extra system appear to be sufficient for achieving the exact analytical maximum.
We have also solved the SDP for the case d S = d E = 2, for various sequences without a trivial maximum probability (i.e., ω a d E < 1), but only without the additional PPT constraint, as the requirements needed for its addition would have exceeded the available memory.Additionally, since the involved SDPs are only numerically tractable for N ≈ L, and no convergence was observed for the values we managed to compute, we can only claim to have obtained upper bounds for the maximum ω a d E , as in Eq. ( 16).As our choice of probe state and measurements (Eq.( 19)) for d S = d E = 2 reproduces the quan-

L N a
Upper bound (SDP) Achievable value (GD)  [67] and act as lower bounds on the maximum, but are not suitable as witnesses.All results are in agreement, and for a = 001, we observe convergence of the bounds as N increases.
tum scenarios previously investigated in [67], as we show in Sec.7.3, we can compare our upper bounds against the largest known achievable values, obtained through explicit time-evolutions found via gradient descent techniques.This allows us to estimate the range of values containing the maxima ω a 2 .These comparisons are shown in Table 2, where, for instance, we see 0.437341 ≤ ω 001 2 ≤ 0.521219.As a concrete example of a violation of this bound, for d S = 2 and d E = 3, we may construct a unitary in H ES as follows: For the ρ 0 E , ρ 0 S and E a S we have chosen, the above unitary gives p(001|d E = 3) = 1 > ω 001  2 , implying that the bound found for d E = 2 is not only non-trivial, but actually witnesses environment dimension since it can indeed be violated by means of a larger environment (d E = 3 in this case).In fact, this example certifies that ω 001 3 = 1; see also Sec. 7.3.

Implementation
This section outlines the concrete steps we have taken to relax the original problem to a hierarchy of SDPs and to make these SDPs numerically tractable; see App.B for more details.As noted previously, a straightforward implementation is not computationally tractable, even for short sequences, due to the large number of variables and constraints involved.
We have addressed these additional numerical challenges by exploiting several properties of the problem.Firstly, the symmetric constraint imposed on Φ A N 1 can be satisfied automatically by expressing Φ A N 1 in terms of a basis for the symmetric subspace in the numerical implementation of the SDP.While this provides a significant reduction in the total number of involved variables and constraints, it is by itself insufficient to make the problem tractable.
However, thanks to our specific choice of initial states and measurements, the sparsity of the objective function could be exploited to further reduce the number of variables and constraints, by eliminating all of those that do not affect the objective, either directly or indirectly.This relaxation generally results in a sparse outer approximation of the original problem, i.e., to a value Ω a,N d E ≥ ωa,N d E .In our particular case, as we show in App.B.2, this elimination procedure is exact, thus yielding an optimal value Ω a,N d E = ωa,N d E .

Choice of parameters
Before a concrete implementation, one must first choose the initial environment state ρ 0 E , as well as fixing the probe state and measurements.As the objective function is convex on ρ 0 E , and the optimization is over all unitaries, we may -without loss of generality -fix a pure initial state ρ 0 E = |0⟩⟨0| E and eliminate any assumptions on the environment state; see Sec. 7.2.On the other hand, as assumptions on d S , ρ 0 S and (E a S ) a are experiment-dependent, one must search for "proper" probe states and measurements on a case-by-case basis, but optimal choices, leading to a larger bound, can also be addressed in general terms; see Sec. 7.3.For the choice of sequence, in particular, if it is too simple relative to d E , the maximum attainable probability may be trivial (i.e., equal to one), such that no optimization is required, and no witness can be constructed.Therefore, it is important to choose sequences which have a non-trivial maximum for a given d E .For closed systems, this problem has been solved via the notion of "deterministic complexity" [67], which defines the minimum requirements for a sequence to be able to occur deterministically.In the open system case, the conditions for determinism involve not only the available memory, as was the case in the isolated system, but also the dimension of the system.We elaborate on these conditions in Sec.7.3.

Symmetric representation
Effectively solving any of the SDPs in the hierarchy requires a significant amount of optimization, as a naive implementation quickly becomes computationally intractable.As a rough example, without any simplifications, Φ A N 1 is a square matrix of size (d 2 ES ) N , which in the simplest non-trivial scenario, d E = d S = 2 and N = L = 3, already results in a 4096 × 4096 matrix, with over 16 million complex scalar variables.The partial trace constraint alone involves over 1 million linear equations between these variables, making the SDP numerically intractable in this naïve formulation.
In practice, several simplifications can be made; see App.B for full details.Φ A N 1 and X ⊗ 1 A N L+1 may be written directly in terms of an operator basis for the symmetric subspace, thus automatically satisfying the symmetry constraint, which significantly reduces the number of variables.Defining an isometry between the symmetric subspace of A N 1 , denoted by S N (with dimension dim S N ), and a canonical basis for S N , we may cast all involved objects directly as dim S N × dim S N matrices: where t denotes a "type" for the canonical representation of symmetric states |sym(t)⟩ A N 1 ; see [34].Under this representation, the objective function becomes tr [xϕ], and the constraints may be formulated in terms of ϕ t,t ′ directly, without resorting to the the full space A N for d ES = 4 and N = 3 the matrix ϕ is of size 816 × 816, with around 600 thousand variables: a reduction to 4% of the original 16 million.With further work, the partial trace constraints can also be efficiently written directly in this representation, leading to

Sparse implementation
While these are significant improvements, they still prove to be insufficient, as the dense representation  of the symmetric problem involves too many variables and constraints to be solvable.As a concrete example, for the simplest case of a = 001, d E = 2 and L = N = 3, the SDP solver SCS attempts to allocate a dense array of over 3 TB in size.
To address this, we have developed an algorithm that exploits any sparsity of the objective function, and -whenever possible -constructs a sparse relaxation of the original problem.This is achieved by iteratively selecting which variables of ϕ and which of its constraints are strictly necessary to solve the problem, due to their direct or indirect influence in the objective function.While generally a relaxation, due to the explicit structure of our problem this sparse implementation is exact, yielding ωa,N d E .A detailed explanation of our algorithm is available in App.B.2.
For the state and measurements considered, our technique was capable of reducing the number of variables and constraints immensely, to less than 1% of the symmetric case (see Table 3).This allowed us to successfully compute upper bounds for various sequences up to N = 4 and d E = 2 (Table 2).
As can be observed in Table 3, the sparse problems can clearly be simplified further, by eliminating redundant variables and constraints.We opted for leaving such task to the numerical pre-solver, as this only took a few minutes of computing time.Solving the SDP for N = 3 was possible within minutes, but for N = 4, from 4 up to 14 hours were needed, depending on the sequence.

Repeated unitaries
In the following, we discuss in which cases the condition of repeated unitaries is physically justified.To see this, imagine that the evolution of the system is governed by a time-independent Hamiltonian H SE .The corresponding unitaries will be of the form U (t) = e −iHt .This assumption of time-independence is always possible, as the environment may include any source of time dependence.Ideally, then, we would have the same unitary if we perform the measureand-prepare operations equally spaced in time, say, by an interval ∆t.One should take into account, however, some uncertainty in the time measurement.Let us then assume that our choice of time for performing a measurement is distributed according to a distribution q(t), centered around ∆t.This means that instead of transforming our system according to the unitary U (∆t), we transform it, on average, according to the mapping Λ := U(t)q(t)dt, seemingly breaking our assumption of repeated unitaries.Since Λ is a valid completely positive trace preserving (CPTP) map, it can be dilated by means of a larger environment into a unitary U ′ , namely Λ(ρ SE ) = tr E1 U ′ (ρ SE ⊗ |0⟩⟨0| E1 ) .To complete the argument that this situation can still be described by repeated unitaries, it is enough to show that the same can be done for multiple copies of it, namely, that the repeated operation Λ L can be dilated to some unitary U L .To do so, it is enough to provide at each time-step i a new environment E i prepared in the correct initial state |0⟩⟨0| Ei .Let us compute explicitly the case for two time steps, the general case is straightforward.We define W E1E2 the swap operator between systems E 1 and E 2 , and we set U := W E1E2 • U ′ SEE1 .Let us calculate, for simplicity where σ SEE1 := U ′ SEE1 (ρ SE ⊗ |0⟩⟨0| E1 ).The effect of the final W E1E2 operation is to swap the space E 1 and E 2 that are irrelevant after the operations on SE have been performed.Evidently, this argument can straightforwardly be extended to more time steps.In summary, this means that the condition of "the same unitary" is satisfied as long as the time choice is always drawn from the same distribution.In other words, it is sufficient to always "probabilistically repeat" the same operation.
Finally, we remark that even if this procedure may use a very large environment, we are only interested in showing that we can always assume there is a unitary dynamics, with the notion of an effective environment, then, taking care of estimating the environment consistent with the observed statistics.Moreover, we also remark that time-independent operations are necessary, as unrestricted time-dependent operations can achieve arbitrarily long temporal correlations even with bounded memory size [46].

Effective environment and initial state
The environment of a quantum system, intended as all the physical systems surrounding and pos-sibly interacting with it, is typically a very highdimensional system, if not directly assumed to be infinite-dimensional.From this perspective, we want to make sense of the notion of effective environment.Consider a global transformation of the system and environment.As a first approximation, we can say that the global unitary is of the form U SE ⊗U E ′ , where U SE is an entangling unitary between the system and the effective environment and U E ′ is acting on the rest of the environment.At the same time, an evolution of the form U SE ⊗U E ′ is just an approximation of the full evolution of the environment, as we expect its state to thermalize after some time interval.Nevertheless, from a physical perspective this approximation is still valid if the time required for a single run of the experiment, i.e., the measurement of the temporal sequence, is much shorter than the time needed for the effective environment to thermalize.This is to be expected, as the environment is composed of many particles that may interact with each other with different strengths.Underlying this expectation is the assumption -known as Markovian embedding [10,71,72] and frequently employed in the description and modeling of open quantum system dynamics [43][44][45]63] -that the environment can always be split into two parts: a far environment, leading to irretrievable, memoryless information loss, and an effective environment, that can transport memory.Due to the irretrievable information loss, the dynamics of the system and the effective environment is then described by a general (non-unitary) CPTP map, a situation which, as discussed in the previous section, can again be dilated to repeated unitaries.
Reversing this perspective, namely, looking at the problem of characterizing the effective environment from temporal correlation experiments, we may say that this characterization is inherently dependent on the typical time-scales of a single experimental run.This difference in time scales and the eventual thermalization of the environment, however, are essential for repeating the experiment for collecting statistics, as it is required that the initial state of the environment is always (probabilistically) described by the same state ρ 0 E in each experimental run.As previously mentioned, this is typically a thermal state, but it does not need to be characterized in our approach.In fact, since the SDP maximizes the temporal correlations, which are linear in the initial state of the environment, we know that the maximum is always achieved with a pure state.Up to local unitaries, we can thus always assume it to be ρ 0 E = |0⟩⟨0| E .The bound calculated for this state is, then, valid for any possible initial state of the environment.
Finally, somewhat independent of the explicit experimental situation, we may see our setup as a question of simulation resources: What is the smallest dimension an environment coupling to a known system must have in order to reproduce observed statistics in a unitary way?Seen in this way, the results we present attribute a "simulation hardness" (in the sense of required environment dimension) to each sequence, that is agnostic with respect to concrete time scales or experimental limitations, but rather inherent to the respective sequence.

Conditions for deterministic realizations
The maximum probability for a given sequence increases as more memory is available, i.e., as larger environments can always simulate the dynamics of smaller ones.Therefore, if the sequence a is sufficiently simple, its maximum may be trivial for a given d E , i.e., ω a d E = 1.The maxima ω a d E depend on the sequence, as not all sequences are equally "difficult" to produce with a given amount of memory d E .For example, it is easy to see that the sequence '000' can always be produced with unit probability, while the sequence '001' cannot be produced with unit probability if the environment is not of sufficient size [67].This observation suggests a relevant notion of "complexity" of sequences, which offers a more fundamental relationship between a and d E .Since our goal is to establish non-trivial maxima on the probabilities of sequences, the natural question to ask is how large should d E be such that a can occur deterministically?
Understanding the conditions where this occurs allows us to pick only scenarios featuring non-trivial maxima.It is useful to recall the following Definition.(Deterministic Complexity [67]) Let a be a sequence of measurement outcomes, produced by repeated identical measurements on an isolated ddimensional system.The deterministic complexity of the sequence a, or DC(a), is the minimum d such that p(a|d) = 1 can occur.DC(a) is a property of the sequences a and it is independent of the model (quantum or classical).First, we establish a correspondence between the single system and open system scenarios.Let d S = |A|, so that we may choose E a S = |a⟩⟨a| S and ρ 0 S = |0⟩⟨0| S .Then, the changes in the environment state due to the unitary and measurements can be written succinctly in terms of the completely positive maps Therefore, under the action of (I a ) a∈A , we may interpret the environment as a single isolated "memory" system in which the maps I a act sequentially, as in [67].The converse mapping, from the sequential measurements on an isolated memory system to the open system scenario, is the dilation map with the probe system acting as the ancilla and the environment as the memory.This construction is well known, but it is worth expanding it in detail in order to understand what are the conditions on the ancilla (i.e., the probe system) needed to implement it.
Let n(a) denote the number of unique symbols appearing in a, with n(a) ≤ |A|, and let us assume d S ≥ n(a).Now, consider an instrument given by the maps Ĩa of the form Ĩa (ρ which can always be completed into a unitary matrix U .Thus Ĩa can be written in the form of Eq. ( 26).As we have chosen d S = |A|, ρ 0 S = |0⟩⟨0| S and E a S = |a⟩⟨a| S in our implementation of the SDP (Sec.6), there is a direct correspondence between the two scenarios; see Fig. 4. Therefore, the upper bounds obtained by the SDP in Eq. ( 15) can be compared with the known achievable values from [67], as shown in Table 2 and discussed in Sec. 5.
This construction applies to deterministic models as they have deterministic transitions between states, i.e., each is described by a single Kraus operator per outcome.We thus have Proposition 1.If both d E ≥ DC(a) and d S ≥ n(a), then there is a choice of probe state and measurements such that ω a d E = 1.Note that the condition d S ≥ n(a) is strictly required for deterministic production of a.In fact, if p(a|d E ) = 1, every symbol must occur deterministically.Then, at each step the system must be in a state ρ a S such that tr [ρ a S E a S ] = 1.Thus, the measurements E a S must be able to perfectly discriminate between the states {ρ a S } a , which is possible only if d S ≥ n(a).We have established:

for any choice of probe state or measurements, and any environment dimension d E .
In conclusion, these results tells us that nontrivial bounds appear for d E < DC(a), that we can compare these bound with the single-system scenario for d S ≥ n(a), and that |A| = 2 and L = 3 is the smallest scenario displaying non-trivial memory effects.

Choice of sequence
As briefly noted in Section 7.3, in order to construct a witness for d E > d, it is vital to choose a sequence a with sufficient deterministic complexity.If the sequence chosen is too short or too simple (i.e., if DC(a) ≤ d), then the bound ω a d is trivial and the dimension d E cannot be tested with such sequence.
However, due to the difficulty in computing these bounds, and the fact longer sequences generally become less probable-and therefore less likely to violate the witness inequality-generally one should choose sequences to be as short as possible while having deterministic complexity of the same size as the minimum environment dimension one wishes to witness.
As a concrete example, if we wish to witness d E > 3, we should choose a sequence a with deterministic complexity DC(a) = 4, e.g., a = 0001, which is the shortest sequence satisfying this requirement.Then, a violation of the bound ω 0001 3 informs us that, in fact, d E ≥ 4.

Conclusions and outlook
We presented a method to lower bound the dimension of the environment interacting with a probe system, based only on the statistics of measurements performed on the (partially characterized) system, and without any assumption on the environment or the dynamics.This is achieved via a hierarchy of semidefinite programs that upper bound temporal correlations achievable in various experimental scenarios, under the assumption of finite memory.Such bounds can be applied to the detection of the effective environment size in the dynamics of open systems, as well as a certification of the minimum size of an environment's dimension compatible with observations.
To keep the discussion simpler, we applied the optimization for a single sequence.It is straightforward to adapt the objective function to arbitrary linear functions of the full probability distribution (p(a|d E )) a , as those appearing in the temporal inequalities derived in, e.g., [12,35,46,61].This may, in principle, lead to better witnesses.We leave the numerical explorations of this problem to a future investigation.
We assumed a joint unitary evolution between system and environment, which leads to a CJ representation of these maps given in terms of symmetric states.As explained in Sec.7.1, this assumption can usually be justified on physical grounds.Nevertheless, a natural question to ask is how do our results change if we consider arbitrary CPTP maps?In that case, the SDP should be modified by replacing the symmetry constraint with permutation invariance, i.e., for all permutations σ ∈ S N ; see Eq. (35).We expect that bounds for CPTP maps may be larger than those for unitary channels of the same dimension, especially for objective functions involving more than a single sequence, but numerical optimization is significantly more costly in this case, as the permutation invariant operator basis is of size [69,Ch. 7 the symmetric case.While it is also possible to write such CPTP maps in terms a dilation of the environment, this approach would likely result in more extraneous variables in the SDP, i.e., the terms in the subspace orthogonal to the dilation ancilla's initial state, possibly rendering the optimization intractable.
Additionally, the SDP in Eq. ( 15) and its related implementation techniques are, in fact, quite general, and can be applied to a wide range of scenarios beyond what we have considered here.As the operator X from Eq. ( 10) is fixed, any choice of intermediate operations between each unitary could be chosen.E.g., the initial system and environment states could be correlated, and the intermediate maps M a could each be replaced by arbitrary joint operations, even timedependent ones.Such approaches could then, for example, be used to bound observables for specific types of processes, e.g., quantum processes with only classical memory [30], by assuming additional entanglement breaking channels on the environment.Therefore, provided the problem is numerically tractable, our techniques are independent of what explicit measurements are chosen.
While the high dimensionality of the current formulation of the SDP quickly renders general numerical implementations intractable, our approach still offers new avenues for the subject of bounding temporal correlations, and their relationship to open-system dynamics.Ultimately, the techniques developed herein should be taken as a proof-of-concept for future developments and improvements.It remains to be seen whether more efficient numerical techniques, or even alternative outer approximations, are better suited for addressing such problems.Further investigation of these avenues is subject to future work.
Nevertheless, the success of our approach highlights the wealth of information contained in temporal correlations and the potential of new techniques for characterizing large complex systems by means of a small probe alone, by exploiting non-trivial properties of the temporal correlations achieved by systems of bounded size.Let ℓ = 1, . . ., L enumerate the time steps, and let I ℓ and O ℓ be the input and output spaces of the ℓ-th unitary evolution U, respectively.For convenience, we write A ℓ := I ℓ ⊗ O ℓ for the joint space at step ℓ, and for the sequential spaces from a to b.We assume that all spaces are isomorphic, i.e., I ℓ ∼ = O ℓ ∼ = H ES for all ℓ, but we preserve labels for clarity.We can thus write the respective maps in the Choi-Jamiołkowski representation (see Eq. ( 3)) as: where C(Λ) is the Choi matrix of the map Λ and C U is normalized such that it corresponds to a quantum state, which is useful during implementation of the symmetric representation of the problem (App.B.1).As before, we highlight that the local input (I) and output (O) spaces of these maps are interleaved: For the ℓ-th U, I = I ℓ and O = O ℓ , but for M a ℓ we have I = O ℓ and O = I ℓ+1 .Since the final output state is discarded (i.e., we only are only concerned with the probability of outcome sequences), the final O is traced out, yielding Ma in the equation above.This structure of the spaces is illustrated in Fig. 6.We may now specify the probability of the sequence a as Where the correcting normalization factor (d ES ) L was incorporated into X (to make up for the normalization of C U ), and X T denotes the transpose with respect to the basis chosen for the isomorphism.In order to obtain an upper-bound for p(a|d E ), our goal is to optimize Eq. ( 29) over all possible unitaries, in terms of C U .As per the above definitions, it follows that C U must satisfy the constraints: • C U ≥ 0 and rank C U = 1, as it represents a unitary channel,

as it is a trace preserving map
We may thus define our optimization problem as Optimization Problem 1.The initial formulation of the problem.
This may be immediately relaxed to a convex form, without affecting the maximum of the objective function, as Optimization Problem 2. The convex relaxation of the original problem.
However, this optimization problem is non-linear, not only due to the tensor product in |ϕ i ⟩⟨ϕ i | ⊗L , but also on the rank-1 nature of |ϕ i ⟩⟨ϕ i |.Let us now define our target set T as Elements of this set will not generally be rank-1, but the optimal solutions of Problems 1 and 2 belong to this set.Our goal now is to approach T by means of further relaxations, which is achieved by exploiting the symmetry of Φ A L 1 .

A.1 The unitary channel constraints
The symmetric structure of Φ A L 1 requires all subspaces to be in the same local state.We can relax this by considering instead the separable set on L parties, such that all local spaces become independent.Here, conv denotes the convex hull of the set, and B the set of bounded operators.We must now find ways to restore rank-1 and permutation invariance constraints for the optima over this set.For now, we highlight that T ⊂ SEP L , but simply switching from Φ A L 1 ∈ T to Φ A L 1 ∈ SEP L does change our problem, so that we must impose further constraints to the set SEP L to restore the original problem in T .The first step is restoring symmetry and optimality of rank-1 to Φ A L 1 .After this, we may restore the partial trace constraint.
Let σ ∈ S n be a permutation from the set of all permutations on n symbols S n .We define respectively, the permutation operator and projector onto the symmetric subspace.The symmetric subspace on n copies of H is defined as Sym n (H) := { |ψ⟩ ∈ H ⊗n | P + n |ψ⟩ = |ψ⟩ }, and the space of symmetric operators is given by the set Sym n (B(H)) := {A ∈ B(H ⊗n ) | P + n A = A}; see [34].Note that, as ρ is Hermitian, the symmetry condition can be equivalently written either as P + n ρ = ρ or P + n ρP + n = ρ.In fact, if More details about the symmetric subspace can be found in App.B.1.

By utilizing Φ A L
1 as our variable as in Eq. ( 34), we have lost both symmetry and rank-1 guarantees at the optimum.However, both can be restored by ensuring Φ A L 1 is in the symmetric subset of SEP L ; see, e.g., [39].For completeness, we provide in the following an elementary proof that Since it is separable, we can write it as as a convex mixture of pure product states, i.e., with |Φ i ⟩ a n-party pure product state, i.e., |Φ i ⟩ = n j=1 |ϕ i,j ⟩, with the {|ϕ i,j ⟩} j not necessarily equal for a given i.However, ρ ∈ Sym n (B(H)) implies ran(ρ) ⊆ Sym n (H), and if we prove that ran(|Φ i ⟩⟨Φ i |) ⊆ Sym n (H) for all i, then we find that {|ϕ i,j ⟩} j must be identical for each i, otherwise |Φ i ⟩ fails to be symmetric, namely, To show this, we note that by Eq. ( 38) and positivity of ρ and |Φ i ⟩⟨Φ i |, we have Ker(ρ) = ∩ i Ker(|Φ i ⟩⟨Φ i |).By Hermiticity, it follows that ran(ρ) = (Ker(ρ)) ⊥ = span(∪ i ran(|Φ i ⟩⟨Φ i |)), which concludes the proof.
Importantly, Eq. ( 37), restores the symmetry and rank-1 properties for optimum solutions of Prob. 2 when In light of this, in the following we use SymSEP n := Sym n (B(H)) ∩ SEP n .

A.1.2 The trace-preserving constraint
We now restore the trace-preserving constraint tr O [|ϕ i ⟩⟨ϕ i |] = 1 I /d ES for states expressed as in Eq. ( 37).This is established by proving that it is sufficient to satisfy this condition for a single party in the convex mixture.

Proposition 3. For any Φ
Proof.We provide a generalization of the arguments in [73, App.A].From Eq. ( 37) we know it admits a decomposition of the form We introduce the one-party auxiliary map , and, due to symmetry, the same is true if the map had been applied to the second party instead.Defining Ẽ( As E acts on Hermitian operators and is Hermiticity-preserving, we have E = Ẽ and the explicit distinction is only made for added clarity in the following proof.Applying the map to the convex mixture in Eq. ( 40), we obtain where We need this to be true for all E i if we want to ensure Φ A L 1 obeys our partial trace constraint globally in the convex mixture.To prove this is the case, let which must hold for any G.Since

this can only be true for all
G if all E i = 0. Furthermore, by the symmetry of each term in the mixture, it is then sufficient to ensure the constraint is satisfied in a single party.From Eq. ( 37), the converse statement follows trivially.

Thus, any state
is part of the target set T and vice versa.

A.1.3 The final rank-constrained problem over SEP
Given all of the previous results, we have now established a relation between the sets T and SEP L , which allows us to remove the non-linear dependence on the tensor product (C U ) ⊗L , by replacing the search space with SymSEP L , while obeying the linear constraints on the partial trace of one party.We thus obtain: Optimization Problem 3. Rank-1 and symmetric optimum (exact) However, Φ A L 1 ∈ SEP L is still not a linear constraint and highly non-trivial to be enforced, as a full characterization of the separable set is NP-HARD [32].Therefore, while a priori appearing more manageable, the above problem is still not in a form that can be tackled numerically.Fortunately, the symmetric constraint can also be exploited to obtain approximations of the symmetric separable states.

A.2.1 Approximation of SEP via the quantum de Finetti theorem
Ensuring Φ A L 1 ∈ SEP L cannot be done exactly, so as it stands the problem is still numerically intractable.However, it is possible to define arbitrarily-precise outer approximations of the separable set through symmetric extensions and the quantum de Finetti theorem [15,20].Concretely, we can approximate, in principle to any desirable precision, a Φ A L 1 ∈ SEP L by a marginal over a larger symmetric (and not necessarily separable) state Φ A N 1 , provided N is sufficiently large.For any finite N , this leads to an outer approximation of T , which establishes a convergent hierarchy of outer approximations, such that for any N ≥ L, we have More precisely, using the quantum de Finetti theorem we can establish asymptotic error bounds on this approximation [16,Cor.1].In terms of the trace norm, this asymptotic error bound is given by As the symmetry requirement applies to both Φ A L 1 and Φ A N 1 , this application of the quantum de Finetti theorem for approximating Φ A L 1 ∈ SEP L is straightforward in the SDP in Eq. ( 15), only requiring the use of a larger state Φ A N 1 as the optimization variable, and a suitable adjustment of the objective function using (X T ⊗ 1 A N L+1 ).

A.2.2 Additional separability constraints
To improve the approximation, at an extra computational cost, we may also include additional separability constraints.In our case, we consider the constraints of positive partial transpose (PPT) on Φ A N 1 : Here, "non-equivalent bipartitions" refers to the fact that it is unnecessary to include all bipartitions, as the state Φ A N 1 is symmetric, and thus permutation invariant.Therefore, the bipartitions needed are α k , where, e.g., we transpose only the first k subspaces, for 1 ≤ k ≤ ⌊N/2⌋, with symmetry taking care of the remaining constraints.
The existence of entangled states which have positive partial transpose [36] means these constraints reduce the feasible convex set to that of PPT states, not the separable states.While entangled PPT states are not suitable solutions to the original problem, they are still adequate approximations and provide upper bounds for the actual optimal values.Inclusion of PPT constraints will perform at least as well as optimizing without them (i.e., the solution can only be improved), but convergence is significantly improved in certain cases [49] to O(1/N 2 ) as opposed to O(1/N ) in the absence of these constraints.As shown in Table 1, for the d E = 1 case PPT constraints were sufficient to provide the exact analytical bounds.

A.2.3 SDP for SEP relaxation
By all of the above results, we arrive at the following SDP relaxation of the original problem: Optimization Problem 4. Final outer approximation (SDP).
with A denoting the set of non-equivalent bipartitions (App.A.2.2).The solutions of this SDP provide upper-bounds for the maxima ω a d E .
of the symmetric subspace, i.e., the normalized symmetric vectors in the full space A N 1 and the canonical basis for S N , as follows: Note that this isometry acts as a projector in A N 1 , i.e., , we may write it concisely as: so that the objective function, which acts entirely within the symmetric subspace, can be written as Positivity of Φ A N 1 can be guaranteed by requiring ϕ ≥ 0 directly, as ϕ and Φ A N 1 have the same non-zero eigenvalues.With the chosen normalization of C U , we may also write tr [ϕ] = 1.In practice, it is sufficient to numerically compute x directly by means of Eq. ( 54), as computing the projection of X T ⊗ 1 A N L+1 onto the symmetric subspace analytically can be cumbersome, and this computation only needs to be performed once before the numerical optimization.
To rewrite the partial trace constraints of the SDP directly in this symmetric basis, we must find a way to express partial traces in terms of the symmetric representation.In the following, we generalize some results known for qubits [62] to arbitrary local dimension.But first, recall that the constraint we wish to rewrite is given by Treating types as ordinary vectors, we may define addition between types with the same d, . This corresponds to the fact that S n+m ⊂ S n ⊗ S m [62].Generalizing upon this idea, we may split an n-partite symmetric state into m parts of sizes k i , with where the sum is over all tuples (r 1 , . . ., r m ) Here, B ℓ := A b ℓ +k ℓ b ℓ +1 , with b ℓ = ℓ−1 i=1 k i , corresponding to the subspace of the ℓ-th part of the decomposition.Importantly, this decomposition is performed in the full space with non-normalized vectors.For completion, the normalized version of the decomposition in Eq. ( 57) is given by However, working under such normalization is cumbersome and inefficient due to the several coefficients involved.Instead, we have chosen to normalize directly in terms of the original types t, t ′ , which makes normalization straightforward.To adapt the partial trace constraint, we first write the state Φ A N 1 as in Eq. ( 22), and using Eq. ( 57) split the basis into two parts of sizes (1, N − 1), obtaining where the proper normalization is now taken care of by defining φt,t ′ := ϕ t,t ′ ((t)(t ′ )) −1/2 .Since r and r ′ are types for a single party, we have that |Sym(r)⟩ A1 are simply vectors in the canonical basis {|u⟩} d u=1 .Thus, we will adopt the notation |u(r)⟩ A1 = |Sym(r)⟩ A1 in what follows.The tensor product form allows for the simplified application of the partial traces in Eq. (56).By linearity of the partial trace, we can treat each term of Eq. ( 59) separately, so that the terms on the left (Y L ) and right (Y R ) hand sides of Eq. ( 56) can be written as A few simplifications are now evident.First, we observe that tr |u(r)⟩⟨u(r ′ )| A1 = δ r,r ′ , where δ r,r ′ is the Kronecker delta.Secondly, we may split the states |u(r)⟩ A1 into the local input and output spaces so that we obtain Using the above results, and correcting for the missing normalizations, we can apply the isometry 1 I1 ⊗ V N −1 , so that Eqs. ( 60) and ( 61) become: Here, we have written 1 I1 = d ES i=1 |i⟩⟨i| I1 as to make the I 1 ⊗S N −1 decomposition explicit in both expressions.By inserting Eq. ( 64) back into the sum of Eq. ( 59), we can appreciate the fact that Eq. ( 64) neatly separates each term of the sum as square matrices of size The constraint of Eq. ( 56) then tells us we must sum over all t, t ′ on both sides, where we can then apply the equality constraint element-wise by simply matching the resulting (i, s, i ′ , s ′ ) entries.As the extra normalization ((s)(s ′ )) 1/2 of Eq. ( 64) is always equal between these elements, it cancels out in their element-wise equality constraint.Thus, it is sufficient to use the normalization of φt,t ′ .
During implementation in software, the summations in Eq. ( 59) can be efficiently performed by taking into account the fact that valid decompositions of the form t = r + s are very restricted in number, and can be efficiently enumerated, grouped, filtered and counted.By assigning a tuple (i, o, s) to each t, the tracing over inputs and output spaces, and applications of the Kronecker deltas, can thus be efficiently computed.Therefore, the linear constraints between variables ϕ t,t ′ can be constructed by bucketing terms ϕ t,t ′ by matching |i⟩⟨i ′ | I1 ⊗ |s⟩⟨s ′ | S N −1 , and normalization is simplified by attaching it to ϕ t,t ′ at the very end, i.e., using φt,t ′ .
A similar strategy as above may be employed to define partial transpositions through the symmetric representation, as was done in [62] for the qubit case, in order to implement the PPT condition.However, for d E = 2 in our case, this approach would render the matrices and the number of constraints too large for a viable computation, once again, and therefore we did not pursue a generalization of this idea.The optimizations we have performed for d E = 1 involving PPT constraints, as shown in Table 1, used Φ A N 1 directly following Eq.( 22), which was still tractable in this case.
While the above steps lead to a significantly smaller representation of the problem, it is not generally a sufficient reduction in variables and constraints for the SDP to be numerically tractable.For concreteness, in the smallest non-trivial case of d E = d S = 2, this gives for N = 2, . . ., 5 a symmetric space of size D S = 136, 816, 3 876, 15 504, . . ., such that the SDP would be written in terms of (D S ) 2 = 18 496, 665 856, 15 023 376, 240 374 016, . . .variables.These examples illustrate how quickly our problem can become numerically intractable, even with symmetry taken into account.

B.2 Sparse implementation of the SDP
In the following, we describe in detail the algorithm we have developed to obtain a sparse version of the SDP.This is an essential step in rendering the problem numerically tractable.The algorithm works by heuristically exploiting any existing sparsity of the original problem, in particular its objective function, in order to automatically construct a sparse outer approximation.In our particular case, the sparse SDP resulting from our algorithm was not only far simpler, it was in fact an exact sparse representation of the original problem.Below, we also discuss the general conditions required for this to be the case for our algorithm.We highlight that it with sparse blocks of variables given by B r .However, imposing positivity for each block is not always possible in this case, as the blocks themselves may have a sparse structure incompatible with positivity constraints.Concretely, there might be no X ≥ 0 matrix with sparsity structure given by X ij = 0 ∀ (i, j) / ∈ θ 0 ext .To address this in general, we instead assume all the blocks are dense.We can then define the initial completed sparsity by including all missing indices (i, j) within each block.In graph theoretical terms, we turn each connected component into a complete graph, i.e., every pair of vertices shares an edge, and use the non-zero entries of the resulting adjacency matrix to define θ 0 comp ; see Fig. 8, right panel.Denoting by B r the dense block matrices, we can define the first sparse approximation of X as such that X 0 ≥ 0 can be safely imposed through block-positivity, with no risk of rendering the SDP unfeasible.The next step is to analyze how this sparse structure affects the other constraints in the SDP.These procedures expanded our initial base sparsity θ 0 base by including a larger set of indices (and therefore, variables) of the original dense X, resulting in θ 0 comp .However, these additional variables may themselves appear in other linear constraints, involving even further variables we have not yet taken into account.These inter-dependencies between variables lead to a cascade of new constraints and variables that must be considered, i.e., the additional indices included in θ 0 comp specify variables which appear in constraints which were previously ignored, as they did not involve variables specified by θ 0 base .Therefore, we repeat the above procedures until a stable and completely self-contained set of indices naturally emerges, i.e., we define a new base sparsity θ 1 base := θ 0 comp , and repeat the process n + 1 times until θ n+1 base = θ n base , at which point we call this stable set the effective sparsity θ * for the problem.In our particular applications, the algorithm converges within 2 to 4 iterations.
Note that, by construction, any variable X ij for (i, j) ∈ θ * appears in the objective function directly, or if not, affects the variables in the objective either directly (through linear constraints) or indirectly (through positivity constraints and, in turn, through further constraints, and so on, recursively).The goal of the above algorithm is to automatically discover a set of variables and constraints which is self-sufficient, given any objective function and linear constraints for an SDP.
While the algorithm generally provides an outer approximation, it can also lead to an exact sparse version of the original problem if the unused linear constraints are homogeneous, which is the case for the problem we consider.To understand why, we define the set of unused constraints as If the unused constraints are all homogeneous, i.e., b k = 0 for all k ∈ K * ⊥ , then the original dense problem can be solved entirely within θ * , without affecting the objective function, as any sparse solution can be made into a feasible solution to the dense problem.This can be understood as follows.Let X * be a solution to the sparse problem, and X * ⊥ a matrix in the complementary space θ * ⊥ , containing all indices not appearing in θ * .Any constraints discarded by the algorithm (i.e., the indices in K * ⊥ ) will act entirely on θ * ⊥ .Then, as the objective function contains only variables within θ * , we may write a dense solution X = X * + X * ⊥ , such that with X * , X * ⊥ ≥ 0, and such that all constraints ⟨C (k) , X⟩ = b k uncouple onto orthogonal spaces.Now, if the linear constraints involving X * ⊥ are all homogeneous (b k = 0 for all k ∈ K * ⊥ ), we may set X * ⊥ = 0 directly, while simultaneously satisfying all constraints in the dense problem, and without affecting the objective function.Therefore, any sparse solution X * is also a valid solution to the dense problem, with the same value for the objective: the sparse problem is exact.Note that the diagonal indices needed to be included in θ 0 base in order to ensure X ≥ 0 is satisfied because of X * ≥ 0, as otherwise positivity of X could always be satisfied for any X * by choosing arbitrarily large diagonal entries in X * ⊥ .Despite the success of this algorithm in giving a sparse solution to our particular problem, in general, there is no guarantee that θ * will be sparse.It may be the case that the algorithm eventually includes all entries of X, in which case an alternative approach must be used, e.g., stopping the algorithm before a stable sparsity pattern is reached (generally resulting in an outer approximation), using a more refined completion procedure (e.g., block-wise chordal completions [28,74]), or exploiting additional structures of the original problem.Moreover, the sparsity of the original problem, and thus the efficacy of the above algorithm, relies heavily on the choice of basis in which the problem is represented.Fortunately, for our choice of parameters, the objective function and the resulting θ * were sufficiently sparse for the SDP to be numerically tractable.Furthermore, as the partial trace constraints tr O1 V N † ϕV N = 1 I 1 d ES ⊗ tr A1 V N † ϕV N are all homogeneous, this approach lead to an exact sparse representation of Eq. (15).Finally, we note that other techniques exist to formulate sparse SDPs [28,74], typically relying on the problem having an inherent "aggregate sparsity" (the joint sparse structure of F and all C (k) taken together), or being analytically amenable to a sparser representation by an appropriate change of variables.In our problem, no such structure is a priori apparent: only the objective is inherently sparse, with the linear constraints jointly involving all variables in a non-trivial manner.In such cases, our algorithm is then a suitable technique to obtain an "effective" sparsity focused on the objective function's inherent sparse structure, with the advantage of being more agnostic to the problem's overall structure.

C Maximum probability for closed systems
Here, we provide a proof for the analytical bounds for the d E = 1 case mentioned in Sec. 3. To this end, recall that, in the sequential measurement protocol, the probability of a sequence of outcomes can be written as with the measure-and-prepare map M a (ρ ES ) = tr S [ρ ES • (1 E ⊗ E a S )] ⊗ ρ 0 S .If d E = 1, the unitary corresponds only to a local rotation on the system, and can be embedded in the state ρ 0 S and measurements E a S .Thus, the maximum depends only on the state and measurements.To compute it, we first note that since both ρ 0 S and (E a S ) a are the same at each step and M a is a measure and prepare operation, all outcomes are independent and identically distributed.Writing q a = tr ρ 0 S E a S as the probability of outcome a, with a q a = 1, and using n a as the number of occurrences of a symbol a in a, we can write the probability as: p(a|d E = 1) = L ℓ=1 q a ℓ = a∈A q na a , with n a ∈ N, a∈A n a = L, q a > 0, and For simplicity, we assume that a contains every symbol of A at least once, such that n a , q a > 0. Otherwise, we could simply assume a smaller A where this is the case.The maximum probability ω a 1 can be found with standard techniques, such as Lagrange multipliers.Using the Lagrangian L = a∈A q na a − α a∈A q a − 1 , (77) we calculate the partial derivatives for each q a , and equate them to zero: ∂L ∂q a = n a q a p − α = 0, ∀a.
Since q a > 0 for all a, p > 0, we may rewrite this as: Summing over a and applying the constraints a q a = 1 and a n a = L, we obtain: Thus, the maximum value is Note that this solution is unique if q a > 0 and p > 0. The only alternative solutions would require that q a = 0 for some a, and thus p = 0 < ω a 1 .Therefore, this is indeed the global maximum.We emphasize that this direct calculation of ω a 1 crucially depends on the specific form of the instrument we employ.If the respective outcomes do not correspond to a M a of measure-and-prepare form, the joint probabilities p(a|d E ) do not factorize like in Eq. ( 75), and an alternative approach must be used.

Figure 1 :
Figure1: Diagram of the sequential measurement protocol, involving a partially characterized probe system and a noncharacterized environment, here separated by the dashed line.The sequence of measurement outcomes a = a1 . . .aL is obtained by repeated preparations of a probe state ρ 0 S , fixed and the same at every time step, which is left to interact with the environment, then followed by measurements (semicircles).The environment acts as a memory resource, capable of establishing long-term correlations between measurements.

Figure 2 :
Figure 2: The protocol with each time step having its own set of input and output Hilbert spaces, denoted by the vertical dotted lines.Note that the input and output spaces of the U and Ma are interleaved: U has inputs I ℓ and outputs O ℓ , but for Ma ℓ , inputs are O ℓ and outputs I ℓ+1 .

Figure 3 :
Figure 3: Schematic of all steps undertaken for computing an upper bound for ω a d E by formulating and solving the SDP problem.Detailed descriptions of each step are covered in Apps.A and B. The tractable/intractable labels, on the right, refer to the case d E = 2.

) 2
equality constraints, resulting in approximately 300 thousand constraints in the d ES = 4 and N = 3 scenario: a reduction to 28% of the original.This symmetric representation is explained in more detail in App.B.1.

Figure 4 :
Figure 4: The open system and environment scenario can be equivalently interpreted as the dilation of the scenario involving sequential measurements Ia on an isolated environment system.

Figure 5 :
Figure 5: Schematic of all steps undertaken for computing an upper bound for ω a d E by formulating and solving the SDP problem.The tractable/intractable labels, on the right, refer to the case d E = 2.

Figure 6 :
Figure 6: Diagram of the specific protocol discussed in this work, with each time step written in terms of distinct input and output Hilbert spaces, denoted by the dotted lines.Note that the input and output spaces of the U and Ma are interleaved: U has inputs I ℓ and outputs I ℓ , but for Ma ℓ , inputs are O ℓ and outputs I ℓ+1 .

Figure 7 :
Figure7: An adjacency matrix A and its corresponding graph G(A), with three connected components.Non-zero entries are depicted with the colors (and markers) of the component they belong to.Self-edges, corresponding to (i, i) entries, are omitted.

Figure 8 :
Figure 8: (Left) Under the permutation π = [(5273)(4)(16)], the matrix A of Fig.7becomes block-diagonal, with blocks possibly containing "gaps" indicating that not all pairs of vertices share an edge for that component.(Right) The blocks can be completed by adding the missing entries (i.e., edges).

Table 1 :
Results of the SDP for d E = 1, compared with the analytical maxima ω a 1

Table 2 :
Comparison between SDP upper bounds and best achievable values known for the maximum ω a d E , for d E = 2. Best known values were obtained through gradient descent (GD)

Table 3 :
Comparison between the number of variables and linear constraints in the symmetric problem vs. its sparse implementation, for the sequence a = 001.Our algorithm achieves a vast reduction in the number of variables and constraints, while still allowing exact solutions for our problem.