Bounding sets of sequential quantum correlations and device-independent randomness certification

An important problem in quantum information theory is that of bounding sets of correlations that arise from making local measurements on entangled states of arbitrary dimension. Currently, the best-known method to tackle this problem is the NPA hierarchy; an infinite sequence of semidefinite programs that provides increasingly tighter outer approximations to the desired set of correlations. In this work we consider a more general scenario in which one performs sequences of local measurements on an entangled state of arbitrary dimension. We show that a simple adaptation of the original NPA hierarchy provides an analogous hierarchy for this scenario, with comparable resource requirements and convergence properties. We then use the method to tackle some problems in device-independent quantum information. First, we show how one can robustly certify over 2.3 bits of device-independent local randomness from a two-quibt state using a sequence of measurements, going beyond the theoretical maximum of two bits that can be achieved with non-sequential measurements. Finally, we show tight upper bounds to two previously defined tasks in sequential Bell test scenarios.


Introduction
The correlations between outcomes of local measurements made on entangled quantum systems are known to exhibit a rich structure. Firstly, they are generally stronger than correlations attainable via classical resources, a phenomenon known as Bell nonlocality [1,2]. Secondly, sets of quantum correlations are known to contain both smooth and flat boundaries [3,4], and there exist correlations whose realisation requires infinite-dimensional entangled states [5], even in scenarios involving small and finite alphabet sizes.
All of this makes the problem of characterising, and optimising over, the set of quantum correlations a highly non-trivial and potentially undecidable problem. At the same time, being able to perform an optimisation over the entire set of quantum correlations is crucial for many areas of quantum information theory, principally in the field of device-independent quantum information, where quantum systems are treated as black-boxes and one makes no assumption on the physical dimension of the underlying state. A major breakthrough in this direction came with the discovery of the NPA-hierarchy [6,7], which provides a characterisation of the set of quantum correlations via a sequence of increasing tighter outer approximations, each expressed in terms of a semi-definite program (SDP). Consequently, the NPA hierarchy has become a vital tool for the study of deviceindependent protocols in the standard scenario in which they are usually considered, commonly referred to as a Bell test. There, a bipartite state is shared between two parties, each of which makes a number of local measurements in order to generate the data that is used in the protocol.
In recent years a number of works have also considered sequential Bell test scenarios, in which the parties make a sequence of local measurements that obey a time-ordered causal structure [8,9,10,11,12] (see figure 1). Such scenarios have been shown to be relevant for Bell nonlocality via for example the phenomenon of hidden nonlocality [10,11,12]. As a result, sequential measurement scenarios are known to provide an advantage in device-independent randomness Figure 1: A sequential Bell scenario in which both parties perform a sequence of two measurements on their halves of a bipartite quantum state. In this work we develop methods to characterise the sets of probability distributions that can arise in such scenarios involving arbitrary numbers of parties in each sequence.
certification [13] and, we expect, in many other device-independent protocols. Further to this, sequential measurement scenarios also play a role in demonstrations of contextuality [14] and Leggett-Garg type tests of nonclassicality [15].
It is thus very desirable to develop methods to characterise the correlations arising in sequential Bell test scenarios. In this work we show that such a characterisation is possible by augmenting the original NPA hierarcy with a finite number of additional linear constraints. This provides a sequence of outer approximations to the corresponding set of correlations that can each be defined via a suitable SDP, with analogous resource requirements and convergence properties of the NPA hierarchy. We then apply our hierarchy to several problems in quantum information. First, we investigate device-independent randomness certification. We show how to use the hierarchy to robustly certify over 2.3 bits of local randomness from a two-qubit state via a simple sequential measurement strategy, thus going beyond the theoretical maximum of two bits that is achievable in non-sequential Bell scenarios. We then show that the previously studied strategies for the simultaneous violation of two CHSH inequalities [9] and the violation of the sequential Bell inequality defined in [8] are both optimal for strategies of any dimension, up to numerical precision.
We note that the recent work [16] also describes a sequence of SDP relaxations for generic quantum-causal networks that can be applied to the sequential structures we consider; see the discussion for further information.

Quantum correlations
In a standard Bell scenario, two spatially-separated players perform measurements on their local share of a bipartite state, chosen according to some random inputs x, y = 1, . . . , m, and then collect the corresponding outputs a, b = 1, . . . , d. The resulting correlations P (a, b|x, y) are called quantum, P (a, b|x, y) ∈ Q, if they can be written tr[ρ A x a ⊗ B y b ] for some bipartite quantum state ρ and local measurement operators A x a and B y b . Here one can take ρ pure and the measurements projective without loss of generality, since any measurement on a mixed state can be realised as a projective measurement on a purification of the state [17]. Thus, Since the state and measurements appearing in (1) are potentially infinite dimensional, the problem of deciding membership in, or optimising over the set Q is highly non-trivial. Currently, the only general purpose technique to tackle such a problem is the NPA hierarchy [6,7], which we will recap shortly.

Sequential quantum correlations
In this work we consider sequential measurement scenarios, where a quantum system is subjected local measurements that obey a time-ordered structure (see Fig. 1). Consider first a single quantum system |ψ , of potentially uncountable infinite dimension, that undergoes a sequence of n measurements with inputs x i and outcomes a i . The first measurement outcome and its corresponding post-measurement state are described by sets of Kraus operators {K x1 a1,µ1 }. For finite dimensional systems the (sub-normalised) post-measurement state obtained after obtaining outcome a 1 takes the form with P (a 1 |x 1 ) = tr ρ a1|x1 , a1,µ1 K † a1,µ1 K a1,µ1 = 1, and where the sum over µ 1 is needed since we may have multiple Kraus operators associated to a single measurement outcome. Generally, for infinite dimensional systems one replaces the sum with an integral: where again P (a 1 |x 1 ) = tr ρ a1|x1 and a1 dµ 1 K † a1,µ1 K a1,µ1 = 1. Continuing this process for the entire sequence with inputs x = (x 1 , · · · , x n ) and outputs a = (a 1 , · · · , a n ), one finds To ease notation we have left the time-step dependence of the Kraus operators implicit. That is, {K x1 † a1,µ1 } and {K x2 † a2,µ2 } are in general different sets of operators, which is understood from the input/output indices. We define the set of sequential quantum correlations Q SEQ as those that arise from performing sequential measurements locally on a bipartite quantum state |ψ , i.e. P (a, b|x, y) ∈ Q SEQ ⇐⇒ P (a, b|x, y) = ψ|A x a ⊗ B y b |ψ , where the measurement operators A x a and B y b have the sequential structure (5). Can we define a hierarchy, analogous to the NPA hierarchy for Q, to characterise the set Q SEQ ? In this work we show how this can be achieved in an efficient manner, via a simple adaptation of the original NPA hierarchy.

The NPA hierarchy
Before explaining our method, we review the NPA hierachy [6,7]. The NPA hierarchy provides a sequence of tests, each of which checks membership in a set Q i ⊇ Q such that Q 1 ⊇ Q 2 ⊇ · · · ⊇ Q. To see how the NPA hierarchy works, consider some state and projective measurements |ψ , A x a , B y b with corresponding correlations P (a, b|x, . Define sets S k , consisting of the identity operator and all products of the operators A x a and B y b up to degree k; k is the i th element of S k . Next, define the moment matrix of order k, Γ k , with elements Γ By construction, the matrix Γ k has the following properties: i. Γ k satisfies a number of linear constraints stemming from the orthogonality properties (2), the normalisation of the measurement operators, and from the commutation of Alice's and Bob's operators. For example ψ|A x a A x a |ψ = 0 for a = a and ψ|[A x a , B y b ]|ψ = 0. We can write these constraints as tr[Γ k G i ] = 0 for some suitable fixed matrices G i .
ii. Γ k contains some elements that correspond to observable probabilities. For example ψ|A x a B y b |ψ = P (a, b|x, y). We write these constraints as tr[Γ k F j ] = P j , where F j are fixed matrices and P j denotes the corresponding observed probability. Similarly, taking S iii. Γ † k = Γ k and Γ k is positive semi-definite (see [6] for a simple proof). Imagine that we are given some other correlation P (a, b, |x, y) for which we want to test membership in Q. If P ∈ Q, there exists a state and measurements leading to P and a corresponding matrix Γ k satisfying the above conditions. We thus have a necessary condition for P ∈ Q: NPA hierarchy (level k): We denote the set of correlations with a positive solution to the above problem at level k as Q k .
Since the test is a necessary condition for P ∈ Q we have Q k ⊇ Q. As the test contains only linear and positive-semidefinite constraints, it can be cast as a SDP feasibility problem and solved efficiently (in the size of the matrix Γ k ) by a suitable solver. We thus have a sequence of SDPs, each of which provides a relaxation to the problem of deciding membership in Q. Since Γ k is a principle sub-matrix of Γ k+1 , one has Γ k+1 0 =⇒ Γ k 0 and so Q k+1 ⊇ Q k . Furthermore, one can perform optimization of linear combinations of the probabilities P j over Q k by removing the final constraint in (8) and defining a linear combination of the elements tr[Γ k F j ] as an objective function of the SDP. One can then obtain certified upper and lower bounds to the problem via duality theorems of convex optimisation. In practice, relevant problems can be tackled in this way at low levels of the hierarchy that are tractable on a desktop computer.
In principle, one can use other sets than S k to generate the moment matrix (7), with each choice giving a different relaxation to Q. Often, and in the examples we present later, we will use a level that is mid-way between level 1 and level 2, often called level 1+AB. This level is defined by the set This set defines the lowest level in the hierarchy of [18], and defines the so-called set of 'almost quantum' correlations [19]. As we will see, this set is often sufficient to get non-trival and even tight bounds to relevant optimisation problems.

NPA hierarchy for sequential correlations
First, let us state our main technical result regarding the characterisation of sequential quantum correlations.
1 ≤ k ≤ n and similarly for B y b .
Note that the projective condition (10) is in fact implied by the more general condition (12) and so one can equivalently take only (11) and (12) in the above.
Proof. We first prove that any correlations in Q SEQ can be realised using measurement operators satisfying (10), (11) and (12). This can be proven by considering Stinespring dilations of the sequential measurements; see appendix A. For example, for a sequence of two measurements (see figure 2), one finds that Alice's full measurement operator can be written which describes a projective measurement and thus satisfies (10). Here, U x1 † 1 acts trivially (with the identity) on the third Hilbert space in the product, and U x2 † 2 acts trivially on the second Hilbert space, as shown graphically in figure 2.
The constraint (11) is true for any set of measurement operators that are realised sequentially, as can be seen from (5). This is because it reflects the fact that the measurement operators that define the first k measurements (obtained by marginalising A x a over the last n − k outcomes) must be independent of the last n − k inputs, since these occur later in the sequence. The Stinespring dilation of the measurement operators described previously retains the sequential structure of the measurement, and so this constraint holds for the projective measurement operators as well. For example, by summing over a 2 in (13), one finds an operator that is independent of x 2 .
Finally, given the Stinespring dilations of the sequential measurements, property (12) follows from the orthogonality conditions of the Π aj 's. Consider again the sequence of two measurements in (13). One has for x 1 = x 1 and a 1 = a 1 (omitting the tensor products and the identity operator) where we have used [U x2 2 , Π a1 ] = 0. Generalising this for a general sequence we find (12) We now show the opposite direction, i.e. that any measurement operators satisfying (10), (11) and (12) admit a sequential realisation. In fact, to show this we need only conditions (10), (11). Consider a projective measurement with two input labels x 1 , x 2 and two output labels a 1 , a 2 defined by the measurement operators A x1x2 a1a2 , and assume that the measurement operators satisfy (11). The measurement can be realised sequentially as follows. The first device performs a measurement with Kraus operators K x1 a1 = a2 A x1x2 a1a2 . These operators are projective and independent of x 2 due to (11). The value x 1 is then sent to the second device (using a classical channel) and the second device measures K x2 a2 = a1 A x1x2 a1a2 . The measurement operator describing the full sequence is therefore (15) as required. In the first equality we have used the fact that the Kraus operators are Hermitian and projective by construction. The final equality follows from A x1x2 a1a2 being projective. In the above we made use of a communication channel that we have not explicitly modelled but can be realised sequentially ; see appendix B for a proof where this channel is explicit. The general result for sequences of any length can be achieved in the same fashion by applying the same technique inductively on the sequence.
Having established Fact 1 we may define a hierarchy of relaxations to Q SEQ as follows. Define moment matrices Γ k as in (7) using the projective measurement operators A x a and B y b (i.e. satisfying (10)), leading to analogous constraints to (8). At this point, the relaxation is equivalent to the standard NPA hierarchy, treating the sequences of measurements as single measurements. The constraints (10), (11) and (12) are linear constraints on the measurement operators and thus imply additional linear constraints on Γ k . One can therefore add these extra constraints in the form of extra fixed matrices G SEQ i to (8). This leads us to the following hierarchy for sequential quantum correlations Sequential hierarchy (level k): We call Q k SEQ the set defined at level k of this hierarchy. As with the NPA hierachy, the sets Q k SEQ can be optimised over via SDP solvers with a comparable resource overhead. Note that due to the normalisation of measurement operators and (11), some of the measurement operators can be written as linear combinations of others. In practice, this means that such operators can be excluded from the sets S k (thus increasing efficiency by decreasing the size of Γ k ) since their addition will result in linear dependencies between the rows and columns of Γ k , which do not affect the constraint Γ k 0. This process will also introduce further constraints on the now smaller Γ k . For example, if A x a and A x a are two measurement operators that have been removed from S k through this process, then by expressing them as linear combinations of the remaining elements in S k , the constraint (12) gives a polynomial operator identity that implies further constraints on the moment matrix.

Convergence of the hierarchy
Since the conditions (11) characterise precisely the set of sequential measurement operators and are linear constraints, one can use the same methods as in [7] to prove convergence of the hierarchy. In fact, one can extract a quantum state and measurement operators from the moment matrix Γ ∞ corresponding to the asymptotic level of the hierarchy. It is then straightforward to see that the added linear constraints G SEQ i enforce that the extracted measurement operators satisfy property (11), hence having a sequential realisation. Technically speaking, the convergence is proven to a setQ SEQ ⊇ Q SEQ . Here,Q SEQ is the set of sequential quantum correlations where the tensor product structure is replaced by the weaker constraint that Alice and Bob's measurement operators commute, i.e. p(a, b|x, y) ∈Q SEQ ⇐⇒ p(a, b|x, y where one has [A x a , B y b ] = 0 for all a, x, b, y and the measurement operators have the sequential structure (5). This commuting operator formalism is used in algebraic quantum field theory [20], and it is known that there exist scenarios for which Q SEQ ⊂Q SEQ [21].

Relaxations of local correlations
The hierarchy can also be used to define semidefinite programming relaxations to the set of 'time ordered local correlations' defined in [8]. Such correlations are those that can be obtained by a local hidden variable model that must respect the sequential causal structure of the scenario. The idea essentially the same as that presented in [22]; as we show in appendix C, any hidden variable model can be seen as a special case of a quantum strategy, where all measurement operators of the same party commute. For the sequential scenario, one therefore just has to add the additional linear constraints to Γ k implied by the relations [A x a , A x a ] = 0 and [B y b , B y b ] = 0.

Applications
In the rest of this article we use our methods to tackle a number of open questions in quantum information theory. Code to implement our method in python can be found in the GitLab repository https://gitlab.com/josephbowles/sequentialnpa.

Robust device-independent certification of more that 2 bits of local randomness
One of the most important applications of the NPA hierarchy is bounding the amount of randomness one can certify from an observed probability distribution in the device-independent setting [23,24,25,26,27,28,29,30]. A common figure of merit that is used is the local guessing probability, defined as the maximum probability with which an adversary-usually called Eve-could guess the value of one of the local outputs for a fixed local input. More precisely, consider the set of tripartite probability distributions p ABE (a, b, e|x, y) for Alice, Bob and Eve (where Eve has no input and the same output alphabet as Bob) that have a realisation in quantum theory, i.e. p ABE (a, b, e|x, y) = ψ|A x a ⊗ B y b ⊗ E e |ψ ⇐⇒ p ABE ∈ Q for some state and measurements. Define p AB (a, b|x, y) and p BE (b, e|y) to be the corresponding marginal distributions of p ABE (a, b, e|x, y). The local guessing probability for Bob's input y = y * given an observed probability distribution P obs (a, b|x, y) is the best probability that Eve could guess b given y = y * while simultaneously reproducing P obs when marginalising over her output. That is, = P obs (a, b|x, y) where |b| is the size of Bob's output alphabet. To define the local guessing probability in the sequential scenario one imposes that the distribution p ABE be realised by a sequential quantum strategy. That is, the local guessing probability for Bob's input y * given an observed distribution P obs (a, b|x, y) becomes G(y * ) = max p ABE e p BE (e, e|y * ) such that p AB (a, b|x, y) = e p ABE (a, b, e|x, y) = P obs (a, b|x, y), where the alphabet of e is the same as b and where p ABE has a sequential realisation, i.e.
p ABE (a, b, e|x, y where the measurement operators A x a and B y b have the structure (5). In appendix D we show how upper bounds to (18) can be obtained efficiently using our hierarchy.
In the standard Bell scenario, the local guessing probability (17) is always lower bounded by 1/d 2 , where d is the local Hilbert space dimension of the state used to obtain the observed correlations. This follows from the fact that extremal measurements acting on a Hilbert space of dimension d have at most d 2 outcomes [30,31]. Hence, the amount of randomness, expressed as the min entropy − log 2 (G) is always lower than 2 log 2 (d) bits. However, if one imposes the sequential structure on the local measurement one can no longer bound the number of outcomes of extremal measurements. In [13] Curchod et. al. use this to construct a protocol to obtain arbitrarily small local guessing probabilities from any two-qubit entangled pure state using a single Alice and a sequence of Bobs.
The construction in [13] has two disadvantages however. Firstly, the number of measurements that Alice makes grows quickly with the amount of certified randomness. For example, to certify more that two bits of local randomness one needs at least 14 measurements for Alice. Secondly, although the authors prove that the protocol is noise resistant in principle, precise upper bounds on the guessing probability could not be proven for any nonzero level of noise, and the method can therefore not be used in practice. In the following we show that one can use our hierarchy to certify more than two bits of local randomness in a simple sequential scenario using only two measurements for Alice. Moreover, we use our hierarchy to calculate upper bounds to the guessing probabilities in the presence of noise, thus making the scheme experimentally relevant.
To generate the observed correlations P obs we consider a scenario involving one Alice and a sequence of two Bobs (that we call Bob 1 and Bob 2 ), where Alice and Bob 1 share the two-qubit isotropic state with noise parameter η: (20) with |φ + = [|00 + |11 ]/ √ 2. Alice performs one of two measurements given by the observables cos µ σ z ± sin µ σ x , where tan µ = sin 2 and is a free parameter. Bob 1 performs one of two measurements. For y 1 = 0 he performs a projective measurement of σ z with Kraus operators |0 0| and |1 1|. For y 1 = 1 he performs the two outcome measurement defined by the Kraus operators The parameter controls the strength of the measurement: for = 0, the measurement is a projective measurement in the x direction; for = π/4 the measurement is non-interacting. Bob 2 performs one of three measurements. For y 2 = 0, 1 he performs a projective measurement of σ z or σ x . For y 2 = 2 he performs the symmetric 3-outcome POVM given by the measurement operators where v b2 = (sin( 2π 3 b 2 ), 0, cos( 2π 3 b 2 )). The inspiration for these measurements is the following. For y 1 = 1, the post measurement state shared between Alice and Bob 2 will be one of two partially entangled states, depending on the value of b 1 . The correlations obtained by performing the measurements for x, y 2 = 0, 1 on these states are known to self-test both of the corresponding state and measurements [32]. We expect (although we have not proven) that this implies that the state shared between Alice and Bob 1 is |φ + and the measurement for Bob 1 (21), which essentially implies that one must have p(b 2 |y 2 = 2) = 1 3 , leading to more than two bits of randomness. In figure 3 we present upper bounds to G(y * = (1, 2)) obtained in this way as a function of η, with = 7π/32 and calculated using level 1 + AB of the hierarchy. For low noise, one can surpass two bits of randomness. Moreover, for close to 4% noise (well within experimental reach) our strategy outperforms the non-sequential strategy where one performs the measurement that maximally violate the CHSH Bell inequality on the same state. We leave a more detailed analysis of noise including detector inefficiencies to future work.

Monogamy of nonlocality in sequential measurement scenarios
Consider a scenario involving one Alice and two Bobs, where each party has two inputs and two outputs, with inputs and outputs labelled by 0,1. The value of the CHSH Bell functional between Alice and Bob 1 is where P AB1 is the marginal distribution between Alice and Bob 1 . We may define the average CHSH Bell functional between Alice and Bob 2 as seq. strategy (level Q 1+AB SEQ ) CHSH strategy (NPA level 4) Figure 3: Blue: lower bound to the local randomness as a function of the noise parameter η for our sequential measurement strategy, obtained at level 1+AB of our sequential hierarchy. Red: corresponding local randomness obtainable with the same state in a non-sequential scenario using measurements that lead to the maximal violation of the CHSH Bell inequality, obtained at level 4 of the NPA hierarchy.
i.e. the CHSH Bell functional between Alice and Bob 2 , averaged over b 1 and a uniform choice of y 1 . The values of CHSH AB1 and CHSH AB2 are subject to monogamy due to both the monogamy of correlations and the sequential measurement constraints. Silva et. al. investigate this in [9], finding that for two-qubit systems, the optimal trade-off satisfies which can be saturated with an appropriate choice of measurements. We use the sequential NPA hierarchy to investigate this trade-off for systems of general dimension. We numerically maximise the value of CHSH AB2 conditioned on values of CHSH AB1 at level 1+AB of the hierarchy (see figure 4). We find that the values obtained match those of (24) up to the precision of the SDP solver. Thus, we conjecture that the strategies presented in [9] are optimal for any dimension. This is somewhat surprising since one may expect to gain an advantage from higher dimensional systems. For example, it would allow Bob 1 to communicate perfectly the value of y 1 and b 1 to Bob 2 , which in principle could increase the value of CHSH A,B2 .

Tight bounds on sequential Bell inequalities
In [8] Gallego et. al. present a Bell inequality (see equation 51 therein) that defines a facet of the set of correlations that admit a sequential time-ordered local model. The scenario involves one Alice and two Bobs, with each party performing one of two dichotomic measurements. The Bell inequality is constructed as follows. Define the correlators The inequality is given by where optimal (qubits) SEQ hierarchy level 1+AB NPA hierarchy level 2 The values obtained at level 1+AB of the sequential hierarchy match the optimal values for qubit strategies found in [9]. To show the effect of our new constraints, we plot the same bounds obtained via the standard NPA hierarchy at level 2, treating the two Bobs as a single party. and the bound 2 holds for sequential time-ordered local correlations. The authors show that it is possible to violate the inequality up to a value of 2 √ 2 using a sequential quantum strategy, providing a lower bound to the maximum violation using a sequential quantum strategy. Using our hierarchy at level 1+AB, we are able to certify a corresponding upper bound that agrees with the value 2 √ 2 up to the precision of the SDP solver. We therefore expect that the strategy given in [8] is optimal for this inequality.

Discussion
We have presented a general method to bound sets of correlations arising from performing sequential measurements on entangled quantum states. Our techniques can be seen as part of a collection of works that extend the original applicability of the NPA hierarchy to scenarios of restricted dimension [33,34] and entanglement [18], classicality [22], and modified causality [16? ].
We note that the techniques described in [16] can in principle deal with the sequential causal structures considered in this work. More specifically, one could use their method to treat 'quantum exogenous' variables by explicitly using the unitaries in (13) as operators in the generating set S k and defining a resulting relaxation. This method is significantly less efficient however since one needs to go to high levels (with large moment matrices) of the corresponding relaxation, and no convergence properties are proven. Given these points, it would thus be interesting to study whether our method could be extended to other causal scenarios, or be used to improve the efficiency of the method in [16]. For example, can our method be applied to give a convergent hierarchy for a generic causal structure involving latent quantum variables?
The NPA hierarchy is often used as a numerical method to bound fidelities in self-testing protocols [35]. One avenue of research would therefore be to investigate whether sequential measurement scenarios can improve self-testing fidelity bounds, by adapting the current method to our hierarchy, or to investigate the self-testing of quantum channels, to which sequential measurement scenarios are naturally related. Finally, it would also be interesting to use our method to investigate to what extent sequential measurements can improve other device-independent protocols. For example, can our advantages in local guessing probability be translated to practical improvements to rates in randomness extraction or quantum key distribution protocols? measurement can be realised as follows. Introduce ancilla spaces A 1 and A 1 and the ancilla state Define an operator U x1 1 via its action on the state |ψ |0 as One has φ| 0| 0|U x1 † 1 U x1 1 |ψ |0 |0 = φ|ψ for all |ψ , |φ . It follows that U x1 1 can be extended to a unitary operator acting on |ψ |0 . Measure the A 1 space in the |a 1 basis, obtaining outcome a 1 . Conditioning on outcome a 1 and tracing out the A 1 and A 1 spaces, one finds (4). We have thus reproduced the first measurement in the sequence. Introducing a fresh ancilla and repeating this for the second measurement in the sequence we find where the Π ai 's are projectors onto the corresponding spaces. The full measurement A x a is thus projective. We may repeat this process for a sequence of arbitrary length, and hence A x a can be taken to be projective without loss of generality.

B Detailed proof of fact 1
Here we give a proof of the reverse direction of fact 1, where we explicitly model the communication channel in the Kraus operators. Enlarge the system via an ancilla state so that the full state is |ψ ⊗ |0 . This space will be used as a communication channel in the following. The first device performs a measurement with Kraus operators K x1 x ). These operators are independent of x 2 due to (11). The second device measures (projective) Kraus operators K x2 a2 = x1,a1 A x1x2 a1a2 ⊗ |x 1 x 1 |. The measurement operator describing the full sequence is therefore The resulting correlations are as desired.

C Hierarchy for time ordered local correlations
Here we show how to modify the our hierarchy for sequential quantum correlations introduced in the main text in order to approximate the set of time ordered local correlations. Following [8], we say that the correlations from a Bell scenario are time ordered local if they can be described by the following model P (a, b|x, y) = λ dλρ(λ)p(a|x, λ)p(b|y, λ) , where the distribution p(a|x, λ) satisfies the following sequential no-signaling constraint for all values of λ a k+1 ,··· ,an p(a|x, λ) − p(a|x , λ) = 0 ∀a 1 , . . . , a k (33) ∀x, x s.t and similarly for p(b|y, λ). Correlations in the above form are the only ones that can be achieve with classical means in a sequential Bell scenario. It is well know that, by using the constraints in (33), the model (32) can be reduced to a sum over deterministic strategies, namely where the deterministic probability distributions split into a product and where the expression D(a k |x 1 , . . . , x k , λ) corresponds to outputting deterministically a k = λ(x 1 , . . . , x k ) depending on the strategy given by λ(.) and on all the inputs of previous boxes in the sequence (and similarly for Bob's strategy).
Determining whether a given distribution admits a decomposition in such a form is an instance of linear programming. Indeed, it implies checking if the distribution can be written as a convex combination of a finite amount of extremal points, represented by all the possible choices of deterministic strategies D SEQ (a|x, λ), D SEQ (b|y, λ). This linear program quickly becomes computationally intractable, since the number of extremal points increases exponentially with the number of inputs. Moreover, for each additional box in the sequence, the scaling is even worse than the equivalent multipartite locality scenario, because the possible strategies for each box depend on the inputs of all the previous boxes.
That is why we are interested in relaxing the linear program with an SDP, in a similar spirit as in [22]. In particular, the objective is to have a way of determining whether a distribution is sequentially local that, despite being a relaxation, works in many relevant cases and has a better scaling with the number of inputs/boxes. In the following we show how to do this by adapting our sequential hierarchy. The first step is to find a particular realisation of sequentially local correlations in terms of a quantum measurement on a quantum state; namely we look for realisation of the kind p(a, b|x, y) = tr(ρ AB A x a ⊗ B y b ) .
Now, it can be easily checked that correlations of the kind (34) can be reproduced by the following choice of state and measurements for Alice and Bob's side respectively It is also easy to verify that measurements in the above form satisfy the constraints (10) and (11). In particular, the second property follows directly from the fact that the deterministic strategies D SEQ (a|x, λ) and D SEQ (b|y, λ) satisfy the no-signalling condition (33). Moreover, since all measurement operators are diagonal in the |λ basis it follows that [A x a , A x a ] = 0 and [B y b , B y b ] = 0.
In other words, the set of time ordered local correlations can be obtained by means of locally commuting quantum sequential measurements. These commutativity conditions imply additional linear constraints on the moment matrix elements, expressed by some fixed matrices G LOC i . We can thus define the following hierarchy Hierarchy for sequential local correlations (level k) We call L k SEQ the set defined at level k of this hierarchy. By construction, each L k SEQ defines an outer approximation of the set of time ordered local correlations. The computational advantage gained by replacing a linear programming characterisation of the exact set with an SDP relaxation is clear: at each fixed level k, the number of variables involved in the moment matrix Γ k scales polynomially with the number of input choices for x 1 , . . . , x n and y 1 , . . . , y n , contrarily to the exponential scaling of the linear programming. This may allow one to probe scenarios which would otherwise be practically impossible using linear programming methods.