Probing the Non-Classicality of Temporal Correlations

Correlations between spacelike separated measurements on entangled quantum systems are stronger than any classical correlations and are at the heart of numerous quantum technologies. In practice, however, spacelike separation is often not guaranteed and we typically face situations where measurements have an underlying time order. Here we aim to provide a fair comparison of classical and quantum models of temporal correlations on a single particle, as well as timelike-separated correlations on multiple particles. We use a causal modeling approach to show, in theory and experiment, that quantum correlations outperform their classical counterpart when allowed equal, but limited communication resources. This provides a clearer picture of the role of quantum correlations in timelike separated scenarios, which play an important role in foundational and practical aspects of quantum information processing.

Specifically, one research direction has been to study how limited resources such as entropy [9], memory [10][11][12] or dimension [13][14][15] can lead to a quantum advantage over classical models. On the other hand, one can relax the local causality assumption in Bell's theorem, aiming to explain quantum correlations by classical models augmented with communication [16][17][18][19][20][21][22][23][24][25]. However, communication is typically only allowed for the classical system, leading to an unfair comparison between classical and quantum models, and it remains unclear to what extent these results hold when equal communication power is given to quantum mechanics [23].
Here we use a causal modeling approach to allow a fair comparison of classical and quantum models of timelike separated correlations. First, we show that for two time-ordered spatially separated measurements, augmented with a limited amount of classical communication, quantum correlations outperform their classical counterpart. Second, we show that, in contrast to spatial Bell-type scenarios [26], there are faithful classical causal models reproducing all the temporal correlations obtained from a series of projective quantum measurements on a single quantum system. However, we find that non-classical correlations can arise in this scenario, when considering a slightly weaker classical causal model, that is nonetheless strictly stronger than previous results [27], which are contained as a special case. Finally we derive Belltype inequalities for the above scenarios and demonstrate that they are violated in a photonic experiment.
1 Causal modeling and timelike Bellscenarios In the following we employ the formalism of Bayesian networks [28], which provides a natural framework for classical causal modeling. A central concept in this framework is that of a directed acyclic graph (DAG), which consists of a set of nodes, representing the relevant random variables 1 in the considered situation, and directed edges, representing the causal relations between those variables. A set of variables X 1 , . . . , X n forms a Bayesian network with respect to some DAG if and only if the probability distribution p(x 1 , . . . , x n ) can be decomposed as where P A i stands for the set of graph-theoretical parents of the variable X i (i.e. all variables that have a direct causal influence over X i ). Without loss of generality each variable can be understood as a deterministic function of its parents plus local noise U i that supplies potential randomness, x i = f i (pa i , u i ). This formalism thus enables a distinction between simple statistical correlations and actual causation by explicitly specifying the underlying mechanism generating the data.
Here we are interested in DAGs containing so-called latent variables, which are empirically inaccessible. In the context of Bell's theorem [1] these are also known as hidden variables. For any set of observed correlations, there are in general many DAGs with hidden variables that could have produced these observations. Among these, causal inference is particularly interested in those fulfilling the conditions of minimality and faithfulness. Minimality requires that, given two possible causal models, we choose the simplest one, capable of generating the smallest set of correlations (including the observed one). In turn, faithfulness, requires the causal model to be able to explain the observed data without resorting to fine-tuning of the causal-statistical parameters. In other words, any observed (conditional) independence should be a consequence of the causal structure itself, rather than a specific choice of parameters. Faithful (i.e. non-finetuned) models are therefore robust against changes in the causal parameters and thus the preferred choice.
To illustrate the last point, consider the paradigmatic causal structure of Bell's theorem in Fig. 1a. This structure intuitively reflects the causal assumptions of Bell's theorem, leading to the so-called local hidden-variable (LHV) models. First, the two parties are assumed to be spacelike separated, such that the correlations between the measurements outcomes A 1 We adopt the standard convention that uppercase letters label random variables while their values are denoted in lower case. and B can only be mediated via a common source Λ, implying that p(a, b|x, y, λ) = p(a|x, λ)p(b|y, λ). Second, it is assumed that the experimenters can freely choose which observables to measure (represented by the random variables X and Y ), independently of how the system was prepared, that is, p(x, y, λ) = p(x, y)p(λ). Note that these constraints implied by the causal model appear at an unobservable level since they explicitly involve the hidden variables Λ. Yet, they also imply observable constraints in the form of no-signaling conditions, expressed as p(a|x, y) = p(a|x) and p(b|x, y) = p(b|y), and Bell inequalities [1,29]. Two observers, Alice and Bob, each have the choice of two measurements represented by the random variables X and Y , respectively. The correlation between their measurement outcomes, modeled as random variables A and B, respectively, are mediated solely by a common cause in their past-the hidden variable Λ. (b) Bell's causal model augmented with one-way communication from Alice to Bob. The initial state of the joint system is specified by the ontic state Λ. First, Alice performs a measurement with setting x, obtaining outcome a. She then sends a message m to Bob, who performs a measurement with setting y, obtaining outcome b.
While quantum correlations obey the no-signaling conditions, they violate Bell inequalities [1,29] and are thus in conflict with the assumptions behind the causal structure in Fig. 1a. In order to maintain a classical causal explanation, the model in Fig. 1a must therefore be augmented with additional resources; something that can only be done at the cost of introducing fine-tuning [26]. For instance, the causal structure in Fig. 1b can reproduce all quantum correlations, but at the same time allows, in principle, for non-local correlations between X and B. Hence, in order to satisfy the no-signaling condition p(b|x, y) = p(b|y) the causal parameters must be chosen from a set of measure zero [28], a signature of fine-tuning.
Studying such non-local classical models can provide valuable insights into the relation between classical and quantum theory, and their applications [2]. However, at the same time such models lead to an unfair comparison, since allowing for communication makes not only classical, but also quantum models more powerful. In practice, it is more natural to assume a certain underlying causal structure, and ask what can be achieved with classical and quantum resources? Bell's theorem is a particular case of this broader question, referring to spacelike separated events. However, there are often situations where the events are timelike rather than spacelike ordered. Examples include central quantum information tasks, such as teleportation [30], superdense coding [31], and measurement-based quantum computation [32], as well as prepare-and-measure scenarios [13], sequential Bell scenarios [33,34] and a sequence of measurements on a single quantum system [27].
2 Non-classicality of timelike correlations augmented by communication Consider the scenario in Fig. 1b, where two distant parties, Alice and Bob, share pre-established correlations (represented by Λ) and are allowed oneway communication (the message M ). As shown in Ref. [16], a classical model of this form (for a large enough message M ) is enough to reproduce all the correlations obtainable from local measurements on two-qubit entangled states as in Fig. 1a, which are described by where M a x and M b y are measurement operators for Alice and Bob, respectively, and ρ is the density matrix describing the shared quantum state. If, in contrast, we impose the causal structure of Fig. 1b also to the quantum case-that is, Bob's measurement may depend on Alice's measurement setting and outcomethen the set of correlations is described by Note that Bob's measurement operator now explicitly depends on the values of X and A (via the message M ) 2 . Clearly, if there are no restrictions on the dimensionality of the message, every distribution of the form above can be also obtained by the classical hidden variable model in Fig. 1b,  (4) In fact, it is enough to choose m = x to reproduce all possible one-way signalling correlations. To see this, note that the quantum correlations arising from Eq. (3) are of the form p(a, b|x, y) = p(b|x, y, a)p(a|x), and that a can be made a deterministic function a = f a (x, λ). Hence a does not carry any information that is not already contained in λ and x.
Notwithstanding, this picture changes if we impose restrictions on the message sent from Alice to Bob. Consider that each party measures three dichotomic observables (i.e. x, y = 0, 1, 2 and a, b = 0, 1) and that Alice is bound to send a binary message (m = 0, 1). In this case, every classical model must obey the inequality [16] Furthermore, it was shown in Ref. [16] that this inequality also holds for correlations from local measurements on entangled quantum states, while it can be violated by more powerful no-signalling correlations. Hence, while one bit of communication is in this scenario sufficient for a classical model to simulate quantum correlations (without communication), it is not sufficient to simulate all possible no-signalling correlations. However, just like communication-augmented classical models become more powerful, so do quantum models. Specifically, local measurements on entangled quantum states augmented with one bit of communication can indeed violate inequality (5) [23], thus showing that under fair comparison, quantum advantage persists in such a timelike-separated Bell scenario. The two photons are entangled to arbitrary degree using a non-deterministic controlled-NOT gate (CNOT), based on nonclassical interference in a partially polarizing beam splitter (PPBS) [35]. Alice and Bob then perform local projective measurements on their share of the entangled state, which are implemented using a set of half-(HWP) and quarter-waveplates (QWP), a Glan-Taylor polarizer (GT) and single-photon counters (APD).
As an example, consider that Alice and Bob share a maximally entangled state |Ψ + = (|00 + |11 )/ √ 2). Alice performs local measurements with settings A 0 = A 1 =X, A 2 =Ẑ and encodes her measurement setting in a message m to Bob. For x = 0 she sends m = 0 and for x = 1 or x = 2 she sends m = 1. Assuming that all inputs are equally likely this message has an entropy of H(m) ∼ 0.92 and thus contains less than 1 bit of information. If Bob receives m = 0, he Experimentally we can test inequality (5) with qubits encoded in the polarization of single photons, see Fig. 2. Two single photons are first entangled using a non-deterministic controlled-NOT gate, and then distributed to Alice and Bob. By varying the input states this configuration can produce states with arbitrary degree of entanglement, quantified by the concurrence C [36], see Appendix B for more details. For simplicity, the message m has been directly taken into account in Bob's measurement basis. The experimental results in Fig. 3 show a clear violation of inequality (5), up to S 1bit = 6.66 +0.02 −0.02 . fix e d optimized Figure 3: Experimental test of inequality (5) for temporally-separated spatial correlations. The inequality is tested for a family of states with varying degree of entanglement as quantified by the concurrence. Data points include 3σ statistical error bars and the theory prediction is shown as solid lines. Data points in the grey region are compatible with inequality (5), while it is violated for data in the white region. Blue data and theory corresponds to the fixed measurement scheme outlined in the text, whereas the orange data and theory is obtained when the measurement settings are optimized for a given non-maximally entangled state, see Appendix B for more details.
3 Classical causal models for a sequence of projective measurements Above we have analyzed the correlations between two timelike separated parties, whose measurements follow an underlying time order. We will now focus on a series of projective measurements performed on a single quantum system. A measurement with setting x, obtaining outcome a, is described by a set of projective operators M x a , such that after the measurement the system, initially in state ρ 1 , is left in a state given by In a next time step a measurement with setting y, producing outcome b is performed, leaving the system in a state ρ 3 = ρ y b = M y b , and so forth.
Similar to a standard Bell experiment, the classical causal description of this scenario, illustrated in Fig. 4, involves a random variable Λ 1 -the ontic state [37]-which fully specifies the initial state of the system. The probability that a measurement x produces the outcome a is then given by p(a|x) = λ1 p(a|x, λ 1 )p(λ 1 ). Here we have explicitly used the measurement independence assumption [22,38] p(λ 1 |x) = p(λ 1 ), that the measurement setting can be chosen independently of how the system is prepared. After the measurement, the system will be in a potentially different state Λ 2 . For a projective measurement, Λ 2 is fully specified by the setting and outcome of the preceding measurement, and does not depend directly on the pre-measurement state Λ 1 , that is p(λ 2 |x, a, λ 1 ) = p(λ 2 |x, a). This implies that all correlations between Λ 1 and Λ 2 are mediated via the measurement. To see this, note that the quantum probability distribution after a series of three sequential measurements is given by In particular, any potential correlation between the measurement outcome c in the third time step with the measurement setting and outcome in the first time step (x and a, respectively), is screened-off by the intermediate measurement setting and outcome (y and b), that is, p(c|a, b, x, y, z) = p(c|b, y, z). This is similar to the no-signalling constraints arising in a Bell scenario and thus imposes restrictions to the classical causal models describing such a scenario. Specifically, a causal model that reproduces this independence without resorting to fine-tuning cannot contain a causal link of the form Λ 2 → Λ 3 , since such a link can generate unwanted correlations between the variables X, A and C (not mediated by Y, B). In fact, any model that contains such a link, can only satisfy the condition p(c|a, b, x, y, z) = p(c|b, y, z) by virtue of causal parameters chosen from a set of measure zero [28], that is, the parameters are fine-tuned in such a way that these correlations are hidden from the observational data [26].
Following the above description, the case of 3 sequential projective measurements on a single qubit can be represented in terms of the causal structure in Fig. 4. The temporal correlations p(a, b, c|x, y, z) compatible with this causal structure can then be decomposed as (with straightforward generalization to more time steps) p(a, b, c|x, y, z) = λ1,λ2,λ3 p(a|x, λ 1 )p(b|y, λ 2 ) p(c|z, λ 3 )p(λ 1 )p(λ 2 |x, a)p(λ 3 |y, b).
Note that without any restrictions on the dimensionality the hidden states Λ i can contain the full information about the measurement settings and outcomes of the previous time steps. This implies that, in contrast to the spacelike Bell scenario, all temporal correlations of the form of Eq. (6) can be faithfully reproduced by the classical causal model in Fig. 4. This naturally raises the question whether further restrictions on the classical causal model, might reveal a quantum advantage. Similar to the so-called prepare-and-measure scenarios [13], one might expect that quantum systems of a given dimension give rise to measurement statistics that cannot be reproduced Figure 4: A sequence of three projective measurement on a single qubit. (a) A single quantum system is subject to a sequence of projective measurements at times t1, t2, and t3 with settings x, y, z, and obtaining outcomes a, b, c, respectively. (b) A general classical causal model for the scenario in (a). Note that, although causal graphs are formulated without reference to any spacetime structure, here we have drawn the graph such that the horizontal direction can be identified with time.
by classical systems Λ i of the same dimension. Since restriction on the dimension of Λ i in models of the form of Fig. 4 would lead to non-convex sets that are technically very challenging to characterize [39,40], one typically considers convex relaxations. The resulting models contain the model of interest as a special case, but allow for shared randomness between the parties. For example, using the results of Ref. [17] we show in the Appendix A that classical hidden states of dimension 4 (two bits of classical information), together with shared randomness between the parties, are enough to reproduce all correlations from a series of projective measurements on a single qubit. For a fair comparison to a qubit, we then considered hidden states of dimension two and could not find a difference between quantum and classical correlations. In light of these results it would be very interesting to test the model of Fig. 4 with two-dimensional hidden states and no shared randomness. Due to the complexity of characterizing such non-convex sets, however, this remains an open question. Besides restrictions on the dimension of the hidden state, certain physical constraints and assumptions might naturally lead to weaker causal models for a sequence of projective measurements on a single qubit. A well-known example is the Leggett-Garg (LG) model [27] for testing macroscopic realism. This model is based on the assumption of noninvasive measurability, stating that it should be possible, in principle, to determine (measure) the state of a system without perturbing it. This is a special case of Eq. (7), where the hidden variable state is unchanged by the measurements and constant throughout the experiment, that is, λ i = λ, which implies that p(a, b, c|x, y, z) = λ p(a|x, λ)p(b|y, λ) p(c|z, λ)p(λ). (8) Note that this is the usual local hidden variable description encountered in Bell's theorem. Further, since the expectation values of a sequence of projective measurements on a single qubit are the same as for local measurements on a pair of entangled particles [6], quantum correlations also violate macroscopic realism, manifest as violations of so-called Leggett-Garg inequalities [41]. The result, however, relies critically on the assumption of noninvasive measurability, which is difficult to justify for a single quantum system. In fact, quite generally, the correlations obtained by sequential measurements on a quantum system will display signaling (e.g p(b|x, y) = p(b|x , y)) as opposed to the model in Eq. (8) that only allow for non-signaling correlations. In other terms, the Leggett-Garg model implies independence relations that are not observed in the experiment. In this sense, in order to test incompatibility with the Leggett-Garg model we do not need to take the strength of the correlations into account and simply look for violations of independence relations implied by the model [42].
This raises the question of whether one can find examples of temporal quantum experiments to which classical faithful causal models can reproduce all independence relations while at the same time being incompatible with the generated correlations. Next we show that this is indeed the case.

Quantum incompatibility with a weaker classical model for sequential projective measurements
From a causal inference perspective, given some observed probability distribution the goal is to find a faithful causal model reproducing all the (conditional) independence relations implied by this distribution. As a concrete example, consider a sequence of three projective measurements (see Fig. 4a) on an initial maximally mixed qubit state ρ 1 = 1/2. It is easy to verify that for arbitrary projective measurements, the measurement outcome of the third measurement is independent of the setting of the first measurement, i.e. p(x, c) = p(x)p(c). In this case, the causal model in Fig. 4b is not faithful any longer because it allows for correlations between the variables X and C. Instead, the most general causal model reproducing such independence relation is shown in Fig. 5 where, in comparison with the causal model in Fig. 4, the causal link between the variable B and Λ 3 is removed. Any distribution compatible with this model has a decomposition given by p(a, b, c|x, y, z) = λ1,λ2,λ3 p(a|x, λ 1 )p(b|y, λ 2 ) (9) p(c|z, λ 3 )p(λ 1 )p(λ 2 |a, x)p(λ 3 |y). for a wide range of relevant experimental scenarios. This includes arbitrary measurements on an initially maximally mixed state, as well as arbitrary initial states in the xy-plane of the Bloch-sphere for the measurements in the experimental implementation below. As detailed in Appendix A any correlations compatible with Eq. (9) must respect the inequality

Crucially, this classical model faithfully captures the observed independence p(x, c) = p(x)p(c) that holds
where the joint expectation values are defined as A x B y = a,b (−1) a+b p(a, b|x, y) and BC xyz = a,b,c (−1) b+c p(a, b, c|x, y, z). This inequality can, however, be violated by a sequence of projective measurements on any initial qubit state. For instance, choosing measurement settings A 0 =Ẑ, A 1 =X, B 0 = −C 1 = (Ẑ +X)/ √ 2, and B 1 = −C 0 = (Ẑ −X)/ √ 2 (whereX andẐ are the Pauli operators) obtains a value of S τ3 = 4 √ 2 > 6. Furthermore, for any initial state in the xy-plane of the Bloch sphere, the resulting probability distribution p(a, b, c|x, y, z) respects the independence relation p(c|x) = p(c) implied by the model under test. Through unitary rotations this implies that for any fixed initial quantum state, one can generate temporal correlations that cannot be explained by non-fine-tuned models.
Experimentally we test inequality (10) with photonic polarization qubits for an initial maximally mixed state, see Fig. 5a. For the intermediate measurement the system is coupled to a meter in the state |0 . A measurement of the meter in the computational basis {|0 , |1 } achieves a projective measurement of the system in a basis that is chosen by appropriate single-qubit unitaries applied to the sys-tem before and after the interaction. Notably, this measurement design can be straightforwardly generalized to more than three parties by replicating the von Neumann measurement. The experimental results in Fig. 6 demonstrate a clear violation of inequality (10) by a series of three projective measurements on a single qubit, achieving a value of S τ3 = 6.65 +0.01 −0.01 .

Discussion
Our results show, both in theory and experiment, that the discrepancy between the classical and quantum descriptions typically associated with spatial correlations extends to temporal correlation scenarios. The latter arise naturally in situations where spacelike separation cannot be practically guaranteed, or when a sequence of measurements is performed on the same quantum system. In the case of spatial correlations with an underlying time order, we have shown that a Bell-type inequality, designed to test classical models augmented with limited communication, displays a quantum violation if and only if the quantum protocol is also augmented with communication power. This highlights that a quantum advantage persists in a fair comparison of classical and quantum resources.
In a purely temporal-correlations scenario we have shown that, as opposed to spatial Bell scenarios [26], there is a faithful (non-fine-tuned) classical model capable of simulating all correlations from projective measurements on quantum states. It remains an open question whether these models are equivalent when imposing the same dimensionality to quantum and classical models. Notwithstanding, we identified quantum violations of inequalities associated with a slightly less powerful classical model that includes the Leggett-Garg model as special case [27]. We have experimentally observed such a quantum violation, which demonstrates a stronger form of non-classicality of the correlations arising from a temporal sequence of projective measurements. Furthermore, our experimental design allows for tuning of the intermediate measurement strength, which will enable future studies of non-Markovian models with some residual correlations between the ontic states Λ i over multiple time-steps [43].
Our theoretical results are based on a causal modeling approach and thus formulated without reference to a background spacetime structure. The direction of time is only implicitly deduced from the flow of information between the parties or sequence of measurements. Experimentally, the considered scenarios are not subject to the locality loophole. However, the results rely on the related assumption that there is no hidden communication channel-other than the ones implied by the models in Figs. 1b and 4-between the different time steps of the scenario under consideration. Practically, we also rely on a fair-sampling assumption to contend with imperfect detection efficiencies.
Temporal correlation scenarios play an important role in communication complexity problems [44] and in the search for a physical principle behind quantum nonlocality, such as information causality [45]. Our results thus provide an avenue towards a more systematic understanding of the quantum advantage arising in such scenarios, that not only may lead to new ways of processing information but also to new insights into the nature of quantum correlations.
A Classical models for a sequence of projective measurements on a qubit A.1 Classical simulation with hidden states of dimension 4 Note that the sequential measurement scenario can be mapped to an equivalent sequence of quantum teleportations [30]. In the first time step Alice measures the state in her possession generating some local probability distribution p(a|x), which she uses to prepare different states ρ x a that she sends to Bob via the usual quantum teleportation protocol. Then Bob measures the teleported system generating a probability distribution p(b|x, a, y) and prepares states ρ y b that are teleported to Charlie. Clearly, any classical (hidden variable) protocol simulating the statistics of the measurements performed on the teleported states will immediately lead to a simulation of the equivalent sequence of projective measurements. As shown in Ref. [17], this can be achieved using only two bits of classical communication between the parties and an arbitrary amount of shared randomness, implying that hidden states of dimension 4 and shared randomness are enough to obtain all quantum correlation obtained by a sequence of projective measurements on a single qubit.
For a fair comparison of classical and quantum resources we have also considered whether a classical message of dimension 2 (1 bit of communication) is enough to simulate all projective measurements on a qubit state. To that aim we considered a scenario with two time steps. If Alice has the choice between two measurements (i.e. x = 0, 1) then one classical bit can carry all the information about the input X and, using the argument in the main text, all the one-way signalling correlations in this case can be simulated using a classical message of dimension 2. We thus considered the case were X assumes at least 3 different values and fully characterized the set of classical correlations for dichotomic measurements (a, b = 0, 1) and with x = 0, 1, 2 and y = 0, 1. The polytope corresponding to this scenario is described by 864 inequalities (many being equivalent under the allowed symmetries given by party, input and output permutations). Considering qubit states and arbitrary projective qubit measurements we could not find any violation of these inequalities.
It is interesting to note that among these inequalities we find the dimension-witness inequality from Ref. [13], given by where B xy = b=0,1 (−1) b p(b|x, y). As shown in Ref. [13], this inequality can be violated by measurements on a qubit. There, however, a slightly different situation is considered, the so-called prepare-andmeasure scenario, where the variable X uniquely iden-tifies the state ρ x being prepared and to be measured in the second time step. In our case the states ρ a x to be measured in the second time step will depend on X, but also on the measurement outcome A, a feature that seems to be enough to preclude any violation of the inequality above (or any other defining the scenario). It would be interesting to derive inequalities for more general scenarios including more measurement settings or time steps to see whether any violations can be found.
A.2 Derivation of Bell-type inequalities bounding classical models for a sequence of projective measurements In order to derive Bell-type inequalities for the temporal scenario in Fig. 5, first note that all correlations compatible with such a model are also compatible with a model implying that p(b|a, x, y, λ)p(c|y, z, λ).
This follows from the fact that the arrow between the hidden variable at a given time step and the next measurement outcome (e.g. Λ 2 → B) can be replaced by directed arrows from the measurement choices (or measurement outcomes) of the previous step to the next one (e.g, X → B and A → B) plus a local noise variable (Λ B → B). Note that these noise terms are implicitly present in the model in Fig. 5, where they have been absorbed into Λ 1 , Λ 2 , Λ 3 . When making the above replacement, however, these local noise terms have to be introduced explicitly. They can then be combined into a variable Λ = (Λ A , Λ B , Λ C ) that acts as a common ancestor and source of randomness for all the measurement outcomes, see Fig. A1. In principle, one would further have to impose the independence of the local noise terms, that is p(λ A , λ B , λ C ) = p(λ A )p(λ B )p(λ C ). This, however, would define a non-convex set that is very difficult to characterize [39,40]. Instead, we consider a more general convex relaxation of this set which contains the case of independent noise variables as a special case. As detailed in Ref. [25], the probability distributions compatible with Eq. (A2) define a convex polytope that can be characterized in terms of finitely many extremal points. Given the list of extremal points one can resort to standard convex optimization software [46] to find the dual description in terms of linear (Bell-type) inequalities. For quantum correlations arising from a sequence of three projective measurements it follows that ABC = A BC . In other words, the full (three-point) correlator does not carry any information that is not already contained in the bipartite and single correlators. For this reason we focus our attention on inequalities involving the A X B Y Z C Λ Figure A1: A DAG for the tripartite temporal correlations scenario that contains the DAG in Fig. 4 of the main text as a special case. As shown in Ref. [25] the correlations compatible with this DAG define a convex set such that Bell-type inequalities can be derived using standard convex optimization software [46]. expectation values AB and BC , which, due to the structure of the problem, can be defined in different ways. In the following we use Since there is a causal link between X and B, the correlations between B and C can explicitly depend on X. We then compute the Bell-type inequalities in this subspace, one of which is inequality (10) in the main text.

A.3 Quantum violation of inequality (10)
In order to search for a possible quantum violation of inequality (10), we consider a single qubit in the initial pure state |Ψ = cos θ 0 |0 + e iφ0 sin θ 0 |1 .

(A7)
We generate this family of states by subjecting the separable state |D ⊗( √ 1 + κ|H + √ 1 − κ|V )/ √ 2 to a controlled-NOT gate, as shown in Fig. 2 in the main text. The parameter 0 ≤ κ ≤ 1 turns out to be equal to the concurrence C of the resulting bipartite state, such that the generated amount of entanglement can be easily controlled by the initial state of the meter photon.
Using the fixed measurement protocol in the main text, the states of Eq. (A7) achieve a value of S 1bit = 4 + (1 + κ) √ 2 in inequality (5). It is, however, possible to optimize the measurement settings such that ideally every pure entangled quantum state violates inequality (5). Specifically, this amounts to modifying the protocol in the main text such that for m = 0 Bob measures B 0 = B 1 = B 2 =X, while for x = 1 he measures B 0 = (X + κẐ)/ √ 1 + κ 2 , B 1 = (X − κẐ)/ √ 1 + κ 2 and B 2 = −X. Using this protocol, states of the form of Eq. (A7) achieve S 1bit = 4 + 2 √ 1 + κ 2 ≥ 6. This is reminiscent of the observation that, using optimized measurement settings, the CHSH inequality [47] can be violated by every pure twoqubit entangled quantum state [48]. In fact, inequality (5) contains a CHSH inequality for the settings A 1 , A 2 , B 0 , B 1 , which can be violated in the usual way, while the additional communication in our protocol can be used to maximize the remaining four terms up to the maximal value of 4 for every state of the form Eq. (A7). Hence, the expectation value of inequality (5) can be written as S 1bit = 4 + S chsh , where S chsh is the CHSH parameter corresponding to the first four terms of inequality (5).