Causation does not explain contextuality

Realist interpretations of quantum mechanics presuppose the existence of elements of reality that are independent of the actions used to reveal them. Such a view is challenged by several no-go theorems that show quantum correlations cannot be explained by non-contextual ontological models, where physical properties are assumed to exist prior to and independently of the act of measurement. However, all such contextuality proofs assume a traditional notion of causal structure, where causal influence flows from past to future according to ordinary dynamical laws. This leaves open the question of whether the apparent contextuality of quantum mechanics is simply the signature of some exotic causal structure, where the future might affect the past or distant systems might get correlated due to non-local constraints. Here we show that quantum predictions require a deeper form of contextuality: even allowing for arbitrary causal structure, no model can explain quantum correlations from non-contextual ontological properties of the world, be they initial states, dynamical laws, or global constraints.


Introduction
The appeal of an operational physical theory is that it makes as few unwarranted assumptions about nature as possible. One simply assigns probabilities to experimental outcomes, conditioned on the list of experimental procedures required to realise these outcomes. Ideally, such operational theories are minimal : procedures that cannot be statistically discriminated are given the same representation in the theory. Quantum mechanics is an example of such a minimal operational theory: all the statistically significant information about the preparation procedure is contained in the quantum state, and the probability of an event (labelled by a Positive Operator Valued Measure (POVM) element) does not depend on any other information regarding the manner in which the measurement was achieved (such as the full POVM). However, one of the most debated questions in the foundations of the theory is whether one can go beyond this statistical level and also provide an ontological description of some actual state of affairs that occurs during each run of an experiment. That is, a statement about the world that tells us what is responsible for the observed experimental outcomes.
The task of providing such an ontological model for quantum theory has proven to be exceedingly difficult. A plethora of no-go theorems exists that describe the various natural assumptions one must forgo in order to produce an ontological model that accords with experiment. One such caveat is non-contextuality. Ultimately an apriori assumption, non-contextual theories posit the existence of physical properties that do not depend on the way they are measured. There is a large literature discussing the various ways one may wish to cash out this notion more precisely. Broadly speaking, non-contextuality no-go theorems fall into two distinct categories. Kochen-Specker style proofs show that quantum measurements cannot be regarded as deterministically uncovering pre-existing, or ontic, properties of systems [1][2][3]. Spekkens style proofs, on the other hand, show that one cannot explain quantum statistics via ontological properties that mirror the contextindependence seen at the operational level [4][5][6][7][8]. While both approaches are well justified and have led to interesting and relevant results, our own definition of non-contextuality is more closely related to the latter. This particular view of non-contextuality can more broadly be seen as an analogue of the no fine-tuning argument from causal modelling [9], an analogue of Leibniz's principle of the Identity of Indiscernibles [4,6], and a methodological assumption akin to Occam's razor.
Non-contextuality no-go theorems are not merely of foundational interest but can also serve as security proofs for a range of simple cryptographic scenarios [10,11], can herald a quantum advantage for computation [12], and also for state discrimination [8]. Such results, however, require the assumption of a fixed background causal structure; at the very minimum, a single causal arrow from preparation to measurement. This leaves open the question of whether one can produce a non-contextual ontological model by allowing for a suitably exotic causal structure. Some authors attempt to explain quantum correlations by positing backwards-in-time causal influences [13][14][15][16][17][18][19], while others claim it is the existence of non-local constraints that does the explanatory work [20,21]. The rationale in both cases is that non-contextuality could emerge naturally in such models: physical properties might well be "real" and "counterfactually definite", but depend on future or distant measurements because of some physically motivated-although radically novel-causal influence. Such proposals do not fit neatly within the classical causal modelling framework, and so are not ruled out by recent work in this direction [9,22], nor by any of the existing no-go theorems.
In this paper, we characterise a new ontological models framework to prove that even if one allows for arbitrary causal structure, ontological models of quantum experiments are necessarily contextual. Crucially, what is contextual is not just the traditional notion of "state", but any supposedly objective feature of the theory, such as a dynamical law or boundary condition. Our finding suggests that any model that posits unusual causal relations in the hope of saving "reality" will necessarily be contextual. Finally, this work also represents a possible approach to how we ought to think of the generalised quantum processes of recent work [23][24][25][26][27][28][29][30][31][32][33][34][35][36][37]. It is clear that any ontological reading of such processes will have to contend with the spectre of contextuality.
The paper is organised as follows. In section 1 we present the traditional ontological models framework and clarify the rationale behind retrocausal explanations of quantum statistics. In section 2 we introduce and justify the four primitive elements required to define our operational model: local regions, local controllables, outcomes and an environment. In section 3 we define the three classes of operationally indistinguishable elements: events, instruments and processes. In Section 4 we characterise instrument and process non-contextuality according to these equivalence classes, and provide a generalised framework for a non-contextual ontological model. As this is the conceptual heart of our result, in Section 5 we clarify the scope and applicability of this framework via three examples. Using standard quantum theory and results from previous work [29,37], in section 6 we characterise an operational model that accords with the experimental predictions of quantum theory. Section 7 puts these elements together to prove that one cannot produce an on-tological model that is both process and instrument non-contextual and accords with the predictions of quantum theory. In Section 8 we consider the constraints imposed on ontological models when one only assumes instrument non-contextuality. We finish with a discussion.
1 An introduction to ontological models and retrocausal approaches.
The ontological models framework assumes that systems possess well defined properties at all times [18,38,39]. The starting point is the very general claim that all experiments can be modelled operationally as sets of preparations, followed by transformations, followed by measurements, all performed upon some physical system. The set of all possible preparations, transformations and measurements is regarded as capturing the entire possibility space of any experiment and can be associated with the operational predictions of a particular theory. For example, an experiment can involve choices of possible preparation settings (labelled by the random variable P ) and choices of possible measurement settings (M ) with associated outcomes (a). 1 An operational model then predicts probabilities for outcomes for all possible combinations of preparations and measurements: ∀a, M, P : p(a|M, P ). (1) Such probabilistic predictions should coincide with the operational predictions of the theory in question. For example, in the case of quantum theory each preparation choice is modelled as a density operator (ρ P ) on a Hilbert space associated to a quantum system (H A ). Similarly, each measurement choice M is associated with a positive operator valued measure {E a|M }, whose elements correspond to particular outcomes a. The probabilities predicted by the theory are: p(a|P, M ) = Tr(ρ P E a|M ). (2) An ontological extension of such an operational model further assumes that the system possesses well defined ontological properties between the time of preparation and measurement. Such properties are collectively known as the "ontic state" and typically denoted by λ. In the ontological models framework each preparation procedure P is presumed to select a particular ontic state λ according to a fixed probability distribution: µ P (λ), and each measurement choice is presumed to output a particular outcome according to a fixed response function: ξ a|M (λ). That is, (i) every preparation P can be associated to a normalised probability distribution over the ontic state space µ P (λ), such that µ P (λ)dλ = 1, and (ii) every measurement M , with outcomes a, can be associated to a set of response functions {ξ a|M (λ)} over the ontic states, satisfying a ξ a|M (λ) = 1 for all λ.
As the ontic states are not directly observed, the operational statistics are obtained via marginalisation and we have: where for quantum theory: ∀a, M, P : Tr(ρ P E a|M ) = ξ a|M (λ)µ P (λ)dλ.
The ontological models framework has been used in numerous works to clarify the manner in which quantum theory should be considered contextual [4][5][6][7][8]40]. The key assumption is that one can infer ontological equivalence from operational equivalence: for example, if two preparation procedures produce the same distributions over outcomes for all possible measurements, then any differences between them do not play a role in determining the ontic states of the system in question. Thus, the justification for why one can't distinguish between the two equivalent preparations at the operational level is because there is no difference between the role the preparations play at the ontological level. The view is that each use of a preparation device selects one from a set of possible ontic states according to exactly the same probability distribution in each run. Formally, if ∀M and outcome a p(a|M, P 1 ) = p(a|M, P 2 ), then both preparations specify the same distribution over ontic states: Similarly, if two measurements result in the same outcome statistics for all possible preparations then both measurements are represented by the same fixed response function. Formally, if ∀P and outcomes a p(a|M 1 , P ) = p(a|M 2 , P ), then both measurements specify the same distribution over ontic states: Thus, in a non-contextual ontological model one can account for operational statistics according to Eq. 3. Implicit in this model is the belief that the ontic state screens off the preparations from the measurements (a property also known as λ-mediation [18]). In [40] it was shown that one can use the assumptions above to derive a contextuality proof: no model of the form of Eq. 3 can explain the statistics of quantum theory.
In this approach to contextuality [4][5][6][7][8]40], one assumes that ontic states determine correlations according to some fixed causal order. Formally, this is captured by Eq. 3: the preparation is assumed to cause the selection of a particular ontic state λ according to a fixed distribution µ(λ), and the measurement choice does not alter this value but merely determines the outcome probability, also according to a fixed probability distribution. This leaves open the question of whether one can explain the contextuality of quantum theory by postulating an alternative, retrocausal ontology: If the future can affect the past, then the state λ could depend on the measurement setting M , and Eq. (3) would not be justified.
Generally speaking, retrocausal approaches posit the existence of backwards-in-time causal influences to explain quantum correlations. The stated appeal of such approaches is that the consequent explanations retain some element of our classical notion of reality: local causality, determinate ontology, and counterfactual definiteness. For example, Price and Wharton explain Bell correlations by including a "zig-zag" of causal influence, passing via hidden variables that travel backwards in time from one measurement event to the source and then forwards in time to the distant measurement event [14]. Although not explicitly stated, there is also one further assumption underlying these approaches: such causal influences follow some kind of law-like behaviour. That is, one would not expect the rules by which such retrocausal influences propagate, or backward-in-time states evolve, to be completely ad hoc.
As stated in the introduction, we follow the Spekkens-style approach and also define non-contextuality in terms of operational equivalences. Where we depart however, is in our particular choice of operational primitives. The usual primitives of preparations, transformations and measurements do not permit one to consider causal scenarios that move beyond the most simple causally ordered situations; in these models the notion of reality is defined in terms of properties that exist before a measurement takes place. The underlying ontology is therefore assumed to follow some ordinary causal structure, akin to the directed acyclic graphs of causal models [41]. In our model we wish to be able to consider more general situations, for example where we include any possible global dynamics, causal structure, space-time geometry or global constraints. In order to provide this alternative perspective we consider the primitive operational elements to be sets of labelled local regions, locally controllable properties and an environment.

Operational primitives
We define an operational model of any experiment to consist of local labelled regions (A, B, C, . . . ) where one can perform controlled operations that can be associated with outcomes. The regions align with concepts such as local laboratories, communicating parties (e.g. Alice and Bob) and local space-time regions (similar, e.g., to the operational framework of [42]). There is no apriori assumption that these regions be "fixed" or preassigned in some manner; they are simply labels for the locus of a set of controlled operations. Controlled operations generalise the notion of preparations, measurements, transformations, and can include the addition or subtraction of ancillary systems. Examples include the orientation of a wave-plate, the instigation of a microwave pulse, and the use of a photodetector. We call such local operations the local controllables. Each local controllable is represented asĨ X , where the superscript X = A, B, . . . labels the associated region. We consider outcomes as labels associated to the result of choosing a particular local controllable; the outcomes for region A are labelled a = 0, 1, 2, . . . . Examples include the number of detected photons, the result of a spin measurement or the time of arrival of a photon. We allow the outcomes to have infinite possible values as this enables us to use the same variable for local controllables that have different numbers of possible outcomes. In general however, we expect that only a finite number of such outcomes is associated with non-zero probability.
Finally, we consider all the possible properties that could account for correlations between outcomes in the local regions. These include any global properties, initial states, connecting mechanisms, causal influence, or global dynamics. We call this the environment,W . Note that in our operational model environments and local controllables are by construction always uncorrelated. That is, if we see a property change in relation to a choice of local controllable we label this as an outcome and do not classify it as part of the environment.
We can thus describe an experiment by a set of regions, outcomes, local controllables and an environment. If we consider a particular run of an experiment there will in general be a collection of outcomes that occur, one for each local region. One can associate a joint probability to this set of outcomes and empirically verify probability assignments for each possible set of outcomes. An operational model for such an experiment allows one to calculate expected probabilities: The operational model thus specifies a distribution over outcomes for local controllables  I A ,Ĩ B , . . . , and a shared environmentW , Fig. 1. Note that it should be possible to have ignorance over part of the environment and characterise this accordingly using the operational model. More explicitly, ifξ represents the part of the environment about which we are ignorant, then the operational probabilities given the known part of the environment are obtained by marginalising overξ: where the second equality comes from the assumption that the local controllables are uncorrelated with the environment. As a concrete example,W can describe the axis along which a spin-1 2 particle is prepared, whileξ represents whether the spin is prepared aligned or anti-aligned with that axis. 2 The marginal (10) then describes a scenario where there is some probabilistic uncertainty of the spin's direction i.e. which value of ξ occurs in any given run. Note that, for the particular case p(ξ|W ) = 1 2 , we obtain the maximally mixed state irrespective of the axis, making the variableW redundant. Such redundancies can be taken into account via operational equivalences.

Events
We say that a pair composed of an outcome and the respective local controllable (a,Ĩ A ) is operationally equivalent to the pair (a ,Ĩ A ) if the joint probabilities for a, b, c, . . . and a , b, c, . . . are the same for all possible outcomes and local controllables in the other regions B, C, . . . , and for all environmentsW .
We denote an equivalence class of such pairs of outcomes and local controllables as an event:

Instruments
We define an instrument as the list of possible events for a local controllableĨ A , where an event for some We say thatĨ A is equivalent toĨ A if they define the same list of possible events and we denote the equivalence class Note that our definition allows distinct instruments to share one or more events. Note also, our definition implies that the probability for an event doesn't depend on the particular instrument I, once we assume the event is possible given the instrument. This property we call operational instrument equivalence. 3

Process
The process captures those physical features responsible for generating the joint statistics for a set of events, independently of the choice of local instruments. A process is defined as an equivalence class of environments, W : A simple example is the spatio-temporal ordering of regions. It is clear that the operational statistics of events in regions A and B can be different for the following two causal orderings: (i) A is before B, (ii) B is before A; thus the respective environments,W (i) andW (ii) , will not be equivalent. On the other hand, for certain experiments we would not expect any difference in statistics for a simple rotation of the whole experiment by 45 degrees; these two environments will be represented by the same process W .
The above equivalences allow us to define a joint probability distribution over the space of events (rather than outcomes) conditioned on instruments (rather than local controllables) and the process (rather than the environment). As discussed above, this distribution satisfies operational instrument equivalence, which means that the joint probability for a set of events is either zero or independent of the respective instruments. Therefore, it can be expressed in terms of a frame function f W that maps events to probabilities and is normalised for each instrument: where, for a set S, χ S is the indicator function, χ S (s) = 1 for s ∈ S and χ S (s) = 0 for s ∈ S. Note that the indicator functions are necessary to make the whole expression a valid probability distribution, normalised over the entire space of events. Furthermore, and in contrast to similar expressions involving POVMs, the dependency on the instruments is crucial to allow for causal influence across the regions: Integrating over the events of, say, region A, can result in a marginal distribution that still depends on A's instrument and displays signalling from A to other regions. However, the fact that the dependency on the instruments is solely through the indicator functions tells us that the causal relations can be attributed to the particular events realised in each experimental run, rather than to the whole instruments (which include the specification of events that did not happen).
In other words, the event "screens off" the instrument: once the event in a local region is known, further knowledge of the instrument does not allow for any better prediction about events in other regions.

Ontological model
The purpose of an ontological model is to introduce possible elements of reality. Typically, one assumes that the ontology is encoded in a "state", representing the physical properties of a system at a given time. Here we shift the focus from states to more general properties of the environment that are responsible for mediating correlations between regions. We represent the collection of all such properties by a single variable ω, named the ontic process. We wish to clarify at this point that our ontic process captures the physical properties of the world that remain invariant under our local operations. That is, although we allow local properties to change under specific operations, we wish our ontic process to capture those aspects of reality that are independent of this probing. The interpretation of ontic processes and the relation with the usual notion of ontic states can be seen via the examples of the following section. Our ontological model specifies a joint probability for a set of outcomes, one at each local region, given the ontic process, the environment, and the set of local controllables. This joint probability reduces to the operational joint probability when the value of the ontic process is unknown: There are three natural assumptions one might require of an ontological model defined according to these operational equivalences: The ontic process mediates all the correlations between regions, thus ω screens off outcomes from the environment, and we have:

Assumption 2. Instrument non-contextuality. Operationally indistinguishable pairs of outcomes and local controllables should remain indistinguishable at the ontological level.
That is, for operationally equivalent pairs (a,Ĩ A ), (a ,Ĩ A ), which means that we can define a probability distribution on the space of events, conditioned on instruments and on the ontic process, in terms of a frame function f ω , such that: where χ is the indicator function, χ X (x) = 1 for x ∈ X and χ X (x) = 0 for x ∈ X, and f ω maps events to probabilities: and is normalised for each set of events that corresponds to a particular instrument:

Assumption 3. Process non-contextuality.
For operationally equivalent processesW ,W the assumption of process non-contextuality implies: and we can define a function g W (ω) that maps ontic processes to probabilities, given each process W : that is normalised for all ω: For an ontological model that satisfies the above three assumptions, the operational probability can now be expressed in terms of events, instruments and processes as: Although ontic states, as they are usually understood, are not represented explicitly in our framework, they are not excluded. In the following section we present three examples to illustrate how such ontic states, with or without retrocausality, can be represented in our model.

Causally-ordered models
As a first example, let us consider a classical, deterministic scenario (without retrocausality) with two regions, A in the past of B, each delimited by a past and future space-like boundary, see Fig. 2a. For a classical system, we can assign input states λ A I and λ B I to the past boundaries of A and B, respectively, and output states λ A O and λ B O to the respective future boundaries. As measurements can be performed without disturbance on a classical system, we associate the input state in each region with the respective measurement outcome: a ≡ λ A I and b ≡ λ B I . As local controllables we take deterministic local operations, defined as functions f X that map the input state of each region to the corresponding output: where X denotes the respective local region, A or B. Assuming ordinary dynamical laws, the input state at B can depend on the output at A through some function: The input state at A, on the other hand, does not depend on B, and thus has to be specified as an independent environment variable. The ontic process for this model is thus identified with the pair Indeed, knowing ω and the choice of local operations is sufficient to fully determine the measured outcomes: As the model is fully deterministic, and we have not introduced any redundant variables, there are no non-trivial equivalence classes. Explicitly, an event in region A (and similarly for B) is given by the pair a, f A , or equivalently by the input-output pair while the instrument is given by the collection of events given a choice of operation, which is just to say the instrument can be identified with the function f A . We see in this example that the ontology, as traditionally understood, lies in the event variables λ. These variables are not independent of the local controllables, because the event at B can depend on the operation performed at A. However, there is still an aspect of the ontology that does not depend on the operations: the initial state λ A I and the functional relation w B . It is this invariant aspect of the ontology that we call a process.

Time-travelling classical systems
General Relativity allows for space-time geometries with closed time-like curves, where a system can travel back in time and interact with its past self [43], thus providing physicallymotivated examples of scenarios that defy ordinary forward causality. Notably, qualitative analogies between quantum phenomena and classical time-travelling systems have been suggested [44], making the latter an interesting test-bed for generalised ontological models.
The example in the previous subsection can be readily generalised to a deterministic model of classical system near closed time-like curves by allowing the input state at A to depend on B through some function The process is now given by two functions, ω ≡ w A , w B , Fig. 2b, with the causally-ordered case recovered when one of the two is a constant. Compatibility with arbitrary local operations imposes constraints on the function w A , w B and, in the two-region case, it turns out that one of them has in fact to be constant [45,46]. However, for three or more regions, it is possible to find deterministic processes, with no constant component, that are still consistent with arbitrary local operations 4 .
Also in this case, the observed outcomes are fully determined once the process and the local operations are specified, as the unique fixed points a ≡  , f B , . . . ). Thus, from the perspective of ordinary ontological models, time-travelling systems appear contextual, since it is impossible to assign a "state" to any region independently of the operations. Nonetheless, the relation between events, captured by the process, does not depend on the operations. Thus, following the terminology introduced here, models such as the above are both instrument and process non-contextual. (As in the previous causally-ordered example, there are no non-trivial equivalence classes, so non-contextuality is straightforward.) More general models of classical closed time-like curves might impose restrictions on the accessible local operations 5 . Even more generally, one can consider models where instruments are not associated with local input-to-output functions but with more general sets of input-output pairs, is the state space associated with the past (future) boundary of the local region. In such models, a choice of instrument selects which pairs of input-output states are possible, while a deterministic process would determine, given all choices of instruments, which pairs are actually realised. Thus, in such models both the state in the past and in the future of a local region depend on the choice of instrument, thus again they are necessarily contextual from the point of view of traditional ontological models. Yet, they remain instrument and process non-contextual as long as deterministic processes are considered.
In the above deterministic examples ω-mediation is satisfied trivially, because ontic and operational processes coincide. This can be generalised to situations where we have only partial knowledge about the environment. For example, we might not have full knowledge of the initial state, but only know the temperature T of a thermal bath from which the state is extracted; or the system might get coupled to some external environment during the evolution from one region to another. In all cases, we end up with partial knowledge of the ontic process, expressed by some probability p(ω|W ) where W represents all relevant accessible information about the environment (the temperature of the bath or other noise parameters). The resulting probabilistic operational model naturally satisfies the property of ω-mediation, because knowing the temperature or noise parameters does not provide more information than already encoded in the ontic process, namely in the underlying microstates and functional relations.
Note also that our construction of an ontological model respects the mobility of the boundary between local instruments and processes that one sees in ordinary applications of quantum theory. As a simple example, consider a preparation P of a quantum system, followed by a measurement M . This can be modelled in three different ways: (i) with P as part of the environmentW , and M as an instrument associated to a single local region, (ii) with P and M as instruments in two distinct local regions, andW capturing both a channel between preparation and measurement, plus any additional information about the environment, or (iii) with both P and M characterising the instruments in a single local region and all other information about the environment modelled asW . For classical processes characterised as causal models, such a shift in perspective is formalised by the notion of "latent variables" [41]. An analogue notion of "latent laboratories" exists for quantum processes characterised as quantum causal models, and this formal structure likewise characterises the mobility of the boundary to which we refer [34].

All-at-once stochastic models
In the above examples of time-travelling systems, the ontic process (or at least certain aspects of it) can be understood as describing the dynamical evolution of systems between regions. Some retrocausal approaches attempt to provide an ontology for quantum mechanics that does not rely on any dynamical process; rather, one should consider all relevant events in space-time "at once". The appearance of quantum probabilities is then justified by the fact that the information available at a given time is not sufficient to fully determine the state of the system at all times (with the missing information possibly contained in some unknown boundary condition in the future). Our framework naturally captures all such models, because an ontic process need not be interpreted as a transformation: it simply represents the rule generating all relevant events given the local operations.
operations would simply represent a deviation from classical physics in the local region where the agent acts.
An instructive example is a toy model by Wharton [16], which represents a space-time scenario as a system in thermal equilibrium, with events at different space-time locations represented as states at different points in space. While having a clear ontological interpretation, this model offers qualitative analogies with quantum interference and, when analysed from an ordinary time-evolution perspective, displays an apparent contextuality. We show in detail in appendix A how (a generalisation of) Wharton's model fits within our framework and satisfies the requirements of ω-mediation and instrument and process non-contextuality.
The above three examples illustrate that it is indeed easy to represent many possible physical scenarios via ontological models that are both instrument and process noncontextual. Given the exotic nature of the latter two examples, it seems plausible that one could also produce such a model to explain quantum correlations. In the following sections we prove that this is not the case.

Quantum models
If one assumes that the results of experiments in local regions accord with quantum mechanics, then events can be associated with completely positive trace-non-increasing (CP) maps M A : A I → A O , where input and output spaces are the spaces of linear operators over input and output Hilbert spaces of the local region, [54]. Each set I A of CP maps that sums to a completely positive trace preserving (CPTP) map is a quantum instrument [55]: An instrument thus represents the collection of all possible events that can be observed given a specific choice of local controllable. Given these definitions of events and instruments, one can predict the joint probability over possible events using a generalised form of the Born rule: where M A , M B . . . are the Choi-Jamiołkowski representations of the local CP maps associated to particular events, and W is a positive, semi-definite operator associated to the relevant process [23,26,29]. We call W the process matrix, using the terminology of Ref. [29]. It is possible to derive this trace rule for probabilities by assuming linearity [29], or alternatively one can derive linearity (and the trace rule) from the assumption of operational instrument equivalence alone [37]. The significance of this latter derivation is that the condition of operational instrument equivalence is formally identical to that of instrument non-contextuality, with the only difference that the latter includes the ontic process. Therefore, for each ontic process ω, the corresponding frame function can be expressed as: where we introduced the short-hand notation M ≡ M A ⊗ M B ⊗ . . . and σ(ω) is a process matrix [37]. We now wish to show that the function g W (ω) that features in our ontological model, under the assumption of process non-contextuality, can be represented as where {η(ω)} ω∈Ω , Ω being the set of ontic processes, is a quantum instrument. It is common in non-contextuality no-go theorems (as well as in the process matrix formalism) to assume preservation of probabilistic mixtures as an assumption that is independent of the assumption of non-contextuality. Here we rather derive it from our assumption of process non-contextuality. Consider two classical variables ξ, W used to describe the process, where we already take operational equivalences into account. Following the earlier example, we can think of W as describing a cartesian axis, while ξ-the aspect of the process about which we are ignorant-describes whether a spin-1 2 particle is prepared aligned or anti-aligned to this axis. The operational probabilities given W , and the corresponding decomposition for ontological probabilities, are obtained by marginalisation: where, in the last identity, we use the fact that p(ω|W, ξ) does not depend on the local controllables (and thus on the instruments) due to the assumption of ω-mediation; and p(ξ|W ) is due to our assumption that the environment and local controllables (and thus process and instruments) are uncorrelated. Additionally, due to ω-mediation, we no longer need to condition the M A , M B , . . . directly on W and ξ. Now let us write W ξ for the process corresponding to the pair W, ξ. We have thus g W (ω) is convex-linear in W . The first identity in Eq. (38) comes from the fact that probabilistic mixtures of quantum processes are represented as convex combinations, thus W = dξ W ξ p(ξ|W ). This in turn is a consequence of the trace formula for operational quantum probabilities (which is itself a consequence of operational instrument equivalence): for all CP maps M A , M B , . . . Using standard linear-algebra arguments, g W (ω) can be extended to a linear function over W , leading to the representation (36), g W (ω) = Tr [η(ω)W ]. Positivity and normalisation of probabilities then imply Operators η(ω) as defined above can be understood as the Choi representation of CP maps that sum up to a trace preserving map, namely {η(ω)} ω∈Ω defines an instrument. In general, the CP maps η(ω) do not have to factorise over the separate regions, therefore it might not be possible to interpret them as local operations. This is not an obstacle, as such an interpretation is not required for the rest of the argument.

A quantum contradiction
To summarise the results so far, we have an operational rule for the predictions of the joint probabilities of outcomes according to quantum theory: We also have an ontological model for predicting the joint probabilities under the assumptions of ω-mediation, instrument non-contextuality and process non-contextuality: which given the results of the last section, becomes: If this accords with quantum predictions then we should have: It has been noted [40] that a decomposition of the form (44) is akin to the expression of expectation values in terms of quasi-probability distributions [56,57]. However, the non-contextuality assumptions force both f ω and g W to be ordinary, positive probability distributions. It is well known that quantum expectation values cannot be expressed in such a way. It is however instructive to consider an explicit contradiction within the present process framework.
From (46), which follows from the fact that M span a complete set of the joint linear space Eq. (48) tells us that W is a convex mixture of the operators σ(ω). If W is extremal, namely if it cannot be decomposed into a non-trivial convex combination of other processes, then W ∝ σ(ω) for g W (ω) = 0. Denoting the support of g W by Ω W , i.e., ω ∈ Ω W ⇔ g W (ω) = 0, we have W ∝ σ(ω) ∀ω ∈ Ω W for an extremal W .
Consider now a process W that can be decomposed into two distinct mixtures of two sets of extremal processes W j and W k (we take discrete sets for simplicity): Since g W is convex-linear in W , we have g W = j q j g W j . This means that, for every ω ∈ Ω W , there must be a j such that g W j (ω) = 0. In other words, Ω W = j Ω W j . By a similar argument, we have that Ω W = k Ω W k . We thus see that each convex decomposition of W into distinct extremal processes corresponds to a partition of W 's support into the extremal processes' supports. This in turns implies that each ω belongs to both Ω W j and Ω W k , for some j and k. As we have seen, this would imply However, one can find many examples where no process in one decomposition is proportional to any process in the other. This implies a contradiction and shows that a decomposition such as (46) cannot exist for all CP maps and quantum processes. As a particular example to show the above contradiction, consider a process W corresponding to a quantum channel from a region with a two-level output, A O to a region with a two-level input, B I : formed from the following two combinations of extremal processes: where X, Y and Z are the Pauli matrices, U is a unitary, and we used the notation [[V ]] := rs |r s| ⊗ V |r s| V † for the Choi representation of a unitary V . It is clear that no W j is proportional to any W k for an appropriate choice of U , and we have a contradiction with (50).

Process-contextual extensions of quantum theory
Contextuality proofs do not always require both preparation and measurement non-contextuality. Indeed, many no-go theorems focus on the requirement of measurement non-contextuality alone. Interestingly, even without preparation non-contextuality, measurement non-contextuality imposes strong constraints on the ontology. Essentially, any non-contextual ontology must reduce to the Beltrametti-Bugajski (BB) model [58], which identifies elements of reality with the quantum wave function. An important consequence of this result is that no measurement non-contextual extension of quantum theory exists that can provide more accurate predictions of experimental outcomes [5].
It is thus interesting to consider dropping the requirement of process non-contextuality in our framework, leaving instrument non-contextuality as the sole requirement. It is easy to see that instrument non-contextual, process-contextual models are possible. An example is a model where the ontic process is directly identified with the quantum process: Operational probabilities are then recovered simply by using the "quantum process rule", Eq. (34), for the ontic frame function: This "crude" ontological model is similar to the BB model. A difference is that the BB model only identifies pure quantum states with elements of reality, while in Eq. (60) any process counts as ontic, including those corresponding to mixed states or noisy channels. One could refine the above model by only allowing an appropriately defined "pure process" to be ontic. (See however Ref. [59] for possible ambiguities regarding such a definition.) A similar non-extendability result to that of [5] also holds in our case. As already discussed above, the only instrument non-contextual frame function must be given by Eq. (35), namely to every ontic process ω is associated a process matrix σ (ω). The implication is that an instrument non-contextual hidden variable cannot provide more information than that contained in a process matrix. We thus conclude that quantum mechanics admits no non-trivial, instrument non-contextual extension. Indeed, this result holds independently of any assumptions one may make about the causal structure of a possible underlying ontology. Therefore, even instrument non-contextuality alone poses strong restrictions on hidden variable models that attempt to leverage exotic causal structures to recover a non-contextual notion of reality.

Discussion
We have shown that it is not possible to construct an ontological model that is both instrument and process non-contextual and also accords with the predictions of quantum mechanics. We take both forms of non-contextuality to be very reasonable assumptions if one wishes some aspect of "reality" to be describable in a manner that is independent of the act of experimentation. Thus our work shows that models that posit unusual causal, global or dynamical relations will not solve a key quantum mystery, that of contextuality.
Standard no-go theorems show that quantum theory is not consistent with ontological models where the properties of a system exist prior to and independently of the way they are measured. A possible interpretation is that properties do exist, but they are in fact dependent on future actions. Here we have shown that hidden variable models that attempt to leverage such influence from the future have to violate some broader form of non-contextuality. This new notion of non-contextuality refers to the rules that dictate how local actions influence observed events, rather than to states and measurements.
We have introduced three assumptions in order to analyse non-contextuality in such scenarios where influence from the future is possible. The core idea is captured by the assumption of ω-mediation. This states that an agent's actions should effect the world according to rules or laws that do not themselves depend on such actions. Indeed, if the rules changed every time we changed how we intervened on the world, we would not call them "rules" to begin with. In the context of ontological models, this assumption allows one to assume that experiments uncover an aspect of nature that is unchanging.
The second assumption, instrument non-contextuality, states that operationally equivalent interventions should not produce distinct effects at the ontological level. We have shown that this assumption is compatible with scenarios that would be interpreted as contextual when viewed from an ordinary, time-oriented perspective. For example, we have illustrated that time-travelling models where states can depend on future interventions satisfy the requirement of instrument non-contextuality. Despite this generality, instrument non-contextuality is nonetheless sufficient to rule out all non-trivial hidden-variable extensions of quantum theory: Any additional variable that could provide better predictions for quantum statistics than ordinary quantum mechanics must be instrument contextual.
Our third assumption, process non-contextuality, states that the probabilistic assignment of the ontic description of an experiment should reflect the operationally equivalent arrangements of the same experiment. Here by "experiment" we mean the specification of the set of conditions under which agents can operate. That is, we include in this description all aspects of a physical scenario other than the choices of settings and the observed outcomes. Such aspects include what kind of systems are involved, the laws describing such systems, boundary conditions, etc. We have shown that no ontic model can satisfy this requirement of process non-contextuality, including those that directly identify quantum objects as ontic.
The distinction between background environment variables and locally controllable settings that one makes when describing experiments using our approach is of course mobile. What counts as a freely chosen parameter in one situation can count as a fixed parameter in another. Our result is robust under such a shift in perspective: no matter how we decide to describe a quantum experiment, it will not be possible to find an ontic representation for it that is both instrument and process non-contextual.
Finally, we draw attention to the fact that our results rely on complete matching to the operational predictions of quantum theory. This is a recognised feature of all ontological models that rely on operational equivalence classes and leaves open the possibility that particular ontological models might allow for some experimentally testable, different predictions. Thus, for proponents of particular retrocausal models, the door remains open to develop their ontology such that they can predict some possible deviation from quantum statistics. In the face of such statistical deviation, the possibility of a non-contextual ontological model remains open.

A Wharton's retrocausal toy model
The core idea of the model is to represent a system across space-time, analogously to the representation of a system in space in thermodynamical equilibrium. Rather than being determined by dynamical evolution, the states at each point in space-time are known with some probability. This is similar to how macrostates can be considered as providing probability distributions for microstates. In this model each event in space-time is represented as a site, labelled by the index j, within a lattice. At each site j we can have a particle in a state λ, whose possible values are assumed to be ±1 for simplicity. The entire system across space-time is treated "all-at-once" in the same way one would treat a spatially extended system, where each site represents a different location in space. The system is then associated with a Hamiltonian H = − <i,j> λ i λ j , where the sum is taken over nearest-neighbours according the geometry of the lattice. All we know about the system is that it is in a thermal state, with inverse temperature β, thus the probability for a certain configuration λ := (λ 1 , λ 2 , . . . ) is p( λ|β) ∝ e −H( λ)β . If we learn the state of one of the sites, we need to update the thermal distribution by conditioning on the observed value. However, since the model is supposed to represent a space-time configuration, the sites we can observe at any given time are restricted.  [16]. Each node j represents a location in space-time where a system can be found in a state λ j , j = 1, 2, . . . . The state of the entire system is sampled from a thermal ensemble, defined by a Hamiltonian containing interactions between nodes connected by an edge, where each node is treated as a site in a spatially distributed lattice. (a) Observing the system at a given time reveals the state at one of the nodes, e.g. λ 1 = 1, upon which the probability assignment at the other nodes has to be updated. (b) The analogue of an interference experiment is represented by the insertion of an additional node in the future, which results in a different thermal state and thus in a different probability distribution for all states. An observer at an earlier time that ignores this possibility might interpret such a dependence from future actions as a form of contextuality.
Retrocausality is introduced by assuming that performing a measurement at any given time can result in the introduction of a new site, thus changing the geometry of the system, Fig. 3. Assuming a thermal state with a given temperature, the two geometries result in different probability distributions for the microstates. If the system is interpreted as time oriented, and the influence of the future intervention is ignored, then one might be led to the conclusion that it is impossible to assign non-contextual states of reality to the system. The analogy is seen with a quantum interference experiment, where a measurement in the future is assumed to change the conditions that determine the state of the system in its past. If the influence from the future measurement is included, argues Wharton, then one might be able to recover an ontic interpretation of quantum mechanics, where the quantum state simply represents lack of information about the underlying state. This model is interesting because causal influence is not mediated by an explicit mechanism, as opposed to ordinary dynamical systems including the time-travelling examples in the main text. Nonetheless, it is possible to fit this model into our general ontological framework, where the observed probabilities are mediated by an ontic process. Crucially, the model turns out to be both instrument and process non-contextual, showing that approaches of this type cannot reproduce the predictions of quantum theory.

Classical systems on an arbitrary geometry
We consider a more general version of Wharton's model, with arbitrary geometry, an arbitrary set of discrete values for the states, and arbitrary local interactions. Consider a set N of |N | = N sites. Each site j ∈ N can contain a classical system whose state λ j can take value in some set S j . The state of the entire system is thus described by a vector λ ≡ (λ 1 , . . . , λ N ) ∈ S := × j∈N S j .
A Hamiltonian function H( λ) is defined on the system. We assume that this Hamiltonian is local, namely it is a sum of terms representing local interactions between sites. A subset of sites e ⊂ N contributing to an interaction term is called "hyperedge" and the set E of hyperedges defines a "hypergraph" over N . The Hamiltonian can thus be decomposed as where each term h e is function on the space L e := × j∈e S j . By convention, we identify the state λ j = 0 of system j with the "empty site", namely with no system in it. This implies that, for every hyperedge e containing j, In other words, each interaction term vanishes when one of the sites on which it acts is empty. In this way, "different geometries" corresponding to additional or missing sites, are simply represented as a particular choice of states in a fixed geometry. In our terminology, each site j represents a (space-time) region and each state λ j represents an event. We can interpret each event as "ontic"; however, since we assume that each ontic event can also be observed, ontic and operational events are identified.
No control. Before considering the possibility of interventions, it is useful to see how our framework applies to the simpler scenario with no interventions. In this case, "process" is synonymous with "state". Thus, a deterministic process is simply a specific microstate λ, while a general probabilistic process is a probability distributions P λ . For the case of a thermal state, where the only information we can access about the environment is the inverse temperature β, the operational process is probabilistic, given by the Gibbs distribution Since there are no irrelevant environment variables in this model, questions about contextuality do not arise: each value of the environment variable corresponds to just one process (i.e. to one probability distribution for the "events"). Therefore, at the formal level, we could identify the operational process with the ontic process; the resulting model would be process non-contextual by construction (instrument non-contextuality is even more trivial here, because there is no choice of instruments). A more natural ontological model is a deterministic one, where each ontic process (or ontic state) is identified with one microstate λ. As required by the general formalism, the operational process provides a probability distribution over the possible ontic processes, and knowing the ontic process makes knowledge of the operational process redundant (the ontic process "screens off" the operational one), in agreement with the property of ω-mediation.
Local control. Local instruments are defined as subsets of events and represent the possibility of local control. Thus, in general, the possible sets of instruments at a site j ∈ N corresponds to a subset I j ⊂ S j . As a simple case-study, we consider the scenario where the only control is inserting or removing a site, as in Wharton's example. Therefore, for each region j ∈ N there are two possible instruments: A prominent feature of this example is that instruments are disjoint sets, so there is never an event that belongs to two distinct instruments. This ensures the instrument non-contextuality of the model. As in the no-control case, a deterministic process corresponds to a specification of all events, while a probabilistic process corresponds to a probability distribution for the possible events. The possibility of control means that the events now can depend on the instruments, so the process must encode this dependency. Thus, a deterministic process is given by a set of functions such that, for each j ∈ N , Condition (67) simply says that if we choose to remove the system from site j (I j = I j 0 ), then there will be no system at site j (λ j = 0), while if we choose to insert the system (I j = I j 1 ) then the system will be there, in one of its possible states (λ j ∈ S j \ {0}). For a probabilistic process, dependency on the instruments is encoded in a conditional probability distribution p( λ | I). The probabilistic version of the consistency condition A more compact way to represent a (non-contextual) process is through a frame function, which can be defined piece-wise as: f ( λ) := p( λ | I) for λ 1 ∈ I 1 , . . . , λ N ∈ I N . (69) The consistency condition (68) is then expressed as Let us pause for a moment on this definition. The frame function is defined as a function f : S → [0, 1]. That is, it assigns a probability to each N -tuplet λ = (λ 1 . . . , λ N ), without needing any additional information about the instruments. In the case we are considering, different instruments are non-overlapping sets of states. Therefore, if we know the state λ j , we automatically know the instrument I j and requiring "independence from the instruments" is completely trivial: either the site is there, and the instrument is I j 1 , or the site is not there, and I j = I j 1 . Once we know the state, there is nothing more the instrument can tell us. Technically, each value I defines a subset in the domain of f , and the value f takes in each of these subsets is given by the conditional probability (69). The non-overlapping of different instruments is crucial for this construction: if the same λ could belong to two different instruments, we would not know which value of p( λ | I) to use to define the frame function. For overlapping instruments, the existence of a frame function is equivalent to the assumption of instrument non-contextuality.
Processes for the thermal state Once again, the only environment variable is the inverse temperature β, which thus parametrises the operational processes. Given the above discussion, it should be clear that, for each β, we can write a conditional probability in the form (70), where the frame function is defined as in Eq. (69), with probabilities provided by the Gibbs distribution (64). Explicitly, where Z(β | I) := λ∈ I e −βH( λ) .
Let us stress that, from the perspective of our framework, we might as well stop here: we already have a model that is both instrument and process non-contextual. The point of our theorem is to see if it is possible to write a given operational model in terms of an underlying non-contextual model; if that is possible, the operational model cannot reproduce the predictions of quantum mechanics. In this case we already have a non-contextual model, so we know it cannot reproduce quantum mechanics. Note that the theorem does not rely on any interpretation we might assign to ontic processes, events etc; it is simply a statement about properties that an ontological model can or cannot have. For the sake of completeness, and since we would more naturally associate ontology with determinism, we can write explicitly how a deterministic process model looks in the present case study. Recall that a deterministic process is a (multi-valued) function ω ≡ (ω 1 . . . ω N ) from the instruments to the events. For notational convenience, we can identify the two possible instruments I j x j at each site j with their label x j ∈ {0, 1}. Therefore, a choice of instruments is given by N binary variables x ≡ (x 1 , . . . , x N ) and a process is identified with 2 N N -tuples { a x } x∈{0,1} N , where a x := ω( I x ). A deterministic ontological model is thus defined by a conditional probability distribution that reproduces the operational probabilities via ω-mediation: where the sum should be understood as and the "ontic" probabilities are given by We now show that the conditional probabilities for the ontic process in our thermal model are given by where the operational frame function f β is given by expression (71). To see that the conditional probabilities (78) provide an ontological model for the original thermal-state model, one can verify that, by putting together expressions (78) and (76) into Eq. (74), one indeed obtains the operational probabilities (70). Explicitly, where we used the normalisation of the frame function, λ∈ I x f β ( λ) = 1 for every collection of instruments I x ≡ I 1 x 1 , . . . , I N x N .