Classical causal models cannot faithfully explain Bell nonlocality or Kochen-Specker contextuality in arbitrary scenarios

In a recent work, it was shown by one of us (EGC) that Bell-Kochen-Specker inequality violations in phenomena satisfying the no-disturbance condition (a generalisation of the no-signalling condition) cannot in general be explained with a faithful classical causal model -- that is, a classical causal model that satisfies the assumption of no fine-tuning. The proof of that claim however was restricted to Bell scenarios involving 2 parties or Kochen-Specker-contextuality scenarios involving 2 measurements per context. Here we show that the result holds in the general case of arbitrary numbers of parties or measurements per context; it is not an artefact of the simplest scenarios. This result unifies, in full generality, Bell nonlocality and Kochen-Specker contextuality as violations of a fundamental principle of classical causality. We identify, however, an implicit assumption in the former proof, making it explicit here: that certain operational symmetries of the phenomenon are reflected in the model, rather than requiring fine-tuned choices of model parameters. This clarifies a subtle but important distinction between Bell nonlocality and Kochen-Specker contextuality.


Introduction
Bell nonlocality [1] and Kochen-Specker (KS) contextuality [2] are classically forbidden correlations characteristic of quantum phenomena. Bell nonlocality can be understood as the impossibility to explain certain quantum correlations between space-like separated systems within a J. C. Pearl: jason.pearl@griffithuni.edu.au E. G. Cavalcanti: e.cavalcanti@griffith.edu.au classical theory of causality, assuming relativistic causal structure [3]. KS-contextuality, on the other hand, can be understood, within the framework of ontological models [4], as the incompatibility between the predictions of quantum theory with the joint assumption of measurement noncontextuality-the assumption that the outcome statistics of a phenomenon should not depend on the measurement context-and outcome determinism.
The fundamentally quantum nature of contextual and nonlocal correlations lies at the heart of many quantum protocols. Bell nonlocality is a key resource for quantum communication, with applications such as reducing communication complexity [5] and secure communication [6]. Since classical simulation of Bell correlations is possible (between time-like separated system) via the addition of communication channels between the parties in the Bell test [7,8], quantum over classical advantages provided by Bell nonlocality can be understood as quantum protocols having access to correlations that can only be simulated classically with the aid of extra resources. KScontextuality, on the other hand, has been identified as a key resource fuelling quantum over classical advantages in quantum computation [9][10][11][12].
A modern approach is to encode correlations for a set of observed variables in the framework of causal models, where a causal structure is represented as a directed acyclic graph (DAG) [13,14]. Recently, a framework was introduced to unify KS-contextuality and Bell nonlocality as violations of a fundamental principle of causal models: the principle of no-fine-tuning, or faithfulness [15]. In the framework of causal models, fine-tuning occurs when specific choices of parameters of the model (such as distributions over latent variables) hide from operational accessibility some causal connections available in the model. In [14] it was shown that representing cer-tain Bell-inequality violations by classical causal models requires fine-tuning, and this result was extended to the case of KS contextuality in [15]. Considering a classical causal model to be (essentially) a classical simulation of a quantum phenomenon, this provides a novel approach to understanding the quantum over classical advantage provided by Bell-KS correlations: fine-tuning can be considered an unavoidable resource waste in any classical simulation, relative to the quantum realisation of the same correlations. This causal perspective also reinforces the program of revising the assumptions underlying the classical causal models framework, such as Reichenbach's principle of common cause [16,17]-towards a general framework of quantum causal models [17][18][19][20][21][22][23].
The proofs that classical causal models for Bell-KS correlations require fine-tuning, however, are so far restricted to bipartite Bell scenarios [14] or KS scenarios with two measurements per context [15]. As quantum protocols can make use of large numbers of parties or measurements per context, a general proof is needed for this approach to have practical merit. Here we generalise the framework of [15] to arbitrary numbers of parties or measurements per context, demonstrating in full generality the need for fine-tuning in classical causal models for Bell-KS inequality violations.
In the present work, we also correct a subtle but important issue in the definition of no finetuning used in [15]. That definition did not account for the possibility that the same measurement could have different statistics depending on which random variable in a causal model it is associated with, which would represent a form of contextuality not ruled out by the notion of no fine-tuning used in that work. Here we find that an updated definition can correct this issue, by including the requirement that operational symmetries of the phenomenon must be reflected in the model -a requirement analogous to the notion of "operational no fine-tuning" recently introduced by Catani and Leifer [24]. This paper is organised as follows. In Section 2 we give a brief review of the formalism of classical causal models, and Section 3 then sets up a framework for describing Bell-KS contextuality scenarios within that formalism. The main result of this work is then given in Section 4. In Section 5, an example is given to demonstrate how to translate a well-known Kochen-Specker scenario (the Peres-Mermin square) into the causal framework. A technical proof of the main result is provided in Section 6. We conclude by discussing some important implications of this work, as well as providing some possible avenues for future research in Section 7.

Causal models
Causal models have been developed as a tool for connecting causal inferences and probabilistic observations, with a wide range of applications, from statistics to epidemiology, economics and computer science [13]. In this framework, a causal structure is represented by a graph G containing a set of observable variables of interest, as well as additional latent, or hidden, variables. Variables are represented as nodes, with causal links represented by directed edges (arrows). For a pair of variables {A, B}, A is considered to be the direct cause of B should the graph G contain a directed edge from A to B. Topologically ordered directed graphs (i.e. those that exclude the possibility of paradoxical causal loops) are known as directed acyclic graphs (DAGs).
Standard terminology will be used to refer to relationships between variables. If there is is a directed path from A to B, then A is said to be an ancestor of B, and B is a descendent of A. If A has a directed edge to B (i.e. A is a direct cause of B), then A is said to be the parent of B. The set of all parents (all direct causes) for B is denoted by Pa(B); the set of all non-descendents of B is denoted by Nd(B). The Causal Markov Condition is the assumption that a variable X is conditionally independent of its non-descendents, given its parents. This conditional independence (C.I.) is denoted as (X ⊥ ⊥ Nd(X) | Pa(X)), meaning that P (X | Nd(X), Pa(X)) = P (X | Pa(X)). For a DAG G containing variables {X 1 , . . . , X n }, the Causal Markov Condition implies that any probability distribution compatible with G factorises as (1) A procedure called d-separation (directional separation) can be used to obtain C.I. relations from a graph [13]. Here and henceforth we use a bold symbol to refer to (variables associated with) a set of nodes in a graph. A path p connecting a set of nodes X with a set of nodes Y is blocked (d-separated) by a set of nodes Z if and only if 1. p contains a chain A → B → C or a fork A ← B → C such that the middle node B is in Z, or 2. p contains an inverted fork (collider) A → B ← C such that the middle node B is not in Z and such that no descendant of B is in Z.
and only if Z blocks every path from a node in X to a node in Y . The d-separation condition is a sound and complete criteria for conditional independence. Sound: if the d-separation condition (X ⊥ ⊥ Y | Z) d is satisfied by a graph G, then all probability distributions compatible with G satisfy the C.I. relation (X ⊥ ⊥ Y | Z); complete: if all probability distributions compatible with G satisfy (X ⊥ ⊥ Y | Z), then G satisfies the d-separation condition (X ⊥ ⊥ Y | Z) d . Note that d-separation refers to a relation between X, Y and Z relative to a graph, and can also be applied to a subgraph S of a graph G. A d-separation condition obeyed by a subgraph S is not necessarily obeyed by G however.

Causal framework for Bell-Kochen-Specker contextuality & nonlocality
The framework used here generalises that of [15], where traditional ontological models for Bellnonlocality and contextuality were translated into the language of causal models. Some of the terminology follows that of [25].
A measurement scenario, or contextuality scenario, or compatibility scenario is specified by a set of k measurements M = {m 1 , . . . , m k }, a set O of possible outcomes for each measurement, and a compatibility structure C, defined to contain all subsets of jointly measurable members of M: a subset c ⊆ M is said to be jointly measurable, compatible, or to represent a measurement context iff c ∈ C. A special class of contextuality scenarios are n-partite Bell-nonlocality scenarios, where M can be decomposed into n disjoint subsets M = {M 1 , . . . , M n } such that each context c ∈ C contains exactly one element from each subset. We define a Kochen-Specker (KS) scenario as any contextuality scenario that is not a Bell scenario. We will also refer to an arbitrary (Bell or KS) contextuality scenario as a Bell-KS scenario.
Here we consider a general class of measurement scenarios, with no restriction on the number of measurements per context. For simplicity, however, and without loss of generality, we augment all contexts, where needed, with trivial measurements (that always give the same outcome), so that all contexts contain exactly the same number of measurements n = max c∈C |c|. Similarly, there is no loss in generality by assigning the same outcome set O to every measurement, as O can be made large enough to include all possible outcomes of all m i ∈ M.
Given a measurement scenario, in each test, that is, in each run of the experiment, a set of n compatible measurements is chosen to be performed, via a set of random variables X = {X 1 , X 2 , ..., X n }. That is, in each run the random variables take values so as to form a measurement context, e.g.
The respective outcomes are recorded by the set of random variables For convenience, we denote an index subset by γ ⊆ I such that A γ ⊆ A and X γ ⊆ X. We then introduce the shortcut notations A \γ = A \ A γ and X \γ = X \ X γ for the complementary subsets of variables. We define a test so that for Bell scenarios, each variable X i (corresponding to the i th party in the test) is always chosen from its corresponding subset M i . Some remarks about the identification of mea-surements are in order. In Bell scenarios, each measurement choice can be thought of as a setting on a "black box" in possession of one of n agents. In KS scenarios, in which the measurement set M cannot be factorised that way, some further justification is needed to identify the "same" measurement in different contexts. This could be done, following [4], via operational equivalence classes. Depending on the implementation, each random variable X j could then be thought of either as an experimental "slot" in a process, or simply as an arbitrary label. For example, X j could refer to the j th measurement in a temporal sequence. Alternatively, it could refer to the j th measurement to be chosen in an arbitrary way, specifying a joint measurement only after all n measurements are chosen. In either of these cases, the same measurement m can be associated with different random variables X j in different tests.
A phenomenon is specified by a probability distribution P(AX) for all allowed values of the observable variables. Note that the formalism so far is independent of any causal structure. We now define a (classical) causal model for a phenomenon.

Definition 1 (Classical causal model).
A classical causal model Γ for a phenomenon P consists of a (possibly empty) set of latent variables Ξ, a DAG G with nodes {A, X, Ξ}, and a probability distribution P (AXΞ) compatible with G, such that P(AX) = Ξ P (AXΞ).
Marginal and conditional probabilities are calculated in the standard way, e.g. P(X) = A P(AX) and P(A|X) = P(AX)/P(X), and similarly for the model probabilities P (·). If the marginal probability distribution P(A γ |X) ≡ A \γ P(A|X) for any compatible subset of measurement outcomes is independent of the context in which they are performed, the phenomenon is said to satisfy the condition of no-disturbance.
The second of these conditions represents the requirement that when a measurement m is associated with more than one index (as can occur in KS scenarios), its marginals are independent of the index (and thus of the context).
To clarify our notation, in a scenario with three pairs of variables, the no-disturbance conditions include three constraints of the form In the language of causal models, these nodisturbance conditions are denoted by (A γ ⊥ ⊥ X \γ | X γ ). The decomposition axiom (3) can then be used to derive less general no-disturbance conditions for subsets of X \γ .
In Bell scenarios, when each measurement in X is space-like separated from all others, the no-disturbance condition is called the nosignalling condition. Note that in Bell scenarios no-signalling could be defined as P(A γ |X \γ ) = P(A γ ). However, although this assumption is implied by our definition when the measurement settings are chosen independently, it can be violated if the choices are correlated, as is the case in general KS scenarios.
It is important to note that no-disturbance and no-signalling are defined as properties of phenomena, that is, they are defined operationally. The following definitions instead deal with properties of causal models for a phenomenon-that is, they can be understood as ontological properties. We say that a phenomenon P satisfies a property pertaining to causal models when there exists a causal model for P that satisfies that property, and we say that a phenomenon P violates a property when no causal model for P satisfies that property.
Kochen-Specker noncontextuality (KSNC) [2] is the assumption that all measurements have a predetermined value independently of the context in which they are performed. Bell-locality, on the other hand, can be derived from two sets of assumptions 1 [3]: Locality and Predetermination (Bell's 1964 theorem) or Local Causality (Bell's 1976 theorem). Local Causality is a stronger notion than Locality, but weaker than the conjunction of Locality and Predetermination.
While Bell's 1964 theorem (and the KS theorem) can be resolved with the simple rejection of Predetermination, Bell's 1976 theorem requires a more radical revision on classical notions of causality. The notion of causality built into Local Causality amounts to the classical causal framework reviewed here, with the causal graph implied by relativistic causal structure.
Despite this conceptual difference, the mathematical constraints imposed by the assumptions of Bell-locality and KS-noncontextuality are essentially equivalent, and can be translated to the language of causal models through the condition of factorisability: The requirement for KS scenarios means that a measurement m has the same marginal statistics in the model independently of which random variable it is associated with.
By the Fine-Abramsky-Brandenburger (FAB) theorem [25,26], the assumption of Kochen-Specker noncontextuality is equivalent to the existence of a factorisable model for a phenomenon satisfying no-disturbance. Bell-locality is the special case of KS-noncontextuality in a Bell scenario.
The set of KS-noncontextual phenomena for each scenario is bounded by the KS inequalities [27,28], which can be derived as the facets of a convex polytope [29,30] induced by the factorisability condition. These inequalities reduce to Bell inequalities [31] in Bell scenarios.
Every factorisable phenomenon can be modelled via a canonical causal model 2 with a graph as given in Fig. 1, containing a latent variable Λ acting as a common cause between all outcomes A i and a latent variable Ω as a common cause to all measurement choices X i . In a KS scenario, Ω can be thought of as encoding the choice of context; in Bell scenarios, it is usually assumed that Figure 1: A canonical causal graph compatible with nodisturbance and no fine-tuning.
the choices of measurement are mutually independent, which is encoded in the typical Bell-scenario graph by ommitting Ω. However, this is unnecessary: any phenomenon compatible with the graph in Fig. 1 satisfies factorisability, as can be readily checked by applying the Causal Markov Condition to this graph and summing over Ω. What is required for the derivation of Bell inequalities is that Ω is not otherwise causally connected to the other variables in the graph. In Bell scenarios, Fig. 1 (and thereby factorisability) is motivated by relativistic causal structure, when the different parties' events are spacelike separated, plus an assumption of "freedom of choice" or "statistical independence", which can be interpreted at the causal level as the requirement that Λ and Ω are not causally connected. Bell-locality can then be derived from an application of the Causal Markov Condition to the causal graph implied by relativistic causal structure (for a review, see [3]).
On the other hand, the justification of factorisability for KS scenarios, where measurements are not space-like separated, rests on more controversial grounds. It is typically derived with an assumption of outcome determinism that is arguably unjustified when formulated within the language of ontological models [4,33]. Here we show that, fortunately, this controversy can be avoided within the framework of causal models, as the condition of factorisability is implied by the principle of no fine-tuning, or faithfulness, a fundamental principle of causal models, without the need to invoke outcome determinism. Consider a phenomenon that is known to satisfy the C.I. relation (A ⊥ ⊥ B | C) (corresponding, for example, to a no-signalling condition). If the causal structure does not satisfy the d-separation (A ⊥ ⊥ B | C) d , then the observed conditional independence can only arise due to specially finetuned values of the causal parameters. These finetuned parameters act to "hide" causal connections (for example, faster-than-light causation), creating the illusion of a C.I. relation at the operational level. A faithful causal model is then best understood to be a causal model with no hidden causal connections.
The symmetry condition in Def. 4.2 is an extension to Pearl's notion of faithfulness, but is essential for contextuality scenarios, as we will see in the final step of the proof of Theorem 1. The specific implication of this assumption that we need is that if the marginals of a phenonemon are symmetric with respect to exchange of labels associated with a measurement m (e.g. if P(A i |X i = m) = P(A j |X j = m) for some i, j) the model should satisfy the same symmetry. This assumption is analogous to the notion of "operational no fine-tuning" recently introduced by Catani and Leifer [24]. Its requirement for KS scenarios highlights a subtle but important distinction in the assumption of no-fine-tuning required for KS scenarios vs Bell scenarios.
The motivation for no fine-tuning is analogous to that for Leibniz's principle of the identity of indiscernibles [34], which states that a theory should avoid postulating distinctions at the ontological level that are not reflected in operational distinctions. It can also be understood as the methodological principle underlying Einstein's principles of relativity and of equivalence [34].
In light of the above discussion, the violation of a Bell-KS inequality in a phenomenon P implies that (i) either the causal graph underlying the phenomenon does not have the form of the canonical causal graph in Fig. 1, or (ii) the classical causal model formalism needs to be rejected or modified so that this graph does not imply factorisability (e.g. as in the program of quantum causal models [17][18][19][20][21][22][23]). For Bell scenarios, the causal graph is motivated by relativity, and proposed resolutions of the type (i) above include violations of relativistic causality (e.g. in Bohmian mechanics [35]), retrocausality [36] or superdeterminism [37]. A priori, these alternative causal structures seem objectionable for different reasons, but they all share a common property: they require fine-tuning within a classical causal model. A natural question is therefore whether another modified causal structure, however exotic, could reproduce the violation of Bell inequalities while avoiding this objection. For KS scenarios, on the other hand, the canonical causal graph in Fig. 1 is not directly motivated by relativity, and the question is how can factorisability be motivated for these scenarios at all. The present result completes the partial results of [14,15] and resolves both of these questions at once, for arbitrary Bell-KS scenarios: it establishes that any classical causal model for such scenarios that is faithful to the no-disturbance conditions implies factorisability.

Main result
In [15] (following [14]), it was shown that no-finetuning leads to KS-noncontextuality for any phenomenon satisfying no-disturbance. The proof however was restricted to contextuality scenarios with two measurements per context (and bipartite Bell scenarios as a special case). Here we show that this result holds in general scenarios with arbitrary numbers of parties or measurements per context.

Theorem 1. Every phenomenon satisfying nodisturbance in an arbitrary contextuality scenario that has a faithful causal model is factorisable.
Theorem 1 leads to the following immediate corollaries:

Corollary 2. Every classical causal model that reproduces the violation of a Bell-KS inequality in a no-disturbance phenomenon in an arbitrary
Bell-KS scenario requires fine-tuning.

Example scenario based on the Peres-Mermin square
In [15], it was shown how the causal framework for contextuality can be mapped onto a three-measurement scenario with two measurements per context introduced by Liang, Wiseman, and Spekkens [38]. Here, we demonstrate an example with three measurements per context based upon the the Peres-Mermin square [39,40]. This scenario contains nine measurements M = {m 1 , m 2 , . . . , m 9 } with binary outcomes O = {−1, 1}. The compatibility structure can be conveniently represented by the hypergraph in Fig. 2 [41].
Rows and columns represent measurement contexts-that is, they are jointly measurable. Formally, the compatibility structure is denoted by C = {R 1 , R 2 , R 3 , C 1 , C 2 , C 3 }. For this scenario, n = 3, as each measurement context contains at most three measurements. In each run of the experiment, X = {X 1 , X 2 , X 3 } can take any triplet of values from C, and Consider a phenomenon P = (AX) that satisfies the no-disturbance relations P(A γ |X) = P(A γ |X γ ).
In this scenario, γ = {1, 2, 3, {1, 2}, {1, 3}, {2, 3}}, and so there is a no-disturbance relation for each member of γ. From Theorem 1, any faithful classical causal model for this phenomenon must satisfy KS-noncontextuality. In this scenario, KS-noncontextuality implies the inequality [42]: Quantum theory predicts a state-independent violation of this inequality by two qubits, where the measurement scenario is represented by the following array of Pauli spin matrices, z .

Proof of Theorem 1
To aid in the proof of Theorem 1, we introduce the graphical notations in Fig. 3 to represent sets of causal connections. A diagram using these shortcut notations represents the set of all DAGs compatible with all shortcut notations. A dashed line represents a connection of the type indicated or no connection. Figure 3: Shortcut graphical notations for causal connections between X and Y.
The proof will make use of the following Lemma: Lemma 1. Let a chained graph be a graph of the form below (Fig. 4), where A,B,C,D   The no-disturbance condition, when combined with the assumption of no fine-tuning, leads to the d-separation conditions The rest of the proof proceeds by deriving a set of d-separation conditions that must be obeyed by any causal model satisfying (8). These d-separation conditions then imply new C.I. relations in any joint distribution compatible with any faithful causal graph satisfying nodisturbance, which will be shown to imply factorisability in the joint distribution for any number of parties or measurements per context.
Step 1a. The class of DAGs we need to consider are those that include latent variables as common causes for observable variables or direct causal connections between them. There is no need to consider latent variables as intermediaries or common effects between variables, since those have no effect on the allowed probability distributions over the observable variables. Figure 5: Remaining class of DAGs after Step 1a.
From (8), we can exclude any direct causal link or common cause between {A i , X \i } for all i ∈ I = {1, 2, ..., n}. In particular, this excludes the possibility of a common cause between any two or more outcomes and a setting, or between any two or more setting and an outcome. This leaves us with the possibility of any causal link between {A i , X i }, {A i , A \i } and {X i , X \i } as shown in Fig. 5. Without loss of generality, we then introduce Λ and Ω as the sets of latent variables potentially acting as common causes between all outcomes and all settings, respectively, and the above remark implies that Λ and Ω are not directly causally connected. Considering intermediate latent variables as joint intermediaries would not allow for more general phenomena, and would not create any causal paths between the observable variables that are not already included in Fig. 5.
Step 1b. For every DAG represented by Fig. 5, and without loss of generality, all members of A and X can be grouped into subsets depending on the existence of certain causal connections, as shown in Fig 6. All members of A with no direct causal connection to any member of X are denoted by the subset B. Remaining members of A are denoted by the subset C. Likewise, all members of X with no direct causal connection to any member in A are denoted by Y , while the remaining members are denoted by Z.
Step 2a. From (8), we can derive (B ⊥ ⊥ Z | Y ) d . Note that any path between B and Z must pass through at least one element of C. Therefore, for any such path, C acts as a middle node that is not in Y . For B to be d-separated from Z given Y , C must act as a collider in any path between B and Z. Since every member of C has a connection to one and only one member of Z, any B C Y Z Λ Figure 6: Shortcut representation of the class of DAGs in Fig. 5. Dashed circles represent the possibility of an empty set. B ⊆ A: all members of A with no causal connection to X; C ⊆ A: all members of A with some causal connection to X. A connection between two sets represents all possible connections of the type indicated between members of each node. member of C with a direct causal connection to B would be a non-collider middle node between B and Z. Thus, direct connections from C to B would violate (B ⊥ ⊥ Z | Y ) d , and are excluded, as shown in Fig. 7.
Step 2b. From Fig. 7 we see that Y cannot act as a middle node in paths between B and Z.
Therefore B is d-separated from Z given any variable that is not a collider in a path between them. As Λ satisfies this condition, we find that Using (8) again, we can write the d-separation condition (C ⊥ ⊥ Y | Z) d . From the symmetry axiom (2), this can be rewritten as (Y ⊥ ⊥ C | Z) d . From Lemma 1, it follows that all graphs in Fig. 7 From the weak union axiom (4), this can be rewritten as (Y ⊥ ⊥ CB | ZΛ) d . Reapplying the symmetry axiom (2) and rewriting BC = A, we arrive at the condition Step 3a. Now we consider the causal connections between two arbitrary variables {C i , C j } ∈ C. From (8) and the decomposition axiom (3), Figure 8: Representation of the set of graphs with C separated into C i and C \i and Z separated into Z i and Z \i .
we can write the condition For this path to be blocked by Z j , the middle node C i must be a collider. Thus, we can eliminate direct connections C i → C j between any two members of C, as shown in Fig. 8.
We now consider what d-separation conditions can be found between members of C, as this will be required for the final step. Considering Fig. 8, all paths between C i and C \i can be divided in two classes: (i) those paths that go through Z i (the bottom half of Fig. 8) and (ii) those that go through BΛ (the top half). Consider the paths in (i). From (8) and the decomposition axiom (3), we can write We can see from Fig. 8 that every path between C i and Z \i Y that contains a collider in Z i violates this condition. Thus any such path must have a chain or fork with Z i as the middle node. Thus Z i blocks, as a chain or fork, all paths between C i and C \i that go through Z i . Therefore Z blocks all paths in (i) between C i and C \i . The paths in (ii) are blocked by conditioning on BΛ, as all such paths are chains or forks with B and/or Λ as a middle node. Thus, we find that all DAGs in Fig. 8 that satisfy condition (8) also satisfy Any path through Λ is blocked by Λ because it is a fork. From as conditioning on Λ cannot make C i and Z \i dependent.
Step 3c. This step consists of deriving the dseparation condition (Λ ⊥ ⊥ X) d . It is not neces-sary to consider paths through B, as every path from Λ to a member of Z must pass through a member of C. For every member C i of C, we can separate the graphs in Fig. 8 into the two sub-graphs shown in Fig. 9. Figure 9: Two sub-graphs of Fig. 8, where (a) considers a direct connection from C i to Z i with or without a common cause, and (b) excludes a connection from C i to Z i . Together these graphs account for all graphs in Fig. 8. Fig. 9a. The conditions (B ⊥ ⊥ Z | Y ) d and (C \i ⊥ ⊥ Z i | Z \i ) d exclude the possibility of a common cause between C i and any member of B or C \i , as well as direct connections from B to C i , as shown below in Fig. 10. Therefore there are no paths of the type Λ−C i −Z i when there is a direct connection from C i to Z i . Figure 10: Elimination of causal connections from Fig. 9a. Fig. 9b is blocked by the empty set, as C i acts as a collider with no descendants. Since every path between Λ and Z includes a sub-path of the form of Λ − C i − Z i in Fig. 9a or Fig. 9b, we can then write

Now consider the class of graphs in
Step 4. The d-separation conditions derived in (9), (10), (11), (12) and (13) imply the corre-sponding C.I. conditions From the definition of conditional probability, we can write the observable joint distribution as Summing over Ω, writing X = Y Z and using (14), we can write Substituting A = BC, and using (15) and (16), (21) Now note from Fig. 8 that no observable variable outside B can have a direct causal link to B. This is reflected in the equation above by the fact that B is only dependent on Λ. Without loss of generality, we can thus let Λ determine B, as any phenomenon that is compatible with a model of this form is also compatible with a model where Λ determines B (after suitable fine-graining). Any information about B is then known given Λ. Since Λ determines B, P (C | ZBΛ) = P (C | ZΛ). Using the definition of conditional probability, P (C | ZΛ) = j P (C j | C\{C 1 , C 2 , . . . , C j }ZΛ).

(22)
From (17), all C j ∈ C are independent given ZΛ. From (18), all C j ∈ C are independent of Z \j given Z j Λ. Thus Applying this procedure to P (B | Λ), and since by definition Λ determines B, we can similarly write We can finally write the observable joint distribution for any phenomena satisfying no-disturbance and no-fine-tuning as: For Bell scenarios, where each measurement is performed by a single party, this is a factorisable model, completing the proof.
Step 5. For KS scenarios, a further step is needed, to justify that P (

Conclusion
In summary, we have shown that Kochen-Specker contextuality and Bell-nonlocality, in fully general scenarios with arbitrary numbers of measurements per context or parties, and arbitrary numbers of outcomes per measurement, can both be understood as phenomena for which it is impossible to construct a faithful classical causal model. This means that these key quantum phenomena can be understood in a unified way as violations of the classical framework of causality. This result has several important consequences. Firstly, from a foundational perspective, it generalises the results of [14,15], confirming that this relationship between fine-tuning and Bell-KS inequality violations is fully general, and not an artefact of the simplest scenarios. This adds extra motivation for the program of quantum causal models [17][18][19][20][21][22][23], in which the classical framework of causality is extended into a framework where quantum correlations can be potentially explained without fine-tuning, thus removing the objectionable property that, according to the present result, is required of classical causal models for all Bell-KS correlations.
Secondly, as alluded to in the introduction, and in [15], this result gives a general motivation for the idea of quantifying quantum advantage via fine tuning, as it shows that this is a property of all classical simulations of phenomena displaying Bell-KS contextuality-key resources for quantum communication and computation protocols. It also puts in a new light previous results about simulating Bell correlations with the aid of extra communication, such as the seminal work of Toner and Bacon [8] and subsequent results.
Another avenue for further research is to extend the principle of no fine-tuning to accommodate phenomena that do not satisfy nodisturbance, as is the case in non-ideal experiments. In [15], a generalised principle of no finetuning was proposed, whereby a causal model should not allow causal connections stronger than needed to explain the observed deviations from the no-disturbance condition. A recent work [32] has implemented a version of this principle, and shown that models that satisfy this property (dubbed "M-noncontextuality") are equivalent to models that satisfy the property of CbD-noncontextuality defined in the "Contextuality by Default" approach [43]. However, the class of causal models considered in [32] is not as general as the ones considered here: it only considers a minimal relaxation from the default causal structure, allowing for causal influences from the contexts to the measurement outcomes. This excludes by fiat a large class of candidate causal models. It would be interesting to know whether this relationship between generalised nofine-tuning and CbD-contextuality holds in general, and whether this can lead to robust experimental tests.
Finally, we note that although this result unifies Bell-nonlocality and Kochen-Specker contextuality as violation of no-fine-tuning in classical causal models, this does not imply that these phenomena represent all forms of quantum violations of classical causality. It has recently been shown [44], for example, that quantum correlations can violate the classical constraints on a "triangle scenario" [45] even in the absence of any choice of measurement settings, a phenomenon dubbed "quantum nonlocality without inputs". Since there are no choices of settings, there are no C.I. relations associated with no-disturbance conditions in those scenarios. Therefore the quantum violations of those kinds of scenarios are not instances of Bell-nonlocality nor KS-contextuality, but some fundamentally different kind of nonclas-sicality. Understanding the nature of this distinction is an interesting question for further study.