Beyond the Cabello-Severini-Winter framework: Making sense of contextuality without sharpness of measurements

By generalizing the Cabello-Severini-Winter (CSW) framework, we build a bridge from this graph-theoretic approach for Kochen-Specker (KS) contextuality to a hypergraph-theoretic approach for Spekkens' contextuality, as applied to Kochen-Specker type scenarios. Our generalized framework describes an experiment that requires, besides the correlations between measurements carried out on a system prepared according to a fixed preparation procedure (as in Bell-KS type experiments), the correlations between measurement outcomes and corresponding preparations that seek to make these measurement outcomes highly predictable. This latter feature of the experiment allows us to obtain noise-robust noncontextuality inequalities by applying the assumption of noncontextuality to both preparations and measurements, without requiring the assumption of outcome-determinism. Indeed, we treat all measurements on an equal footing: no notion of"sharpness"is presumed for them, hence no putative justification of outcome-determinism from sharpness is sought. As a result, unlike the CSW framework, we do not require Specker's principle (or the exclusivity principle) -- that pairwise exclusive measurement events must all be mutually exclusive -- as a fundamental constraint on (sharp) measurement events in any operational theory describing the experiment. All this allows us, for the case of quantum theory, to deal with nonprojective (or unsharp) measurements and resolve the pathologies they lead to in traditional approaches. Our noncontextuality inequalities are robust to the presence of noise in the experimental procedures, whether they are measurements or preparations, and are applicable to operational theories that need not be quantum.

Much work has been devoted, recently [1][2][3][4][5][6], to obtaining constraints on operational statistics that follow from the assumption of noncontextuality within the framework proposed by Spekkens [7]. This generalized framework abandons the assumption of outcome determinism that is intrinsic to the Kochen-Specker (KS) framework [8], applies to arbitrary operational theories, and extends the notion of noncontextuality to arbitrary experimental procedures -preparations, transformations, and measurements -rather than measurements alone.
On the other hand, work along the lines of the traditional KS framework culminated in two recent approaches: the graph-theoretic framework of Cabello, Severini, and Winter (CSW) [9,10] where a general approach to obtaining graph-theoretic bounds on linear Bell-KS functionals was proposed, and the related hypergraph framework of Acín, Fritz, Leverrier, and Sainz (AFLS) [11], where an approach to characterizing sets of correlations was proposed. The CSW framework relates wellknown graph invariants to upper bounds of Bell-KS inequalities, upper bounds on maximum quantum violations of these inequalities, and upper bounds on them in general probabilistic theories [12] -denoted E1 -which satisfy the "exclusivity principle" [10]. Complementary to this, the AFLS framework uses graph invariants in the service of deciding whether a given assignment of probabilities to measurement outcomes in a KS-contextuality experiment belongs to a particular set of correlations; they showed that membership in the quantum set of correlations (defined only for projective measurements in quantum theory) cannot be witnessed by a graph invariant [11]. Another recent approach [13] employs sheaftheoretic ideas to formulate KS-contextuality.
In this paper we build a bridge from the CSW approach, where "classical" (i.e., KS-noncontextual) correlations are bounded by Bell-KS inequalities, to noiserobust noncontextuality inequalities in the Spekkens framework [7]. Unlike the criteria for KS-contextuality in the CSW framework, the operational criteria for contextualityà la Spekkens are robust to noise and therefore applicable to arbitrary positive operator-valued measures (POVMs) and mixed states in quantum theory.
Indeed, if one allows for POVMs in the definition of quantum correlations (rather than just projective measurements), then the separation between quantum and E1 correlations in the CSW framework breaks down. This is because any set of probabilities satisfying the "no-disturbance" or "no-signalling" condition (of which the E1 correlations are a subset, in general) can be achieved by (trivial) POVMs by simply multiplying an identity operator with every probability in such an assignment of probabilities. 1 By the lights of KS-noncontextuality as one's notion of classicality, then, trivial POVMs saturating the general probabilistic bound on the correlations are maximally nonclassical (i.e., maximally KScontextual). However, we "know" intuitively that trivial POVMs are "classical", even if KS-noncontextuality as a notion of classicality doesn't quite capture that intuition. A simple operational sense in which trivial POVMs are "classical" is that they reveal nothing about the quantum state on which they are measured, being incapable of distinguishing any pair of states whatsover. 2 This simple sense in which they are "classical" is, however, not captured by KS-noncontextuality as one's notion of classicality, since the experiment is restricted to considering correlations obtained from compatible sets of measurements implemented on the same preparation, and therefore no variation over preparations is taken into account in Bell-KS type inequalities. This makes such experiments incapable of witnessing the "triviality" of trivial POVMs, i.e., the fact that they correspond to a fixed probability distribution that doesn't vary even as the choice of preparation is varied. Moreover, since all nonprojective measurements are excluded by fiat in traditional Kochen-Specker type approaches [10,11], 3 one loses out on the potential to explore the possibilities that nontrivial, yet nonprojective, measurements offer with respect to contextuality. 4 Note that whenever we refer to "Bell-KS" functionals or inequalities for Kochen-Specker type experiments, we are not thinking of experiments that are Bell experiments [18][19][20][21][22][23], which have spacelike separation between multiple parties, each performing local measurements on a shared multipartite preparation. For the case of Bell experiments, trivial local POVMs assigned to each party in a Bell experiment do not lead to Bell violations for a simple reason: the trivial POVMs for each party are all compatible with each other, thereby admitting a joint probability distribution over their outcomes for each party; taking a product of these local joint probability distributions (one for each party) results in a joint distribution over all measurements of all parties, hence satisfying Bell inequalities. The fact that the POVMs are trivial ensures that the Bell inequalities are satisfied regardless of the choice of shared quantum state. On the other hand, forgetting the constraint of local POVMs, there always exist global trivial POVMs that can violate Bell inequalities: e.g., just take the Popescu-Rohrlich (PR) box distribution [24], and multiply an identity operator (on the joint Hilbert space of Alice and Bob) with each probability in the PR-box; this results in four trivial POVMs, defined over the joint Hilbert space, that together violate the CHSH inequality maximally. But, of course, this violation is uninteresting because it doesn't obey the locality constraint on the measurements in a Bell experiment. This is mathematically reflected in the fact that the PRbox distribution cannot be written as a convex mixture of product distributions, one for each party, hence the corresponding trivial POVM cannot be understood in terms of trivial local POVMs. Hence, it is the locality of the trivial POVMs in a Bell experiment that prevents them from violating a Bell inequality. The fact that they are "trivial" in the sense of being unable to distinguish two quantum states plays a role in the sense that, regardless of the shared quantum state, these POVMs yield fixed distributions over the measurement outcomes, thus always allowing the construction of a fixed (that is, independent of the quantum state) global joint probability distribution over all measurements in a Bell scenario. Since there are no such locality constraints on the form of the POVM elements in a Kochen-Specker experiment, they can easily violate any KS-noncontextuality inequality, e.g., the two-party CHSH experiment considered as a Kochen-Specker experiment with four observables in a 4cycle where adjacent pairs are jointly measurable allows for trivial POVMs (like the PR-box trivial POVM above) violating the CHSH-type Bell-KS inequality in this scenario maximally. By the lights of KS-noncontextuality, this violation would indicate the maximum possible KScontextuality with respect to this CHSH-type inequality. 5 Hence, our criticism of KS-noncontextuality as a notion of classicality -in an experiment with no locality constraints on the measurements -does not extend to the case of Bell-locality (or local causality) as a notion of classicality in a Bell experiment, where the experiment must respect locality constraints on the measurements for a Bell inequality violation to be meaningful. It is the locality of the measurements implemented by the various parties in a Bell experiment that renders Bell-locality immune to the criticism we are directing at KS-noncontextuality in this paper. Indeed, any attempt at a unified approach to KS-contextuality and Bell-nonlocality in the traditional approach [10,11,13] suffers from the problem of not making this distinction (one of locality of POVMs) between the two kinds of experiments (Bell experiments vs. Kochen-Specker experiments) precise, choosing instead to dwell on their formal 5 See Appendix A for more discussion. mathematical unification as an instance of the classical marginal problem [25]. The marginal problem formulation is perhaps most explicit in the case of marginal scenarios defined in Ref. [26] (see also Ref. [27]). This unification forces a certain dichotomy in these approaches: while in Bell scenarios, one need not restrict to any notion of a "sharp" measurement in the definition of probabilistic models (and thus claim "theory independence"), in Kochen-Specker scenarios, one must make some statement about the structure of the measurements (such as their presumed sharpness [28], or that their joint measurability [17,29] is restricted to commutativity [13]), rendering any putative "theory independence" claim unfounded. 6 Because of this pathology of POVMs with respect to KS-noncontextuality as a notion of classicality, all traditional treatments in the KS framework [10,11,13] restrict the set of quantum correlations to those which are achieved by projective measurements (rather than POVMs, generally) on a quantum state. With recent work on a sensible notion of "sharp" measurement in a general probabilistic theory [31,32], the current attitude of proponents (see, e.g., [28]) of the traditional KS framework (defending KS-noncontextuality as a sensible operational notion of classicality) is to restrict attention to sharp measurements in both quantum theory and general probabilistic theories.
However, another logical possibility is available and, indeed, operationally better justified than KSnoncontextuality [33,35]: that one must revise one's notion of classicality as KS-noncontextuality to a notion of classicality that allows for arbitrary quantum measurements and witnesses the fact that trivial POVMs are indeed classical according to this revised notion, even in experiments where -unlike a Bell experiment -there is no constraint of locality on the measurements. At the same time, such a revised notion should be capable of recovering the traditional notion of KS-noncontextuality as classicality in the case of projective measurements in quantum theory. 7 Fortunately, we already have such a notion of classicality: namely, universal noncontextuality, as defined in the Spekkens framework [7,33]. In 6 See Ref. [30] for how this lack of locality of measurements in a Kochen-Specker type experiment translates, at the ontological level, to the unreasonableness of assuming factorizability in the ontological model; this factorizability (or the stronger condition of outcome determinism) is invoked to justify the resulting derivation of Bell-KS inequalities as constraints from a classical marginal problem. 7 This is in contrast to what is usually done traditionally: that one insists on KS-noncontextuality as one's notion of classicality [10,11,13] and, for this notion to make sense, one restricts the scope of allowed measurements to just the projective measurements so that commutativity is equivalent to joint measurability [17,29]. If one lifts the restriction to projective measurements to allow arbitrary POVMs, then one is forced to modify KS-noncontextuality in order to avoid the pathology of trivial POVMs. We do this in a principled way in this paper, building on the approach of Ref. [7]. particular, for Kochen-Specker type experimental scenarios, we will consider the twin notions of preparation noncontextuality and measurement noncontextualitytaken together as a notion of classicality -to obtain noise-robust noncontextuality inequalities that generalize the KS-noncontextuality inequalities of CSW and witness nonclassicality even when the quantum correlations arising from arbitrary quantum measurements on any quantum state are allowed. A key innovation of this approach is that it treats all measurements in an operational theory on an equal footing. No definition of "sharpness" is needed to justify or derive noncontextuality inequalities in this approach. Furthermore, if certain idealizations are presumed about the operational statistics, then these inequalities formally recover the usual Bell-KS inequalities a la CSW. Note that Bell-KS inequalites can be viewed as an instance of the classical marginal problem [25][26][27]30], i.e., as constraints on the (marginal) probability distributions over subsets of a set of observables that follow from requiring the existence of global joint probability distribution over the set of all observables. Since the Bell-KS inequalities are only recovered under certain idealizations, but not otherwise, the noise-robust noncontextuality inequalities we obtain cannot in general be viewed as arising from a classical marginal problem. Hence, they cannot be understood within existing frameworks that rely on this (reduction to the classical marginal problem) property to formally unify the treatment of Bellnonlocality and KS-contextuality [10,11,13]. This is a crucial distinction relative to the usual Bell-KS inequality type witnesses of KS-contextuality. We now proceed to develop our framework as follows: Section II reviews the Spekkens framework for generalized noncontextuality [7]; Section III introduces a hypergraph framework that shares features of traditional frameworks for KS-contextuality [10,11] but is also augmented (relative to these traditional frameworks) with the ingredients necessary for obtaining noise-robust noncontextuality inequalities; Section IV defines a new hypergraph invariant that we need later on as a crucial new ingredient in our inequalities; and Section V obtains noise-robust noncontextuality inequalities in the framework defined in Section III and using the hypergraph invariant of Section IV, based on the technique proposed in Ref. [5]. Finally, we conclude with some discussion and open questions in Section VI.

II. SPEKKENS FRAMEWORK
We concern ourselves with prepare-and-measure experiments. A schematic of such an experiment is shown in Figure 1 where, for the sake of simplicity, we imagine a single source device that can perform any preparation procedure of interest (rather than a collection of source devices, each implementing a particular preparation procedure) and a single measurement device that can perform any measurement procedure of interest (rather than a collection of measurement devices, each implementing a particular measurement procedure). 8 The source device has a source setting, S ∈ S, 9 that can be chosen to prepare a system in an ensemble of possible preparation procedures, {P [s|S] } s∈V S , according to some probability distribution p(s|S). This means that the source device has one classical input S and two outputs: one output is a classical label s ∈ V S identifying the preparation procedure (in the ensemble {p(s|S), P [s|S] } s∈V S ) that is carried out when source outcome s is observed for source setting S (this source event 8 Note that this is just a conceptual abstraction: in particular, the various possible measurement settings on the measurement device may, for example, correspond to incompatible measurements in quantum theory. The fact that we represent the different measurement settings by choices of knob settings M ∈ M on a single measurement device does not mean that it's physically possible to implement all the measurements represented by M jointly (in a single run of the experiment); it only means that the experimenter can choose to implement any of the measurements in the set M in a given run of the prepare-and-measure experiment. The same comment applies to our abstraction of preparation procedures to knob settings and outcomes of a single source device. Also, we take the each of sets M and S to include a tomographically complete set of measurements and preparations, respectively. This tomographic completeness property is necessary to be able to implement an actual contextuality experiment, see Refs. [2,33,34]. 9 Where S is just a set of classical labels for different choices of knob settings on the source device.
is denoted [s|S]), and the other output is a system (quantum or otherwise) prepared according to the source event [s|S], i.e., preparation procedure P [s|S] , with probability p(s|S). Thus, the assemblage of possible ensembles that the source device can prepare can be denoted by On the other hand, the measurement device has two inputs, one a classical input M ∈ M specifying the choice of measurement setting to be implemented, and the other input receives the system prepared according to prepartion procedure P [s|S] and on which this measurement M is carried out. The measurement device has one classical output m ∈ V M denoting the outcome of the measurement M implemented on a system prepared according to P [s|S] , and which occurs with probability p(m|M, S, s).
We will be interested in the operational joint probability p(m, s|M, S) ≡ p(m|M, S, s)p(s|S) for this prepareand-measure experiment for various choices of M ∈ M, S ∈ S. Note that this operational description takes as primitive the operations carried out in the lab and restricts itself to predicting the probabilities of classical outcomes (i.e., m, s) given some interventions (i.e., classical inputs, M, S). To be able to define noncontextuality, the operational theory should admit a notion of operational equivalence, both for sources and for measurements.  (2) In this paper, we will be primarily interested in the operational equivalence between the source settings themselves rather than the source events, i.e., operational equivalence between settings when one coarse-grains over their outcomes. More precisely, two source settings S and S are said to be operationally equivalent, denoted [ |S] [ |S ], if no measurement event can distinguish them once all their outcomes are coarse-grained over, i.e., (3) We use the notation [ |S] to denote coarse-graining over all the outcomes in V S , i.e., the source event that "at least one of the source outcomes in the set V S occurs for source setting S." Given the operational description of the experiment in terms of probabilities p(m, s|M, S), we want to explore the properties of any underlying ontological model for this operational description. Any such ontological model, defined within the ontological models framework [36], takes as primitive the physical system (rather than operations on it) that passes between the source and measurement devices, i.e., its basic objects are ontic states of the system, denoted λ ∈ Λ, that represent intrinsic properties of the physical system. When a preparation procedure [s|S] is carried out, the source device samples from the space of ontic states Λ according to a probability distribution µ(λ|S, s) ∈ [0, 1], where λ∈Λ µ(λ|S, s) = 1, and the joint distribution over s and λ given S is given by µ(λ, s|S) ≡ µ(λ|S, s)p(s|S). On the other hand, when a system in ontic state λ is input to the measurement device with measurement setting M ∈ M, the probability distribution over the measurement outcomes is given by ξ(m|M, λ) ∈ [0, 1], where m∈V M ξ(m|M, λ) = 1. The operational statistics p(m, s|M, S) results on account of coarse-graining over λ, i.e., p(m, s|M, S) = λ∈Λ ξ(m|M, λ)µ(λ, s|S).
As such, it is always possible to build an ontological model for an operational theory. 10 It's only when additional assumptions are imposed on the ontological model that deciding its existence becomes a nontrivial problem. Such additional assumptions must, of course, play an explanatory role to be worth investigating. The assumption we are interested in is noncontextuality and its purpose is to explain the observed operational equivalences in an operational theory. But before we get to noncontextuality, we need to define what a context is: a context is any distinction between operationally equivalent procedures.
In quantum theory, for example, the preparation basis of the maximally mixed state of a qubit is an example of a preparation context since uniformly mixing the spin up and down eigenstates along any basis leads to the same quantum state. Similarly, when the statistics of a given measurement is inferred by coarse-graining the statistics obtained from measuring it jointly with one or the other measurement (and the two inferences agree), these latter measurements are examples of measurement contexts for the given measurement, e.g., in quantum theory, consider the case of three Hermitian operators A, B, C such that  Noncontextuality, motivated by the methodological principle of the identity of indiscernables [7], is then an inference from the operational description to the ontological description of an experiment. It posits that the operational equivalences are preserved in the ontological model: the reason one cannot distinguish two operationally equivalent procedures is that there is, ontologically, no difference between them. Mathematically, the assumption of measurement noncontextuality entails that while the assumption of preparation noncontextuality entails that Here we denote µ(λ|S) ≡ s∈V S µ(λ, s|S), etc., for simplicity of notation, rather than use the notation µ(λ, |S) for these coarse-grained distributions.

III. HYPERGRAPH APPROACH TO KOCHEN-SPECKER SCENARIOS IN THE SPEKKENS FRAMEWORK
We will use the language of hypergraphs and their subgraphs to study Kochen-Specker type experimental scenarios in a framework that allows for operational noncontextuality inequalitiesà la Spekkens [7]. The presentation here is a hybrid one, discussing features of the CSW framework [9,10] in the notation of the AFLS framework [11], but extending both in ways appropriate for the purpose of this paper. Our goal is to demonstrate how the graph-theoretic invariants of CSW [10] can be repurposed towards obtaining noise-robust noncontextuality inequalities.
We do this in two parts: first, we define a representation of measurement events in the manner of Refs. [10,11], and then we define a representation of source events in the spirit of Ref. [1].

A. Measurements
The basic object for representing measurements is a hypergraph, Γ, with a set of vertices V (Γ) such that each vertex v ∈ V (Γ) denotes a measurement outcome, and a set of hyperedges E(Γ) such that each hyperedge e ∈ E(Γ) is a subset of V (Γ) and denotes a measurement consisting of outcomes in e. Here, E ⊆ 2 V (Γ) and e∈E(Γ) e = V (Γ). Such a hypergraph satisfies the definition of a contextuality scenarioà la AFLS [11]. We will further assume, unless specified otherwise, that the hypergraph is simple: that is, e 1 , e 2 ∈ E(Γ) and e 1 ⊆ e 2 ⇒ e 1 = e 2 , or that no hyperedge is a strict subset of another. Such hypergraphs are also called Sperner families [37]. Two measurement events are said to be (mutually) exclusive if the vertices denoting them appear in a common hyperedge, i.e., if they can be realized in a single measurement. Here, "exclusive" refers to the fact that both measurement events cannot occur together for a given source event when the measurement corresponding to a hyperedge in which they appear together is implemented: hence, the sum of their occurrence probabilities cannot exceed 1.
A probabilistic model on Γ is an assignment of probabilities to the vertices v ∈ V (Γ) such that p(v) ≥ 0 for all v ∈ V (Γ) and v∈e p(v) = 1 for all e ∈ E(Γ).
Here we are assuming that, in fact, every vertex v represents an equivalence class of measurement events, denoted [m|M], and every edge e represents an equivalence class of measurements, denoted M. 11 This assumption is implicit in previous (hyper)graph-theoretic approaches to KS-contextuality [10,11]. The fact that each v represents an equivalence class of measurement events, [m|M], means that 1. any probabilistic model on Γ, viewed as operational probabilities for a given source event (that is p(v) ≡ p(v|S, s) ≡ p(m|M, S, s)), respects (by definition) the operational equivalences we have presumed between measurement events in the operational description of the experiment, 12 and 2. any probabilistic model on Γ, viewed as ontological probabilities for a given ontic state (that is, p(v) ≡ p(v|λ) ≡ ξ(m|M, λ)), respects (by definition) the assumption of measurement noncontextuality with respect to the presumed operational equivalences between measurement events.
Thus, it's the structure of a contextuality scenario that dictates the operational equivalences between measurement events that are of interest and it's the definition of a probabilistic model on such a contextuality scenario which ensures that such operational equivalences are respected by the operational probabilities as well as the fact that measurement noncontextuality is respected when the probabilistic model is viewed as an ontological assignment of probabilities by a fixed ontic state. We will therefore often write p(m, s|M, S) as p(v, s|S) and p(m|M, S, s) as p(v|S, s), where [s|S] is a source event. Similarly, we will also write ξ(m|M, λ) as p(v|λ), where λ is an ontic state.
Orthogonality graph of Γ, O(Γ): Given the hypergraph Γ, we construct its orthogonality graph O(Γ): that is, the vertices of O(Γ) are given by V (O(Γ)) ≡ V (Γ), and the edges of O(Γ) are given by E(O(Γ)) ≡ {{v, v }|v, v ∈ e for some e ∈ E(Γ)}. Each edge of O(Γ) denotes the exclusivity of the two measurement events it connects, i.e., the fact that they can occur as outcomes of a single measurement.
For any Bell-KS inequality constraining correlations between measurement events from O(Γ) (when all measurements are implemented on a given source event), we construct a subgraph G of O(Γ) such that the vertices of G, i.e., V (G), correspond to measurement events that appear in the inequality with nonzero coefficients, and two vertices share an edge in G if and only if they share an edge in O(Γ). More explicitly, consider a Bell-KS expression where where R KS is the upper bound on the expression in any operational theory that admits a KS-noncontextual ontological model. Often, but not always, these inequalities are simply of the form where w v = 1 for all v ∈ V (G). In keeping with the CSW notation [10], we will denote the general situation by a weighted graph (G, w), where w is a function that maps vertices v ∈ V (G) to weights w v > 0. See Figures 2 and 3 for an example from the Klyachko-Can-Binicioglu-Shumovsky (KCBS) scenario [10,38]. Below, we make some remarks clarifying the scope of the framework described above before we move to the case of sources.

Classification of probabilistic models
We classify the probabilistic models on a hypergraph Γ as follows: • KS-noncontextual probabilistic models, C(Γ): a probabilistic model which is a convex combination of deterministic assignments p : where v∈e p(v) = 1 for all e ∈ E(Γ). In Ref. [11], this is referred to as a "classical model". 13 • Consistent exclusivity satisfying probabilistic models, CE 1 (Γ): a probabilistic model on Γ, p : V (Γ) → [0, 1], such that (in addition to satisfying the definition of a probabilistic model), v∈c p(v) ≤ 1 for all cliques c in the orthogonality graph O(Γ). This is the same as the set of E1 probabilistic models of Ref. [10].
Note that a clique in the orthogonality graph O(Γ) is a set of vertices that are pairwise exclusive (i.e., every vertex in this set shares an edge with every other vertex).
• General probabilistic models, G(Γ): Any p that satisfies the definition of a probabilistic model is a general probabilistic model.
We therefore have for any hypergraph Γ.

Structural Specker's principle vs. Statistical Specker's principle
The CSW framework [10] restricts the scope of probabilistic models on a hypergraph to those satisfying consistent exclusivity (the E1 probabilistic models), motivated by what is sometimes called Specker's principle [39]: that is, "if you have several questions and you can answer any two of them, then you can also answer all of them". If by "questions" we understand measurement settings, then the principle says that a set of pairwise jointly implementable measurement settings is itself jointly implementable. Note that when we say a set of measurement settings is "jointly implementable", "jointly measurable", or "compatible", we mean that there exists another choice of a single measurement setting in the theory such that this measurement setting can reproduce the statistics of all the measurement settings in the set by coarse-graining. See Ref. [29] for an overview of joint measurability in quantum theory. As such, in its application to measurement settings, Specker's principle is a constraint on the structure of measurements allowed in a physical theory that respects it, e.g., measurement settings that correspond to projective measurements or PVMs (projection valued measures) in quantum theory. This is, for example, the reading adopted in Ref. [40]. On the other hand, we will often also refer to the "joint measurability" of a set of measurement events, by which we mean that this set of measurement events is a subset of the set of measurement outcomes for some choice of measurement setting. At the level of measurement events, 14 then, there are two distinct ways to read Specker's principle that one needs to keep in mind which we distinguish as structural Specker's principle vs. statistical Specker's principle. We define these two readings below: • Structural Specker's principle imposes a structural constraint on a contextuality scenario Γ.
This (strong) reading of Specker's principle applies to any set of measurement events, say M ⊆ V (Γ), where every pair of measurement events can arise as outcomes of a single measurement: that is, for each pair {v, v } ⊆ M, there exists some e ∈ E(Γ) such that {v, v } ⊆ e. The principle then states: Given a set M of pairwise jointly measurable measurement events in some contextuality scenario Γ, all the measurement events in M are jointly measurable, i.e., all the measurement events in the set can arise as outcomes of a single measurement: M ⊆ e for some e ∈ E(Γ).
Alternatively, the constraint of structural Specker's principle can be restated as: Every clique in the orthogonality graph of Γ, O(Γ), is a subset of some hyperedge in Γ.
Note that we haven't said anything directly about probabilities here: any Γ satisfying the above property is said to satisfy structural Specker's principle.
• Statistical Specker's principle (or consistent exclusivity) imposes a statistical constraint on probabilistic models on any contextuality scenario Γ representing measurement events in an operational theory.
This (weak) reading of Specker's principle imposes an additional constraint on a probabilistic model Given a set M of pairwise jointly measurable measurement events, p satisfies v∈M p(v) ≤ 1.
This can also be expressed as: A probabilistic model p ∈ G(Γ) is said to satisfy statistical Specker's principle if the sum of probabilities it assigns to the vertices of every clique in the orthogonality graph of Γ, O(Γ), does not exceed 1, i.e., v∈c p(v) ≤ 1 for all cliques c in O(Γ).
All probabilistic models that satisfy this constraint define the set of probabilistic models CE 1 (Γ) (or E1) for any contextuality scenario Γ regardless of whether Γ satisfies structural Specker's principle.
Any probabilistic model p on Γ such that p ∈ CE 1 (Γ) is said to satisfy statistical Specker's principle or, equivalently, consistent exclusivity [11].
Probabilistic models on any hypergraph belonging to the set of Γ which satisfy the (strong) structural Specker's principle obviously satisfy the (weak) statistical Specker's principle. This holds simply on account of the structure of such Γ: that is, for all Γ satisfying structural Specker's principle, we have CE 1 (Γ) = G(Γ), so that any probabilistic model p ∈ G(Γ) also satisfies statistical Specker's principle: p ∈ CE 1 (Γ). To see that CE 1 (Γ) = G(Γ) for any Γ satisfying structural Specker's principle, note that every clique c in O(Γ) is a subset of some hyperedge in Γ, hence for every clique c v∈c p(v) ≤ 1 for all p ∈ G(Γ), i.e., p ∈ CE 1 (Γ).
While structural Specker's principle, imposing a constraint on the structure of Γ, implies that CE 1 (Γ) = G(Γ), 15 it remains an open question whether the converse is true: That is, given that CE 1 (Γ) = G(Γ) for some Γ, is it the case that Γ must then necessarily satisfy structural Specker's principle, namely, that every clique in O(Γ) is a subset of some hyperedge in Γ?
A positive answer to this question would answer Problem 7.2.3 of Ref. [11] asking for a characterization of Γ for which CE 1 (Γ) = G(Γ).
We have so far defined structural Specker's principle as a constraint on Γ and statistical Specker's principle as a constraint on a probabilistic model on any Γ. Any operational theory would typically allow many possible Γ to be realized by its measurement events as well as many possible probabilistic models to be realized on any Γ representing its measurement events. It will be useful for our discussion to define what we mean when we say an operational theory, say T, satisfies structural Specker's principle or that it satisfies statistical Specker's principle.
We denote by T(Γ) the set of probabilistic models achievable on any Γ by an operational theory T. Since an operational theory can only put further constraints on probabilistic models in G(Γ), we obviously have: T(Γ) ⊆ G(Γ).

T satisfies statistical Specker's principle:
We say an operational theory T satisfies statistical Since the satisfaction of statistical Specker's principle is a constraint on the statistical predictions of T, there must be some fact about the structure of T that leads to this constraint. This fact enforcing statistical Specker's principle could be some restriction arising from the structure of allowed measurement events and/or even the structure of allowed preparations in the operational theory T. For instance, this is the case for quantum theory when one only considers projective measurements implemented on an arbitrary quantum state, i.e., where Q(Γ) denotes the set of probabilistic models that can be obtained in this way. More generally, one could relax the norestriction hypothesis [41] in some particular way in T so that not all probabilistic models in G(Γ) are allowed in T(Γ). In the case of quantum theory, restricting attention to only projective measurements (as we just pointed out) rather than the more general case allowing arbitrary POVMs is one way of restricting the set of possible probabilistic models realizable with quantum states and measurements to a strict subset of G(Γ). Allowing arbitrary POVMs would lead to a violation of statistical Specker's principle by probabilistic models arising from quantum theory.
Let us now define what it means for an operational theory T to satisfy structural Specker's principle.
2. T satisfies structural Specker's principle: An operational theory T is said to satisfy structural Specker's principle if for any set of measurement events that are pairwise jointly measurable, i.e, measurement events in each pair arise as outcomes of some measurement in the theory, it is the case that all the measurement events in the set are jointly measurable, i.e., all the measurement events in the set arise as outcomes of a single measurement in the theory.
We will now show that a theory which satisfies structural Specker's principle also satisfies statistical Specker's principle. To do this, we consider a contextuality scenario Γ which may not satisfy structural Specker's principle and from it construct a contextuality scenario Γ which does satisfy the principle. The construction proceeds as follows:

Turn each clique in
3. Turn each maximal clique c in O(Γ) that is not a hyperedge in Γ to a hyperedge in Γ and include an additional vertex v c in this hyperedge. Here, a maximal clique in a graph is a clique that is not a strict subset of another clique, i.e., there is no vertex outside the clique that shares an edge with each vertex in the clique.
We then have for the hyperedges of Γ , where C is the set of maximal cliques in O(Γ) that are not hyperedges in Γ.
Note that as long as a theory T satisfies structural Specker's principle, converting maximal cliques in O(Γ) that are not hyperedges in Γ to hyperedges in Γ is a valid move within the theory since the resulting hyperedge would indeed constitute a valid measurement in the theory.

The resulting contextuality scenario Γ is thus given
, the two hypergraphs are isomorphic).
Our construction of Γ leads to the following properties: • Γ satisfies structural Specker's principle (by construction) since every clique in O(Γ ) is a subset of some hyperedge in Γ . Hence, it's also the case that statistical Specker's principle holds for probabilistic models on Γ as CE 1 (Γ ) = G(Γ ).
Note that the construction of Γ relied on the fact that the theory we are considering satisfies structural Specker's principle. If the theory doesn't satisfy this principle, but one goes ahead with the construction of Γ , then the new hyperedges in Γ may not constitute valid measurements in the theory.
• Probabilistic models in G(Γ ) are in one-to-one correspondence with probabilistic models in CE 1 (Γ): for any probabilistic model p Γ ∈ CE 1 (Γ), we can define a probabilistic model Similarly, for any p Γ ∈ G(Γ ) we have a probabilistic model p Γ ∈ CE 1 (Γ) if we simply ignore the probabilities assigned to the vertices v c , c ∈ C, which do not appear in Γ.
• Hence, the set of probabilistic models on Γ that satisfy statistical Specker's principle, i.e., CE 1 (Γ), are in one-to-one correspondence with the set of probabilistic models on Γ which (by construction) satisfies structural Specker's principle so that CE 1 (Γ ) = G(Γ ).
We therefore have that denotes the probabilistic models induced on Γ by those on Γ (ignoring the probabilities assigned to vertices in V (Γ )\V (Γ)).
It is conceivable that a particular Γ may not admit probabilistic models from an operational theory T, i.e., T(Γ) = ∅. On the other hand, if Γ admits a representation in terms of measurement events admissible in T, so that T(Γ) = ∅, then two possibilities arise: Γ satisfies structural Specker's principle or it doesn't. If Γ satisfies structural Specker's principle then any probabilistic model in T(Γ) will satisfy statistical Specker's principle and we have Γ = Γ. If Γ does not satisfy structural Specker's principle, we consider its relation with the contextuality scenario Γ constructed from it that does satisfy structural Specker's principle. Such a Γ admits a representation in a theory T satisfying structural Specker's principle (that is, T(Γ ) = ∅) as long as Γ admits such a representation (that is, T(Γ) = ∅). Indeed, it's the satisfaction of structural Specker's principle in T that renders the construction of Γ from Γ physically allowed in T.
Thus, in a theory T that satisfies structural Specker's principle, the following holds: for every probabilistic is uniquely fixed: it's obtained by just neglecting the probabilities assigned by p Γ to the vertices in V (Γ )\V (Γ).
We must therefore have where T(Γ ) V (Γ) denotes the set of probabilistic models induced on Γ by the set of probabilistic models in T(Γ ) under the correspondence we have already established above. We can now state and prove the following theorem: Theorem 1. If an operational theory T satisfies structural Specker's principle, then it also satisfies statistical Specker's principle.
Proof. For any Γ that does not admit a probabilistic model in T, i.e., T(Γ) = ∅, statistical Specker's principle is trivially satisfied since For any Γ that does admit a probabilistic model in T, i.e., T(Γ) = ∅, we can have one of two possibilities: either it satisfies structural Specker's principle, in which case T(Γ) ⊆ CE 1 (Γ) = G(Γ), or it doesn't, in which case we consider the Γ constructed from it following the recipe we have already outlined so that we have: Since T satisfies structural Specker's principle, we have That is, the theory T satisfies statistical Specker's principle on Γ: Overall, we have the desired result: T satisfies structural Specker's principle ⇒ T(Γ) ⊆ CE 1 (Γ) ⊆ G(Γ) for all Γ, i.e., T satisfies statistical Specker's principle.
Thus, one way of enforcing that a particular operational theory T satisfies statistical Specker's principlethat is, T(Γ) ⊆ CE 1 (Γ) ⊆ G(Γ) for all Γ -is to require that it satisfies structural Specker's principle, a constraint on the structure of measurement events in T. This is, for example, what is achieved in Ref. [31] by invoking a notion of "sharpness" for measurement events in an operational theory such that any set of sharp measurement events that are pairwise jointly measurable are all jointly measurable. That is, structural Specker's principle is satisfied in a theory with such sharp measurement events and, consequently, statistical Specker's principle, or what is more conventionally called consistent exclusivity [11], is also satisfied. But it's conceivable that there may be other ways to ensure that only a subset of CE 1 (Γ) probabilistic models are allowed in T(Γ) for any Γ. What we wish to emphasize here is that it is by no means obvious (or at least, it needs to be proven) that the only way to restrict the set of probabilistic models T(Γ) to a subset of CE 1 (Γ) for any Γ is to require that the theory T satisfy structural Specker's principle. 17 Note that statistical Specker's principle (or consistent exclusivity) is so intrinsic to the CSW approach [10] that they do not consider probabilistic models that do not satisfy this principle. 18 This will become important when we consider the fact that nonprojective measurements in quantum theory do not satisfy Specker's principle, structural or statistical (at the level of measurement events), nor even the stronger statement of Specker's principle for measurement settings (cf. Ref. [40]). Indeed, such measurements admit contextuality scenarios Γ that are not possible with projective measurements, such as the one from three binary-outcome POVMs that are pairwise jointly measurable but not triplewise so [14][15][16], and the probabilistic models they give rise to can only be accommodated in the most general set of probabilistic models, G(Γ), since trivial POVMs can realize any probabilistic model at all. Specker's principle, structural or statistical, was motivated by the fact that projective measurements in quantum theory have the property that any set of pairwise orthogonal projectors can be measured together. This principle (in either reading, structural or statistical) would be obeyed in any theory where measurements have this property, and indeed, the more recent attitude of its proponents [28] is to restrict attention to "sharp" measurements in such theories [31,32], where the definition of "sharp" ensures the property of pairwise jointly measurable events being globally jointly measurable, a property which forms the motivational basis (and is sufficient) for statistical Specker's principle to hold (cf. Theorem 1). That is, they seem to regard statistical Specker's principle as grounded in (and physically justified by) structural Specker's principle. Theorem 1 is a quantitative state- 17 Indeed, any putative theory yielding the set of almost quantum correlations (which satisfy statistical Specker's principle) [42] cannot satisfy Specker's principle -that pairwise joint implementable measurement settings are all jointly implementable -for any notion of sharp measurements [40]. Whether structural Specker's principle, which is defined at the level of measurement events, can be upheld for an almost quantum theory -so that it falls in the category of operational theories with sharp measurements envisaged in Ref. [31] -remains an open question. 18 As we have already noted, a noise-robust noncontextuality inequality of the type in Ref. [1] that is based on a logical proof of the KS theorem is not even obtainable if one restricted attention to probabilistic models satisfying CE 1 . The upper bound on that inequality comes from a probabilistic model that does not satisfy CE 1 . ment of this intuition in the hypergraph formalismà la AFLS [11]. The work of Refs. [31,32] can be understood as bridging the gap between structural Specker's principle and statistical Specker's principle by formally defining a notion of sharp measurements in an operational theory such that structural Specker's principle holds for these sharp measurements.
On the other hand, and this is the key point for our purposes, if one wants to make no commitment about the representation of measurements in the operational theory (in particular, not requiring a notion of "sharpness"), then Specker's principle is not a natural constraint to impose on probabilistic models and, indeed, one must deal with the full set of probabilistic models G(Γ) on any contextuality scenario Γ rather than restrict oneself to the set of probabilistic models CE 1 (Γ). It is for this reason that we are translating the notions from CSW [10] to the notational conventions of AFLS [11], the latter being a more natural choice for our purposes, allowing the language needed to articulate the difference between CE 1 (Γ) and G(Γ) rather than excluding the latter by fiat or, perhaps, by an appeal to structural Specker's principle holding for sharp measurements in the landscape of operational theories under consideration (cf. Theorem 1). It is for all these reasons that the "exclusivity principle" a la CSW [10] is not enough to make sense of Spekkens contextuality applied to Kochen-Specker type scenarios. The framework we propose in this paper addresses this gap between the notions Spekkens contextuality (which applies to arbitrary measurements) requires in a hypergraph framework and those that the CSW framework [10] (which applies to "sharp" measurements) can provide in its graph-theoretic formulation. 4. Remark on the classification of probabilistic models: why we haven't defined "quantum models" as those obtained from projective measurements The reader may note that we haven't tried to define any notion of a "quantum model" so far, having only adopted the definitions of Ref. [11] for KS-noncontextual models (C(Γ)), for models satisfying consistent exclusivity (CE 1 (Γ)), and for general probabilistic models (G(Γ)). The reason for this is that we do not wish to restrict ourselves to projective measurements in defining a "quantum model", unlike the traditional Kochen-Specker approaches [10,11]. In Ref. [11], a quantum model is defined as a probabilistic model that can be realized in the following manner: assign projectors {Π v } v∈V (Γ) (defined on any Hilbert space) to all the vertices of Γ such that v∈e Π v = I for all e ∈ E(Γ), and we have p(v) = Tr(ρΠ v ), for some density operator ρ on the Hilbert space, I being the identity operator.
On the other hand, allowing arbitrary positive operator-valued measures (POVMs) in a definition of a quantum model (as we would rather prefer) means that, in fact, quantum models on a hypergraph Γ are as general as the general probabilistic models G(Γ), rendering such a definition redundant. This can be seen by noting that for any probabilistic model p ∈ G(Γ), one can associate positive operators to the vertices of Γ given by p(v)I such that for any quantum state ρ on some Hilbert space, we have p(v) = Tr(ρp(v)I), where I is the identity operator.
Our focus in this paper is not on quantum theory, in particular, even though the need to be able to handle noisy measurements and preparations (particularly, trivial POVMs) in quantum theory can be taken as a motivation for this work. Rather, our focus is on delineating the boundary between operational theories that admit noncontextual ontological models (for Kochen-Specker type experiments, suitably augmented with multiple preparation procedures, as outlined in this paper) and those that don't by obtaining noise-robust noncontextuality inequalities. In particular, we want these inequalities to indicate the noise thresholds beyond which an experiment cannot rule out the existence of a noncontextual ontological model with respect to the quantities of interest. This also means that making sense of quantum correlations in this approach requires one to pay attention not only to the measurements involved in an experiment but also the preparations; indeed, this shift of focus from measurements alone, to include multiple preparations (or source settings), is a fundamental conceptual difference between our approach and that of traditional Kochen-Specker contextuality frameworks [10,11,13].

Scope of this framework
Note that whenever we refer to the "CSW framework", we mean the framework of Ref. [10], which often differs from the framework of Ref. [9] in some respects, e.g., the normalization of probabilities in a given hyperedge, assumed in [10], but not in [9]. In Ref. [9], the authors write: Notice that in all of the above we never require that any particular context should be associated to a complete measurement: the conditions only make sure that each context is a subset of outcomes of a measurement and that they are mutually exclusive. Thus, unlike the original KS theorem, it is clear that every context hypergraph Γ has always a classical noncontextual model, besides possibly quantum and generalized models.
On the other hand, in Ref. [10], they write: The fact that the sum of probabilities of outcomes of a test is 1 can be used to express these correlations as a positive linear combination of probabilities of events, S = i w i P (e i ), with w i > 0. The latter presentation [10] is more in line with the "original KS theorem" [8], as well as the presentation in Ref. [11]. Since normalization of probabilities is thus presumed in Ref. [10], in keeping with the definition of a probabilistic model we have presented (following [11]), the graph invariants of CSW [10] refer, specifically, to subgraphs G of those hypergraphs Γ on which the set of KS-noncontextual probabilistic models is nonempty. In particular, our generalization of the CSW framework [10] in this paper says nothing about noiserobust noncontextuality inequalities from logical proofs of the Kochen-Specker theorem [8], which rely on hypergraphs Γ that admit no KS-noncontextual probabilistic models, i.e., KS-uncolourable hypergraphs. It also says nothing for the hypergraphs Γ that do not satisfy the property CE 1 (Γ) = G(Γ). An example of such a hypergraph, which is not covered by our generalization of the CSW framework on both counts, is the 18 ray hypergraph first presented in Ref. [43], denoted Γ 18 (see Fig. 4 and Appendix B). Indeed, the study of noise-robust noncontextuality inequalities from such KS-uncolourable hypergraphs was initiated in Ref. [1], and a more exhaustive hypergraph-theoretic treatment of it will be presented in forthcoming work [44]. In this paper, we will restrict ourselves to KS-colourable hypergraphs, the study of which was initiated in Ref. [5], and, of these, only those KScolourable hypergraphs Γ which satisfy CE 1 (Γ) = G(Γ). Note that this is not a limitation of our general approach, which is based on Ref. [5] and applies to any KScolourable hypergraph, but rather a limitation we inherit from the CSW framework [10] 19 since we want to leverage their graph invariants in obtaining our noise-robust noncontextuality inequalities. The study of other KS-colourable hypergraphs, in particular those which arise only with nonprojective measurements in quantum theory [14][15][16] and are outside the scope of traditional frameworks [10,11,13], will be taken up in future work.
To summarize, the measurement events hypergraphs Γ where the present framework (and the CSW framework [10]) applies must satisfy two properties: C(Γ) = ∅ (that is, KS-colourability) and CE 1 (Γ) = G(Γ). 20 In the next subsection, we define additional notions necessary to obtain noise-robust noncontextuality inequalities that make use of graph invariants from the CSW framework. These notions correspond to source events that are an integral part of our framework.

B. Sources
Having introduced the (hyper)graph-theoretic elements that we need to talk about measurement events, we are now in a position to introduce features of source events that are relevant in the Spekkens framework.
As we have argued previously, we require the measurement events hypergraph Γ to be such that C(Γ) = ∅ and CE 1 (Γ) = G(Γ) to be able to obtain noise-robust noncontextuality inequalities that use graph invariants from the CSW framework [10]. Now, in the CSW framework [10], every Bell-KS expression picks out a particular subgraph G of the orthogonality graph O(Γ) of the contextuality scenario Γ of interest. The vertices of G denote the measurement events of interest in a given Bell-KS expression and we have the following: • A general probabilistic model p ∈ G(Γ) will assign probabilities to vertices in G such that: p(v) ≥ 0 for all v ∈ V (G) and p(v) + p(v ) ≤ 1 for every edge {v, v } ∈ E(G).
• A probabilistic model p ∈ CE 1 (Γ) will assign probabilities to vertices in G such that: 20 As we have shown, when the operational theory T under consideration satisfies structural Specker's principle, we can always turn a hypergraph Γ that doesn't satisfy structural Specker's principle into a hypergraph Γ that satisfies it and for which, therefore, CE 1 (Γ ) = G(Γ ) holds. This can be seen as justification for restricting oneself to probabilistic models satisfying consistent exclusivity in the CSW framework [10]: such a restriction is not really a restriction if the theory satisfies structural Specker's principle. On the other hand, we restrict ourselves to hypergraphs for which CE 1 (Γ) = G(Γ) without assuming that T satisfies structural Specker's principle. The justification for this seemingly ad hoc restriction is simply that it is necessary in order to meaningfully leverage the graph invariants of CSW [10] -in particular, the fractional packing number -in our noise-robust noncontextuality inequalities. This will become clear when we obtain our noise-robust noncontextuality inequalities. for every clique c ⊆ V (G).
• A probabilistic model p ∈ C(Γ) will assign probabilities to vertices in G such that: where Pr(k) ≥ 0, k Pr(k) = 1, and for each k, p k is a set of deterministic as- Since Γ is such that CE 1 (Γ) = G(Γ), the condition v∈c p(v) ≤ 1 for every clique c ⊆ V (G) on the probabilities assigned to vertices in G is redundant. We now define a simplified hypergraph, Γ G , obtained from G as follows: convert all maximal cliques in G to hyperedges and add an extra (no-detection) vertex 21 to each such hyperedge.
This Γ G , for any G, will satisfy the property that CE 1 (Γ G ) = G(Γ G ) and any probabilistic model on Γ assigning probabilities to measurement events in G will correspond to a probabilistic model on Γ G which also assigns the same probabilities to measurement events in G. Formally: where v c is the extra no-detection vertex added to the hyperedge corresponding to maximal clique c in G.
We have the following probabilistic model on Γ G , given a probabilistic model p ∈ G(Γ): the probabilities assigned to the vertices in V (G) ⊆ V (Γ G ) are the same as specified by p ∈ G(Γ) and the probabilities assigned to the remaining vertices in V (Γ G )\V (G) are given by p(v c ) = 1 − v∈c p(v), for every maximal clique c in G. Consider, for example, the KCBS scenario [5,10,38]: the 20-vertex Γ representing measurement events from five 4-outcome joint measurements (Fig. 2), its 5 vertices G involved in the KCBS inequality (Fig. 3), and 10-vertex hypergraph Γ G constructed from G (Fig. 5).
Given Γ G , constructed from G, we can now define a hypergraph Σ G of source events as follows: for every hyperedge e ∈ E(Γ G ), corresponding to the choice of measurement setting M e , we define a hyperedge e ∈ E(Σ G ) denoting a corresponding choice of source setting S e . And for every vertex v ∈ e(∈ E(Γ G )), we define a vertex v e ∈ e(∈ E(Σ G )). Hence, every measurement event [v|e] in Γ G corresponds to a vertex v e of Σ G , and the number of such vertices in V (Σ G ) is |V (Γ G )||E(Γ G )|. This means that the operational equivalences between the measurement events that are implicit in Γ G -such as [v|e] is operationally equivalent to [v|e ], where e, e ∈ E(Γ G ) are distinct hyperedges that share the vertex (representing an equivalence class of measurement events) v ∈ V (Γ G ) -are not carried over to the source events, where none is presumed to be operationally equivalent to any other, Here v e (v e ) represents a source event [s e |S e ] ([s e |S e ]), rather than an equivalence class of source events.
Besides these |V (Γ G )||E(Γ G )| vertices in V (Σ G ) and the associated hyperedges e ∈ E(Σ G ), we have an additional hyperedge e * ∈ E(Σ G ), representing a source setting S e * , containing two new vertices v 0 e * , v 1 e * ∈ V (Σ G ). Here An example of such a source events hypergraph was considered in Ref. [1], albeit without the additional source labelled by e * here [5]. We illustrate it here in Fig. 6 for the KCBS scenario.

IV. A KEY HYPERGRAPH INVARIANT: THE WEIGHTED MAX-PREDICTABILITY
Without loss of generality, we assume that the extremal probabilistic models on Γ G are in bijective correspondence with the ontic states of the physical system on which the measurements are carried out. Thus, the measurement noncontextual assignment of probabilities given by the response functions {ξ(v|λ)} v∈V (Γ G ) is convexly extremal for all λ ∈ Λ. The set of extremal prob- abilistic models on Γ G can then be denoted by the set of ontic states Λ ≡ Λ det ∪ Λ ind , where Λ det corresponds to the set of extremal probabilistic models that assign {0, 1}-valued probabilities to all the measurement events of Γ G , and Λ ind corresponds to the set of extremal probabilistic models that assign (0, 1)-valued probabilities to some (non-empty subset) of the measurement events of Γ G .
We can now define a hypergraph invariant that will be relevant for the operational noncontextuality inequalities we derive: where q e ≥ 0 for all e ∈ E(Γ G ) and e∈E(Γ G ) q e = 1 and is the maximum probability of occurrence for any outcome of the measurement M e corresponding to the hyperedge e ∈ E(Γ G ). We call β(Γ G , q) the weighted maxpredictability of the measurement settings (i.e., hyperedges) in Γ G , where the hyperedges e ∈ E(Γ G ) are weighted by the probabilities given by the probability distribution q ≡ {q e } e∈E(Γ G ) .

V. NOISE-ROBUST NONCONTEXTUALITY INEQUALITIES
Consider the positive linear combination of the probabilities of measurement events, where w v > 0 for all v ∈ V (G).
The fundamental result of CSW is that this quantity is bounded for different sets of correlations -KSnoncontextual, those realizable by projective quantum measurements, and those satisfying consistent exclusivity -by graph-theoretic invariants as follows: where KS denotes the set of probabilistic models C(Γ G ), Q denotes the set of probabilistic models on Γ G achievable by projective quantum measurements, denoted Q(Γ G ), and CE 1 denotes the set CE 1 (Γ G ). The graph invariants of the weighted graph (G, w), namely, α(G, w), θ(G, w), and α * (G, w) are defined as follows: 1. Independence number α(G, w): where I ⊆ V (G) is an independent set of vertices of G, i.e., a set of nonadjacent vertices of G, so that none of the vertices in this set shares an edge with any other vertex in the set.
2. Lovasz theta number θ(G, w): where is called an orthonormal representation (OR) of the complement of G, namely,Ḡ, and the unit vector |ψ ∈ R d is called a handle. 3. Fractional packing number α * (G, w): where Note that since we are always considering Γ G such that CE 1 (Γ G ) = G(Γ G ), we, in fact, have the bounds where GPT denotes the full set of probabilistic models on Γ G , i.e., G(Γ G ).
In terms of the notation we have already introduced, where R([s|S]) ≤ R KS was a Bell-KS inequality, we now have -from CSW [10] -that R KS = α(G, w).
We need to define a new quantity not in the CSW framework, namely, q e me,se δ me,se p(m e , s e |M e , S e ), (19) where {q e } e∈E(Γ G ) is a probability distribution, i.e., q e ≥ 0 for all e ∈ E(Γ G ) and e∈E(Γ G ) q e = 1, such that β(Γ G , q) < 1 holds. 22 In previous work [1,5], we have taken q to be the uniform distribution q e = 1 |E(Γ G )| , but the derivation of the noncontextuality inequalities is independent of that choice (as we'll see here). Also, note that we have chosen a labelling convention for outcomes of source setting S e (namely, s e ) and measurement set- In the ontological model, Defining Similarly, Corr = λ∈Λ e∈E(Γ G ) q e me,se δ me,se ξ(m e |M e , λ)µ(s e |S e , λ)ν(λ) where we have used preparation noncontextuality: Using the fact that where p 0 ≡ p(s e * = 0|S e * ). Note that, for any λ ∈ Λ, Corr(λ) is upper bounded as follows: where ζ(M e , λ) ≡ max me ξ(m e |M e , λ). If λ ∈ Λ det , then this upper bound is trivial, i.e., Corr(λ) ≤ 1. On the other hand, for all λ ∈ Λ ind , we have Similarly, for λ ∈ Λ det we have R(λ) ≤ α(G, w), while for λ ∈ Λ ind we have R(λ) ≤ α * (G, w).
Thus, our noise-robust noncontextuality inequality now reads: , (32) which can be rewritten as For a nontrivial upper bound -and hence, the possibility of witnessing contextuality via this inequalitythe upper bound on Corr should be strictly bounded above by 1, and the upper bound on R should be strictly bounded above by α * (G, w) (the algebraic upper bound on R), that is These are the minimal benchmarks necessary -besides the requirement of tomographic completeness of a finite set of procedures and the possibility of inferring secondary procedures with exact operational equivalences using convexity of the operational theory [2] -to witness contextuality in a Kochen-Specker type experiment adapted to our framework following Spekkens [7]. Suppose one achieves, by some means, a value of R = θ(G, w). When would this value be an evidence of contextuality? For this to be the case, we must have: Now, for the ideal quantum realization where measurement events are projectors, and the corresponding source events are eigenstates, it is always the case that Corr = 1, hence contextuality is witnessed. However, it's possible to witness contextuality even if Corr < 1, as long as it exceeds the lower bound we specified above. In a sense, for quantum theory, this allows for a quantitative account of the effect of nonprojectiveness in the measurements (or mixedness in preparations) on the possibility of witnessing contextuality, a feature that is absent in traditional Kochen-Specker approaches [9][10][11]13]. Indeed, as long as one achieves any value of R > α(G, w), it is possible to witness contextuality for a sufficiently high value of Corr (see Eq. (32)).

A. Example: KCBS scenario
We will now illustrate our hypergraph framework by applying it to the KCBS scenario to make differences with respect to the CSW graph-theoretic framework [10] explicit.
The graph G for the KCBS scenario is given in Fig. 3, the measurement events hypergraph Γ G is given in Fig. 5, and the source events hypergraph Σ G is given in Fig. 6. We then have where the (vertex) weights w v = 1 for all v ∈ V (G), i.e., it's an unweighted graph and we will use α(G) and α * (G) to denote its independence number and the fractional packing number, respectively. These are given by α(G) = 2 and α * (G) = 5/2.
The source-measurement correlation term is given by q e me,se δ me,se p(m e , s e |M e , S e ) (38) for any choice of probability distribution q ≡ {q e } e∈E(Γ G ) . Note that the only extremal probabilistic model on Γ G corresponding to an indeterministic assignment (in Λ ind ) assigns ξ(v|λ) = 1 2 for all v ∈ V (G). This means The noncontextuality inequality of Eq. (33) then becomes (in the KCBS scenario) or Recall that the KCBS inequality [10,38] reads R ≤ 2 and it would be a valid noncontextuality inequality in our framework if and only if one can find measurements and preparations such that Corr = 1. In the standard KCBS construction [38] that violates the inequality R ≤ 2, we have the five vertices in G (say v i , i ∈ {1, 2, 3, 4, 5}, labelled cyclically) associated with five projectors Π i = |l i l i |, i ∈ {1, 2, 3, 4, 5}, on a qutrit Hilbert space, given by the vectors |l i = (sin θ cos φ i , sin θ sin φ i , cos θ), φ i = 4πi 5 , and cos θ = 1 4 √ 5 . The special source event [s e * = 0|S e * ] is associated with the quantum state |ψ = (0, 0, 1), so that See Fig. 7 for a depiction of the geometric configuration of these vectors.
To turn this KCBS construction into an argument against noncontextuality in our approach, we need additional ingredients beyond the graph G. Firstly, for both the measurement events hypergraph Γ G and the source events hypergraph Σ G , we denote the hyperedges by e i , i ∈ {1, 2, 3, 4, 5}. In Γ G , the measurement events for the setting M ei are given by 3 . We thus have the operational equivalences we need between the source settings: This choice of representation for Γ G and Σ G yields p 0 = 1 3 , Corr = 1, and R([s e * = 0|S e * ]) = √ 5, so that the inequality is violated. However, note that this is an idealization (under which Corr = 1) and, typically, the source events and measurement events will not be perfectly correlated (Corr < 1) and the operational equivalences between the source settings need not correspond to the maximally mixed state. All that is required for a test of noncontextuality using this inequality is that the operational equivalences hold for some choice of preparations and measurements which need not be the same as that in the ideal KCBS construction.

A. Measurement-measurement correlations vs. source-measurement correlations
Note that the usual Kochen-Specker experiment, as conceptualized in Refs. [9][10][11]13], for example, involves only the quantity R([s|S]), representing correlations between various measurement events when all the measurements are implemented on a system prepared according to the same preparation procedure, denoted by the source event [s|S]. Thus, R represents measurementmeasurement correlations on a system prepared according to a fixed choice of preparation procedure.
On the other hand, the experiment we have conceptualized in this paper involves, besides the quantity R, a quantity Corr representing source-measurement correlations, characterizing the quality of the measurements in terms of their response to corresponding preparations.
Our noncontextuality inequalities represent a trade-off relation that must hold between R and Corr in an operational theory that admits a noncontextual ontological model. Here we note that the first example of such a tradeoff relation, albeit only for the case of operational quantum theory with unsharp measurements, appeared in Ref. [14] as the Liang-Spekkens-Wiseman (LSW) inequality [15] which has been shown to be experimentally violated in Ref. [45]. 23 And, indeed, the developments reported in Ref. [5] and the present paper have their origins in the idea of such a trade-off relation that first appeared in Ref. [14].
B. Can our noise-robust noncontextuality inequalities be saturated by a noncontextual ontological model?
A natural question concerns the tightness of these noncontextuality inequalities, i.e., can they be saturated by a noncontextual ontological model? This requires one to specify a noncontextual ontological model for which or, equivalently, .
23 This experiment, however, is not in a position to make claims about contextuality without presuming the operational theory is quantum theory simply because the LSW inequality presumes operational quantum theory. The noncontextuality inequalities in this paper do not require the operational theory to be quantum theory and can therefore be experimentally tested using techniques from Refs. [2,34,46].
The assumption of measurement noncontextuality is already implicit in our characterization of the response functions ξ(m e |M e , λ), and for this reason it is, indeed, trivial to satisfy measurement noncontextuality while saturating these noncontextuality inequalities. Measurement noncontextuality, alone, in fact even allows R = α * (G, w). On the other hand, as in traditional Bell-KS type treatments, if outcome determinism is presumed, then we know that there exists a necessary and sufficient set of Bell-KS inequalities (each corresponding to a particular choice of R([s|S])) that are saturated by a KSnoncontextual ontological model: this just corresponds to the case R([s|S]) = α(G, w) for any such Bell-KS inequality. Indeed, our noise-robust noncontextuality inequalities corresponding to these choices of R([s|S]) can always be saturated when Corr = 1, because in that case outcome determinism is justified by preparation noncontextuality (cf. Ref. [5]) and the inequalities are identical to the Bell-KS inequalities.
To show that the noncontextuality inequalities can be saturated, we construct a noncontextual ontological model with the following constraints: 1. For any λ ∈ Λ det : ∀s e , e ∈ E(Σ G )\{e * } : µ(s e |S e , λ) = δ se,xe , where x e is such that ξ(m e |M e , λ) = δ me,xe . This choice of µ(s e |S e , λ) for λ ∈ Λ det ensures that 2. For any λ ∈ Λ ind : ∀s e , e ∈ E(Σ G )\{e * } : µ(s e |S e , λ) = δ se,ye , where y e is such that ξ(m e = y e |M e , λ) = max me ξ(m e |M e , λ). This choice of µ(s e |S e , λ) for λ ∈ Λ ind ensures that 3. Let us define Λ Corr max ⊆ Λ ind and Λ R max ⊆ Λ ind such that and For our construction to work, we require that the polytope of probabilistic models on Γ G be such that Λ Corr max ∩ Λ R max = ∅, i.e., there exist indeterministic extremal probabilistic models corresponding to λ ∈ Λ Corr max ∩Λ R max that maximize both R(λ) and Corr(λ).
4. The distribution µ(λ|S e * , s e * = 0) is such that 6. Note that the assumption of preparation noncontextuality is implicit in that fact that ν(λ) = µ(λ|S e ) for all e ∈ E(Σ G ) and none of the constraints above tamper with that assumption. Measurement noncontextuality is implicit in the structure of the polytope of probabilistic models on Γ G , and the vertices of this polytope correspond to the ontic states Λ.
With these constraints on the noncontextual ontological model in hand, we have: using constraints 1, 2, 3, and 4, using constraint 5. We therefore have (64) Again, using constraints 1, 2, 3, and 4, we have Plugging these expressions for Corr and R on the lefthand-side of the condition for saturation, Eq. (49), of the noncontextuality inequality, we have so that our noncontextual ontological model, subject to the specified constraints, saturates the noncontextuality inequality.
The crucial conditon we need for this noncontextual ontological model to work is that there exists at least one extremal probabilistic model on Γ G (corresponding to a λ ∈ Λ ind ) such that R(λ) = α * (G, w) and Corr(λ) = β(Γ G , q). For all such Γ G , we have shown that our noncontextuality inequalities will be saturated.
This leaves us with some open questions: Does this crucial condition hold for all Γ G of interest? If it doesn't hold for some Γ G , is it still possible: 1) to obtain a different noncontextual ontological model that saturates Eq. (32), or 2) to derive a (tight) noncontextuality inequality that is possibly different from Eq. (32) but is saturated by a noncontextual ontological model?
Note that for the case of Γ G that admit indeterministic extremal probabilistic models with probability assignments only in {0, 1 2 }, e.g., the n-cycle scenarios discussed in Ref. [5], this condition always holds.
C. Can trivial POVMs ever violate these noncontextuality inequalities? No.
Recall that a trivial POVM is defined as an assignment of positive operators p(v)I to the vertices of Γ G , where I is the identity operator on some Hilbert space and p : V (Γ G ) → [0, 1], such that v∈e p(v) = 1 for all e ∈ E(Γ G ), is a probabilistic model on Γ G . Consider trivial POVMs corresponding to any KSnoncontextual probabilistic model (that is a convex mixture of deterministic vertices, Λ det ). The largest value Corr can take in this case is less than or equal to 1. This means that the upper bound on R from our noncontextuality inequality, Eq. (33), will be greater than or equal to α(G, w), whereas we know that for a KS-noncontextual probabilistic model, R ≤ α(G, w). Hence, there is no violation of our noncontextuality inequality for such trivial POVMs.
Now consider trivial POVMs that correspond to the indeterministic vertices, Λ ind , or their convex mixtures. We know that for these trivial POVMs, Corr ≤ β(Γ G , q). For any R ≤ α * (G, w) that is achieved by these trival POVMs, our noncontextuality inequality reads A sufficient condition for this inequality to be satisfied is that which reduces, for R > α(G, w), to where the upper bound is greater than or equal to 1, since α(G, w) < R ≤ α * (G, w). This is trivially satisfied since p 0 ≤ 1. For R < α(G, w), the sufficient condition of Eq. (68) is again trivially satisfied since it reduces to and we must anyway have p 0 ≥ 0. For R = α(G, w), the sufficient condition reduces to β(Γ G , q) ≤ 1, which is again trivially satisfied since β(Γ G , q) < 1 by definition.
In general, a probabilistic model achieved by trivial POVMs can be in the convex hull of both deterministic (Λ det ) and indeterministic (Λ ind ) vertices, with the total weight on deterministic vertices denoted by Pr(Λ det ) and that on indeterministic vertices by Pr(Λ ind ), so that Pr(Λ det ) + Pr(Λ ind ) = 1. We then have A sufficient condition for satisfaction of the noncontextuality inequality is then which becomes when R > α(G, w). Noting that we have so that the sufficient condition for satisfaction of the noncontextuality inequality becomes p 0 ≤ 1, which is trivially satisfied. When R = α(G, w), the sufficient condition becomes β(Γ G , q) ≤ 1, which is again trivially satisfied.
Finally, when R < α(G, w), the sufficient condition becomes which is again trivially satisfied since p 0 ≥ 0. Hence trivial POVMs cannot yield a violation of our noncontextuality inequalities. This is the sense in which trivial POVMs cannot lead to nonclassicality in our approach, unlike the case of traditional Kochen-Specker approaches [9][10][11]13]. To violate our noncontextuality inequalities, the POVMs must necessarily have some nontrivial projective component (that is not the identity operator or zero) but they need not be projectors.

D. Open questions
We collect here the open questions raised in this paper, and also raise other open questions that merit further research: 1. Characterizing structural Specker's principle from probabilistic models on a hypergraph Γ: The question is if there are any other Γ that also satisfy CE 1 (Γ) = G(Γ).

Conditions for saturating the noise-robust noncontextuality inequalities:
Does there exist at least one extremal probabilistic model on Γ G (corresponding to an indeterministic vertex of the polytope, λ ∈ Λ ind ) such that R(λ) = α * (G, w) and Corr(λ) = β(Γ G , q), for all Γ G of interest, i.e., Γ G such that C(Γ G ) = ∅ and If it doesn't hold for some Γ G , is it still possible: 1) to obtain a different noncontextual ontological model that saturates Eq. (32), or 2) to derive a (tight) noncontextuality inequality that is possibly different from Eq. (32) but is saturated by a noncontextual ontological model?
3. Properties of the weighted max-predictability, β(Γ G , q): Since the crucial new hypergraph-theoretic ingredient in our inequalities is the weighted maxpredictability, it would be interesting to understand properties of this hypergraph invariant on both counts: as a new mathematical object in its own right, one we haven't been able to find a reference to in the hypergraph theory literature, as well as an important parameter of a hypergraph relevant for noise-robustness of a noise-robust noncontextuality inequality. Indeed, as we point out in footnote 23, identifying a distribution q (in the definition of Corr, Eq. (19)) that minimizes β(Γ G , q) for a given Γ G would lead to better noise-robustness in the inequalities of Eqs. (32) or (33).

Noise-robust applications of quantum protocols based on KS-contextuality:
A general research direction is to construct noiserobust versions of applications that have previously been suggested for KS-contextuality. Our approach provides a recipe for doing this for any Bell-KS inequality appearing in such applications. Besides serving as a witness for strong nonclassicality [48] (i.e., Spekkens contextuality), 24 noise-robust versions of these applications can help benchmark the experiments in terms of the noise that can be tolerated while still witnessing nonclassicality. Examples of such applications include those from Refs. [50][51][52][53][54][55]. 24 As opposed to weak nonclassicality that can arise in epistemically restricted classical theories [49]. See also the talk at Ref. [48], 41:43 minutes, for a short discussion.

VII. CONCLUSIONS
We have obtained a hypergraph framework for obtaining noise-robust noncontextuality inequalities corresponding to KS-colourable scenarios, suitably augmented with preparation procedures, in the spirit of Spekkens contextuality [7]. This framework leverages the graph invariants from the graph-theoretic framework of CSW for doing this, in addition to a new hypergraph invariant (Eq. 11) that we call the weighted max-predictability. Our approach is general enough to be applicable to any situation involving noisy preparations and measurements that arises from a KS-colourable contextuality scenario.

ACKNOWLEDGMENTS
I would like to thank Andreas Winter for his comments on an earlier version of some of these ideas, Tobias Fritz for the ping-pong and the sing-song in which we often talked about hypergraphs, Rob Spekkens for the often argumentative -but always productive -conversations over lunch, and participants at the Contextuality conference (CCIOSA) at Perimeter Institute, during July 24 -28, 2017, for very stimulating discussions that fed into the narrative of this paper. I would also like to thank David Schmid The quantum probability, given a shared quantum state ρ AB defined on H A ⊗ H B , is given by A global joint probability distribution which reproduces the above as marginals is simply given by their product: Hence, trivial POVMs never violate any Bell-CHSH inequality for this scenario.

CHSH-type contextuality scenario: 4-cycle
We now consider the Bell-CHSH scenario without the constraint of spacelike separation. What the lack of spacelike separation means from the quantum perspective is that one no longer needs to model this spacelike separation by requiring a tensor product structure, or (more generally) by requiring the commutativity of the observables that are jointly measured [13,56,57]. That is, there is no physical justification for imposing the tensor product structure or the commutativity of jointly measured observables. 25 Thus, we have the Hilbert space H and we consider four binary-outcome POVMs, Further, the following sets of POVMs are jointly measurable: 25 On the other hand, what this lack of spacelike separation means from the perspective of an ontological model is that one no longer has a justification for assuming factorizability [13] and, consequently, the generalization of Fine's theorem [25] fails to prove that there is no loss of generality in assuming outcome determinism in discussions of KS-contextuality (unlike the case of Bell scenarios, where factorizability is justified by spacelike separation); there is a definite loss of generality, in that measurement noncontextual and outcome-indeterministic ontological models that are non-factorizable are not empirically equivalent to measurement noncontextual and outcome-deterministic (or KS-noncontextual) ontological models. See Ref. [30] for a discussion of this aspect.
for all a, b, x, y ∈ {0, 1}. In the absence of such commutativity, the joint POVM cannot be written as a product.
The quantum probability, given a quantum state ρ on H, is given by p(a, b|x, y) = Tr(ρG for a, b, x, y ∈ {0, 1}. Note that this probability depends on the joint measurement G (xy) implementing A (x) and B (y) together, and that, in general, there may be multiple choices of G (xy) possible. This is easy to see since there is one undetermined positive operator in the joint measurement that is not fixed by A (x) or B (y) , i.e., we can write the POVM elements of G (xy) as: G represents the freedom in the choice of how the joint measurement might be implemented within quantum theory. This freedom reflects the fact that since the jointly measured observables are no longer spacelike separated, it is possible to introduce correlations between them that are stronger than what is allowed in the corresponding Bell scenario in quantum theory. The strength of these correlations is only limited by the constraints on G Thus, we have that A (x) is jointly measurable with B (y) and G (xy) denotes a joint POVM of A (x) and B (y) . Now, consider the case when all the POVM elements are trivial, i.e., A In particular, consider the case where q for all a, b, x, y ∈ {0, 1}. A possible joint POVM for these trivial POVMs is then the product POVM: If one restricted joint measurability of A (x) and B (y) to just commutativity -a sufficient but not necessary condition for joint measurability 26 [29] -we would take the above choice of the product POVM as a "natural" one. Being a product of trivial POVMs, this choice will never lead to a violation of the CHSH-type inequality for this scenario. Indeed, the structure of a Bell scenario -requiring the decomposition of the Hilbert space as H = H A ⊗ H B (tensor product paradigm), or more generally, imposing the commutativity requirement [A (x) a , B y b ] = 0 (commutativity paradigm) -is such that the only possible choice of joint measurement that can be implemented by spacelike separated parties is the one that corresponds to the product POVM, given by operators G However, this is not the only allowed joint measurement for these trivial POVMs, particularly when there is no locality constraint on the measurements from spacelike separation. 27 An extreme choice of joint POVM is the following: which leads to the probability distribution p(a, b|x, y) = 1 2 δ a⊕b,xy for any choice of quantum state. Hence, this joint POVM G P R(xy) always yields statistics corresponding to the PR-box, maximally violating the CHSH-type inequality for this scenario, namely, a,b,x,y a⊕b=xy Physically, it's possible to implement this (without requiring any quantum resources) by providing a box that always produces these correlations between measurement settings denoted by (xy) ∈ {0, 1} 2 , regardless of the input state. Such a black-box would maximally violate the CHSH-type inequality (viewed as a Bell-KS inequality witnessing KS-contextuality), but that shouldn't be surprising in the absence of spacelike separation. Also, the trivial PR-box joint POVM G P R(xy) ab is a perfectly 26 Particularly in the absence of spacelike separation. It is the need to model spacelike separation in a quantum Bell experiment that makes commutativity a necessary (and sufficient) condition for joint measurability of spacelike separated observables in a Bell scenario 27 To incorporate such a constraint, spacelike separation needs to be modelled via either the tensor product paradigm or the commutativity paradigm. Both these ways of modelling spacelike separation lead to the same set of quantum correlations for any finite-dimensional Hilbert space H [56]. The question of whether the two paradigms lead to the same set of correlations in the case of infinite dimensional Hilbert spaces is the subject of Tsirelson's problem [56,57]. Most studies of Bell-nonlocality are primarily concerned with finite dimensional Hilbert spaces; should one encounter infinite dimensional Hilbert spaces, the commutativity paradigm is the proper way to model spacelike separation.
valid way to implement the joint measurement of trivial POVMs A (x) and B (y) within the standard paradigm of operational quantum theory. 28 To summarize, we note the following: • Within the traditional framework of KSnoncontextuality, if one wants to go beyond projective measurements to arbitrary POVMs in a contextuality scenario, then one must -in order to avoid the pathology of trivial POVMs violating the Bell-KS inequalities maximally -restrict by fiat the notion of joint measurability to merely commutativity. This is, for example, the attitude adopted in Ref. [13].
• However, if one is going beyond projective measurements, we know that commutativity is only a sufficient condition for joint measurability, not a necessary one [29].
• This brings us to our observation that the traditional notion of KS-noncontextuality is pathological once the most general situation in quantum theory is considered: arbitrary POVMs with the general notion of joint measurability (see, e.g., Ref. [29] for this notion and its relation to commutativity). In particular, in the absence of spacelike separation, there is no physical justification to restrict the notion of joint measurability to merely commutativity.
• A similar consideration applies at the level of a KSnoncontextual ontological model: there, factorizability is not justified in the absence of spacelike separation. So, on those grounds alone, one should go beyond KS-noncontextuality as one's notion of classicality; particularly, if one wants a notion of classicality that does not presume outcome determinism, just as local causality doesn't presume it. This was argued in Ref. [30]. We point out the pathology of trivial POVMs only to drive home this point in a different way. in the light of Spekkens contextuality in Ref. [1]. This hypergraph fails both criteria for the hypergraphs Γ considered in this paper, namely, C(Γ) = ∅ (KS-colourability) and CE 1 (Γ) = G(Γ). For probabilistic models on Γ 18 , the following hold: C(Γ 18 ) = ∅ CE 1 (Γ 18 ) G(Γ 18 ). This was considered in Ref. [1], where CE 1 (Γ 18 ) excludes the extremal probabilistic model in G(Γ 18 ) that corresponds to the upper bound on the noise-robust noncontextuality inequality of Ref. [1]. As argued in Ref. [1], this noiserobust noncontextuality inequality is the appropriate operational generalization (to possibly noisy measurements) of the Kochen-Specker contradiction first demonstrated in Ref. [43]; this generalization cannot be accommodated in our generalization of the CSW framework [10].
If one extends the KS-uncolourable Γ 18 to a KScolourable hypergraph Γ 27 with 9 "no-detection" events, one for each hyperedge, then we have C(Γ 27 ) = ∅, but it's still the case that C(Γ 27 ) CE 1 (Γ 27 ) G(Γ 27 ) for this hypergraph. 29 Hence, Γ 27 cannot be understood in our generalization of the CSW framework either. 30 29 This follows from noting that extremal probabilistic models on Γ 18 are still extremal probabilistic models on Γ 27 : ones where the no-detection events are assigned zero probabilities. See Theorem 2.5.3 of Ref. [11]. 30 Note that adding these no-detection events is equivalent to allowing subnormalized probabilities (i.e., sum of probabilities assigned to measurement events in a hyperedge can be less than 1) on Γ 18 . Hence, even allowing for subnormalization on Γ 18 , which means that one is looking at probabilistic models on the hypergraph Γ 27 , does not eliminate the gap between CE 1 proba- FIG. 9. Going from the orthogonality graph, G, of Γ18 to the hypergraph ΓG (on the right) to which our noise-robust noncontextuality inequality pertains.
Indeed, if one "blindly" writes down a CSW classical bound for some Bell-KS expression defined on O(Γ 18 ), then such a bound is equivalently a bound for the same Bell-KS expression defined on Γ 27 (where normalization is restored). Further, the E1 bound on Γ 18 is a CE 1 bound on Γ 27 . The GPT bound happens to agree with the CE 1 bound for a particular Bell-KS expression (sum of all probabilities) but differs for some other Bell-KS expressions defined on this hypergraph. Consider, for example, the following three expressions (see Fig. 8): Thus, Expr 3 is a Bell-KS expression that discriminates between probabilistic models at all three levels of the hierarchy. Indeed, the upper bound on Expr 3 for CE 1 (Γ 27 ) models can be saturated by projective quantum realizations of the hypergraph, in particular the standard realization with 18 rays, with the zero operator for the no-detection events [43]. The fact that there exists such a Bell-KS expression as Expr 3 means that the CE 1 upper bounds from the CSW approach can be violated by a general probabilistic model, i.e., the upper bounds for CE 1 models and general probabilistic models don't agree, and we cannot take the graph-theoretic upper bounds of CSW for granted in our noise-robust noncontextuality inequalities. Indeed, the general probabilistic upper bound for any Bell-KS expression defined on a contextuality scenario is a hypergraph invariant -in the sense that it is a property that is shared by all hypergraphs isomorphic to each other -that may or may not be expressible as a graph invariantà la CSW.
What, then, do the bounds given by graph invariants of CSW for O(Γ 18 ) mean in our generalization of the CSW framework? Following our approach, outlined in Sec. III.B, we can go from G = O(Γ 18 ) to the hypergraph Γ G = Γ O(Γ18) (see Fig. 9) for which we have (by construction) C(Γ O(Γ18) ) = ∅ (so that the underlying hypergraph is no longer KS-uncolourable) and CE 1 (Γ O(Γ18) ) = G(Γ O(Γ18) ) (so that, for any Bell-KS expression, the upper bound given by the fractional packing number α * (G, w) in the CSW framework agrees with the general probabilistic upper bound). Since this construction proceeds by converting all maximal cliques in Γ 18 to hyperedges in Γ O(Γ18) and adding a new vertex to each such hyperedge, it achieves both purposes: firstly, adding a (no-detection) vertex to every maximal clique that is a hyperedge in Γ 18 ensures the KS-colourability of Γ O(Γ18) , i.e., C(Γ O(Γ18) ) = ∅, and secondly, adding a vertex to every maximal clique that is not a hyperedge in Γ 18 ensures that CE 1 (Γ O(Γ18) ) = G(Γ O(Γ18) ). Once these two properties are satisfied, the graph invariants of CSW [10] become applicable to any Bell-KS expression defined for any set of vertices in the subhypergraph Γ 18 of Γ O(Γ18) .
Our noise-robust noncontextuality inequality then applies to the KS-colourable hypergraph Γ O(Γ18) , where the graph invariants of CSW make sense, rather than the KS-uncolourable hypergraph Γ 18 . On the other hand, an appropriate noise-robust noncontextuality inequality for the KS-uncolourable hypergraph Γ 18 is, then, the one reported in Ref. [1]. 31