Almost Quantum Correlations are Inconsistent with Specker's Principle

Ernst Specker considered a particular feature of quantum theory to be especially fundamental, namely that pairwise joint measurability of sharp measurements implies their global joint measurability (https://vimeo.com/52923835). To date, Specker's principle seemed incapable of singling out quantum theory from the space of all general probabilistic theories. In particular, its well-known consequence for experimental statistics, the principle of consistent exclusivity, does not rule out the set of correlations known as almost quantum, which is strictly larger than the set of quantum correlations. Here we show that, contrary to the popular belief, Specker's principle cannot be satisfied in any theory that yields almost quantum correlations.


Introduction
The advent of quantum theory was accompanied by many conceptual controversies over the failure of intuitions from classical physics, e.g. the existence of wave-particle duality, the fundamental indeterminism apparent from the Born rule, and the nonseparability epitomized by entanglement, as pointed out by Einstein, Podolsky and Rosen [1,2]. More recently, we have witnessed the emergence of quantum information theory and a surge of interest in quantum foundations. As a consequence, there has been remarkable progress in proving theorems concerning ways in which Nature fails to be classical, given a well-defined notion of classicality that mathematically formalizes some intuition from classical physics.
This attitude can also be taken towards quantum theory. Are there ways in which Nature may fail to be quantum? That is, are there deviations from quantum theory in Nature, and if so, where should one look for them [3]? Recently, there has been much effort to identify the physical properties that single out the quantum world from the space of hypothetical Tomáš Gonda: tgonda@perimeterinstitute.ca Accepted in Quantum 2018-08-17, click title to verify alternatives. By doing this, not only can we learn more about quantum theory itself, but we also gain insight about where one might or might not hope to find failures of quantum theory.
In the past two decades, there have been two major lines of research tackling this problem. On the one hand, there is the program of characterizing only the statistical aspect of quantum theory, i.e. recovering quantum correlations. On the other hand, there is that of deriving the structure of quantum theory from a set of simple axioms.
Statistical aspects of quantum theory come into play when exploring phenomena such as Bell nonlocality [4] and Kochen-Specker (KS) contextuality [5]. These cannot be explained by a classical model of the world, although they arise naturally within quantum theory. Despite being physically distinct [6], the notions of classicality challenged by these phenomena are mathematically similar, which makes it possible to study them in a unified manner [7,8].
Tackling the statistical aspects of quantum theory has provided us with considerable progress in articulating principles that are satisfied by quantum correlations. However, a key question remains. Are there any principles that uniquely identify the set of quantum correlations in the space of all conceivable correlations? The study of this question led to the conception of a set known as almost quantum correlations [9]. Initially defined only within Bell scenarios [9], almost quantum correlations have also been subsequently defined within general (KS-)contextuality scenarios [8]. This set of correlations strictly contains the quantum set, yet has the following remarkable property: Almost quantum correlations are consistent with all the principles that to our knowledge have so far been proposed to characterize quantum correlations 1 [9]. In particular, almost quantum correlations satisfy the principle of consistent exclusivity (CE) [8], which we define at the end of section 2.4. Therefore, it is desirable to identify a principle capable of discriminating quantum from almost quantum correlations.
The other approach deals with structural aspects of quantum theory. This has proven more successful than the previous approach since, to date, there are already derivations of quantum theory from a set of reasonably simple axioms [10][11][12][13][14][15][16][17][18][19][20]. Hence, lately, there has been increased interest in how this successful approach could be used to address the aforementioned issues with almost quantum correlations. The hope is to clarify the relevant structural differences between quantum theory and physical theories capable of generating almost quantum correlations. For example, recent results show that any general probabilistic theory [21] giving rise to almost quantum correlations would have to violate the no-restriction hypothesis [22,23].
In this paper, we explore the structural aspects of a hypothetical almost quantum theory by investigating its statistical aspects-almost quantum correlations. To this end, we use the connections between two frameworks for describing contextuality scenarios: in terms of compatible measurements [7], and in terms of operational equivalences among measurement events [8]. First, in section 2, we review these two formalisms and their relation. Section 3 then presents Specker's principle and some of its consequences both for the structure of a theory and the statistics arising from it. In particular, these include a family of sufficiency properties which we define therein. In section 4, we show that the ramifications of one of these sufficiency properties for measurement statistics are violated by almost quantum correlations. Thus, Specker's principle cannot be satisfied in any theory yielding almost quantum correlations. Finally, section 5 provides evidence that not all of the sufficiency properties are in conflict with almost quantum correlations, demonstrating the subtle nature of the contradiction between these and Specker's principle.

Marginal Scenarios and the Event-Based Hypergraph Approach to Contextuality
Contextuality has been studied in a number of different formalisms [7,8,24]. The two that we are interested in here are traditional contextuality scenarios defined from compatibility structures of measurements, as formalised by Abramsky and Brandenburger in [7], and the more general approach 2 by Acín, Fritz, Leverrier and Sainz [8]. In this section, we lay out the key concepts of both formalisms, their connections, and important sets of correlations (a.k.a. probabilistic models). We begin by making the notion of compatible measurements explicit.

Compatibility of Measurements
Compatible measurements: [25,26] Let {A 1 , . . . , A n } be a set of n measurements with the corresponding outcome sets being {O 1 , . . . , O n }. We say that the elements of the set {A 1 , . . . , A n } are compatible (or jointly measurable) if there exists a measurement A with outcome set O 1 × . . . × O n , such that all the measurements in the set {A 1 , . . . , A n } can be simulated by post-processing the outcomes of A . That is, the statistics of {A 1 , . . . , A n } for every preparation 3 ρ can be reconstructed perfectly by measuring the given preparation ρ with A : where α ∈ O 1 × . . . × O n , and a i ∈ O i is fixed in the sum on the right hand side.
As an example, consider two compatible measurements A 1 and A 2 in quantum theory.
where σ x and σ z are two of the Pauli matrices. The outcome set for each measurement is O = {0, 1} in this case. If by ρ we denote the (arbitrary) state of a quantum system, then for any a 1 , a 2 ∈ O, Pr (a 2 | A 2 , ρ) = tr 1 In order to show that A 1 and A 2 are compatible, we have to construct a measurement that can simulate them both. One such measurement is A = {N a 1 ,a 2 }, with N a 1 ,a 2 defined for any a 1 , a 2 ∈ O as follows: N a 1 ,a 2 := 1 This measurement has the following properties: It follows that, for all ρ, the probabilities of outcomes of A 1 and of A 2 can be simulated by coarse-graining the probabilities of outcomes of A : and thus A 1 and A 2 are compatible.
As a consequence of the definition of compatibility presented above, the marginal statistics of the set of compatible measurements has to be consistent with the existence of a joint probability distribution over O 1 × . . . × O n . Equivalently, whenever the statistics of a set of measurements fails to admit a joint probability distribution, there must necessarily be incompatibility within that set of measurements.
However, it is important to point out that the converse need not hold. That is, if for each preparation there exists a joint probability distribution over O 1 × . . . × O n that gives the statistics of each of the measurements {A 1 , . . . , A n }, it does not follow that these are compatible measurements. The simplest counter-example is that of two projective measurements in quantum theory. Obviously, one can construct a joint distribution by taking the product of the respective distributions for each of the two measurements, regardless of whether they are compatible (i.e. commuting) or not. These considerations will be important for understanding the relations that hold among the sufficiency properties in section 3.

Compatibility Approach to Contextuality
The traditional compatibility approach [7], also known as sheaf-theoretic approach, is concerned with a set of measurements and the compatibility relations among them. These relations are described by contexts, each of which is a set of compatible measurements. A contextuality scenario is defined by a triplet (X, O, M) and is called a marginal scenario 4 in this approach. X denotes the set of measurements, each of whose set of outcome labels is without loss of generality assumed to be identical and equal to O. On the other hand, M is a subset of the power set of X that corresponds to the maximal contexts, i.e. those contexts which are not contained within any other one. Hence, all the contexts in a given scenario are precisely the subsets of elements of M. An example of the marginal scenario usually referred to as Specker's triangle is presented in figure 1a.
Given a maximal context x ∈ M, let O x denote the set of all families of outcomes of the measurements in x. That is, O x naturally corresponds to the set of functions x → O. An assignment of probabilities to measurement outcomes, {P x (a) : x ∈ M, a ∈ O x }, is called an empirical model. For the purposes of studying the phenomenon of contextuality, we focus on empirical models that satisfy the no-disturbance principle, which states that for any two contexts x, x ∈ M, P x|x∩x = P x |x∩x holds. Here, P x|x∩x denotes the marginal distribution of P x associated to the measurements in x ∩ x . In words, no-disturbance imposes that the marginal statistics of a subset of measurements does not depend on the context in which they could be measured. No-disturbance also provides justification for considering only maximal contexts: If we are interested in the statistics of a set of compatible measurements that do not form a maximal context, we can use any maximal context that contains all of them and get the same result.
An empirical model P is said to be quantum whenever its statistics may be recovered by performing, on some quantum state, projective measurements which satisfy the given compatibility relations.
There is a particular family of marginal scenarios, mentioned also in [27], that is of interest to us with regards to the discussion in this paper. We call them symmetric marginal scenarios and they are characterized by two parameters (n, k).

Symmetric marginal scenarios:
An (n, k) symmetric marginal scenario is a triple (X, O, M) with |X| = n and

Events-Based Hypergraph Approach to Contextuality
The second formalism we discuss is the hypergraph-based formalism from reference [8]. Here, a contextuality scenario is defined as an events-based hypergraph H = (V, E) with vertices V and hyperedges E. The vertices correspond to the events in the scenario, each of which represents an outcome obtained from a device after it receives an input-the measurement choice. The hyperedges are sets of events representing all the possible outcomes given a particular measurement choice. The hypergraph approach assumes that every such measurement set is complete. That is, if the measurement corresponding to hyperedge e is performed, exactly one of the outcomes in e is obtained. Note that measurement sets may have non-trivial (a) Specker's triangle as a marginal scenario.
The events-based representation of the same scenario. The vertices are given by the events (a i a j |ij) where a i , a j ∈ O denote the outcomes of the measurements Blue hyperedges correspond to measuring maximal contexts, while the remaining hyperedges arise via other, less trivial, measurement protocols. The Specker's triangle corresponds to a (3, 2) symmetric marginal scenario, using the terminology introduced in section 2.2.
intersection. If an event appears in more than one hyperedge, it corresponds to outcomes of implementing different measurement choices that are operationally equivalent 5 .
There is a close connection between the two approaches. In particular, given a marginal scenario (X, O, M), we can define an events-based hypergraph H[X, O, M] that corresponds to the same situation, in accordance with the Appendix D of [8]. In a nutshell, the construction works as follows. On the other hand, the hyperedges arise via measurements protocols [8, Def. D.1.4] (see next paragraph for an example), which correspond to adaptive choices of measurements from X. They are adaptive, because a choice of an individual element of X can depend on the outcomes of the previously chosen ones.
As an example, consider the case of Specker's triangle scenario depicted in figure 1. This scenario consists of three dichotomic measurements that are pairwise compatible but not triplewise compatible 6 . Hence, as a marginal scenario (X, O, M) Specker's triangle As an events-based hypergraph, Specker's triangle is shown in figure 1b. The vertices of this hypergraph are given by the events V = {(a i a j |ij)} i<j , where the indices i and j run from 1 to 3 and refer to the three measurements in X. One can notice that there are two types of hyperedges. First of all, there are the blue hyperedges of the form {(a i a j |ij)} a i ,a j ∈O . These contain all the possible outcomes for the two measurements in a chosen context. Secondly, there are hyperedges corresponding to adaptive measurement protocols. As an example, consider the hyperedge depicted in dark green at the bottom left side of the hypergraph, containing vertices (00|12), (10|12), (10|23) and (11|23). This hyperedge may be understood as the following protocol. Initially, one performs measurement of A 2 . If the outcome is a 2 = 0, then A 1 is measured next. However, if the outcome is a 2 = 1, then the second measurement one performs is A 3 . The former possibility yields the events (00|12) and (10|12), while the latter gives (10|23) and (11|23).

Sets of Probabilistic Models for Events-Based Hypergraphs
The objects of study in the events-based approach are the so-called probabilistic models, which is a notion analogous to empirical models in the compatibility approach. A probabilistic model on a contextuality scenario H is a functional p : V → [0, 1] denoting the conditional probability that an event v occurs when any measurement e with v as one of its outcomes is performed. The identification of outcomes by operational equivalences in the definition of the contextuality scenario implies that the probability p(v) is independent of the measurement e. Moreover, since the measurements are complete, the probabilities are normalized within each hyperedge. That is, v∈e p(v) = 1 for all e ∈ E(H). Since the specification of a probabilistic model is a tuple of real numbers, we often treat a probabilistic model as a vector p embedded in a vector space with coordinates corresponding to the probabilities of events in V .
We will denote the set of all valid probabilistic models on H by G(H). There are various subsets of G(H) which are of particular interest. Here, we define the subsets corresponding to classical, quantum, and almost quantum models.
The set of all classical models on H is denoted by C(H) and forms a polytope, whose vertices are the deterministic probabilistic models.  H) and a projection operator P v on H associated to every v ∈ V , such that The set of all quantum models on H is denoted by Q(H).
In the specific case of Bell The set of all almost quantum models on H is denoted by Q 1 (H).
The terminology "almost quantum" for Q 1 models comes from their close connection to the set of correlations in Bell scenarios known as "almost quantum correlations" [9]. Indeed, it has been proven that the set Q 1 (H) is equivalent to the set of almost quantum correlations whenever H is the hypergraph of a Bell scenario 7 .
A common feature shared by quantum and almost quantum models is that they both satisfy the consistent exclusivity principle [8,9]. Explicitly, this means that all the CE inequalities defined below hold for every almost quantum (and quantum) probabilistic model.

Consistent Exclusivity principle:
A theory satisfies the principle of consistent exclusivity (CE) if every probabilistic model p admissible by the theory satisfies all the relevant CE inequalities 8 . 7 For a discussion on how to define such Bell hypergraphs we refer the reader to [8,28]. 8 A stronger version of the Consistent Exclusivity principle imposes constraints on the probabilistic models of a contextuality scenario by considering its application in larger scenarios that contain the original one (see [8]).

CE models:
A probabilistic model p ∈ G(H) is consistently exclusive if it satisfies the CE principle. The set of all consistently exclusive models on H is denoted by CE(H) 9 and forms a convex polytope.

Specker's Principle and its Consequences
One can derive several constraints on the sets of symmetric marginal scenarios and the statistics they generate by considering consequences of Specker's principle, which we now introduce.
Specker's principle as originally stated: "If you have several questions and you can answer any two of them, then you can also answer all of them." [vimeo.com/52923835 (2009), also attributed in reference 29.] As stated, the principle refers to "questions", i.e. measurements, and imposes that a set of pairwise compatible measurements is itself compatible. We can formalize this interpretation of Specker's principle as follows.

Pairwise sufficiency for measurements: If in a set of measurements every pair is compatible, then all the measurements are compatible.
Under the principle of pairwise sufficiency for measurements, there would be no distinction between an (n, 2) symmetric marginal scenario and an (n, n) symmetric marginal scenario, as those would merely be different representations of the same compatibility relations.
This formalization of Specker's principle holds true in quantum theory for sharp (i.e. projective) measurements. Note that Specker's principle has elsewhere been invoked to motivate statistical constraints at the level of events, such as the principle of consistent exclusivity [8,24]. However, it is important to recognize that these statistical constraints are not equivalent to Specker's principle. Rather, they are implications thereof. In other words, Specker's principle pertains foremost to "questions" (i.e. measurements), and only secondarily to "answers" (i.e. events).
While consistent exclusivity is a statistical constraint implied by Specker's principle, it is not the only one. We distinguish consistent exclusivity (which applies to any contextuality scenario) from the following statistical constraint (which refers to (n, 2) symmetric marginal scenarios only).
Pairwise sufficiency for probabilistic models: If in a set of measurements every pair is compatible, then -for every preparation -the statistics generated by these measurements are marginals of some joint probability distribution.
Equivalently, we can say that in a theory satisfying pairwise sufficiency for probabilistic models, the set of probabilistic models that arises from an events-based hypergraph that can be obtained from an (n, 2) symmetric marginal scenario for some integer n is always classical.
Note that every compatible set of measurements admits a joint probability distribution over the outcomes of the measurements in the set, as implied by the definition of compatibility. Therefore, if a theory satisfies pairwise sufficiency for measurements, then it also satisfies pairwise sufficiency for probabilistic models. However, the converse need not hold, as has been pointed out at the end of section 2.1. This one-way relationship between the principle for measurements and the principle for probabilistic models is illustrated in figure 2.
Besides the concept of pairwise sufficiency, one can define another property called all-but-one sufficiency, which might at first appear to be less constraining. All-but-one sufficiency captures the same idea as pairwise sufficiency, but applies only to a subset of marginal scenarios, namely (n, n−1) symmetric marginal scenarios.
All-but-one sufficiency for measurements: If in a set of at least three measurements, the elements of every proper subset are compatible, then all the measurements are compatible.
Under the principle of all-but-one sufficiency for measurements, there would be no distinction between an (n, n−1) symmetric marginal scenario and an (n, n) symmetric marginal scenario for n ≥ 3, as those would merely be different representations of the same compatibility relations. The qualifier n ≥ 3 is needed, because otherwise all pairs of measurements would be compatible, and hence all contextuality scenarios would be deemed classical.
All-but-one sufficiency for probabilistic models: If in a set of at least three measurements, the elements of every proper subset are compatible, then -for every preparation -the statistics generated by these measurements are marginals of some joint probability distribution.
In other words, we can say that a theory satisfies the principle of all-but-one sufficiency for probabilistic models if every probabilistic model p is classical whenever the compatibility scenario is an (n, n−1) symmetric marginal scenario for some integer n ≥ 3.
Notice that if a theory satisfies pairwise sufficiency for measurements, then it also satisfies all-but-one sufficiency for measurements. Similarly, if a theory satisfies pairwise sufficiency for probabilistic models, then it satisfies all-but-one sufficiency for probabilistic models as well. Moreover, we now show that pairwise sufficiency and all-but-one sufficiency are actually equivalent at the level of measurements, even though they seem to be distinct at the level of probabilistic models, as the conjecture in section 5 would imply, if true. The implication relations among the various consequences of Specker's principle are summarized in figure 2.

Theorem 1. A theory satisfies the all-but-one sufficiency principle for measurements if and only if it satisfies the pairwise sufficiency principle for measurements.
Proof. The "if" statement holds trivially, so let us focus on the "only if" implication. Assume that a theory satisfies the all-but-one sufficiency principle for measurements. Let X be a set of n measurements of which every pair is compatible. We can choose n ≥ 4, because for n = 3, the argument is trivial. By applying the assumption, we establish that every subset of X of size 3 is compatible. This follows because every such subset is a (3, 2) symmetric marginal scenario. With this established, we again apply the assumption in order to establish that every subset of X of size 4 is compatible. If n > 4, we can further iterate applications of the assumption until we establish that all subsets of size n, namely the entire set X, is compatible.
Specker's principle ⇐⇒ Pairwise sufficiency for measurements ⇐⇒ All-but-one sufficiency for measurements

=⇒ =⇒ =⇒
Consistent exclusivity Pairwise sufficiency for probabilistic models =⇒ All-but-one sufficiency for probabilistic models Figure 2: Some of the consequences of Specker's principle and the implications known to hold among them. The top row contains principles pertaining to the structure of measurements, while the bottom row contains principles pertaining to sets of probabilistic models. Text colour depicts whether a given statement holds in an almost quantum theory: the green statement is satisfied, the red statements are violated, and the black statement is the subject of conjecture in section 5.
One can naturally consider two hierarchies of principles akin to pairwise sufficiency and allbut-one sufficiency. For example, the principle of triplewise sufficiency for measurements states that triplewise compatibility implies joint compatibility. As another example, the allbut-two sufficiency principle for probabilistic models states that no nonclassical correlations can exist in a compatibility scenario wherein every collection of all-but-two measurements is compatible, i.e. an (n, n−2) symmetric marginal scenario.

Almost Quantum Models Violate Specker's Principle
Since projective measurements in quantum theory satisfy the principle of pairwise sufficiency, the set Q(H) for an events-based hypergraph H obtained from an (n, 2) symmetric marginal scenario coincides with C(H). Here, we demonstrate that there are probabilistic models in Q 1 (H) for H obtained from an (4, 2) symmetric marginal scenario (see figure 3), which are outside of C(H). This fact shows that almost quantum correlations violate "pairwise sufficiency for probabilistic models", and therefore any physical theory consistent with them violates "pairwise sufficiency for measurements" and, by doing so, also Specker's principle.
In a (4, 2) symmetric marginal scenario, each facet of the classical polytope is either a CE constraint or a pentagonal inequality [30]. One instance of the pentagonal inequalities reads where X i X j = a i ,a j ∈O (−1) a i +a j p(a i a j |ij) and X j = a j ∈O (−1) a j p(a j |j). Notice that the upper bound of 2 is satisfied for both classical and quantum probabilistic models [30]. This is expected, because it is well known that both classical and quantum theories respect Specker's principle.
A linear program optimisation over the maximum values of equation (11) for general probabilistic models yields a value of I * pent = 6. Similarly, optimising equation (11) via a semidefinite program [8] yields a maximum value of I AQ pent = 2.5 for almost quantum probabilistic models, which violates the classical and quantum bound of the inequality. This is despite the fact that almost quantum models satisfy all the CE inequalities.
This result reveals a counterintuitive aspect of almost quantum models. Namely, there exist scenarios in which all the measurements are pairwise compatible yet almost quantum models are strictly more general than classical models. Therefore, almost quantum models are inconsistent with Specker's principle, in that any set of measurements that gives rise to them cannot satisfy it.

Theorem 2.
There is no set of measurements, in any generalized probabilistic theory, which both gives rise to almost quantum models and also satisfies Specker's principle.
By theorems 1 and 2, we learn that there is no set of measurements, in any theory, which both gives rise to almost quantum models and also satisfies the principle of all-but-one sufficiency for measurements.
As a final comment, the research scope that tackles the formulations of general physical theories beyond the quantum one has particularly focused on the definition of a subset of measurements, known as sharp measurements, which correspond to projective measurements in the case of quantum theory. There is still no consensus on how sharp measurements should be defined in a general theory [31,32]. In the following we argue that (i) any definition of sharp measurements in a theory that yields almost quantum correlations must violate Specker's principle, and (ii) any notion of sharpness in an almost quantum theory must deviate from the candidates proposed so far [31,32].

Corollary 2.1. If in any theory there is a notion of sharpness, relative to which the sharp measurements yield almost quantum correlations, then there are some sharp measurements in the theory which violate Specker's principle.
This corollary is an instance of theorem 2, whereby the set of measurements giving rise to almost quantum correlations is exactly the one that satisfies some (unspecified) notion of 'sharpness'.
This specific case can be motivated by analogy with quantum theory. As in quantum theory, in a hypothetical almost quantum theory one must also restrict the set of allowed measurements to a strict subset of the set of all measurements in order to pick out exactly the almost quantum correlations in Kochen-Specker contextuality scenarios. If all measurements were allowed -in either quantum or almost quantum theory -then one could realize any logically possible probabilistic model, using the completely noisy effects 10 [33]. Furthermore, the set of measurements allowed in quantum theory is exactly the set of sharp quantum measurements. In an almost quantum theory, it is unclear what this allowed set of measurements would be, but one might expect such measurements to correspond to some notion of sharpness as well.
For example, Corollary 2.1 sheds light on the notion of sharpness for general probabilistic theories that was proposed by Chiribella and Yuan [31,32]. The notion of sharpness there does imply Specker's principle, and as such no theory can give rise to the almost quantum models using only the sharp measurements in the sense of references [31,32]. In particular, such a theory cannot violate the pentagonal inequality presented in equation (11).

Almost Quantum Models and the All-But-One Sufficiency Principle
The example presented in the previous section teaches us that any almost quantum theory must violate the all-but-one sufficiency principle for measurements. Incidentally, it also demonstrates that almost quantum models fail to satisfy the all-but-two sufficiency principle for probabilistic models, since C(H) = Q 1 (H) for an event-based hypergraph H obtained from a (4, 2) symmetric marginal scenario. Now, is it possible that almost quantum models nevertheless satisfy the principle of all-but-one sufficiency, despite its violation at the level of measurements by any almost quantum theory? It is conceivable that a physical theory might satisfy the principle of all-but-one sufficiency at the level of probabilistic models while violating the principle at the level of measurements.
It might feel unnatural to divorce a statistical constraint from any restriction on the structure of measurements. Nevertheless, this unexpected satisfaction of the all-but-one sufficiency principle for probabilistic models without motivation from the corresponding measurementbased principle appears to be a feature of any almost quantum theory.
We state this as a conjecture, and provide suggestive evidence in terms of a theorem that confirms the conjecture for the special case of dichotomic measurements.

Conjecture.
Almost quantum correlations satisfy the all-but-one sufficiency principle for probabilistic models. In other words, the sets Q 1 (H) and C(H) coincide, whenever H corresponds to an (n, n−1) symmetric marginal scenario. The proof of theorem 3 is presented in appendix A. This theorem tells us that the sets Q 1 (H) and C(H) coincide whenever H corresponds to an (n, n−1) symmetric marginal scenario with binary outcome measurements, in agreement with the conjecture above.
Thus, we conclude that almost quantum models satisfy some statistical consequences of Specker's principle, but not others. For example, they satisfy the principle of consistent exclusivity, as well as the principle of all-but-one sufficiency for probabilistic models (at least for binary outcome measurements). On the other hand, almost quantum models satisfy neither the principle of pairwise sufficiency nor that of all-but-two sufficiency, be it for measurements or probabilistic models. This dichotomy challenges our ability to understand the statistical predictions of any almost quantum theory in terms of Specker-like restrictions at the level of measurements.

Conclusions
In this work, we have explored some consequences of Specker's principle and their consistency with almost quantum correlations. By studying contextuality scenarios on a single system, we have found a fundamental difference between quantum theory and any potential almost quantum theory, complementing the results by Sainz et al. [23]. There, a different nogo theorem (pertaining to almost quantum theories and the no-restriction hypothesis) was inferred based on considerations of Bell scenarios involving multiple subsystems.
Our results imply that in any general probabilistic theory, the structure of measurements reproducing almost quantum models is in contradiction with Specker's principle. Accordingly, the notion of sharpness proposed in references [31,32] cannot be used to recover the almost quantum correlations. This result runs counter to sentiments implicit in earlier literature. Previously, almost quantum models appeared to be the purest embodiment of Specker's principle [9,24,29,[34][35][36], in the sense that they are uniquely identified 11 by consistent exclusivity and closure under wirings [8, Thrm. 7.6.2].
However, consistent exclusivity is not the only constraint on probabilistic models implied by Specker's principle. Another such principle is pairwise sufficiency for probabilistic models, as defined in section 3. Despite the "success" of the almost quantum correlations relative to consistent exclusivity, we nevertheless witness their failure to satisfy Specker's principle when analyzed through the lens of pairwise sufficiency. As such, the above preconceptions must be reversed. Almost quantum models do not exemplify Specker's principle. Rather, they are antithetical to it.
For advocates of Specker's principle, our results challenge the possible physical significance of the almost quantum correlations [37]. More importantly, however, our findings restore the prominence of Specker's principle as a potential means for identifying the essence of quantum theory. The violation of Specker's principle by almost quantum models demonstrates that holistic considerations of Specker's principle enable greater insight than consistent exclusivity alone.
All the results in this manuscript were found solely by studying a rather limited set of contextuality scenarios, namely symmetric scenarios with binary outcome measurements. Consideration of asymmetric compatibility structures or scenarios beyond binary-outcome 11 Almost quantum correlations are uniquely identified under the assumptions that the set of correlations allowed in Nature contains the quantum one, and by considering the ramifications of consistent exclusivity even in the limit of many independent copies of the scenario. measurements might illuminate stronger constraints and richer intuitions about almost quantum theory than the results presented here.

A Proof of Theorem 3
For the proof of theorem 3, it is useful to introduce the concept of a generalized event in relation to the events as vertices of some hypergraph H. This notion refers to a coarse-graining of events in H. For instance, in the case of figure 1b, a generalized event g = (a 1 |1) can be defined as either the coarse graining of {(a 1 , a 2 |1, 2), (a 1 ,ā 2 |1, 2)} or that of {(a 1 , a 3 |1, 3), (a 1 ,ā 3 |1, 3)}, since the measurement A 1 is compatible with both A 2 and A 3 and the no-disturbance condition holds. Each of these sets {(a 1 , a j |ij), (a 1 ,ā j |ij)} is then a refinement of g. The probability assigned to a generalized event is then given by summing the probabilities of all the events in any of its refinements.
The notion of exclusivity of events can be also extended to generalized events. Two generalized events g, g are said to be exclusive if events in a refinement of g are pairwise exclusive with events in a refinement of g . Equivalently, one can say that the union of any refinements of g and g is a set of pairwise exclusive events in V .
In this appendix we study almost quantum probabilistic models in a particular family of contextuality scenarios, which we call binary n-Specker scenarios. These are the eventsbased hypergraphs corresponding to (n, n−1) symmetric marginal scenarios introduced in definition 2.2. Moreover, we restrict our attention to cases with the measurements in X each having two outcomes, so that we can choose O = {0, 1}. The set of contexts M consists of every subset of X with n−1 elements. That is, each proper subset of X is jointly measurable, but the whole set X of n measurements is not necessarily jointly measurable. Figure 1a depicts a 3-Specker scenario and figure 4 a 4-Specker scenario.
From the very definition of the scenario it immediately follows that every quantum empirical model in (X, O, M) has a realisation in terms of deterministic non-contextual hidden variable models, since in quantum theory a set of pairwise compatible measurements is jointly measurable. Here we show that the set of almost quantum models has a similar behaviour. The proof relies on showing that every facet of the classical polytope corresponds to a CE inequality, each of which is satisfied by both quantum and almost quantum models.
Let us introduce some extra notation. (12) In this notation, the no-disturbance condition guarantees that the marginal distribution p(a i 1 , . . . , a i k ) is independent of the choice of context from which it is computed.
The rest of this appendix is dedicated to the proof of the theorem, which can written in the language of n-Specker scenarios as follows. First of all notice that there are inclusions C(H) ⊆ Q(H) ⊆ Q 1 (H) ⊆ CE(H), which hold in general for any contextuality scenario H [8]. Therefore, in order to prove theorem 3, we only need to show CE(H) ⊆ C(H). This can be broken down into three parts. Firstly, we define a finite set of linear inequality constraints I n on G(H) and the corresponding polytope P, which satisfies them. Then we show in lemma 3.1 that all these inequalities are actually CE inequalities, so that CE(H) ⊆ P holds. The proof concludes with lemma 3.2, which establishes the equality of C(H) and P.   a i 1 , . . . , a i j , o), (13) 12 For simplicity we will denote by p(ai 1 , . . . , ai k ) the conditional probability distribution p(ai 1  with p(o) = 1 whenever k = n. We use the notation I k n for the set of corresponding inequalities with a given k.
We define the full set of inequalities, I n , as the union of all I k n for odd k between 1 and n. In fact, the right hand side of equation (13) can be expressed in simpler terms, as we illustrate next. Given a probabilistic model p, which specifies a probability distribution for each proper subset of X, there is a function f : O X → R that recovers p by marginalization 13 [7]. That is, by marginalizing f over any measurement i, we obtain the original probabilistic model for the context X \ {i} Even though this function is not unique, we can write I k,S,s,o n in terms of f uniquely. In particular, we have (s, o). (15) Notice that one can choose f to be a probability distribution over O X if and only if p is a classical probabilistic model.

Example 1.
As an example, consider the 3-Specker scenario, also known as the Specker' triangle. The two sets of inequalities that give rise to I 3 are The first ones are just the positivity constraints, which are always CE inequalities. Likewise, inequalities in I 3 3 can be written in a form that is manifestly CE, as whereā i := a i + 1 (mod 2). A generalization of the fact that I 3 consists of CE inequalities to all I n is the content of the next lemma.
Lemma 3.1. Every element of I n defined as above is a CE inequality.
Proof. We prove this result by induction on n. However, in order to perform the inductive step, we need the induction hypothesis to be slightly stronger than just the statement of lemma 3.1 for a particular value of n. of generalized events (which by a slight abuse of notation we refer to simply as events), such that the following properties hold: 13 The marginal of f over Ω ⊆ X is computed by summing the values of f over O Ω , which yields a function O X\Ω → R.

Induction hypothesis (IH
14 Note that when k is equal to n in I k,S,s,o n , S has to be X itself and o contains no outcomes. the fact that the set of events being summed over is precisely D sm m . Moreover, we use property (b) for D s m−2 m−2 , which is a part of IH m-2 . Finally, equation (29) replaces f with p. This is possible, because D sm m satisfies property (c), which is justified in the following step. IS(e) Direct inspection shows that every event in D sm m comprises a measurement that assigns a different outcome to s m , and one that assigns a different outcome tos m . Hence, D sm m satisfies property (e) as well.

IS(c) Since
Thus the proof that IH m-2 and IH m-1 imply IH m is complete. As a consequence, the induction hypothesis holds for all integers greater than 1, which proves lemma 3.1.
An immediate corollary of lemma 3.1 is the inclusion CE(H) ⊆ P. The rest of the proof of theorem 3 consists of demonstrating P = C(H). Proof. The proof is divided into three parts, each corresponding to one of the claims below. Firstly, we establish the dimension of the polytopes, showing that it is the same for both polytopes. Next, we show that every facet of P contains a facet of C(H). Finally, we prove that C(H) does not have any facets other than those contained within facets of P. These facts together imply P = C(H). Notice that every element of the probabilistic model p ∈ G(H) can be written as a linear combination of terms that refer to outcome(s) 0 only. There are n−1 j=1 n j = 2 n −2 such terms, which gives an upper bound on the dimensions of G(H), P and C(H).
One simple consequence of lemma 3.1 is that C(H) is a subset of P. Therefore, it suffices to show dim(C(H)) = 2 n −2. Note that the dimension of the classical polytope in an n-Specker scenario is precisely one less than the number of degrees of freedom in a joint probability distribution of n binary outcome measurements, i.e. one less than 2 n −1 [38]. Hence, the classical polytope has dimension 2 n −2, as required. Generically, given a polytope R, a linear inequality A ≥ 0 corresponds to a facet of R if and only if dim(R) or more vertices of R lie within the hyperplane A = 0. Moreover, if the above condition is satisfied, the facet of R lies within the hyperplane A = 0. Now, to every facet of P, we can associate an inequality from I n . Furthermore, each inequality in I n is saturated by a set of exactly 2 n −2 vertices of C(H), and no two of these sets are the same. This fact can be seen easily by considering the form of inequalities in equation (15) in terms of f . Since 2 n −2 is also the dimension of C(H) by claim 3.2.1, each inequality in I n is associated with a different facet of C(H). Therefore, every facet of P contains some facet of C(H), as C(H) ⊆ P. This relation between C(H) and P is depicted in figure 5. Moreover, every inequality in I n corresponds to a different facet of P. The number of facets of P is thus equal to the number of distinct inequalities in |I n |, which is where Ω denotes the set of odd integers between 1 and n and the factor of 1 /2 takes into account the fact that I k,S,s,o n = I k,S,s,o n , so that the corresponding inequality is counted only once. Figure 5: Two polytopes, R (gray) and R (blue). R is contained within R, and each facet of R contains one of R . This schematically depicts the relation between the classical polytope C(H) (which plays the role of R ) and the polytope P (which plays the role of R) proven in Claim 3.2.2. The number of vertices of C(H) is N := |O n | = 2 n . These correspond to all the possible deterministic assignments of probabilities to a joint outcome of the n measurements. As a matter of fact, no polytope with 2 n vertices and dimension 2 n −2 has more than 2 2n−2 facets, which happens to be the number of facets of P.

R R
In greater generality, McMullen [39] showed that any convex polytope with a given dimension d and number of vertices N has at most as many faces as the cyclic polytope C(d, N ). Let us denote the number of facets of C(d, N ) by # d−1 (C(d, N )). Theorem 4 by Gale [40] asserts that for even dimensions, d = 2m, this number satisfies