General Probabilistic Theories with a Gleason-type Theorem

Gleason-type theorems for quantum theory allow one to recover the quantum state space by assuming that (i) states consistently assign probabilities to measurement outcomes and that (ii) there is a unique state for every such assignment. We identify the class of general probabilistic theories which also admit Gleason-type theorems. It contains theories satisfying the no-restriction hypothesis as well as others which can simulate such an unrestricted theory arbitrarily well when allowing for post-selection on measurement outcomes. Our result also implies that the standard no-restriction hypothesis applied to effects is not equivalent to the dual no-restriction hypothesis applied to states which is found to be less restrictive.


Introduction
More than sixty years ago, Mackey [1] asked whether the density operator represents the most general notion of a quantum state that is consistent with the standard description of observables as self-adjoint operators. Gleason [2] responded with a proof that-in separable Hilbert spaces of dimension greater than two-every state must admit an expression in terms of a density operator if it is to consistently assign probabilities to the measurement outcomes of such observables. In 2003, Busch [3] (and then Caves et al. [4]) generalized the idea of Gleason's theorem to observables represented by positive-operator measures (POMs). The resulting Gleason-type theorem (GTT) not only is much simpler to prove but it also applies to two-dimensional Hilbert spaces, since the assumptions being made are stronger than in Gleason's case.
In this paper, we investigate whether the Gleason-type theorem is special to quantum theory. Imagine that a theory different from quantum theory were to successfully describe Nature. Would a GTT still exist?
Our question is made explicit by posing it within the family of general probabilistic theories (GPTs) which have emerged as natural generalizations of quantum theory [5][6][7][8][9]. The framework of GPTs derives from operational principles and it encompasses both quantum and classical models. One of the motivations to explore these alternative theories has been to identify features which single out quantum theory among others of comparable structure. Our study contributes to that fundamental quest.
Effectively, Gleason and Busch establish a bijection between frame functions and density operators in quantum theory. Frame functions associate probabilities to the mathematical objects representing the possible outcomes of measurements in such a way that the probabilities assigned to all disjoint outcomes of a given measurement sum to unity. The rationale behind a frame function is that the probabilities of all measurement outcomes for all observables should define a unique state. If this were not the case, then two "different" states would be indistinguishable, both practically and theoretically.
Our strategy will be to generalise the concept of frame functions to GPTs in order to investigate whether they are in exact correspondence with the objects that represent states in these theories. We are able to identify all general probabilistic theories in which this correspondence continues to hold. We find that GTTs exist for GPTs satisfying the no-restriction hypothesis [10,11], as anyone familiar with the proof of Busch's result or the work of Gudder et al. [12] might expect. However, we also find other GPTs which admit a Gleason-type theorem, namely those that satisfy a "noisy version" of the no-restriction hypothesis (or come arbitrarily close to satisfying it). An alternative way to characterise this class of GPTs is to use the idea of simultation of measurements via classical processes and postselection [13]. Any GPT in this class can simulate an arbitrarily good approximation to any observable in a related unrestricted GPT (i.e. the GPT satisfying the no-restriction hypothesis).
The existence of a GTT for GPTs such as quantum theory or real-vector-space quantum theory [14][15][16][17] has a number of consequences. It becomes possible, for example, to modify the axiomatic structure of the theories as it is no longer necessary to-separately and independently-stipulate both the state space and the observables of the theory. Our result can also be used to derive the standard GPT framework from operational assumptions different to those found in the literature. More specifically, the standard GPT framework is recovered if-after motivating the standard description of observables in GPTs-states are assumed to correspond to frame functions of these observables.
Additionally, our result also uncovers a new property of the no-restriction hypothesis in GPTs. The standard no-restriction hypothesis applied to effects turns out to be inequivalent to the dual assumption-the no-state-restriction hypothesis-applied to states. Consequently, deriving the GPT framework in two equivalent ways, starting from the structure of state spaces in one case or effect spaces in the other, leads to inequivalent frameworks if the respective no-restriction hypotheses are assumed.
To make the paper self-contained and to introduce the notation, we will first review concepts of the GPT framework relevant here. In Section 4, we define frame functions for GPTs and prove our main theorem. In Section 5, we provide three examples to demonstrate the simplification of the postulates required to specify an individual GPT. Section 6 strengthens our main theorem by defining frame functions only on a proper subset of all observables, the analog of projective-simulable observables. The stronger result leads to an alternative operational motivation for deriving part of the GPT framework. In Section 7 we summarize and discuss our results.

General probabilistic theories
The GPT framework allows one to define a broad family of theories of which quantum theory (in finite dimensional Hilbert spaces) is a member. Any (real or fictitious) system described by a GPT has the following fundamental property: there exists a finite set of fiducial measurement outcomes, the probabilities of which uniquely determine its state. 1 For example, the state of a spin-1/2 particle is determined by the probabilities of the +1 outcome of measuring spin observables in three orthogonal directions, as demonstrated by the Bloch vector description.
There are many different yet equivalent ways to formulate the GPT framework. To make this paper self-contained, let us briefly outline an intuitive approach to GPTs which is based on an operational derivation [7].

States
If a system has a minimal fiducial set consisting of d outcomes 2 , its state space S is given by a convex, compact set of vectors of the form where p k ∈ [0, 1] , k = 1 . . . d, are the probabilities of the fiducial outcomes. The extra dimension of the "ambient" vector space simplifies the description of measurement outcomes, as explained below. The convexity of the state space follows from the assumption that if one were to prepare the system in the states ω and ω = (p 1 , . . . , p d , 1) T with probabilities λ and (1 − λ), respectively, then the probability of observing the k-th fiducial measurement outcome should equal therefore, this mixed state should be represented by the vector A state ω is extremal if it cannot be written as a (non-trivial) convex combination of other states. The state space is assumed to be compact since, firstly, it must be bounded if the entries of the vector are to be between zero and one. Secondly, as an arbitrarily good approximation of a state would be operationally indistinguishable from the state itself, we also assume the state space is closed in the Euclidean topology. Throughout the manuscript we will use the Euclidean topology and Euclidean norm · .
As an example, consider a classical bit which may reside in one of two states called "0" and "1", or in a mixture of the two. If we know that the bit is in state 0 with probability 1 To encompass quantum theory in toto, the restriction to a finite set of fiducial measurement outcomes would need to be relaxed; see Nuida et al. [18] and Lami et al. [19], for example. 2 A fiducial set is minimal if there is no such set with fewer than d outcomes.
p then it is in state 1 with probability (1 − p); in other words, the number p ∈ [0, 1] determines the state of the system. When performing the measurement which asks "Is the bit in state 0 or 1?", the outcome "The bit is in state 0." forms a complete set of fiducial measurement outcomes. Thus, the state space S b of the bit can be represented by the line segment between (0, 1) T and (1, 1) T , as displayed in Fig. 1a (see Section. 2.2). The end points of the segment correspond to the states 0 and 1, respectively, and their convex hull defines the state space S b .

Effects and observables
The possible outcomes of measuring an observable in a GPT system with state space S correspond to effects which are linear maps e : R d+1 → R such that 0 ≤ e (ω) ≤ 1 for all states ω ∈ S; here e (ω) denotes the probability of observing the outcome e when a measurement M (with e as a possible outcome) is performed on a system in state ω. Due to the linearity of the map e, any effect can be uniquely expressed in the form for some vector e ∈ R d+1 . We will also use the term "effect" to refer to the vector e representing a map e. The linearity of effects is motivated by the assumption that they should respect the mixing of states with some parameter λ ∈ [0, 1]. More specifically, the following two events should occur with the same probability: (i) observing the outcome e of a measurement M performed on a system in a mixed state ω (λ) = λω + (1 − λ) ω ; (ii) observing the outcome e when the measurement M is performed with probability λ on a system in state ω and with probability (1 − λ) on a system prepared in state ω .
This assumption implies that the map e should satisfy Thus, the map e is an affine function on the state space S which can be extended to a linear function on the vector space R d+1 containing S. The set of all effects associated with measurement outcomes in a specific GPT system is known as its effect space, E. The space E corresponds to a convex subset of R d+1 , as does the state space S. It necessarily contains the zero and unit vectors, as well as the vector (u − e) for every e ∈ E [11], which arises automatically as a valid effect. We also assume that the effect space spans the full (d + 1) dimensions of the vector space; otherwise the model would contain states which result in identical probabilities for all effects in the effect space, making them indistinguishable and hence operationally equivalent. Note that a d-dimensional state space comes with a (d + 1)-dimensional effect space. Extremal effects are defined by the property that they cannot be written as a (non-trivial) convex combination of other effects.
Observables are given by tuples e 1 , e 2 , . . . of elements of the effect space that sum to the unit effect u, with each effect in the tuple corresponding to a different possible outcome when measuring the observable. The position of an effect in the tuple encodes the label of the corresponding outcome. Given the observable D e = e, u − e , for example, we will say effect e represents the first possible outcome of measuring D e since e occupies the first position in the tuple. A GPT should also specify which tuples of effects correspond to observables (or, in the language of [20], the GPT should specify the set of meters). We will assume throughout (except in Section. 6) that any finite tuple of effects e 1 , . . . , e n satisfying n j=1 e j = u (7) and j∈J e j ∈ E (8) for any subset J ⊂ {1, . . . , n}, corresponds to an observable of the GPT system 3 . Eq. (8) ensures that the set of observables is closed under coarse-graining of outcomes. The effect space E b of the classical bit with state space S b is given by the parallelogram depicted in Fig. 1a. The two-outcome measurement B answering "Is the bit in state 0 or 1?" is represented by

Equivalent GPTs
When considering a specific GPT it is sometimes useful to linearly transform its state and effect spaces. The Bloch vector describing a qubit state is a case in point since its components do not necessarily take values in the range [0, 1]. The Bloch vector representation of a qubit density operator is given by the vector (x, y, z, 1) T , with the fourth component being the coefficient of the identity matrix. The linear relation between each of the coefficients r ∈ {x, y, z} and the probability p r of finding the outcome "+1" when measuring σ r reads explicitly p r = (1 + r) /2. Any linear transformation which preserves the inner product between states and effects of a given GPT system gives rise to an alternative representation. Suppose that we transform the state space S by an invertible (d + 1) × (d + 1) matrix M to the space S M ≡ MS. Then we must apply the inverse transpose transformation M −T ≡ M −1 T to the effect space, E M ≡ M −T E, in order that the probabilities remain invariant, The transformed state and effect spaces continue to be convex subsets of R d+1 , and they can even be thought of as a convex subset of a vector space isomorphic to R d+1 . GPTs are often presented in this way (cf. [11,21] and references therein). The standard formulation of quantum theory in finite dimensions is an example of representing the state and effect spaces of a theory as subsets of a vector space isomorphic to R d+1 . Quantum states are represented by density operators on C d which form a convex subset of the real vector space of Hermitian operators on C d , which is isomorphic to R d 2 . Quantum effects, or elements of a positive-operator measure (POM), can also be embedded in this space with e (ω) = Tr (eω) for an operator e satisfying 0 ≤ ψ|e|ψ ≤ ψ|ψ for all rays |ψ ∈ C d . Using this representation of the state and effect spaces is convenient for d > 2 since it is cumbersome to explicitly describe the set of density operators by some set of constraints on vectors of the form given in Eq. (1) (see e.g. [22][23][24]).
As an explicit example, let us transform the GPT description of a classical bit with state space S b by the matrix The new state space, S B ≡ MS b , is now the convex hull of the images of the extremal states 0 and 1 (previously located at (0, 1) T and (1, 1) T , respectively), i.e.
Similarly, the effect space, E B ≡ M −T E b , is given by the convex hull of the zero effect 0, the unit effect u and two other extremal effects, as pictured in Figure 1b.

Cones in GPTs
The notion of a positive cone is useful when studying the properties of state and effect spaces of a GPT. A positive cone is a subset of R d+1 that contains all non-negative linear combinations of its elements (see [25], in which our positive cone corresponds to a cone containing the origin). Positive cones may, for example, be generated from convex subsets of real vector spaces.

Definition 1.
The positive cone A + of a convex subset A of a real vector space is the set of vectors Positive cones also arise from considering the space dual to a subset of vectors in an inner product space.
Figure (2a) illustrates, for a classical bit, the dual cone S * B of the state space S B . It is easy to see that, in general, the effect space E of a GPT system must be contained within the dual cone S * of the state space in order that the effects assign non-negative probabilities to every state in the state space.
The following lemma describes a simple but important property of effect spaces related to the fact that the elements of its dual cone effectively span the ambient space.

Lemma 1. For any effect space E and any vector
Proof. Firstly, the interior of E + is non-empty since E is convex and spans R d+1 . Let e be an interior point of E + . As e is an interior point of E + , we have that e + c ∈ E + for some > 0 and we may take a = (e + c) / and b = e/ .
Two further lemmata (proven in Appendix A), which we will need later on, establish relations between positive cones and their dual cones. We will denote the closure of a set A by A.

Lemma 2. Let
A be a compact, convex subset of R d+1 , then A * * = A + .

Lemma 3. For a compact and convex subset
Finally, it will also be important that the positive cone of a GPT state space is always a closed set (also proven in Appendix A).

Unrestricted GPTs
The main result of this paper is to establish a Gleason-type theorem for a class of GPTs which we will introduce in this section, namely almost noisy unrestricted GPTs. First, we describe the no-restriction hypothesis and the unrestricted GPTs it defines. Next, we define noisy unrestricted (NU) GPTs and, building on this concept, we define almost NU GPTs.

The no-restriction hypothesis
A particularly close relationship between state and effect spaces exists in GPTs that satisfy the no-restriction hypothesis [11], i.e. GPTs with effect spaces consisting of all linear maps e : R d+1 → R such that 0 ≤ e (ω) ≤ 1 for all ω ∈ S. In such an unrestricted theory the state space defines a unique effect space, and vice versa. The effect space of a system with state space S in an unrestricted GPT is given by where u − S * = {u − e|e ∈ S * }. The classical bit is an example of an unrestricted GPT system. The cones S * and (u − S * ) as well as their intersection are illustrated in Figure  2b. Conversely, if an unrestricted GPT system has an effect space E then a unique state space is associated with it, namely: where 1 d+1 = ω ∈ R d+1 |u · ω = 1 ; we will omit the subscript d + 1 whenever the dimension is clear from the context. We have introduced the maps E and W in the context of unrestricted GPTs but they are well-defined for the state and effect spaces of any GPT. The maps will play an important role in the derivation of our main result (see Section 4).

Noisy unrestricted GPTs
The class of NU GPTs consists of all unrestricted GPTs along with a special subset of restricted GPTs. The included restricted GPTs are those that can be thought of as unrestricted GPTs in which some (or all) of the observables can only be measured with a limited efficiency, or with some inherent noise.

Definition 3.
A GPT system is noisy unrestricted whenever its state space S and effect space E satisfy the following property: for every vector e ∈ E (S) there exists a number p e ∈ (0, 1] such that the rescaled vector p e e is contained in the effect space E.
It follows that NU GPT systems are exactly those in which the positive cones of the effect space E and the unrestricted effect space E(S) are equal. Moreover, Definition 3 is equivalent to the statement that each NU GPT system is closely related to an unrestricted GPT system in the following way: for each observable O = e 1 , e 2 , . . . , e n in the unrestricted GPT system, there exists some p ∈ (0, 1] and an observable in the NU GPT system, while the state spaces of the two systems are given by the same set. Thus, measuring the observable O p of the NU GPT can be thought of as successfully measuring the observable O (of the associated unrestricted GPT) with probability p, and observing no outcome with probability (1 − p), regardless of the state of the system. In the language of [13], a measurement of O p can simulate a measurement of O when one allows for post-selection on the outcomes. The case of p e = 1 for all vectors in e ∈ E (S) is included in Definition 3 for later convenience; in other words, "noiseless" unrestricted GPTs-i.e. those in which E = E (S) holds-are also considered to be NU GPTs. All other NU GPTs, however, are restricted, i.e. they violate the no-restriction hypothesis. Figure 3 shows two modified versions of the bit GPT system that violate the norestriction hypothesis, one of which is a NU GPT system while the other is not. Further examples of the three different varieties of GPT system-restricted, unrestricted and noisy unrestricted-can be found in Section 5.

Almost noisy unrestricted GPTs
We now introduce almost NU GPTs as a relaxation of the class of NU GPTs. Just as a NU GPT, a given almost NU GPT will be related to a unique unrestricted GPT. However, given an observable O of the unrestricted GPT, it will be sufficient for the almost NU GPT to include a noisy measurement of an observable arbitrarily close to O. In NU GPTs, there is a noisy measurement for each such observable, O, so the requirement is met. In a non-NU GPT it can only be met if the GPT has extremal effects arbitrarily close to the zero effect. For example, consider the almost NU bit depicted in Fig. 4. Every point on the boundary of the effect space is extremal and as the non-zero extremal effects approach zero they also approach the boundary of E + B , the positive cone of the unrestricted bit effect space.
In general an almost NU GPT is defined as follows.

Definition 4.
A GPT system is almost noisy unrestricted (aNU) whenever its state space S and effect space E satisfy the following condition: for every vector e ∈ E (S) and every > 0 there exists a vector e ∈ E(S) and a number p e ∈ (0, 1] such that e − e < and the rescaled vector p e e is contained in the effect space E. Alternatively, we can characterise aNU GPTs as those that can simulate any observable in their unrestricted counterpart arbitrarily well via post-selection. Explicity, for every observable O = e 1 , e 2 , . . . , e n in the unrestricted GPT system and > 0, there exists a second observable (the approximation) O = e 1 , e 2 , . . . , e n of the unrestricted GPT system satisfying e j − e j ≤ for all 1 ≤ j ≤ n, and also a probability p > 0 such that O = pe 1 , pe 2 , . . . , pe n , (1 − p)u is an observable of the aNU GPT system. The fact that Def. 4 is a necessary condition for a GPT to have this property is clear. To show that it is also sufficient we need the following alternative characterisation of aNU GPTs which will also be key to proving our main result.

Lemma 5.
In an almost NU GPT the state space S and effect space E of each system are related by The proof can be found in Appendix A. Note that, as shown in the proof of Lemma 5, in an aNU GPT we find E + = E(S) + , a condition considered by Ludwig in his operational framework, see [26, Chapt. VI, Thm. 2.2.1]. Now we can show that an aNU GPT system contains an observable O which is a noisy version of an arbitrarily close approximation O to any observable O in its unrestricted counterpart. Let S and E be state and effect spaces of a system in an almost NU GPT and E(S) be the effect space of the corresponding unrestricted GPT system. Firstly, if O = u/n, u/n, . . . , u/n we may take O = O. Secondly, for every O = e 1 , e 2 , . . . , e n = u/n, u/n, . . . , u/n in the unrestricted system and > 0 define e j = δu/n + (1 − δ)e j where δ = min e j − u/n 1 ≤ j ≤ n such that e j = u/n for all 1 ≤ j ≤ n. Since E(S) spans R d+1 and contains u − e for all e ∈ E(S) it follows that u/n and hence e j are interior points of E(S) + . By the equality E + = E(S) + we have that e j ∈ E + and thus the observable is in the aNU GPT for some p > 0. This observable is a noisy version of the observable Note that, compared to the condition in Lemma 5, NU GPTs satisfy the stronger condition E (S) = E + ∩ u − E + . Hence in GPTs which are aNU but not NU we find E + = E + , i.e. the positive cones of the effect spaces are not closed. It follows from the proof of Lemma 4 that E + = E + only holds when there are extremal effects arbitrarily close to the zero effect, such as in the aNU bit in Fig. 4.

Gleason-type theorems for GPTs
Gleason's theorem is motivated by the idea the probabilities of the outcomes of any measurement performed on a quantum system should uniquely define the state of the system. Thus, every state should come with a frame function, that is a probability assignment on the space of projections (and later, quantum effects [3]) such that the probabilities of the disjoint outcomes of any measurement sum to unity. In order to formulate a GTT for GPTs, we need to generalize the concept of a frame function.

Definition 5.
A frame function on the effect space E of a GPT system is a map v : Considering measurements with only a finite number of possible outcomes is sufficient for our purposes; thus, assumption (V2) is only required to hold for finite sequences of effects. Countable sequences of effects may be required if one considers infinite-dimensional systems.
In quantum theory the results of Gleason and Busch show that any frame function must correspond to a density operator. In other words, there are no states beyond those we already believe to exist under the assumption that states must correspond to frame functions. We will take the analog of this idea as the definition of a GTT for a GPT.

Definition 6.
A GPT admits a Gleason-type theorem if and only if for each of its systems every frame function on the effect space E can be represented by a state in the state space S.
In this definition, the state space S is prescribed by the GPT and may be a subset of the set W (E) of all mathematically well-defined states given the effect space E. The existence of a GTT would allow the set of all possible states of a GPT system to follow from the effect space via the natural assumption that a state can be uniquely defined by its propensity to take each possible value of every observable. The requirement that all mathematically possible states are realised in a theory could be thought of as dual to the no-restriction hypothesis, i.e. requiring that all effects have a corresponding measurement outcome. We will show, however, that the classes of GPTs that satisfy these requirements do not coincide.

A Gleason-type theorem for almost NU GPTs
After these preliminaries, let us state the main result of this paper which identifies the condition under which Gleason-type theorems exist for general probabilistic theories.
i.e. a GPT admits a Gleason-type theorem if and only if it is an almost noisy unrestricted GPT.
Since quantum theory in finite dimensions is a GPT obeying the no-restriction hypothesis, Busch's result [3] is an immediate consequence of Theorem 1. The infinite-dimensional case, however, will not be treated here.

Consequences of a GTT for aNU GPTs
Before presenting the proof of Theorem 1, we briefly explain how a GTT allows one to simplify the postulates used to describe a specific GPT in an axiomatic approach. A simple way to state the postulates, often used for quantum theory, is to describe the mathematical objects that represent observables and states along with the rule for calculating the probabilities of measurement outcomes (supplemented by postulates describing the composition of systems and, possibly, the evolution of the system in time). In general, for some GPT system with effect space E and state space S, such postulates would take the following form: (O) The observables of the system correspond exactly to the tuples of vectors e 1 , e 2 , . . .
in E that sum to the vector u, with each vector corresponding to a possible disjoint outcome of measuring the observable.
(S) The states of the system correspond exactly to vectors ω ∈ S.
(P) When measuring the observable e 1 , e 2 , . . . on a system in state ω ∈ S, the probability to obtain outcome e j is given by p j (ω) = e j · ω.
If there exists a GTT for the GPT in hand, then it could be recovered by replacing the postulates (S) and (P) by the operationally motivated assumption that every state must have a corresponding frame function defining its outcome probabilities, along with the converse assumption that every frame function must have a corresponding state in the theory. Consequently, one only needs to supplement postulate (O) with a single new postulate.
(F) There exists a state of the system for every frame function on the effect space E. Our result also opens up an alternative step in establishing the GPT framework. In Section 2 we reviewed the derivation of the framework from operational principles which motivates the structure of state spaces in GPTs from the assumption of fiducial measurement outcomes, as in [6,7,9,21,27,28]. The approach then arrives at the effect-space structure by motivating effects as affine functions on the states space.
One may, however, invert this process and arrive at the same framework. By assuming that fiducial states exist one can motivate the structure of effect spaces in GPTs 4 , namely as convex, compact subsets of a real vector space containing the zero vector and a vector u such that u − e is in the set for every effect e 5 . At this point, the standard approach would motivate states as affine functionals on the effect space using arguments based on classical mixtures.
Our results offer a mathematically weaker alternative: it is sufficient to combine the effect-space structure with a minimal set of two-outcome observables and their convex combinations as described in Section 6. Then, a corollary to our Gleason-type theorem may be used to recover the structure of the state space (as a subset of W (E) where E is the effect space of the model) from the assumption that a state must come with a unique frame function on this minimal set of observables.
In other words, assuming that states are frame functions offers an alternative to the mathematically stronger assumption that states are linear functionals on the real vector space containing the effect space. The details of this approach are described in Appendix C.
We highlight one observation following from this derivation, namely, the inequivalence of the no-restriction hypothesis-as described in Section. 3.1-and the dual assumption in the fiducial-states approach, which we will call the no-state-restriction hypothesis 6 . Recall that the no-restriction hypothesis assumes that, given a state space S, every mathematically valid effect is indeed an effect of the system, i.e. E = E(S). The no-state-restriction hypothesis says that, given an effect space E, every mathematically valid state appears in the theory, i.e. S = W (E). As shown in the proof of Theorem 1 below, the relation S = W (E) holds for exactly almost NU GPTs. Therefore we can conclude that the no-restriction and no-state-restriction hypotheses are not equivalent and the former constitutes a stronger assumption.
Since both approaches are operationally valid, the no-restriction hypothesis and the no-state-restriction hypothesis appear to be motivated equally well. One could argue that their inequivalence demonstrates that neither assumption is valid. In any case, if one is ready to assume either of them, one should be willing to assume the other, too. Thus, the stronger no-effect-restriction hypothesis should be taken, as seems to have happened naturally in the literature, e.g in [6,19,27,32,33]. Conversely, they should also both be rejected together, meaning no-go theorems such as in [28] would benefit from targetting the weaker no-state-restriction hypothesis.

Proof of Theorem 1
Using Definition 6 of a GTT, Theorem 1 shows that aNU GPTs are exactly the class of GPTs that admit GTTs. We will prove this result in two steps: (i) in Lemma 6, a frame function on a GPT effect space E is found to correspond to a vector in the set W (E) defined in Section 3.1; (ii) the set W (E) is found to correspond to the state space of a GPT system if and only if the GPT is in the class of aNU GPTs, in Lemmata 7 and 8.
Step (i): The proof of the following lemma 7 follows the one for the quantum case given in [3].
for some vector ω ∈ W (E) and all effects e ∈ E.
Lemma 6 shows that if one defines states as frame functions on an effect space E, then the associated state space must be W (E).
Step (ii): We will now prove that the set W (E) corresponds to the state space of a GPT system with effect space E if and only if the GPT is an aNU GPT. Two lemmata will be needed to show that W (E (S)) = S holds for all GPTs while the relation E (W (E)) = E (S) only holds for aNU GPTs. The proofs can be found in Appendix D.
Lemma 7. For any GPT system with state space S, we have W (E (S)) = S. 6 Following this terminology the no-restriction hypothesis should be more accurately called the no-effectrestriction hypothesis. For consistency with the literature and brevity we will stick with with the standard nomenclature.

Lemma 8. Given a GPT system with state and effect spaces S and E, respectively, the relation E (W (E)) = E (S) holds if and only if
We are now in a position to prove our main result, Theorem 1, announced in the previous section. It states that a general probabilistic theory admits a Gleason-type theorem if and only if it is almost noisy unrestricted. The result is an immediate consequence of the lemmata just shown.
Proof. By Lemma 6 we know that frame functions must be of the form v (e) = e · ω for some ω ∈ W (E). Thus, to conclude the proof we need to show that the state space of Now, by applying the W map to both sides of this equation and using Lemma 7, we find Secondly, assume that Eq. (20) does not hold, i.e. E + ∩ u − E + = E (S), then by and S = S, which is the content of Theorem 1.

Examples and applications
In this section, we will consider examples of NU GPT systems to show how their axiomatic formulation simplifies due to the Gleason-type theorem they allow. We also highlight that a simple well-known non-quantum model introduced by Spekkens [35] does not belong to the class of almost NU GPTs and, therefore, does not come with a GTT.

Simplified axioms for a rebit and other unrestricted GPTs
Unrestricted GPTs are a well-studied class of GPT. The rebit [14], for example, is a GPT system with a disc-shaped state space. The state space can be equivalently modeled (see Section 2.3) by the subset of real density matrices of a qubit. Rebits are convenient low-dimensional building blocks for a toy model of quantum theory, giving rise to many characteristic features such as superposition, entanglement and non-locality [14,16,32]. Using our notation, the state space of a rebit is given by where The convex hull of the zero effect 0, the unit effect u and a continuous ring of effects form the rebit effect space illustrated in Figure 5a. The rebit satisfies the no-restriction hypothesis since the effect space E R is as large as is permitted in the GPT framework (see Eq. 17).
Let us now apply the general argument given at the end of Section 4.1 to a rebit, as a first example of an unrestricted GPT. Instead of using the GPT framework, we could have described the rebit of this hypothetical world in an axiomatic fashion, i.e. by assuming the axioms (O), (S) and (P) from Section 4.1, using the effect and state spaces E R and S R . Then, Theorem 1 states that, alternatively, we could postulate the rebit observables and, by considering the frame functions associated with them, recover both the state space S R and the probability rule. More explicitly, we replace postulates (S) and (P) by a single postulate with operational motivation.
(F) The states of a rebit correspond exactly to the frame functions on the effect space E R .
In other words, we effectively introduce the states of the rebit as probability assignments on the outcomes of measurements. The model created by the postulates (O) and (F) is equivalent to the the original one in the sense that it makes exactly the same predictions. Classical bits, qubits and qudits as well as square bits (or squits, for short) are other unrestricted GPTs for which identical arguments also result in a smaller set of axioms by means of our Gleason-type theorem. The state and effect spaces of bits and qudits were described in Section 2.3 while a squit or gbit [6] is a GPT with a square state space and an octahedral effect space, as illustrated in Figure 5b. Pairs of squits are often considered in the study of non-local correlations since they are capable of producing the super-quantum correlations of a PR-box [36].

A GPT with a GTT: the noisy rebit
Next, let us consider a noisy rebit characterized by the property that the extremal rebit observables D e θ = e θ , u − e θ , θ ∈ [0, 2π), can be measured only imperfectly, i.e. with some efficiency p ∈ (0, 1); see Eq. (27) for the definition of the effects e θ 8 . The state space of this NU GPT system coincides with that of the rebit, S n R ≡ S R . In order to define its effect space , let us introduce two continuous rings of effects, These rings, along with the zero effect 0 and the unit effect u, form the extremal points of the noisy rebit effect space, depicted in Figure 6. While still being a GPT system, the model does not satisfy the no-restriction hypothesis: the effect space E n R is restricted to a proper subset of E R shown in Figure 5a. Nevertheless, Theorem 1 continues to apply: the noisy rebit admits a GTT which is effectively due to the fact that there exist finite neighbourhoods of the zero effect and the unit effect in which E n R and E R coincide. Thus, the noisy rebit does not satisfy the no-restriction hypothesis, but it does satisfy the dual assumption of the no-state-restriction hypothesis (see Appendix C).
Repeating the argument presented in Section 5.1, we are able to simplify the definition of the noisy rebit in terms of postulates (O), (S), and (P) which introduce its effect space E n R , its state space S n R , and the Born rule, respectively. The alternative axiomatic formulation in terms of only two postulates only rests on the effect space of the system, (O) The observables of a noisy rebit correspond exactly to the tuples of vectors e 1 , e 2 , . . . in E n R that sum to the vector u, with each vector corresponding to a possible disjoint outcome of measuring the observable.
(F) The states of a noisy rebit correspond exactly to frame functions on the effect space E n R .
Mutatis mutandis, this procedure applies to any other aNU GPT.

A GPT without a GTT: the Spekkens toy model
In 2007, a toy theory was introduced [35] capable of reproducing a number of important quantum features such as the existence of non-commuting observables, the impossibility of cloning arbitrary states and the presence of entanglement while simultaneously admitting a description in terms of local hidden variables. Originally, Spekkens' model had been introduced without reference to the GPT framework. Here, we will consider the "convexification" of this model such that it becomes a GPT system, as described in [11]. Considered as a GPT system, Spekkens' model comes with a restricted effect space, and it is not part of an aNU GPT which can be seen as follows. Its state space is given by a regular octahedron with vertices Under the no-restriction hypothesis the extremal effects (other than the zero and unit effects) associated with the space S S would be the vertices of a cube. In Spekkens' model, however, they are taken to be the vertices of another octahedron inscribed into this cube, as depicted in Figure 7. More explicitly, the effect space is the convex hull of the zero and unit effects and the six extremal effects given by the (rescaled) vectors in Eq. (32), Not being an aNU GPT system, Theorem 1 tells us that the toy model does not admit a GTT. 9 It is impossible to reproduce this GPT system by assuming that the states of the system are in one-to-one correspondence with the frame functions on the effect space. There are, in fact, more frame functions than states in S S . The frame functions correspond to all vectors in the set W (E S ) which is a strict superset of S S forming a cube around S S , in the same way that E (S S ) encloses E S in Figure 7.
In order to recover the original model, one would have to place a restriction on which frame functions correspond to allowed states. This restriction can be considered analogous to relaxing the no-restriction hypothesis on the effect space.

A GTT for almost NU GPTs based on two-outcome observables
The definition of a frame function used in Section 4 is based on the idea that every sequence of effects e 1 , e 2 , . . . ∈ E satisfying Eqs. (7) and (8) corresponds to an observable. This assumption is, however, not operationally motivated and is therefore not part of the most general GPT framework. In this section we show that relaxing this assumption does not pose an obstacle to the existence of Gleason-type theorems. In quantum theory a Gleasontype theorem can already be derived by involving only a specific subset of all POMs [37] known as projective-simulable observables [38]. We will show now that a similar weakening of the assumptions continues to imply the result of Theorem 1 in the context of GPTs. In this way, we are able to extend our result to the most general type of GPTs.
Let us begin by introducing the idea of simulating the measurement of an observable by measuring other observables. This simulation is achieved in a GPT by classically mixing observables and post-processing measurement outcomes [39]. For example, to simulate the observable we may measure the observables E = e 1 , e 2 , u − e 1 − e 2 and F = f , 0, u − f (35) with probabilities 1/3 and and 2/3, respectively, to simulate followed by coarse-graining the first two outcomes to produce the dichotomic observable G, where the addition and scalar multiplication of observables is performed elementwise. The only post-processing necessary in the proof to follow is to add outcomes to an observable that occur with probability zero. For example, the two-outcome observable e, u − e simulates the three-outcome observable e, u − e, 0 if one considers there to be a third outcome of measuring e, u − e which never occurs. Now consider the set of observables which may be simulated by two-outcome extremal observables, i.e. those described by an extremal effect e and its complement u − e. For brevity we will refer to such observables as simulable. We now define simulable observables, denoting by O(j) the j-th outcome of an observable O. A dichotomic observable has precisely two non-zero effects and, possibly, any number of copies of the zero effect.

Definition 7. A simulable n-outcome observable O satisfies
for some probability distribution q(k|j) such that N k=1 q(k|j) = 1 for 1 ≤ j ≤ m and some classical mixture M of dichotomic extremal observables.
The set of simulable observables is minimal in the sense that it is contained in any set of observables, under the operational assumption that a set of observables be simulationclosed [20]. To see this we first note that each extremal effect must be included in some observable, otherwise it would be excluded from the effect space. Any observable containing a given extremal effect e can be coarse-grained to give the two-outcome extremal observable e, u − e . Thus, in order to be closed under simulation a set of observables must at least contain all those that can be simulated by two-outcome extremal observables.
Next, let us call a frame function simulable if the property (V2) in Definition 5 is required to hold for simulable observables only. Theorem 1 can now be strengthened because the properties of simulable frame functions are sufficient for a proof.

Theorem 2. Let S and E be the state and effect spaces, respectively, of a NU GPT. Any simulable frame function v on E admits an expression
v (e) = e·ω, (37) for some ω ∈ S and all e ∈ E.

Proof. See Appendix E.
This theorem can be used to provide an alternative step in the operational derivation of the GPT framework as described in Section 4.1, with full details given in Appendix C.

Summary and Discussion
From a conceptual point of view, the results of this paper imply that each general probabilistic theory belongs to one of two distinct classes: either it admits, like quantum theory, a Gleason-type theorem which allows us to construct the set of the possible states of the theory, or it does not admit a GTT.
In Lemma 6 (see Section 4) frame functions were found to be linear functionals on the effect space. If one considers this fact to be the main content of the Gleason-type theorems in quantum theory then the lemma proves that Gleason-type theorems exist for all GPTs. In this paper we have, however, taken the view that a Gleason-type theorem should establish a bijection between frame functions and states in the theory under consideration.
Interpreting GTTs in this way, Theorem 1 shows that a GPT admits such a theorem if and only if it is an almost noisy unrestricted GPT, of which classical and quantum models are examples. Requiring that there is a state in a theory for every frame function could be considered as dual to the no-restriction hypothesis which demands that to every mathematical effect there should correspond a measurement outcome. However, we have shown that the no-restriction hypothesis is more restrictive than requiring the existence of a GTT. On the one hand, every unrestricted GPT admits a GTT but, on the other, there are almost NU GPTs that admit a GTT but violate the no-restriction hypothesis. Phrased differently, the no-effect-restriction hypothesis is manifestly different from the no-state-restriction hypothesis.
In Section 4.1 we describe how a Gleason-type theorem can be used to derive the state space in a given GPT from the set of observables. The postulates (O), (S) and (P), which specify a given GPT, can be replaced by just two postulates, namely (O) and (F), when the description of states as frame functions is assumed. This reduction is only possible in almost NU GPTs.
Extensions of Gleason's theorem to beyond quantum theory have been considered previously in the literature. Gudder et al. [12] consider states on convex effect algebras. These algebras can be represented [40] by subsets K + ∩ (u − K + ) of a real linear space V in which K is a positive cone and u is an element of K + . This representation coincides with unrestricted effect spaces in our terminology. Morphisms (which coincide with frame functions in our terminology) were shown in [12] to extend to positive linear functionals on V . Barnum [41] pointed out that this result can be considered as a Gleason-type theorem demonstrating, in our terminology, the existence of a Gleason-type theorem for unrestricted GPTs. Theorem 1 extends this result to exactly the class of almost NU GPTs.
In recent work [42][43][44], alternatives to simplifying the postulates of quantum theory have been put forward by assuming, for example, the postulates of pure states and their dynamics in combination with operational reasoning. It would be interesting to study whether similar approaches also hold for other GPTs, or whether they are unique to quantum theory.
The current work relies heavily on the convex structure of GPTs. In future work we would like to establish which GPTs admit an analog of Gleason's original theorem, in the sense that the frame functions would only be defined on extremal effects where convexity arguments can no longer be made. Finally, it might be possible to establish a link between Gleason-type theorems and the set of almost-quantum correlations [45]. It is known that GPTs satisfying the no-restriction hypothesis cannot produce the set of almost quantum correlations in Bell scenarios [28]. If this result could be extended to almost NU GPTs then the existence of a GTT for a GPT would also preclude the possibility of that GPT producing the set of almost quantum correlations.
draft of our manuscript and for suggesting a way to correct it which triggered the extension of the result from NU GPTs to aNU GPTs. VJW acknowledges support by the Foundation for Polish Science (IRAP project, ICTQT, contract no. 2018/MAB/5, co-financed by EU within Smart Growth Operational Programme) and the Government of Spain (FIS2020-TRANQI and Severo Ochoa CEX2019-000910-S), Fundació Cellex, Fundació Mir-Puig, Generalitat de Catalunya (CERCA, AGAUR SGR 1381 and QuantumCAT).
Proof. Firstly, let a ∈ A + . Then there exists a sequence (a n ) n∈N such that a n → a as n → ∞ with a n ∈ A + . For any b ∈ A * we have a n · b ≥ 0 and, since the inner product is continuous, a · b ≥ 0. Therefore we have shown a ∈ A * * .
Secondly, consider x / ∈ A + . By the hyperplane separation theorem there exists h ∈ R d+1 and real numbers c 1 > c 2 such that a · h ≥ c 1 for all a ∈ A + and x · h ≤ c 2 . For all a ∈ A + and λ > 0 we have λa · h ≥ c 1 and therefore a · h ≥ c 1 /λ. Taking the limit as λ → ∞ we find a · h ≥ 0 for all a ∈ A + and hence h ∈ A * . Finally, since 0 ∈ A + we find 0 = 0 · h ≥ c 1 > c 2 , and thus x · h ≤ c 2 < 0, meaning x / ∈ A * * since we found that h ∈ A * .

Lemma 3. For a compact and convex subset
Proof. By Definition 2, a vector b is in the dual cone A * of A if and only if b · a ≥ 0 for all a ∈ A. Equivalently, we may require x (b · a) = b · (xa) ≥ 0 for all vectors a in the set A and x ≥ 0, which holds if and only if b ∈ A + . Lemma 4. Given a GPT state space S, the positive cone S + is closed.
Proof. Firstly, note that by definition the set S does not contain the zero vector, 0. Let (x j ) j∈N be a convergent sequence in R d+1 such that x j ∈ S + for all j ∈ N and x j → x as j → ∞. We will show that x ∈ S + . If x = 0, the statement holds. Now assume x = 0. For each j ∈ N there exists p j > 0 and ω j ∈ S such that x j = p j ω j . By the Bolzano-Weierstrass theorem and the compactness of S, the sequence (ω j ) j∈N has a convergent subsequence (ω k(j) ) j∈N such that ω k(j) → ω ∈ S as j → ∞. Now we show that the corresponding subsequence (p k(j) ) j∈N neither diverges to infinity nor tends to zero. Firstly, assume that p k(j) → ∞ as j → ∞. Then, since x k(j) → x, we find the contradiction ω k(j) = x k(j) /p k(j) → 0 / ∈ S. Secondly, assume that p k(j) → 0 as j → ∞. Then x k(j) = p k(j) ω k(j) → 0, contradicting our assumption that x = 0.
Therefore, since (x k(j) ) j∈N converges, we find there exists a finite number p > 0 such that p k(j) → p as j → ∞. This result means that in the limit of j → ∞, x j will converge to an element of S + : x j = p j ω j → pω ∈ S + as j → ∞.

Lemma 5.
In an almost NU GPT the state space S and effect space E of each system are related by E (S) = E + ∩ u − E + .
Proof. Consider a GPT system with state space S and effect space E. Assume that the spaces satisfy E (S) = E + ∩ u − E + .
Then if e ∈ E (S), we have e ∈ E + and for every > 0 there exists e ∈ E + such that e − e < . Hence there exists p ∈ (0, 1] such that pe ∈ E which means that the GPT system satisfies Definition 4. Conversely, assume that the GPT satisfies Definition 4. We will show that E + = E(S) + , by firstly showing E(S) + ⊆ E + and secondly that E + ⊆ E(S) + .
For every vector be ∈ E(S) + (where e ∈ E(S) and b > 0) and > 0 there exists ae ∈ E + (where e ∈ E and a > 0) such that be − bae < b . Letting = δ/b, we find for every f ∈ E(S) + and δ > 0 there exists f ∈ E + such that f − f < . Since every point in E(S) + is a point of closure of E + , we find E(S) + ⊆ E + .
To prove that E + ⊆ E(S) + , we begin by showing that E(S) + is a closed set. Firstly, we have E(S) + = (S * ∩ (u − S * )) + by Eq. (17). Then we find as follows. The set on the left of this equation is clearly contained in that on the right since we have (S * ∩ (u − S * )) + ⊆ (S * ) + = S * . Furthermore, if e ∈ S * , then non-negative rescalings of e are also contained in S * : xe ∈ S * for all x ≥ 0. Since u · ω = 1 for all ω ∈ S, u is an internal point of S * . Thus, there exists an open ball B (u, ) around u of radius in S * for some > 0. Therefore, for x < / e we have u − (u − xe) < and hence u − xe ∈ S * . By definition, xe ∈ (u − S * ), hence we have xe ∈ S * ∩ (u − S * ) and e ∈ (S * ∩ (u − S * )) + , thus verifying Eq. (38). The dual cone S * is, by definition, the intersection of a collection of closed halfspaces therefore S * = E(S) + is closed. Now, since E(S) + is closed and contains E + , we have E + ⊆ E(S) + . Finally, we have shown (S') There exist d fiducial measurement outcomes of observables whose probabilities determine the state of the system. These states are restricted to being represented by vectors in S.
The first part of the postulate, the existence of d fiducial measurement outcomes, determines that the state space can be embedded in R d and is convex, with convex combinations of vectors representing classical mixtures of the corresponding states. However, this assumption does not determine the "shape" of the state space, hence the inclusion of the second part of the postulate restricting the state space to S. For a specific GPT, the second part of the postulate may take a more natural-sounding form such as state vectors having modulus less than or equal to one. From (S'), using the standard operational assumption that effects must respect classical mixtures and the no-restriction hypothesis (see Section 2), the postulates (O) and (P) are recovered easily. Let us conclude by comparing this approach to our approach of using Theorem 1 in order to reduce the postulates (O), (S) and (P).
First, postulate (O) does not assume that there exists d fiducial outcomes. This property is a consequence in our approach once the states are identified as linear functionals on the effect space. Therefore, postulate (O) is not simply a stronger version of (S').
Second, in order to postulate the existence of d fiducial measurement outcomes, as is done in (S'), one assumes some knowledge of all the observables of the system; otherwise one would not know that the two outcomes in question form a complete fiducial set. Therefore, axiom (S') makes assumptions about both the states and the observables of the system whereas (O) only concerns observables.
Finally, in the approach based on (S'), additional assumptions would be necessary to reconstruct an almost NU GPT which does not satisfy the no-restriction hypothesis because one could not use the no-restriction hypothesis to recover the postulates (O) and (P). However, such a GPT does admit a GTT, as Theorem 1 shows, and hence the first method would still be valid.

C The "fiducial state" derivation of the GPT framework
In the modern literature, the GPT framework is typically derived, as in Section 2, by assuming the existence of fiducial measurement outcomes first, then defining the state space of a system, followed by a full treatment of observables and their measurement, see for example [6,7,21,27,28,46]. However, one may equally consider the inverted argument, i.e. derive the framework using equivalent operational assumptions, by assuming a fiducial set of states, in order to define all possible measurements and their outcomes then finding the compatible mathematical description of states. Note that such a dual approach is not novel, for example, Ludwig describes the idea in his work on operational theories [26] exemplify and the test-space formalism [29,31] begins with the structure of measurement outcomes first. Proceeding in this second, dual manner the structure of effect spaces is established first then Theorem 2 presents an alternative method for deriving the structure of state spaces, compared with the standard argument involving mixtures of measurement outcomes.
We begin by summarising the "fiducial states" derivation of the GPT framework in parallel with Section 2. Consider all the possible outcomes of the measurements of all the observables of a given system. We will assume that there exists a finite set of fiducial states such that any one of these outcomes, ζ, is uniquely determined by the probabilities of ζ being observed after a measurement (of which ζ is a possible outcome) is performed on the system in each of the fiducial states. In other words, for a system with d states in its fiducial set, an outcome may be identified by the vector e ∈ R d such that where p j is the probability of observing the outcome for a system in the jth fiducial state. This representation of measurement outcomes is derived from the operational assumption that one should be able to distinguish two distinct measurement outcomes by their statistics on a finite number of states, in analogy to assuming the possibility of distinguishing two distinct states from the probabilities of a finite number of measurement outcomes in the "fiducial measurements" approach.
In line with GPT terminology we will call the set of vectors corresponding to outcomes in a model the effect space and the vectors within this set effects. Note that the effects are now simply vectors and not linear functionals. For brevity, we will often refer to a measurement outcome as the effect by which it is represented.
In the bit example from Section 2, the fiducial set of states could be the "0" and "1" states. Thus the effect space would be a subset of R 2 .
We will assume the existence of an outcome that occurs with probability one for any state of the system. This outcome must be represented by the effect Similarly, we assume the existence of an outcome that never occurs, represented by the effect Any outcome e must have a complement, namely the outcome "not e" necessarily occurring with probability (1 − p j ) when the measurement of "e or not e" is performed on the jth fiducial state. Therefore, for any effect e = (p 1 , . . . , p d ) T the vector must also be in the effect space. Consider two measurements on the system each with a discrete set of possible outcomes and label the outcomes of each measurement with positive integers such that the first measurement has outcomes {e 1 , e 2 , . . .} and the second {e 1 , e 2 , . . .} (if the measurement has a finite number, n, of possible outcomes the labels j for j > n are assigned the zero effect). If a classical mixture of these measurements is performed then possible outcomes of this procedure can be represented by convex combinations of effects. Specifically, if the first measurement is performed with probability p and the second with probability 1 − p, then observing an outcome labeled j from this procedure must be represented by the vector pe j +(1−p)e j in order to be consistent with the fiducial state set. Therefore we assume the effect space is convex. Finally, since an arbitrarily good approximation of an effect would operationally be indistinguishable from the effect itself we assume the effect space is a closed subset of R d .
Returning to the bit example, we can build our effect space from the requirement of having a measurement that perfectly distinguishes "0" and "1", and must therefore have outcomes, (1, 0) T and (0, 1) T . Combined with the other requirements for an effect space we find the bit effect space to be the square in Figure 8, a transformation of the bit effect space described in Section 2.2.
We have arrived at the same requirements for the structure of an effect space as were described in Section 2 (a convex, compact subset of a real vector space containing the zero vector, and a vector u such that u − e is in the set for every e in the set). We may now consider how states should be represented in the framework. We assume a state will be represented by a map ω from an outcome e to the probability of observing e when a measurement (of which e is a possible outcome) is performed on a system in state ω. From here we may derive the state space structure of the GPT framework using the standard operational assumptions or the alternative presented by Theorem 2.
One the one hand, the standard method for deriving the structure of the state space is to exploit the fact that we wish for outcome probabilities to respect mixtures, in analogy with the reasoning behind (5), to find for p ∈ [0, 1] and all effects e, e . Thus each map ω admits an expression for all effects e and some ω ∈ W (E) ∈ R d . One the other hand, we have already assumed that a pair {e, u − e} form a measurement and have introduced the formalism for describing mixtures of measurements, therefore the simulable measurements from Section 6 are already included in the framework. Theorem 2 then tells us that if a state ω is to assign probabilities to the possible outcomes of these measurements such that the probabilities of all the outcomes sum to one then ω(e) = e · ω, (45) for all effects e and some ω ∈ W (E) ∈ R d . Both of these approaches lead to the conclusion that the state space of a GPT with effect space E must be a subset of W (E). Although the conditions are mathematically different there is no clear conceptual advantage to either argument.
The "fiducial states" derivation of the framework highlights the existence of a relative of the no-restriction hypothesis, which we will call the no-state-restriction hypothesis: the inclusion of all ω ∈ R d satisfying e · ω and u · ω = 1 in the state space. Note that this is not equivalent to the no-restriction hypothesis in all cases, for example the noisy bit model in Figure 3a satisfies the no-state-restriction hypothesis but not the no-restriction hypothesis.
Continuing the bit example, employing either the no-restriction or no-state-restriction hypothesis leads to the state space S B , the convex hull of the points (0, 1) T and (1, 0) T pictured in Figure 8.
for some vector ω ∈ W (E) and all effects e ∈ E.
Proof. Consider a finite set of effects e 1 , e 2 , . . . , e n ∈ E such that j∈J e j ∈ E for any subset J ⊆ {1 . . . , n}. First we show that a frame function v must be additive on any such set, i.e. v (e 1 ) + v (e 2 ) + . . .
We have that the tuples are both observables in the GPT since they satisfy Eqs. (7) and (8). Hence, by property (V2) of a frame function we find and Eq. (46) follows. The next step is to show the homogeneity of v on E, i.e.
Note that the convexity of E ensures that rescaling an effect e by a factor α ≤ 1 produces another effect: Now consider two rational numbers p, q ∈ [0, 1] with p ≤ q. Using property (V1) of a frame function with argument (q − p) e ∈ E guarantees that v ((q − p) e) ≥ 0. Also we find by property (V2) of a frame function that v (qe) = v (qe − pe + pe) = v ((q − p) e) + v (pe) .
Thus, the values of frame functions on multiples of a given effect respect the ordering induced by the scale factors, v (pe) ≤ v (qe) .
Next, let p µ and q ν be sequences of rational numbers in the interval [0, 1] that tend to α from below and above, respectively. Then we have so that the homogeneity of v claimed in Eq. (49) follows from taking the limit of both sequences. Thirdly, we construct a well-defined extension of the frame function v to E + , the positive cone associated with E (see Definition 1) such that v (a + b) = v (a) + v (b) holds for all a, b ∈ E + . To do so, consider two effects e 1 , e 2 ∈ E which give rise to the same vector in the positive cone via a = a 1 e 1 = a 2 e 2 ∈ E + , with 1 < a 1 < a 2 . Then we have v (e 2 ) = v a 1 a 2 e 1 = a 1 a 2 v (e 1 ) , hence a 2 v (e 2 ) = a 1 v (e 1 ), and we may uniquely define the frame function on arbitrary vectors in the positive cone by v (a) := a 1 v (e 1 ) .
Additivity of the extended frame function is easily seen to hold for vectors in the positive cone: consider vectors a = ae a and b = be b for e a , e b ∈ E and a, b > 1 and let c = a + b.
Noting that (a + b) /c ∈ E is an effect, we obtain v (a + b) = cv 1 c (a + b) = cv 1 c a + cv A linear extension of a frame function v to the whole of R d+1 follows from the fact that any c ∈ R d+1 outside the positive cone E + may be decomposed into c = a − b with a, b ∈ E + by Lemma 1. If the decomposition is not unique, c = a − b = a − b , we have It then follows from Eq. (46), that v (a) Therefore we may uniquely define the value of the frame function on the vector c via i.e. independently of the decomposition of the vector c. This extension of any frame function v on E to R d+1 must be linear. First we show additivity: let R d+1 c j = a j − b j for a j , b j ∈ E + , then v (c 1 + c 2 Then to show homogeneity let R d+1 c = a − b for a, b ∈ E + , firstly consider γ ≥ 0, in which case we have v (γc) = v (γa − γb) Secondly, consider γ < 0, v (γc) = v ((−γ) (−c)) Therefore, the extended map admits an expression as v (a) = a · ω, for some vector ω = d+1 j=1 v (x j ) x j ∈ R d+1 , where {x 1 , . . . x d+1 } is a basis of R d+1 . Finally, requirements (V1) and (V2) on the behaviour of the frame function v on the effect space E imply that ω ∈ W (E) which concludes the proof.

E Proof of Theorem 2
Proof. Due to the convexity of the effect space E, we can express any effect e ∈ E as a convex combination e = j p j e j , for some extremal effects e j and real numbers p j ∈ [0, 1] which sum to one. Thus we may simulate the observable and hence v (e/2) = v (e) /2.
Secondly, for any effects e, e ∈ E such that e + e ∈ E, the observable is simulable by Eq. (74). Comparing with Eq. (75) gives so that v (e) + v (e ) = v (e + e ) follows, using Eq. (77). By induction, any simulable frame function v is a frame function as defined in Definition 5. Thus, by Theorem 1, any simulable frame function v admits the expression given in Eq. (37).