A structure theorem for generalized-noncontextual ontological models

It is useful to have a criterion for when the predictions of an operational theory should be considered classically explainable. Here we take the criterion to be that the theory admits of a generalized-noncontextual ontological model. Existing works on generalized noncontextuality have focused on experimental scenarios having a simple structure: typically, prepare-measure scenarios. Here, we formally extend the framework of ontological models as well as the principle of generalized noncontextuality to arbitrary compositional scenarios. We leverage a process-theoretic framework to prove that, under some reasonable assumptions, every generalized-noncontextual ontological model of a tomographically local operational theory has a surprisingly rigid and simple mathematical structure -- in short, it corresponds to a frame representation which is not overcomplete. One consequence of this theorem is that the largest number of ontic states possible in any such model is given by the dimension of the associated generalized probabilistic theory. This constraint is useful for generating noncontextuality no-go theorems as well as techniques for experimentally certifying contextuality. Along the way, we extend known results concerning the equivalence of different notions of classicality from prepare-measure scenarios to arbitrary compositional scenarios. Specifically, we prove a correspondence between the following three notions of classical explainability of an operational theory: (i) existence of a noncontextual ontological model for it, (ii) existence of a positive quasiprobability representation for the generalized probabilistic theory it defines, and (iii) existence of an ontological model for the generalized probabilistic theory it defines.


Introduction
For a given operational theory, under what circumstances is it appropriate to say that its predictions admit of a classical explanation?This article starts with the presumption that this question is best answered as follows: the operational theory must admit of an ontological model that satisfies the principle of generalized noncontextuality, defined in Ref. [1].Admitting of a generalizednoncontextual ontological model subsumes several other notions of classical explainability, such as admitting of a positive quasiprobability representation [2,3], being embeddable in a simplicial generalized probabilistic theory (GPT) [4][5][6][7], and admitting of a locally causal model [8,9].(Note that the first two of these results are first proved for general compositional scenarios in this paper.)Additionally, generalized noncontextuality can be motivated as an instance of a methodological principle for theory construction due to Leibniz, as argued in Ref. [10] and the appendix of Ref. [11].Finally, operational theories that fail to admit of a generalized-noncontextual ontological model provide advantages for information processing relative to their classically explainable counterparts [12][13][14][15][16][17][18][19].Because the notion of generalized noncontextuality is the only one we consider in this article, we will often refer to it simply as 'noncontextuality'.
To date, prepare-measure scenarios are the experimental arrangements for which the consequences of generalized noncontextuality have been most explored.A few works have also studied experiments where there is a transformation or an instrument intervening between the preparation and the measurement [18,[20][21][22][23][24].However, generalized noncontextuality has not previously been considered in experimental scenarios wherein the component procedures are connected together in arbitrary ways, that is, in arbitrary compositional scenarios.Indeed, generalized noncontextuality has not even been formally defined at the level of compositional theories prior to this work; rather, it and several related concepts have only been formally defined for particular types of scenarios.In this work, we give a process-theoretic [25][26][27][28] formulation of the various relevant notions of operational theories and of representations thereof, enabling the study of noncontextuality in arbitrary compositional scenarios, and indeed of the noncontextuality of operational theories themselves.We then derive a number of results regarding the structure of noncontextual representations of operational theories, and we ultimately put strong constraints on the nature of these representations.
Like Ref. [4], this work is sensitive to the distinction between operational theories and quotiented operational theories, commonly termed generalized probabilistic theories (or GPTs) [27][28][29][30][31].In an operational theory, one understands the primitive processes (e.g., preparation, transformation, and measurement procedures) to be lists of laboratory instructions detailing actions that can be taken on some physical system.Such a theory also makes predictions for the statistics of outcomes in any given experimental arrangement (without making any attempt to explain these predictions).As lists of laboratory instructions, the processes in an operational theory contain details which are not relevant to the observed statistics; any such details are termed the context of the given process [1].In contrast, a quotiented operational theory, or GPT, arises when one removes this context information by identifying any two processes that differ only by context-that is, which lead to all the same statistical predictions, and so are said to be operationally equivalent.
Our formalization of both operational theories and generalized probabilistic theories follows that of quotiented and unquotiented operational probabilistic theories (OPTs), as in Refs.[32][33][34].For completeness, we provide a pedagogical introduction to our notation and conventions in Sec. 2. The framework presented here is also a precursor to a more novel framework presented in Ref. [35], which is motivated by the objective of cleanly separating the causal and inferential aspects of a theory.
There are multiple different representations of operational theories and generalized probabilistic theories that one can consider, often motivated by the aim of explaining the predictions of the theory by appealing to some underlying realist model of reality.The quintessential sort of explanation is an ontological model of an operational theory, which presumes that the systems passing between experimental devices have properties, and that the outcomes of measurements reveal information about these properties.A complete specification of these properties for a given system is termed its ontic state.The variability in this ontic state mediates causal influences between the devices.Ontological models may be defined for either operational theories or generalized probabilistic theories.Another type of representation which has been widely considered (particularly by the quantum optics community) is that of quasiprobabilistic representations.Such representations are only defined for generalized probabilistic theories, and can be viewed as ontological models using quasiprobability distributions-that is, analogues of probability distributions in which some of the values can be negative.
Our formalization of ontological models is more general than that which is usually given, since we define them in a compositional manner, and for arbitrary theories rather than for particular scenarios.(Although note that this was already done for the special case of quantum theory in Ref. [36].)Furthermore, our formalization of quasiprobabilistic representations is more general than that which is usually given, since we define them for arbitrary GPTs (not necessarily quantum).In this latter case, our formalization was strongly influenced by the prescription suggested in Ref. [37].
We can now return to the question of when a theory's predictions admit of a classical explanation.
As argued above, our guiding principle is that of noncontextuality.The principle of noncontextuality is a constraint on ontological models of operational theories: namely, that the representation of operational processes does not depend on their context.That is, operational processes which lead to identical predictions about operational facts are represented by ontological processes which lead to identical predictions about ontological facts.If such a noncontextual ontological model exists, we take it to be a classical realist explanation of the predictions of the operational theory.Hence, the notion of classical-explainability for operational theories is the existence of a generalized-noncontextual ontological model.
If one takes the processes of a generalized probabilistic theory as the domain of one's representation map, then there is no context on which a given representation (be it an ontological model or a quasiprobabilistic representation) could conceivably depend.This point was first made in Ref. [4], and we expand on it in this work, in particular in Appendix A. In such an approach, one cannot directly take noncontextuality-independence of context-as a notion of classicality for generalized probabilistic theories.Still, Ref. [4] showed that, in prepare-measure scenarios, the notion of noncontextuality for operational theories induces a natural and equivalent notion of classicality for generalized probabilistic theories.In particular, a generalized probabilistic theory admits of a classical explanation if and only if there is a simplex that embeds its state space and furthermore the hypercube of effects that is dual to this simplex embeds its effect space.Such an embedding can be viewed as an ontological model of a GPT [4,5].Hence, the resulting notion of classical explainability for a GPT is the existence of an ontological model for it.Our work extends this result to the case of arbitrary compositional theories and scenarios.
We also extend (from prepare-measure scenarios to arbitrary compositional scenarios) the proof that positive quasiprobabilistic models are in one-to-one correspondence with noncontextual ontological models [2,4].Note that our proof-like the special case given in Ref. [4]corrects some issues with the original arguments in Ref. [2]. 1 1 Ref. [2] was not careful to distinguish between quotiented and unquotiented operational theories, and as such did not stipulate whether quantum theory was being considered as an operational theory or as a GPT.As a result, it failed to note that the most natural domain for a quasi-probabilistic representation is quantum theory as a GPT, while the domain of a noncontextual ontological representation is necessarily quantum theory as an operational theory.Also as a consequence, it argued that a positive quasiprobability representation is the same thing as a noncontextual ontologi-Note that simplex-embeddability can also be motivated as a notion of classicality as follows.First, a simplicial GPT, i.e., one in which all of the state spaces are simplices, transformations are arbitrary convex-linear maps between these simplices, effects are elements of the hypercubes which are dual to the simplices, and which is tomographically local, has been argued to capture a notion of classicality among operational theories [29,30].If a GPT satisfies simplex-embeddabiltiy, then it follows that the set of states and the set of effects therein can be conceptualized as a subset of those arising in a simplicial GPT, implying that every experiment describable by the GPT can be simulated within the simplicial GPT.It follows that simplex embeddability captures the possibility of simulatability within a classical operational theory, hence a notion of classicality.Furthermore, the existence of a positive quasiprobabilistic representation is a notion of classicality in the sphere of quantum optics.Hence, our results ultimately show that three independently motivated notions of classicality (namely these two, and the existence of a noncontextual ontological model of an operational theory) all coincide in general compositional situations (such as is relevant, for example, in quantum computation).
Most importantly, this equivalence allows us to prove that every noncontextual ontological model of a tomographically local operational theory which satisfies an assumption of diagram preservation has a rigid and simple mathematical structure.In particular, every such model is given by a diagram-preserving positive quasiprobabilistic model of the GPT associated with the operational theory, and we prove that every such quasiprobabilistic model is in turn a frame representation [3,38] that is not overcomplete.As a corollary, it follows that the number of ontic states in any such model is no larger than the dimension of the GPT space.
This rigid structure theorem and bound on the number of ontic states shows that there is much less freedom in constructing noncontextual ontological models than previously recognized.In particular, it means that once the representation of the states is fixed (i.e., by choice of frame) then there is no remaining freedom in the representations of the measurements and transformations.Moreover, in many ontological models the number of ontic states is taken to be infinite (e.g., corresponding to points on the surface of the Bloch ball); however, all such models are immediately ruled out by our bound on the number of ontic states.These results also imply new cal model.Our recasting of the relation between quasiprobability representations and noncontextual ontological representations is explicit about such distinctions, and consequently we show that a positive quasiprobability representation is not in and of itself a noncontextual ontological model.Rather, it is just that the sets of these are in one-to-one correspondence.
proofs of the fact that operational quantum theory does not admit of a generalized-noncontextual model and simplifies the problem of witnessing generalized contextuality experimentally.
Categorically, we view the GPT as being a particular monoidal category, and the representations thereof as being particular strong monoidal functors into subcategories of FVect R (the category of linear maps between finite dimensional real vector spaces).In the case of tomographically local GPTs, our structure theorem states that any such functor is naturally isomorphic to a standard representation of a tomographically local GPT within FVect R .In particular, this means that ontological models for such theories, should they exist, are essentially unique.
We now summarize our key results and main assumptions in more detail.

Results
We begin by providing informal statements of our main results.The first result in this list extends the results of Ref. [4] from the case of prepare-measure scenarios to arbitrary scenarios.The second entry in the list constitutes the main technical result of this work.The third is a primary consequence of this for the study of noncontextual ontological models.
1. We refine and generalize the notions of quasiprobabilistic models and ontological models of operational theories and of GPTs to arbitrary compositional scenarios and theories, and we show a triple equivalence between: (a) a positive quasiprobabilistic model of the GPT associated to the operational theory, (b) an ontological model of the GPT associated to the operational theory, and (c) a noncontextual ontological model of the operational theory.2. We then prove a structure theorem for representations of a GPT which implies that: (a) every diagram-preserving quasiprobabilistic model of a GPT is a frame representation that is not overcomplete, i.e., an exact frame representation (b) every diagram-preserving ontological model of a GPT is a positive exact frame representation, and (c) every diagram-preserving noncontextual ontological model of an operational theory can be used to construct a positive exact frame representation of the associated GPT, and vice versa.3. A key corollary of these is that the cardinality of the set of ontic states for a given system in any diagram-preserving ontological model is equal to the dimension of the state space of that system in the GPT.For instance, a noncontextual ontological model of a qudit must have exactly d 2 ontic states.Similarly, the dimension of the sample space of any diagrampreserving quasiprobabilistic model is the GPT dimension.These results show that by moving beyond preparemeasure scenarios, the concept of a noncontextual ontological model of an operational theory becomes constrained to a remarkably specific and simple mathematical structure.Moreover, our bound on the number of ontic states yields new proofs of the impossibility of a noncontextual model of quantum theory (e.g., via Hardy's ontological excess baggage theorem [39]) and dramatic simplifications to algorithms for witnessing contextuality in experimental data (e.g.reducing the algorithm introduced in Ref. [4] from a hierarchy of tests to a single test).

Assumptions
The assumptions that are needed to prove our results will be formally introduced as they become relevant.For the sake of having a complete list in one place, however, we provide an informal account of them here.These assumptions can be divided into two categories.
First, we have assumptions limiting the sorts of operational theories that we are considering.
1. Unique deterministic effect: We consider only operational theories in which all deterministic effects (corresponding to implementing a measurement on the system and marginalizing over its outcome) are operationally equivalent [32].2. Arbitrary mixtures: We assume that every mixture of procedures within an operational theory is also an effective procedure within that operational theory.That is, for any pair of procedures in the theory, there exists a third procedure defined by flipping a weighted coin and choosing to implement either the first or the second, depending on the outcome of the coin flip.3. Finite dimensionality: We assume that the dimension of the GPT associated to the operational theory is finite.4. Tomographic locality: For some of our results, we moreover limit our analysis to operational theories whose corresponding GPT is tomographically local (namely, where all GPT processes on composite systems can be fully characterized by probing the component systems locally [29]).Second, we have assumptions that concern the ontological model (or quasiprobabilistic model).
1. Deterministic effect preservation: Any deterministic effect in the operational theory is represented by marginalization over the sample space of the system in the ontological (or quasiprobabilistic) model.2. Convex-Linearity: The representation of a mixture of procedures is given by the mixture of their representations, and the representation of a coarsegraining of effects is given by the coarse-graining of their representations.3. Empirical adequacy: The ontological (or quasiprobabilistic) representations must make the same predictions as the operational theory.4. Diagram preservation: The compositional structure of the ontological (or quasiprobabilistic) representation must be the same as the compositional structure of the operational theory.(Formally, this means that we take these representations to be strong monoidal functors.) The most significant assumption regarding the scope of operational theories to which our results apply is that of tomographic locality.Among the assumptions concerning the nature of the ontological (or quasiprobablistic) model, the only one that is not completely standard is that of diagram preservation.
As we will explain, however, the assumption of diagram preservation does not restrict the scope of applicability of our results; rather, it is a prescription for how one is to apply our formalism to a given scenario.Furthermore, our main results do not require the full power of diagram preservation, but rather can be derived from the application of this assumption to a few simple scenarios: the identity operation, the prepare-measure scenario, and the measure-and-reprepare operation.However, full diagram preservation is a natural generalization of these assumptions, as well as of a number of other standard assumptions that have been made throughout the literature on ontological models, and so we will build it into our definitions rather than endorsing only those particular instances that we need for the results in this paper.We discuss these points in more detail in Section 5, and provide a defense of full diagram preservation in Ref. [35].

Preliminaries
In this section we provide a pedagogical introduction to the diagrammatic notation that we will employ and its application to operational theories, tomographically local GPTs, and their ontological and quaisiprobabilistic representations.This section should be treated largely as a review of the relevant literature, which we include to have a self-contained presentation of the necessary formalism for our main results.

Process theories
In this paper we will represent various types of theories as process theories [26], which highlights the compositional structures within these theories.We will express certain relationships that hold between these process theories in terms of diagram-preserving maps2 .We give a brief introduction to this formalism here.Readers who would like a deeper understanding of this approach can read, for example, Refs.[25][26][27][28]35].
A process theory P is specified by a collection of systems A,B,C,... and a collection of processes on these systems.We will represent the processes diagrammatically, e.g., where we work with the convention that input systems are at the bottom and output systems are at the top.We will sometimes drop system labels, when it is clear from context.Processes with no input are known as states, those with no outputs as effects, and those with neither inputs nor outputs as scalars.The key feature of a process theory is that this collection of processes P is closed under wiring processes together to form diagrams; for example, Wirings of processes must connect outputs to inputs such that systems match and no cycles are created.
We will commonly draw 'clamp'-shaped higher-order processes [41,42] such as: which we call testers.These can be thought of as something which maps a process from A to B to a scalar via: These are not primitive notions within the framework of process theories and instead are always thought of as being built out of particular state, effect and auxiliary system 3 .In other words, the tester τ is really just shorthand notation for a triple (s τ ,e τ ,W τ ) where: Wτ . ( A diagram-preserving map M : P → P ′ from one process theory P to another, P ′ , is defined as a map taking systems in P to systems in P ′ , denoted as S → M (S), (6) and processes in P to processes in P ′ .Taking inspiration from [40,44] this will be depicted diagrammatically as M :: such that wiring together processes before or after apply- 3 The recent work of Ref. [43] has shown how these, and other, higher order processes can actually be incorporated as primitive notions within a process-theoretic framework.
ing the map is equivalent, that is: In particular, this means that M maps the identity processes in P to identity processes in P ′ .
Remark 2.1.If we interpret these process theories as symmetric monoidal categories, then any strict monoidal functor M defines a diagram preserving map M , simply by taking M (A) = M (A) and M (f ) = M (f ).Note that this latter equation is not obviously well-typed, as, according to Eq. (7), M (f ) : However, in the case of strict monoidal functors, we have that , and so this is not actually a problem.If, on the other hand, one instead has a strong monoidal functor, M , in which this equality is relaxed to the natural isomorphism [45] µ, , one can still use this to define a diagram preserving map.The difference is that now (following Ref. [40]) we need to use the natural isomorphisms µ in order to define M (f ), that is, we define In this paper we will always be considering strong monoidal functors where M (I) = I, but if one merely had that M (I) ϵ ∼ = I, then one must also incorporate this natural isomorphism when defining the action of M on states, effects, and scalars.
We will also use the concept of a sub-process theory, where the intuitive idea is that P ′ is a sub-process theory of P, denoted P ′ ⊆ P, if the processes in P ′ are a subset of the processes in P that are themselves closed under forming diagrams.Formally, we do not require that a sub-process theory is such a theory, but only that it is equivalent to such a theory; that is, we say that P ′ ⊆ P if and only if there exists a faithful strong monoidal functor from P ′ into P.
The key process theory which underpins this work is FVect R , defined as follows: Example 2.2 (FVect R ).Systems are labeled by finite dimensional real vector spaces V where the composition of systems V and W is given by the tensor product V ⊗ W . Processes are defined as linear maps from the input vector space to the output vector space.Composing two processes in sequence corresponds to composing the linear maps, while composing them in parallel corresponds to the tensor product of the maps.If a process lacks an input and/or an output then we view them as linear maps to or from the one-dimensional vector space R. Hence, processes with no input correspond to vectors in V and processes with no output to covectors, i.e., elements of V * .This implies that scalars-processes with neither inputs nor outputs-correspond to real numbers.FVect R is equivalent to the process theory of real-valued matrices.However, representing the former in terms of the latter requires artificially choosing a preferred basis for the vector spaces.
The first is the process theory of (sub)stochastic processes.Here, systems are labeled by finite sets Λ which compose via the Cartesian product.Processes with input Λ and output Λ ′ correspond to (sub)stochastic maps, and can be thought of as functions where for all λ ∈ Λ we have λ ′ ∈Λ ′ f (λ ′ |λ) ≤ 1.When this inequality is an equality, they are said to be stochastic (rather than substochastic).For any pair of functions f : Λ×Λ ′ → [0,1] and g : Λ ′ ×Λ ′′ → [0,1] (where the output type of f matches the input type of g), sequential composition is given by g•f : Λ×Λ ′′ → [0,1] via the following rule for composing the functions: For any pair of functions It is sometimes more convenient or natural to take an alternative (but equivalent) point of view on this process theory (e.g., this view makes it more clear that this is a sub-process theory of FVect R ).In this alternative view, the systems are not simply given by finite sets Λ, but rather are taken to be the vector space of functions from Λ to R, denoted R Λ .Then, rather than taking the processes to be functions f : Λ × Λ ′ → [0,1], one takes them to be linear maps from R Λ to R Λ ′ , denoted by: where for all λ ′ ∈ Λ ′ , we define f (v)(λ ′ ) := λ∈Λ f (λ ′ |λ)v(λ).It is then straightforward to show that sequential composition of the stochastic processes corresponds to composition of the associated linear maps and that parallel composition of the stochastic processes corresponds to the tensor product of the associated linear maps.For example, for sequential composition we have that for all v ∈ R Λ , = = = Moreover, consider processes with no input-that is, linear maps p : R → R Λ .By using the trivial isomorphism R ∼ = R ⋆ where ⋆ is the singleton set ⋆ := { * }, one can see that these correspond to functions p : The second subtheory of FVect R is QuasiSubStoch, which is the same as the process theory of (sub)stochastic processes, but where the constraint of positivity is dropped.The systems can be taken to be finite sets Λ, and the processes with input Λ and output Λ ′ can be taken to be functions These are said to be quasistochastic (as opposed to quasisubstochastic) if they moreover satisfy λ∈Λ ′ f (λ ′ |λ) = 1 for all λ ∈ Λ.The way that these compose and are represented in FVect R is exactly the same as in the case of substochastic maps.
Summarizing, we have Example 2.4 (QuasiSubStoch).We define QuasiSubStoch as the subtheory of FVect R where systems are restricted to vector spaces of the form R Λ and processes are those that correspond to quasi(sub)stochastic maps.
By construction, SubStoch ⊂ QuasiSubStoch ⊂ FVect R ; however, in contrast to FVect R , SubStoch and QuasiSubStoch do come equipped with a preferred basis for each system.It is known that quantum theory as a GPT (QT) can be represented as a subtheory of QuasiSubStoch (see, for example, [37]).

Operational theories
We now introduce a process-theoretic presentation of the framework of operational theories as defined in Ref. [1], resulting in a framework that is essentially that of (unquotiented) operational probabilistic theories [34].An operational theory Op is given by a process theory specifying a set of physical systems and the processes which act on them (where processes are viewed as lists of lab instructions), together with a rule for assigning probabilities to any closed process.A generic laboratory procedure has an associated set of inputs and outputs, and will be denoted diagrammatically as: Of special interest are processes with no inputs and processes with no outputs, depicted respectively as The former is viewed as a preparation procedure and the latter is viewed as an effect, corresponding to some outcome of some measurement.We depict the probability rule by a map p, as That is, the application of p on any closed diagram yields a real number between 0 and 1.Note that this is not a diagram-preserving map as it can only be applied to processes with no input and no output.(Nonetheless, we will see shortly how it has a diagram-preserving extension to arbitrary processes-namely, the quotienting map).This probability rule must be compatible with certain relations that hold between procedures [31,41].First, it must factorise over separated diagrams, for example, Moreover, if T 1 is a procedure that is a mixture of T 2 and T 3 with weights ω and 1 − ω respectively4 , then it must hold that for any tester τ , we have Additionally, if one operational effect E 1 is the coarsegraining of two others, E 2 and E 3 , then Pr(E 1 ,P ) must be the sum of Pr(E 2 ,P ) and Pr(E 3 ,P ) for all P .
Our main result holds only for operational theories satisfying the following property: In other words, any two processes T and T ′ that give the same statistics for all local preparations on their inputs and all local measurements on their outputs also give the same statistics in arbitrary circuits.Such operational theories are alternatively characterized by the fact that the GPT defined by quotienting them satisfies tomographic locality, as we show below.
Two processes with the same input systems and output systems are said to be operationally equivalent [1] if they give rise to the same probabilities no matter what closed diagram they are embedded in.The testers from Eq. (5) facilitate a convenient diagrammatic representation of this condition.That is, two processes are operationally equivalent, denoted by if they assign equal probabilities to every tester5 , so that It is easy to see that operational equivalence defines an equivalence relation.Hence, we can divide the space of processes into equivalence classes, and each process T in the operational theory can be specified by its equivalence class T , together with a label c T of the context of T , specifying which element of the equivalence class it is.For a given T , c T provides all the information which defines that process which is not relevant to its equivalence class.Hence, each procedure is specified by a tuple, T := ( T ,c T ), and we will denote it as such when convenient.In the case of closed diagrams, the equivalence class can be uniquely specified by the probability given by the map p, and so any information beyond this forms the context of the closed diagram.
Next, we define a quotienting map ∼ which maps procedures into their equivalence class (exactly as is done to construct quotiented operational probabilistic theories in Ref. [34]).Given a characterization of each procedure as a tuple of equivalence class and context, the quotienting map picks out the first element of this tuple, taking ( T ,c T ) → T . 6Diagrammatically, we have We prove that it is diagram-preserving in Appendix E.1.For processes which are closed diagrams, one can always choose the representative of the equivalence class to be the real number specified by the probability rule.
Hence, the map ∼: Op → Op can be viewed as a diagrampreserving extension of the probability rule p.This implies that the quotiented operational theory reproduces the predictions of the operational theory, since It is worth noting that in the quotiented operational theory, a closed diagram is equal to a real number (the probability associated to it), while in the operational theory these are not equal until the map p is applied to the closed diagram.
We will assume that every deterministic effect for a given system, A, in the operational theory (corresponding to implementing a measurement on the system and marginalizing over its outcome) is operationally equivalent.We denote these deterministic effects as: c where c labels the context.

The GPT associated to an operational theory
It is well known [31][32][33] that a quotiented operational theory, Op, is nothing but a generalized probabilistic theory [29,30], and in fact for this paper we view this as the definition of a GPT.We will now demonstrate this by showing that Op is tomographic (a notion that will be defined momentarily), is representable in real vector spaces, is convex, and has a unique deterministic effect.This is analogous to how quotiented OPTs arise from unquotiented OPTs in [32,33].Firstly, note that the quotiented operational theory is tomographic.For a generic process theory, P, being tomographic means that processes are characterized by scalars.That is, given any two distinct processes f, g : there must exist a tester h ∈ P that turns each of these processes into a closed diagram, i.e., a scalar, such that the scalars are distinct: That Eq. (32) implies Eq. (31) for processes in a quotiented operational theory is trivial; we now give the proof that Eq. (31) implies Eq. (32).Consider two distinct processes, T and T ′ , in the quotiented operational theorythe images under the quotienting map of process T and T ′ in the operational theory-such that: By definition, we know that T ̸ = T ′ implies that T ̸ ≃ T ′ , and hence there exists some tester τ such that Since the action of p is identical to that of ∼ on closed diagrams, this implies that Finally, we can use the fact that the quotienting map is diagram-preserving to write that there exists τ such that This establishes that Eq. (31) implies Eq. (32), and so the quotiented operational theory is tomographic.This means that we can identify an operational equivalence class of processes, T , with a real vector, K T , living in R T A→B , where T A→B denotes the set of testers for processes with input A and output B. Concretely, we define these vectors component-wise via Clearly, following on from the discussion around Eq. (36), we have that K T = K T ′ if and only if T = T ′ .This vector space representation, however, is generally infinite dimensional, and gives a highly inefficient characterisation of processes.We can instead focus on some minimal subset of fiducial testers F A→B ⊂ T A→B which, for notational convenience, we index as The term fiducial means that this subset of testers satisfies two key properties.The first is that they must also suffice for tomography, i.e., We can therefore use these fiducial testers to define a finite dimensional vector representation R T of a process T , defined componentwise via for α = 1,2,...,m A→B .This new representation, R T , has a straightforward relation to the original vector representation, K T ; all one must do to go from the original to the new representation is to restrict the K T vectors to the relevant subset of their components.We can think of this as a linear restriction map F A→B : R T A→B → R F A→B :: K → K| F A→B .This allows us now to relate these two representations via the observation that The second key property of fiducial sets of testers is that they define a linear compression of the K T vectors.
Formally, what we mean by this is that there is a linear map E A→B : R F A→B → R T A→B which is the inverse to the restriction map F A→B on vectors K T , that is, for all We reiterate that F A→B is taken to be a minimal fiducial set of testers, which means that it is a minimal cardinality set of testers satisfying these two properties.Note that minimal fiducial sets are typically not unique.
Consider now how the sequential composition of processes is represented.Given representations of a pair of processes (R T , R T ′ ) we know (as R is injective) that we can determine T and T ′ , compute their composition T ′ • T , and via Eq.(39) obtain R T ′ • T .We denote the sequential composition map on the vector representation as R T ′ □R T := R T ′ • T .Similarly, for parallel composition we can define R T ′ ⊠R T := R T ′ ⊗ T .As we demonstrate in Appendix E.2, both □ and ⊠ can be uniquely extended to bilinear maps on the relevant vector spaces.Specifically, we have: Lemma 2.5.The operation □ can be uniquely extended to a bilinear map and the operation ⊠ can be uniquely extended to a bilinear map This implies that transformations act linearly on the state space, and also that the summation operation distributes over diagrams, i.e.: It is generally easier to work with this vector representation of processes rather than directly with the abstract process theory of operational equivalence classes of procedures.We will do so when convenient, abusing notation by dropping the explicit symbol R, and simply denoting the vector representation of the equivalence classes in the same way as the equivalence classes themselves.That is, we will denote R T by T .For example, we will write Eq. (42) as Note that generic linear combinations such as i r i T i need not correspond to any process in the operational theory.However, some linear combinations correspond to mixtures and coarse-grainings, and these will correspond to other processes in the operational theory.Namely, if T 1 is a procedure that is a mixture of T 2 and T 3 with weights ω and 1−ω, then by Eq. ( 25) it follows that for all τ , which in turn implies that and so by the fact that quotiented operational theories are tomographic, Hence, the mixing relations between preparation procedures in the operational theory are captured by a convex structure in this representation.More generally we find that for arbitrary coefficients r i .Hence, for example, the coarse-graining relations that hold among operational effects are captured by the linear structure in this representation of the quotiented operational theory.
If one makes the standard assumption that every possible mixture of processes in the operational theory is another process in the operational theory, then it follows that the quotiented operational theory is convex.
Finally, the fact that we assumed that each system A had a unique equivalence class of deterministic effects means that the quotiented operational theory will have a unique deterministic effect [32] for each system: In summary, a quotiented operational theory satisfies the key properties of a GPT: being tomographic, representability of each system in R d (for some d), convexity, and uniqueness of the deterministic effect for each system.Henceforth, we will refer to the quotiented operational theory as the GPT associated to the operational theory, and we presume that every GPT can be achieved in this way.
For example, quantum theory qua operational theory is the process theory whose processes are laboratory procedures (including contexts), while quantum theory qua GPT is the process theory whose processes are completely positive [47,48] trace-nonincreasing maps, whose states are density operators, and so on.When one quotients quantum theory qua operational theory, one obtains quantum theory qua GPT.
It is worth noting that a quotiented operational theory should not be viewed as an instance of an operational theory 7 .In an operational theory, it is not merely that contexts are permitted, rather they are required in the definition of the procedures.The primitives in an operational theory are laboratory procedures, and these necessarily involve a complete description of the context of a procedure.For example, "prepare the maximally mixed state" specifies a preparation when viewing quantum theory as a quotiented operational theory, but not when viewing it as an operational theory.In the latter case, one must additionally specify how this mixture is achieved, e.g., which ensemble of pure states was prepared or which entangled state was followed by tracing of a subsystem.

Tomographic locality of a GPT
Tomographic locality is a common assumption in the GPT literature-indeed, in some early work on GPTs it was considered such a basic principle that it was taken as part of the framework itself [30].Intuitively, it states that processes can be characterized by local state preparations and local measurements.In this section, we will show that all tomographically local GPTs can be represented as subtheories Op ⊂ FVect R , using arguments similar to those in, e.g., the duotensor formalism in Ref. [31].
A GPT is said to satisfy tomographic locality if one can determine the identity of any process by local operations on each of its inputs and outputs: One can immediately verify that an operational theory satisfies Eq. ( 26) if and only if the GPT obtained by quotienting relative to operational equivalences is tomographically local.
There are many equivalent characterizations of tomographic locality for a GPT.The most useful for us is the following condition, first introduced in Sections 6.8 and 9.3 of [31], which allows us to show that tomographically local GPTs can be represented as subtheories of FVect R .
Consider an arbitrary set of linearly independent and spanning states { P A j } j=1,2,...,m A on system A and an arbitrary set of linearly independent and spanning effects { E B i } i=1,2,...,m B on system B. (If the systems are composite, these should moreover be chosen as product states and product effects respectively.)Define the 'transition matrix' in this basis for a process T from system A to system B as Lemma 2.6.A GPT is tomographically local if and only if one can decompose the identity process, denoted 1 A , for every system A as where M 1 A is the matrix inverse of the transition matrix N 1 A of the identity process, that is, This was originally shown in [31,Sec. 6.8], and we reprove this in our notation in Appendix E.3.
The vector space spanned by the set { P A j } j=1,2,...,m A of states is R m A , and the vector space spanned by the set , which generically implies that M 1 A is not the identity matrix (nor is it equal to N 1 A ), counter to intuitions one might have from working with orthonormal bases.The following corollary then makes explicit some extra structure which was implicit in the vector representation R T of the previous section.In particular, it shows that the vector space R m A→B of transformations from A to B is isomorphic to the vector space of linear maps from , where a process T is represented as a vector R T in the former and a matrix M T in the latter.

Corollary 2.7. A GPT is tomographically local if and only if one can decompose any process T as
where Proof.To prove that a GPT is tomographically local if one can decompose any process as in Eq. ( 53), simply note that for the special case of T = 1 A , Eq. ( 53) implies Eq. ( 51), and hence, by Lemma 2.6, implies tomographic locality.
To prove the converse, we assume tomographic locality and apply Lemma 2.6 to decompose the input and output systems of an arbitrary process T as which can be rewritten as: at which point we can simply identify Given the vector representation of two equivalence classes, R T and R T ′ , we showed how to compute the representation of either the sequential or parallel composition of these (via □ and ⊠, respectively).However, if we represent equivalence classes by matrices M T instead, then how must we represent the parallel and sequential composition of processes?It turns out that parallel composition is represented by where the ⊗ on the left represents the parallel composition of equivalence classes, while on the right it represents the tensor product of the two matrices.Meanwhile, the sequential composition of this matrix representation is given by where on the left-hand side • represents the sequential composition of the equivalence classes, while on the righthand side it represents matrix multiplication.These two facts are proven in Appendix E. 4. The fact that Eq. ( 58) is not simply the sequential composition rule for FVect R , namely the matrix product of M T ′ and M T , implies that this matrix representation is not a subtheory of FVect R , nor even some other diagrampreserving representation of the GPT.This form of composition has, however, appeared numerous times in the literature, for example in Refs.[3,31,37,49].There is, moreover, a well known trick to turn this representation into a diagram-preserving representation in FVect R : one simply defines a new matrix representation by replacing It is then easy to verify that these do indeed compose using the standard composition rules (tensor products for parallel composition and matrix multiplication for sequential composition), and to verify that the identity process is represented by the identity matrix.
Putting all of this together we arrive at the following.
Theorem 2.8.Any tomographically local GPT has a diagram-preserving representation in FVect R given by the map on systems and the map on processes, where for some basis of states { P A j } and effects { E B k }, and where This result is implicit in the work of Refs.[31,34] and more explicit in the quantum case of [37].
Effectively this means that we can view any tomographically local GPT simply as a suitably defined subtheory Op ⊂ FVect R .For the remainder of this paper we restrict our attention to tomographically local GPTs, and we will moreover abuse notation and simply denote the linear maps in this representation by T rather than by M T •N 1 A , and similarly, the vector spaces as A rather than by R m A .That is, we will neglect to make the distinction between the quotiented operational theory and its representation as a subtheory of FVect R , as preserving the distinction is unwieldy and typically unhelpful.Quantum theory is an example of an operational theory, and it is well known that the GPT representation of quantum theory is tomographically local.The latter is a subtheory of FVect R , as B(H) is a real vector space and completely positive trace non-increasing maps are just a particular class of linear maps between these vector spaces.Classical theory, the Spekkens toy model [50], and the stabilizer subtheory [51] for arbitrary dimensions are also tomographically local.Examples of GPTs which are not tomographically local are real quantum theory [52] and the real stabilizer subtheory.

Representations of operational theories and GPTs
One often wishes to find alternative representations of an operational theory or a GPT, e.g., as an ontological model or a quasiprobabilistic model (to be defined shortly).A key motivation for studying ontological models is the attempt to find an explanation for the statistics in terms of some underlying properties of the rele-vant systems, especially if this explanation can be said to be classical in some well-motivated sense.In this section, we introduce the definition of ontological models and quasiprobabilistic models, and in the next section we discuss under what conditions one can say that such representations provide a classical explanation of the operational theory or GPT which they describe.

Ontological models
An ontological model is a map associating to every system S a set Λ S of ontic states, and associating to every process a stochastic map from the ontic state space associated to the input systems to the ontic state space associated with the output systems.
It is important to distinguish between ontological models of operational theories and ontological models of GPTs, as was shown in Ref. [4].In particular, the former allows for context-dependence while the latter does not.See App.A for a detailed discussion of this point.Definition 2.9 (Ontological models of operational theories).An ontological model [53] of an operational theory Op is a diagram-preserving map ξ : Op → SubStoch , depicted as ξ :: from the operational theory to the process theory SubStoch, where the map satisfies three properties: 1.It represents all deterministic effects in the operational theory appropriately:

It reproduces the operational predictions of the operational theory (i.e., is empirically adequate). That is, for all closed diagrams:
Pr(E,P ).

It preserves the convex and coarse-graining relations between operational procedures. E.g., if
T 1 is a procedure that is a mixture of T 2 and T 3 with weights ω and 1−ω, respectively, then it must hold that This diagrammatic definition of an ontological model reproduces the usual notions [1] of ontological representations of preparation procedures and of operational effects.In particular, an operational preparation procedure is an operational process with a trivial input, and by diagram preservation of ξ, this is mapped to a process in SubStoch with a trivial input, that is, to a probability distribution over the ontic states: Similarly, an operational effect is an operational process with a trivial output, and by diagram preservation of ξ is mapped to a substochastic process with a trivial output, that is, to a response function over the ontic states:

Definition 2.10 (Ontological models of GPTs
).An ontological model ξ of a GPT Op is a diagram-preserving map ξ : Op → SubStoch , depicted as ξ :: from the GPT to the process theory SubStoch, where the map satisfies three properties: 1.It represents the deterministic effect for each system appropriately:

It preserves the convex and coarse-graining relations between operational procedures. E.g., if
then it must hold that In analogy with the discussion above, one has that normalized GPT states on some system are represented in an ontological model by probability distributions over the ontic state space associated with that system, while GPT effects are represented by response functions.
The state spaces in SubStoch form simplices, and so we will sometimes refer to an ontological model of a GPT as a simplex embedding.This terminology is a natural extension of the definition of simplex embedding in [4].

Quasiprobabilistic models
We now introduce quasiprobabilistic models of a GPT.One could analogously define quasiprobabilistic models of an operational theory (as diagram-preserving maps from Op to QuasiSubStoch).However, given that the expressive freedom offered by the possibility of contextdependence is sufficient to ensure that every operational theory admits of an ontological model, and hence a positive quasiprobabilistic model, there is no need to make use of the additional expressive freedom offered by allowing negative quasiprobabilities, and hence no motivation to introduce such models.On the other hand, in the case of GPTs, there does not always exist an ontological model, hence quasiprobabilistic models are a useful conceptual and mathematical tool for assessing the classicality of a GPT.

Definition 2.11 (Quasiprobabilistic models of GPTs).
A quasiprobabilistic model of a GPT Op, is a diagrampreserving map ξ : Op → QuasiSubStoch , depicted as ξ :: where the map satisfies three properties: 1.It represents the deterministic effect for each system appropriately: 2. It reproduces the operational predictions of the GPT (i.e., is empirically adequate), so that for all closed diagrams, Pr( E, P ).

It preserves the convex and coarse-graining relations between operational procedures. E.g., if
then it must hold that One can see that the only technical distinction between an ontological model of a GPT and a quasiprobabilistic model of a GPT is that in the latter, the probabilities are replaced by quasiprobabilities, which are allowed to go negative.
In analogy with the discussion at the end of Section 2.3.1, one has that GPT states on some system are represented in a quasiprobabilistic model by quasidistributions over the sample space associated with that system, that is, functions on Λ normalised to 1 but where the values can be negative, while GPT effects are represented by arbitrary real-valued functions over the sample space.

Three equivalent notions of classicality
The only ontological models that constitute good classical explanations are those that satisfy additional assumptions.One such principle is that of (generalized) noncontextuality [1].It was argued in Refs.[1,2,4,54] that an ontological model of an operational theory should be deemed a good classical explanation only if it is noncontextual.We now provide the definition of a noncontextual ontological model in the framework we have introduced here.

Definition 3.1 (A noncontextual ontological model of an operational theory
).An ontological model of an operational theory ξ nc : Op → SubStoch satisfies the principle of generalized noncontextuality if and only if every two operationally equivalent procedures in the operational theory are mapped to the same substochastic map in the ontological model.That is, if Another way of stating this condition is that the map ξ nc does not depend functionally on the context of any processes in the operational theory, so that for all T := ( T ,c T ) one has ξ nc (T ) = ξ nc ( T ).
Ontological models of GPTs (as we have defined them) cannot be said to be either generalized-contextual or generalized-noncontextual (in contrast to ontological models of operational theories, which can).This is because the domain of our notion of an ontological model of a GPT has no notion of a context on which the ontological representation could conceivably depend.(This was first pointed out in Ref. [4], and we explain it further in Appendix A.) However, Ref. [4] showed (in the context of prepare-and-measure scenarios) that the principle of noncontextuality nonetheless induces a notion of classicality within the framework of GPTs: namely, the GPT is said to have a classical explanation if and only if it admits of an ontological model.(Not all GPTs admit of an ontological model, even if the operational theory from which they are obtained as a quotiented theory do.This is a consequence of the representational inflexibility resulting from the lack of contexts on which the representation might depend. 10) We now extend this result (Theorem 1 of Ref. [4]) from prepare-and-measure scenarios to arbitrary scenarios.

Proposition 3.2.
There is a one-to-one correspondence between noncontextual ontological models of an operational theory, ξ nc : Op → SubStoch , and ontological models of the associated GPT, ξ : Op → SubStoch .
Proof sketch.The idea of the proof is captured by the 10 Accordingly, the Beltrametti-Bujaski model [55] can be viewed as an ontological model of the single qubit subtheory qua operational theory, but not as an ontological model of the single qubit subtheory qua GPT.This can be seen by noting that this model is explicitly contextual while the single qubit subtheory qua GPT has no contexts.Equivalently, it can be seen by noting that the single qubit subtheory qua GPT does not admit of any ontological model.The same can be said of the 8-state model of Ref. [56] relative to the stabilizer qubit subtheory: it is an ontological model of the stabilizer qubit subtheory qua operational theory (a contextual ontological model) but not of the stabilizer qubit subtheory qua GPT.The latter has no contexts and does not admit of any ontological model.
where C is defined as a map, which is not diagrampreserving (hence the dashed arrow), and which takes any process T in the GPT Op to some process T = ( T ,c T ) in the operational theory.There always exists at least one such map C (in general, there exist many), and all of these satisfy ∼•C = Id (in general, no choice of C will satisfy C •∼ = Id).Now, consider an operational theory Op and the GPT Op it defines.
Given an ontological model ξ of Op, one can define a noncontextual model ξ nc of Op via ξ nc := ξ • ∼.The map constructed in this manner cannot depend on the contexts of processes in the operational theory, since these are removed by the quotienting map ∼.As such, the map ξ nc necessarily satisfies Eq. ( 68), and hence is indeed noncontextual.
Given a noncontextual ontological model ξ nc of Op, one can define an ontological model ξ of Op via ξ := ξ nc • C. Because the map ξ nc does not depend on the context, the map constructed in this manner does not depend on the choice of C, and is unique.
For completeness, we prove in Appendix C that ξ nc := ξ • ∼ indeed satisfies the relevant constraints to be an ontological model of an operational theory, and similarly, that ξ := ξ nc •C satisfies the relevant constraints to be a valid ontological model of a GPT.
Finally, we note that this notion of classicality of a GPT is closely linked to the positivity of quasiprobabilistic models.This result can be seen as an extension of the equivalence in Ref. [2] from the prepare-and-measure scenario to arbitrary compositional scenarios.

Definition 3.3 (Positive quasiprobabilistic model of a tomographically local GPT). A positive quasiprobabilistic model of a tomographically local GPT
Op is a quasiprobabilistic model ξ+ : Op → QuasiSubStoch in which all of the matrix elements of the quasisubstochastic maps in the image of ξ+ are positive, that is, if and only if all of the quasisubstochastic maps in the image of ξ+ are substochastic.
Simply by examining the definitions, it is clear that a positive quasiprobabilistic model ξ+ : Op → QuasiSubStoch of a GPT is equivalent to an ontological model ξ : Op → SubStoch of that GPT.It follows that:

Proposition 3.4. There exists a positive quasiprobabilistic model of a GPT Op if and only if there exists an ontological model of Op.
Although Proposition 3.4 follows immediately from the relevant definitions, we have nonetheless highlighted it here.This is because a generic quasiprobabilistic model of a GPT has no meaningful conceptual relationship to an ontological model of a GPT, and so it is conceptually important to understand in what special cases the two notions coincide.Furthermore, we hope that highlighting this fact will encourage more dialogue between those researchers studying quasiprobabilistic models and those studying ontological models.

Structure theorem
With this framework in place, we can prove our main results.We start with a general theorem, leveraging the fact that Op ⊂ FVect R , as stated in Theorem 2.8.We then specialize to the various physically relevant cases.
where for each system A, χ A : A → V A is a invertible linear map within FVect R .Moreover, the χ A are uniquely determined by Eq. (69).
Note that we have colored the linear maps χ A to make it immediately apparent that they came from the associated diagram-preserving map.
The proof consists of three main arguments, provided explicitly in Appendix B and sketched here.
First, we leverage tomographic locality of the GPT, as well as convex-linearity and diagram preservation of the map, to prove that one can represent the action of the map on a generic process in terms of its action on states and effects Second, using convex-linearity of the map, we prove that one can represent the action of M on states and effects simply as some linear maps within FVect R ; that is, which relies on the isomorphism between vectors (or covectors) in V and linear maps from R to V (resp.V to R).Note that χ B and ϕ A are uniquely fixed by Eq. (71), which means that there can be no other choice made for the χ A appearing in Eq. (69).
Next, we leverage empirical adequacy, that for all P and E, together with the fact that they span the vector space and dual, to show that that is, that ϕ A is the left inverse of χ A .
Finally we consider the representation of the identity as, which is a consequence of diagram preservation, to prove that, which means that ϕ A is also the right inverse of χ A , and hence that it is unique such that we can write ϕ A = χ −1 A .This shows that the only freedom in the representation is in representation of the states, via the choice of linear maps χ S , of the theory; after specifying these, one can uniquely extend to the representation of arbitrary processes.It also shows that the representation M is necessarily invertible as we can always define the inverse of M by using the inverses of the χ's.
One key consequence of this result is the following corollary, whose significance we investigate in Section 4.3.

Corollary 4.2. The dimension of the codomain, V A of the map χ A is given by the dimension of the GPT vector space A.
Proof.The linear map χ A is invertible, so the dimension of its domain and of its codomain must be equal, and its domain is the GPT vector space.
Note that the proof of the structure theorem and this subsequent corollary do not require the full generality of diagram preservation, only the (mathematically) much weaker conditions that: , and (76) We will give justifications of these (for the case of ontological models and quasiprobabilistic models) in Sec. 5, and will discuss further consequences of general diagram preservation in Sec.4.4.

Diagram-preserving quasiprobabilistic models are exact frame representations
As mentioned in the introduction, SubStoch and QuasiSubStoch are subprocess theories of FVect R , SubStoch ⊂ QuasiSubStoch ⊂ FVect R .This implies that our main theorem applies to these special cases.The fact that the codomain is restricted can then equivalently be expressed as a constraint on the linear maps χ A .In the case of quasiprobabilistic representations we obtain:

Proposition 4.3. Any diagram-preserving quasiprobabilistic model of a tomographically local GPT can be written as
for invertible linear maps {χ S : S → R Λ S } within FVect R for each system, where these satisfy χS = . (78) Proof.Since ξ satisfies the requirements of Theorem 4.1 we immediately obtain Eq. (77).For the particular case of the deterministic effect, we have that Recall that, by definition, a quasiprobabilistic model satisfies Eq. ( 65): Combining these gives Composing both sides of this with χ S gives Eq. (78).
The extra constraint of Eq. ( 78) is not part of the general structure theorem because an abstract vector space does not have a natural notion of discarding.Such a privileged notion is found within, for example, SubStoch, as the all ones covector, which represents marginalization. 11ince the χ S are just invertible linear maps, this map can be seen as merely transforming from one representation of the GPT to another.Critically, however, one must note that the vector spaces in QuasiSubStoch are all of the form R Λ , and so they come equipped with extra structure-namely, a preferred basis and dual basis.
Hence, these representations are effectively singling out this preferred basis for the GPT.
To see this more explicitly, denote the preferred basis and cobasis for R Λ as ( This means that, for any system A in the GPT, we can write χ A as: whereby condition (78) becomes Similarly, we could run the same argument using χ −1 where (by Eq. ( 81)) ∀λ, Moreover, this decomposition, together with reversibility and hence A .We can then represent the action of ξ as: which can be viewed as a quasistochastic map defined by the conditional quasiprobability distribution Finally, we note that Proposition 4.3 also implies that any quasiprobability representation constructed using an overcomplete frame necessarily fail to be diagrampreserving.

Diagram-preserving quasiprobabilistic models of quantum theory
We now consider the case of quantum theory as a GPT.The basis { F λ } λ∈Λ is a basis for the real vector space of Hermitian operators for the system while the cobasis, { D λ } λ∈Λ is a basis for the space of linear functionals on the vector space of Hermitian operators.The Riesz representation theorem [57] guarantees that every element D λ of the cobasis can be represented via the Hilbert-Schmidt inner product with some Hermitian operator, which we will denote as D * λ , such that for all ρ: The condition in Eq. ( 86) becomes λ D * λ = 1, the condition in Eq. (88) becomes tr( F λ ) = 1, and the condition of Eq. (90) becomes It is clear, therefore, that { F λ } λ and { D * λ } λ constitute a minimal frame and its dual (in the language of, for example, Refs.[3]).Hence, this representation is nothing but an exact frame representation, that is, one which is not overcomplete.That is, a transformation T , represented by a completely positive trace preserving map E T , will be represented as a quasistochastic map defined by the conditional quasiprobability distribution: It is easy to see that any set of spanning and linearly independent vectors summing to identity will define a suitable dual frame { D * λ }, and then the frame { F λ } itself is uniquely defined by Eq. ( 94).(Note in particular that the elements of the frame need not be pairwise orthogonal, nor must those of the dual frame.) It has previously been shown that all quasiprobabilistic models of quantum theory are frame representations [3].What we learn here is that diagram-preserving quasiprobabilistic models are necessarily the simplest possible frame representations, namely those that are not overcomplete.

Structure theorems for ontological models
In the case of ontological models of a GPT, we obtain:

Proposition 4.4. Any diagram-preserving ontological model of a tomographically local GPT can be written as
and where moreover every pair (χ −1 A ,χ B ) defines a positive map from the cone of transformations from A to B in the GPT Op to the cone of substochastic maps from Λ A to Λ B in SubStoch.
The proof is given in Appendix D. Apart from positivity, the proof follows immediately from Proposition 4.3.
One can interpret this map from the GPT to the ontological model as an explicit embedding into a simplicial GPT as discussed in [4], but generalized to the case in which both the GPT under consideration and the simplicial GPT have arbitrary processes, not just states and effects.This follows from the positivity conditions, empirical adequacy, and the preservation of the deterministic effect.
As in the case of quasiprobabilistic models, we can write out χ A as: where { D A λ } λ forms a basis for the vector space of the GPT defined by the operational theory.The positivity condition for χ A discussed above immediately implies that D A λ is a linear functional which is positive on the state cone, and the normalization condition immediately implies that their sum over λ is the deterministic effect.By a similar argument, each F A λ is a vector in the vector space of states which is positive on all GPT effects.
In the case of a GPT which satisfies the no-restriction hypothesis [32] (e.g., quantum theory and the classical probabilistic theory), this means that the D A λ are effects (forming a measurement) and that the F A λ are states.In the quantum case, the notion of positivity that we have expressed here reduces to positivity of the eigenvalues of the Hermitian operators.This provides another immediate proof that quantum theory, as a GPT, does not admit an ontological model-it would require an exact frame and dual frame for the space of Hermitian operators which are all positive, but it is known that such a basis and dual do not exist [3].
We have shown (Prop 3.2) that every noncontextual ontological model of an operational theory is equivalent to an ontological model of the GPT defined by the operational theory.Combining this with proposition 4.4, it immediately follows that: Corollary 4.5.For operational theories whose corresponding GPT satisfies tomographic locality, any noncon-textual ontological model can be written as where { D A λ } λ forms a basis for the vector space of the GPT defined by the operational theory.
Previously, the notion of a noncontextual ontological representation seemed to be a highly flexible concept, but this corollary demonstrates that it in fact has a very rigid structure.Every noncontextual ontological model can be constructed in two steps: i) quotient to the associated GPT, ii) pick a basis for the GPT such that it is manifestly an ontological model.Furthermore, the only freedom in the representation is in representation of the states in the theory (via this choice of basis); after specifying this, one can uniquely extend to the representation of arbitrary processes.

Consequences of the dimension bound
We can specialize Corollary 4.2 to the case of quasiprobabilistic representations or ontological representations of GPTs.In this case, it states that the dimension of the GPT vector space for a given system A, dim(A), is equal to the dimension of the codomain of the map χ A defining the quasiprobabilistic or ontological representation of A, that is, the dimension of R Λ A .In the case of an ontological model, this dimension is simply |Λ A |, the cardinality of the set Λ A of ontic states of A, so the number of ontic states is equal to the dimension of the GPT space.Moreover, by considering Proposition 3.2, this immediately implies that for any operational theory whose corresponding GPT satisfies tomographic locality, if there exists a noncontextual ontological model thereof, then it must also have a number of ontic states equal to the dimension of the GPT state space.In the language of Hardy [39], this exactly means, for each system A, that the "ontological excess baggage factor", must be exactly 1.In other words, demanding noncontex-tuality rules out ontological excess baggage.Since Hardy showed that all ontological models of a qubit must in fact have unbounded excess baggage, his result can immediately be combined with ours to give a new proof that the full statistics of processes on a qubit do not admit a noncontextual model.
In particular, our result implies that a diagrampreserving noncontextual ontological model of a qubit must have exactly 4 ontic states.This result extends to any subtheory of a qubit whose corresponding GPT is tomographically local, e.g. the stabilizer subtheory.Hence, it constitutes a stringent constraint on ontological models of the qubit stabilizer subtheory qua operational theory.For instance, it immediately guarantees that the 8-state model of Ref. [56]-which, as the name suggests, has 8 ontic statesmust be contextual.Indeed, the 8-state model was previously shown to be contextual by a different argument which focused on the representation of transformation procedures in prepare-transform-measure scenarios [20].Furthermore, our bound improves an algorithm first proposed in Ref. [4].In particular, Ref. [4] gave an algorithm for determining if a GPT admits of an ontological model by testing whether or not the GPT embeds in a simplicial GPT of arbitrary dimension.The lack of a bound on this dimension means that there is no guarantee that the algorithm will ever terminate.Ref. [58] solves this problem by providing such a bound, namely, the square of the given GPT's dimension.Our result strengthens this bound, reducing it to the given GPT's dimension.In fact, our bound is tight, as there can never be an embedding of the GPT into a lower dimensional space.These results simplify the algorithm dramatically: rather than testing for embedding in a sequence of simplicial GPTs of increasing dimension, one can simply perform a single test for embedding in a simplicial GPT of the same dimension as the given GPT.
Yet another application of the dimension bound follows from the results of Ref. [59].Ref. [59] demonstrates that the number of classical bits required to specify the ontic state in any (necessarily contextual) ontological model of the qubit stabilizer subtheory is quadratic in the number of qubits.This is contrasted to the case of the qutrit stabilizer subtheory, wherein there exists a (noncontextual) ontological model with linear scaling in the number of qutrits.The quadratic scaling result for the stabilizer qubit subtheory implies that, for a collection of qubits, the number of ontic states is necessarily greater than the dimension of the space of quantum density operators.Together with our dimension bound, this fact is sufficient to deduce the contextuality of the qubit stabilizer subtheory.Moreover, the fact that there exists a noncontextual ontological model for the qutrit stabilizer subtheory [60], together with our dimension bound, is sufficient to deduce the linear scaling in this case.

Diagram preservation implies ontic separability and more
Returning to representations of a GPT by some M : Op → FVect R satisfying the conditions of Theorem 4.1: if we make use of additional instances of diagram preservation beyond the three instances which we used in proving Theorem 4.1, then we can derive additional constraints on the representation.
One important and immediate consequence of diagram preservation for composite systems is that the composite system AB is represented by the tensor product of the representations of the components: That is, the sample space of a composite system is the Cartesian product of the sample spaces of the components.
This constraint has particular significance if we consider the case of ontological models, ξ : Op → SubStoch , as this means that for an ontological model, the ontic state space of a composite system is the Cartesian product of the ontic state spaces of the components.We term the latter condition ontic separability (See Refs.[61,62]).It is a species of reductionism, asserting, in effect, that composite systems have no holistic properties.More precisely, the property ascriptions to composite systems are all and only the property ascriptions to their components.Yet another way of expressing the condition is that the properties of the whole supervene on the properties of the parts.
The assumption of ontic separability for ontological models has been discussed in many prior works [61,62], and has been a substantive assumption in certain arguments.For instance, in Ref. [63], ontic separability was used to demonstrate that in a noncontextual ontological model, all and only projective measurements are represented outcome-deterministically.
It is also worth noting that the assumption of preparation independence in the PBR theorem [64] follows from diagram preservation (e.g.Eq. ( 105)).This connection between PBR and preservation of compositional structure has been previously explored in Sec. 4 of [36], in which they use this connection to derive a categorical version of the PBR theorem.
Moreover, by considering parallel composition we can also obtain the following: Proposition 4.6.Via diagram preservation, parallel composition implies an additional constraint on the linear maps, χ A , namely, Proof.To begin, let us define Next, applying this to a product state we have Since M is diagram-preserving, we have Recalling the definition of χ S again, we conclude that In a tomographically local GPT, the product states span the entire state space, and so this implies Eq. (100).
Now, if we consider the case of quasiprobabilistic representations, ξ : Op → QuasiSubStoch , we obtain that: Given Eq. ( 84), this is equivalent to That is, diagram preservation implies that the frame representation must factorize across subsystems.In other words, the vector basis defining the frame representation must be a product basis.

Converses to structure theorems
In the above section we showed how all diagram-preserving quasiprobabilistic and ontological representations must have a particularly simple form given by a collection of invertible linear maps {χ A } satisfying certain constraints.
We now prove what is essentially the converse to each of these results.Consider defining a map from Op to FVect R by Under what conditions on the set {χ A } is this map a quasiprobabilistic or ontological representation?To ensure that Eq. ( 108) defines a diagram-preserving map one must simply impose that: This condition, together with invertibility and linearity of the χ A , easily implies that diagram preservation and indeed all the assumptions of Theorem 4.1 are satisfied.
To ensure that Eq. (108) defines a linear representation that is moreover a quasiprobabilistic representation, as in Def.2.11, one must impose that V A = R Λ A and that which implies that the conditions in Def.2.11 are satisfied.Finally, to ensure that Eq. ( 108) defines a quasiprobabilistic representation that is moreover an ontological representation, as in Def.2.10, one must introduce a positivity constraint for this map.Specifically, one must have that defines a positive map from the cone of transformations from A to B in Op to the cone of stochastic maps from This provides a simple recipe for constructing linear representations: one simply needs to choose a family of invertible linear maps for each fundamental system (i.e., one that cannot be further decomposed into subsystems), and then define the others as the tensor product of these (so that general χ A factorise over subsystems).Similarly, it provides a simple recipe for constructing quasiprobabilistic representations, using the same construction but where the χ A preserve the deterministic effect.In the case of noncontextual ontological models, however, the recipe is less simple: one must not only choose invertible linear maps which factorise over subsystems and preserve the deterministic effect; one must also check the positivity condition (which is nontrivial, since for any particular map χ A , one must check the condition for every χ B ).

Categorical reformulation
There is an elegant categorical reframing of our structure theorem, as suggested to us by one of the referees: Proof.The components of the natural isomorphism are given by the χ A , as these go from A → V A = M (A), and where we are abusing notation by denoting R(A) simply as A. Eq. ( 69) ensures that these do define a natural transformation.To see this, first recall that we are abusing notation by suppressing explicitly notating the canonical representation R, so Eq. ( 69) really tells us that which is what we need for this to be a natural transformation.Clearly it is moreover a natural isomorphism as every χ A is invertible (Eqs.( 73) and ( 75)).Eq. (100) then ensures that this is a monoidal natural isomorphism.To see this, note that the LHS of this equation is not quite the component of the natural isomorphism χ A⊗B , instead we have χ AB := µ −1 •χ A⊗B • µ, and so Eq. ( 100) is equivalent to the condition that χ A⊗B •µ = µ•(χ A ⊗χ B ), which is exactly what we need for this to be a monoidal natural isomorphism.Uniqueness of this natural isomorphism follows from the fact that the components χ A are uniquely determined, as noted in Theorem 4.1.
This means that ontological models, should they exist, are essentially unique, in that they are unique up to a unique natural isomorphism.There is, however, an important subtlety going on here.This natural isomorphism is given by viewing ontological models as living in FVect R , and so the components of the natural isomorphism are just invertible real linear maps.
Alternatively, however, one could demand a stricter notion of isomorphism between ontological models by viewing them as living in SubStoch, in which case the components of a natural isomorphism would be invertible stochastic maps, i.e., permutations of the ontic states.This is a much stricter notion of isomorphism, and it is likely that in this sense there are many different ontological models for a given GPT.In certain situations, however, even this stricter notion can be proven, see, for example, Ref. [19].

Revisiting our assumptions
We have derived surprisingly strong constraints on the form of noncontextual ontological models of operational theories, and so it is important to examine the assumptions that went into deriving these constraints.These concerned both the types of operational theories under consideration and the types of ontological representations of these, and were summarized in Section 1.2.The majority of these are ubiquitous and well-motivated.The only notable restriction on the scope of operational theories we consider is the one induced by our assumption of tomographic locality (as discussed further in Section 5.2).Similarly, the only notable restriction on ontological (and quasiprobabilistic) models that is warranting of further discussion is that they are diagram-preserving.

Revisiting diagram preservation
In the case of ontological models, we will provide below a motivation for the instances of diagram preservation that we required for our proofs.Since we have defined quasiprobabilistic models as representations of operational theories wherein the only difference from an ontological model is that the probabilities are allowed to become quasiprobabilities (i.e., drawn from the reals rather than the interval [0,1]), it follows that these same motivations are also applicable to them.It is worth noting that among the quasiprobabilistic representations for continuous variable quantum systems that are most studied in the literature, the Wigner representation satisfies our definition 12 while the Q [65] and P representations [66,67] do not (as they are defined by overcomplete frames).There are also examples of both types among quasiprobabilistic representations of finite-dimensional systems in quantum theory.In particular, Ref. [68] defines a family of discrete Wigner representations, some of which satisfy the assumption of diagram preservation, and some of which do not.
Of special note among those that satisfy the assumption is Gross' discrete Wigner representation [69], which is the unique representation in this family that satisfies a natural covariance property.Ref. [19] further shows that this is the unique noncontextual ontological model for stabilizer subtheories in odd dimensions.
Although we endorse diagram preservation in its most general form, it is worth noting that our main results (given in Section 4) require only the following very specific instances of that assumption: (i) diagram preservation for prepare-measure scenarios, ξ :: and diagram preservation for measure-and-reprepare processes ξ :: (ii) diagram preservation for the identity process ξ :: These are easily justified.Eq. ( 112) captures the idea that the ontic state is the complete causal mediary between the preparation and the effect.This assumption is built into the very definition of the standard ontological models framework (implicitly in early work [1,62] and explicitly in later work [70,71]), and is assumed in virtually every past work on ontological models.
Eq. ( 113) is a similarly natural assumption.The natural view of the process E P (115) is that one has observed effect E and then one has independently implemented the preparation P .There need not be any system acting as a causal mediary between E and P .The natural ontological representation, therefore, is one wherein there is no ontic state mediating the two processes, as depicted in Eq. ( 113).Although we are not aware of this assumption having been made in previous works, it is directly analogous to the preparation-independence assumption made in Ref. [64] (which involved two independent states, rather than an independent effect and state).
Eq. ( 114) can be justified by noting that within the equivalence class of procedures associated to the identity operation in the GPT, there is the one which corresponds to waiting for a vanishing amount of time.In any reasonable physical theory, no evolution is possible in vanishing time, and hence the only valid ontological representation of such an equivalence class of procedures is the identity map on the ontic state space.
Because we consider the full assumption of diagram preservation to be a natural generalization of all of these specific assumptions, we have endorsed it in our definitions.See Appendix B of Ref. [35] for a more thorough defense of this full assumption.

Necessity of tomographic locality
The assumption of tomographic locality is common in the GPT literature, so we will not attempt a defense of it here.Nevertheless, it is natural to ask if the assumption is actually necessary to obtain our structure theorems.Here we provide an example which shows that it is.The operational theory we consider in our example is the real-amplitude version of the qutrit stabilizer subtheory of quantum theory 13 .In this subtheory, two systems are described by 45 parameters, whereas only 6 2 = 36 parameters are available from local measurements, which immediately implies that the theory is not tomographically local (just as the real-amplitude version of the full quantum theory fails to be tomographically local [52]).
To begin with, consider the standard (complexamplitude) qutrit stabilizer subtheory.Gross's discrete Wigner function [69] provides a (diagram-preserving) quasiprobability representation for qutrits for which the stabilizer subtheory is positively represented.By Corollary 3.5, this corresponds to a noncontextual ontological model of the subtheory.Indeed, this ontological model has been examined in Ref. [60], where it is shown that it can be reconstructed from an "epistemic restriction".Since the standard qutrit stabilizer subtheory is tomographically local, these models obey our structure theorems.In particular, the representation of n qutrits uses 9 n ontic states, matching the dimension of the relevant space of density matrices.Now consider the subtheory consisting of only those qutrit stabilizer procedures that can be represented using real amplitudes.This does not introduce any new operational equivalences, and so the model discussed above is still noncontextual when restricted to this subtheory.But now our structure theorem does not hold, because this model still uses 9 n ontic states even though the density matrices now live in a 1 2 3 n (3 n +1)-dimensional space.Moreover, we can show that this sort of example is rather generic, that is, that there is no hope of obtaining a structure theorem with our dimension bound for any theory that is not tomographically local.
Suppose that we have any ontological representation ξ of some GPT wherein each GPT system of dimension d is represented by an ontic state space of cardinality d.Then the GPT is necessarily tomographically local.
To see this, consider the representation of GPT transformations from this state space to itself.Then, the map ξ is a linear map from the space of transformations to d×d substochastic matrices, which are d 2 dimensional.By empirical adequacy ξ is injective, and so the space of GPT transformations is at most d 2 dimensional.But the effect-prepare channels already span d 2 dimensions, so there cannot be any channels outside this span.Hence, by Corollary 2.7, the theory is tomographically local.

Outlook
These results can be directly applied to the study of contextuality in specific scenarios and theories.For instance, we have already seen that our dimension bound is a useful tool for obtaining novel proofs of contextuality (e.g., via Hardy's ontological excess baggage theorem [39] or for the 8-state model of Ref. [56]), and for providing novel algorithms for deriving noise-robust noncontextuality inequalities (namely, the algorithm in Ref. [4] but informed by our dimension bound).It remains to be seen whether other algorithms for witnessing nonclassicality, such as those in Ref. [71] or Ref [70], could be extended within our framework to more general compositional scenarios.
Our formalism is also ideally suited to understanding the information-theoretic advantages afforded by contextual operational theories, such as for computational speedup, since it has the compositional flexibility to describe arbitrary scenarios, such as families of circuits which arise in the gate-based model of computation.In fact, our structure theorem is a major first step in simplifying the proof that contextuality is a necessary resource for the state-injection model of quantum computing [19,72].Ref. [19] shows that such a proof can proceed by applying our structure theorem to show that the only positive quasiprobabilistic models of the (classicallysimulable) stabilizer subtheory for odd dimensions are given by Gross's discrete Wigner function [69]; then, the known fact that the injected resource states necessarily have negative representation in this particular model establishes the result in a direct and elegant fashion.
The key limitation of our results is the assumption that the GPT associated to the operational theory under consideration is tomographically local.There are two potential approaches to dealing with this limitation.On the one hand, one could provide an argument that theories which are not tomographically local are undesirable in some principled sense.For example, it seems likely that one can rule them out on the grounds that they violate Leibniz's methodological principle [10].From a practical perspective, wherein the goal is to experimentally verify nonclassicality in a theory-independent manner, one would instead be motivated to seek experimental evidence that nature truly satisfies tomographic locality, independent of the validity of quantum theory.One possible approach to this end would be to extend the techniques introduced in [73] to composite systems.

A Context-dependence in representations of GPTs
In the main text, we stated that the notion of an ontological model of a GPT that we have defined cannot be said to be either generalized-contextual or generalized-noncontextual (unlike our notion of an ontological model of an operational theory).We will now elaborate on this point.
Consider the contexts that one may wish to associate to a GPT state.One of the examples which appears in the literature corresponds to different decompositions of the GPT state into mixtures of other GPT states, for example: Now consider any ontological representation map which has the GPT as its domain.In the GPT, all three terms in Eq. ( 116) are strictly equal, and hence all three map to the same probability distribution over Λ.As such, there is no possibility for the map to represent an s arising from the LHS mixture differently from how it represents an s arising from the RHS mixture.
A natural question one might consider in light of this is how one should represent ensembles of states ontologically.The ensembles of relevance in the example just given are {(p i ,s i )}, {(q j ,s ′ j )}, and {(1,s)}; all of these are operationally equivalent: but not strictly equal.If one defines a new kind of ontological representation map which acts on such objects, then it could take these distinct ensembles to distinct probability distributions over Λ.One could then meaningfully talk about whether such a representation depended on context or not.However, the notion of ontological representation for a GPT that we have defined herein has as its domain processes within the GPT (such as states), not ensembles of such processes.This is also true for the more general quasistochastic representations of GPTs.As such, applying the notion of generalized contextuality to them is a category mistake, just as it would be a category mistake to ask whether a variable X depends on another variable Y if Y cannot possibly vary [4].Because standard quasiprobability representations (such as Wigner's or Gross's) are instances of our definition (and in particular, because they take the domain of the representation to be states and effects rather than ensembles of states or ensembles of effects), it is equally meaningless to ask whether they are noncontextual or contextual.
Of course, one could define a map which has as its domain the set of ensembles of GPT processes.For such a map, it would be appropriate to ask whether or not the map is noncontextual.This is similar to what is done in the causalinferential framework of Ref. [35], where the central objects of study are ensembles of processes corresponding to an agent's knowledge of what process occurred (although with the difference that in this case we consider ensembles of unquotiented processes).In that context, we formalize the resulting notion of an ontological representation, as well as the natural generalization of the notion of 'noncontextuality' that arises for it.
A similar story holds for the notion of context that is relevant for the study of Kochen-Specker contextuality.Consider two measurements, M 1 and M 2 , which we conceptualize as processes with a GPT input and a classical output.Suppose that these each have a particular outcome, labeled a and b respectively, which correspond to the same GPT effect: The fact that the effect associated to getting outcome a in measurement M 1 is strictly equal to the effect associated to getting outcome b in measurement M 2 implies that any map which has the GPT effect space as its domain must represent the two cases identically.Again, one finds that there is no possibility for a representation map with this specific choice of domain to depend on whether or not the effect was realized using measurement M 1 or M 2 .
But, also as above, one could choose to consider a different kind of ontological representation map in which the domain is no longer the set of GPT processes per se, but something else, which includes, for instance, measurement-outcome pairs.In this particular case, we are interested in pairs (M 1 ,a) and (M 2 ,b) which are operationally equivalent, but not strictly equal.If one defines a new kind of ontological representation map which acts on such objects, then it could take distinct such objects to distinct response functions.One could then meaningfully talk about whether such a representation depended on context or not.This is what is typically done (if only implicitly) in the study of Kochen-Specker noncontextuality.
B Proof of the structure theorem (Theorem 4.1) We now complete the proof of Theorem 4.1, as sketched in the main text.
Proof.Since we are assuming tomographic locality of the GPT, Corollary 2.7 immediately gives Since M is convex-linear and preserves the zero processes 14 , and since the effect-state channels span the vector space, M can be uniquely extended to a linear map M .Hence, Now, using the linearity of M , we have Noting that in this diagram, M is only applied to objects in the domain of M , on which the two maps act identically (by the fact that the former is the linear extension of the latter), one has 14 This follows from the fact that one can construct the zero process by composing a state and an effect with the zero scalar as 0 B A = P •0• E.Then, by empirical adequacy of M , one has M (0) = 0, and so diagram-preservation of M then gives M (0 Accepted in Q uant u m 2024-03-06, click title to verify.Published under CC-BY 4.0.
where the last step follows from the fact that M is diagram-preserving.In summary, we have shown that as claimed in Eq. ( 70).
Next, we analyse M in the specific case of a state P i : Since the DP map M has a unique linear extension which takes the vector space of GPT states B to the vector space V B , and since both of these are in FVect R , one can uniquely re-interpret the action of M as a process χ within FVect R : In particular, we are using the fact that a linear map L : L(R,V ) → L(R,V ′ ) can always be uniquely represented by a linear map l : V → V ′ by exploiting the fact that L(R,V ) ∼ = V .The fact that χ B is the unique linear map satisfying Eq. ( 126), means that there is no possibility for making other choices for the χ A appearing in Eq. ( 69).
Similarly, M on effects E j has a unique linear extension and takes functionals on GPT states to functionals on V A ; in other words, M is the adjoint of a process ϕ within FVect R : In particular, we are using the fact that a linear map L : L(V,R) → L(V ′ ,R) can always be uniquely represented by a linear map l : V ′ → V by exploiting the fact that L(V,R) ∼ = V * and that L(V * ,V ′ * ) ∼ = L(V ′ ,V ).Combining this with Eq. (124), we have All that remains is to show that χ A and ϕ A are inverses.Consider the special case that T is the identity, then Eq. (128) becomes Since M is diagram-preserving, it maps identity to identity, and so this becomes Now consider a state P followed by an effect E. This gives a probability, and since M is empirically adequate it must preserve this probability: and since M is diagram-preserving, Combining this with Eqs. ( 126) and ( 127) gives Since this holds for all E and P , tomographic locality implies that the E span A * and the P span A, and we have that Combining this with Eq. (130) gives that χ and ϕ are inverses of each other.Hence, we can write that ϕ A = χ −1 A and so rewrite Eq. (128) as which completes the proof.
C Completing the proof of Proposition 3.2 The key argument required to establish Prop 3.2 was given just after the proposition itself, but we now complete the proof.We now prove that ξ nc := ξ• ∼ is indeed a valid ontological model of an operational theory if ξ is a valid ontological model of a GPT.To do so, we show that each of the three properties (enumerated in Definition 2.9) that ξ nc should satisfy is implied by the corresponding property (enumerated in Definition 2.10) that ξ is assumed to satisfy by virtue of being an ontological model of a GPT.
First, recall that we assumed that all deterministic effects in the operational theory are operationally equivalent.Hence, the map ∼ will take any such deterministic effect to the unique deterministic effect in the GPT, which (by property 1 of Definition 2.10) must be represented by the unit vector 1.Hence, ξ nc represents all deterministic effects in the operational theory appropriately, namely as the unit vector 1.
Second, recall that ∼ preserves the operational predictions of the operational theory; hence, the fact that (by property 2 of Definition 2.10) ξ preserves the operational predictions of the GPT implies that ξ nc := ξ• ∼ preserves the operational predictions of the operational theory.
Third, recall that if, in the operational theory, P 1 is a procedure that is a mixture of P 2 and P 3 with weights ω and 1−ω, then it follows that under ∼, one has Hence, the fact that (by property 3 of Definition 2.10) the representations of these three processes under ξ satisfy implies that the representations of P 1 , P 2 , and P 3 satisfy Hence ξ nc satisfies all the properties of an ontological model of an operational theory.Conversely, we prove that ξ := ξ nc •C is a valid ontological model of a GPT if ξ nc is a valid noncontextual ontological model of an operational theory.To do so, we show that each of the three properties (enumerated in Definition 2.10) that ξ should satisfy is implied by the corresponding property (enumerated in Definition 2.9) that ξ nc is assumed to satisfy by virtue of being an ontological model of a GPT.
First, consider the unique deterministic effect in the GPT.Applying C to this process yields one of the many deterministic effects in the operational theory.Because (by property 1 of Definition 2.9) ξ nc maps every one of these to the unit vector 1, it follows that ξ := ξ nc •C maps the unique deterministic effect to the unit vector 1.
Second, recall that the context of a process is irrelevant for the operational predictions it makes, and that consequently, the map C preserves the operational predictions.Given that (by property 2 of Definition 2.9) ξ nc preserves the operational predictions, ξ := ξ nc •C also preserves the operational predictions.
Third, consider three processes P 1 , P 2 , and P 3 such that P 1 = ω P 2 +(1−ω) P 3 in the GPT.Under C, one has processes C( P 1 ) = ( P 1 ,c 1 ), C( P 2 ) = ( P 1 ,c 2 ), and C( P 3 ) = ( P 1 ,c 3 ) in the operational theory, where c i are arbitrary contexts specified by the map C. The fact that P 1 = ω P 2 +(1−ω) P 3 implies that C( P 1 ) is operationally equivalent to the effective procedure P mix defined as the mixture of C( P 2 ) and C( P 3 ) with weights ω and 1−ω, respectively.(C( P 1 ) may not actually be this mixture, depending on its context c i , which depends on one's choice of C.) By property 3 of Definition 2.9, ξ nc must satisfy But since ξ nc is a noncontextual model and since C( P 1 ) is operationally equivalent to P mix , it follows that Hence we see that ξ := ξ nc •C satisfies property 3 of Definition 2.10, as required.
If, rather than considering transformations from one GPT system to another, we consider just the states of a single system A then everything simplifies considerably.The vector space we consider in the domain is simply the vector space spanned by the GPT state space, and the positive cone is then just the standard cone of GPT states.The vector space we consider in the codomain is simply the vector space R Λ A with positive cone given by the cone of unnormalised probability distributions.Moreover, the linearly extended action of ξ is nothing but the linear map χ A so we find that χ A must be a positive map in the sense defined above.
Similarly, if we consider the contravariant action of χ −1 A on the space of GPT effects (that is, by composing the effect onto the outgoing wire of χ −1 A ) then we arrive at a similar result.Here we find that the contravariant action of χ −1 A is a positive linear map from the dual of the GPT vector space ordered by the effect cone to the dual of R Λ A ordered by the cone of response functions.

E Proofs for preliminaries E.1 Proof that quotienting is diagram-preserving
In order to see that the quotienting map is diagram-preserving, we must first define what it means for processes in the quotiented theory to be composed.That is, given a suitable pair of equivalence class T and R, we must define R• T (assuming that the relevant type matching constraint is satisfied) and R⊗ T .We define these via composition of some choice of representative elements, r ∈ R and t ∈ T , for each equivalence class, as We now prove that the first of these four conditions (namely, the first equivalence in Eq. ( 153)) holds: where in the second step we are noting that is an example of a tester τ for any τ ′ and r.The argument for the other three conditions is analogous.
This establishes that the notion of composition that we have defined is independent of the choice of representative elements, and so we can simply write We now prove Lemma 2.5, restated here: Lemma E.1.The operation □ can be uniquely extended to a bilinear map and the operation ⊠ can be uniquely extended to a bilinear map Proof.Here we show the proof for □.The proof for ⊠ follows similarly.
To begin, note that the vectors R T with T : A → B span the vector space R m A→B , as we have taken F A→B to be a minimal fiducial set of testers.Consequently, we can always (nonuniquely) write an arbitrary U ∈ R m B→C as i u i R Ti for some transformations T i : B → C, u i ∈ R, and can write an arbitrary V ∈ R m A→B as j v j R T ′ j for some transformations T ′ j : A → B, v j ∈ R. Hence, we propose the linear extension be defined by U□V := ij u i v j (R Ti □R T ′ j ) = ij u i v j R Ti• T ′ j .For this to be a valid definition, however, it must be the case that this is independent of the chosen decomposition of U and V.We now show that this is indeed the case.
To begin, let us consider two distinct decompositions in the second argument of □.That is, given we want to show that for all T .
To begin, we use linearity of E A→B (as defined in Section 2.2.1) to give us that Unpacking the definition of K gives us that for all testers τ , Now, define for some arbitrary τ and transformation T .Substituting tester τ ′ into Eq.(169), we find that As this holds for all τ , and so, in particular, for our fiducial testers, we therefore have that Finally, using the fact that R T • Ti = R T □R Ti and similarly that R T • T ′ j = R T □R T ′ j we find which is our desired result.One can similarly show linearity in the first argument, namely, Putting these together, we obtain full bilinearity of □, namely as required.

E.3 Proof of Lemma 2.6
We now prove Lemma 2.6, restated here:

Lemma E.2. A GPT is tomographically local if and only if one can decompose the identity process for every system A, denoted 1 A , as
where M 1 A is the matrix inverse of the transition matrix N 1 A of the identity process, that is, Proof.First, we prove that if a GPT satisfies tomographic locality, then the identity has a decomposition of the form in Eq. ( 176).We do this by defining a particular process f as a linear expansion into states and effects with the carefully chosen set of coefficients [M 1 A ] j i , and then we prove that f = 1 A .
Take any minimal spanning set { P A i } i of GPT states and spanning set { E A j } j of GPT effects, and consider the transition matrix N 1 A with entries given by [N 1 A ] j i := Next, define M −1 The matrix inverse of N 1 A exists, since the rows of N 1 A are linearly independent.That is, we show that i a i [N 1 A ] j i = 0 if and only if a i = 0 for all i.First, note that we have but as the E A j span the space of effects and composition is bilinear (see Lemma 2.5), this means that i a i P A i = 0; then, since the P A i are linearly independent, we have a i = 0 for all i, as desired.Next we use this inverse to define a process f as A priori, there is no reason why this process must be a physical GPT process; however, it turns out to be the identity process, as we will now show.Consider the expression for some P A k and some E A l from the minimal spanning sets above.Substituting the expansion of f followed by applying the definition of M 1 A and N 1 A , one has But now it follows from Eq. (179 Hence, it holds that for all P A k and E A l in the minimal spanning sets above.By the fact that these sets span the state and effect spaces respectively, it follows that for all P and E. Now, in any GPT which satisfies tomographic locality, namely Eq. ( 49), two channels which give the same statistics on all local inputs and outputs are equal, and hence f is in fact the identity transformation.Hence, the identity transformation has a linear expansion, of the form given by Eq. ( 176), namely Next, we prove the converse: if the identity has a linear expansion as in Eq. (176) in a given GPT, then that GPT satisfies tomographic locality.To see this, consider two bipartite processes T and T ′ which give rise to the same statistics on all local inputs, so that For any tester τ of the appropriate type, one can write the probability generated by composing with T as simply by inserting the linear expansion of the identity on each system.Similarly, one can write Noting that the RHS of Eq. ( 188) splits into two disconnected diagrams, and the same holds for the RHS of Eq. ( 189), it follows from Eq. (187) that Since this is true for any two processes satisfying Eq. ( 187), the principle of tomographic locality (Eq.( 49)) is satisfied.

E.4 Proof of Eqs. (57) and (58)
To prove Eq. ( 57), one can decompose the four identities in the diagram and perform some simple manipulations of the resulting expression.
To prove Eq. ( 58), one can insert four decompositions of the identity into the following diagram:

Corollary 3 . 5 (
Three equivalent notions of classicality).Let Op be an operational theory and Op the GPT obtained from Op by quotienting.Then, the following are equivalent: (i) There exists a noncontextual ontological model of Op, ξ nc : Op → SubStoch .(ii) There exists an ontological model (a.k.a.simplex embedding) of Op, ξ : Op → SubStoch .(iii) There exists a positive quasiprobabilistic model of Op, ξ+ : Op → QuasiSubStoch .This generalizes the results of Refs.[2,4,5] from prepare-measure scenarios to arbitrary compositional scenarios.

r 1 •t 1 = r 2 •t 2 and r 1 ⊗t 1 = r 2 ⊗t 2
be well defined, it must be independent of the choices of representatives, i.e. for any t 1 ,t 2 ∈ T and r 1 ,r 2 ∈ R, one has the case, then the quotienting map is a structure-preserving equivalence relation, or congruence relation, for the process theory.It is straightforward to show that the first equality in Eq. (152) is equivalent to the conditions nontrivial direction of this equivalence, consider the special case of these where r = r 1 (in the first) and where t = t 2 (in the second); then, one has second equality in Eq. (152) is equivalent to the conditions