Correlations constrained by composite measurements

How to understand the set of correlations admissible in nature is one outstanding open problem in the core of the foundations of quantum theory. Here we take a complementary viewpoint to the device-independent approach, and explore the correlations that physical theories may feature when restricted by some particular constraints on their measurements. We show that demanding that a theory exhibits {a composite} measurement imposes a hierarchy of constraints on the structure of its sets of states and effects, which translate to a hierarchy of constraints on the allowed correlations themselves. We moreover focus on the particular case where one demands the existence of a correlated measurement that reads out the parity of local fiducial measurements. By formulating a non-linear Optimisation Problem, and semidefinite relaxations of it, we explore the consequences of the existence of such a parity reading measurement for violations of Bell inequalities. In particular, we show that in certain situations this assumption has surprisingly strong consequences, namely, that Tsirelson's bound can be recovered.


Introduction
Bell nonclassicality is a well-known phenomenon featured by quantum theory, and attests that correlations observed in nature are not always compatible with a classical common cause shared among the distant wings of an experiment [1]. That is, non-classical common causes are necessary to explain our observational data [2]. Bell's theorem not only teaches us a valuable lesson about the foundational aspects of nature, but also underpins a variety of current technological applications. For example, non-classical correlations enable cryptographic applications, such as key distribution [3][4][5][6][7][8] and randomness generation [9][10][11][12], and provide an information-theoretic advantage in other families of so-called non-local games [13][14][15].
Understanding quantum correlations -in particular their limitations -is therefore an important open problem within quantum information theory. Research on these lines has recently been carried out within the device-independent formalism, that is, where the only information used to reason about nature are the classical variables that denote measurement choices and their outcomes, together with the observed outcome statistics. Within this paradigm, quantum correlations are studied "from the outside", by exploring the constraints that physical or information-theoretical principles impose on the observed correlations [16][17][18][19][20][21][22]. In the deviceindependent framework, hence, such proposed constraints are therefore formulated at the level of the correlations themselves.
In this work we take a complementary viewpoint to the problem of characterising quantum correlations, by examining the possible correlations that may arise when constraints are imposed on the underlying physical theory. From this perspective, hence, one asks how various elements of the physical theory constrain or enable particular correlations. The particular objects we are interested in here are the measurements that the theory may feature. Even though in principle one could also impose constraints on the states as well, our curiosity on measurements arises from the results of Ref. [23] -if demanding such parity constraints on measurement outcomes at the level of the statistics yields such substantial constraints on correlations (see below), what can we expect if we open the box and demand conditions on the measurement themselves? On the one hand, it is well known that the theory known colloquially as 'Boxworld' [24], which was formulated in order to realise arbitrary no-signalling correlations, only features local measurements and wirings thereof. That is, Boxworld does not display entangled measurements. This is in contrast to quantum theory, where entangled measurements are ubiquitous -you may for instance think of the so-called Bell measurements. A natural question then arises: is there any relationship between the types of measurements a theory features and the correlations it may produce. Even seemingly simple at first sight, this question is far from trivial: by enlarging the set of allowed measurements one necessarily needs to shrink down the set of allowed states, since the state space featured by a given physical theory is constrained by the dual space of the effects 1 space. How the set of allowed correlations (measurements on states) changes in consequence is therefore not straightforward.
Progress on this question was made in Ref. [23], where it was shown that demanding the existence of a particular entangled effect would constrain the correlations admissible in a bipartite Bell scenario to those realisable by an entangled pair of qubit quantum systems. That is, by demanding that the theory features a particular entangled measurement, it was shown that the allowed correlations in the so-called Clauser-Horne-Shimony-Holt (CHSH) scenario [25] was indeed the set of quantum ones.
In this paper we explore what types of constraints the existence of bipartite effects impose on the possible correlations that a theory may feature. The framework we use to describe the underlying physical theory is that of General Probabilistic Theories (GPTs) [24,[26][27][28][29][30][31][32][33][34] . First, we take a compositional perspective and show how the existence of one (arbitrary) bipartite effect imposes not one but an infinite hierarchy of constraints which must be satisfied by the states and effects of the GPT. This hierarchy of constraints on bipartite states immediately translates to a hierarchy of constraints on the correlations realisable within the theory. Inspired by Ref. [23], we then consider a particular setup where we demand that there exists a measurement in the GPT which can measure the parity of fiducial measurements (or a subset thereof), which we call a (partial) parity reading measurement. We say that the observables which appear in such partial parity reading measurement are parity-readable observables. We then define an Optimisation Problem that computes the maximum violation of a Bell inequality by the corresponding GPT, provided that such a parity reading measurement exists. Such an Optimisation Problem provides a way to characterise the set of correlations allowed by such a GPT when the parties choose among these parity-readable fiducial measurements in the Bell tests. The solution to this optimisation problem is, however, computationally complex, given that the problem itself is polynomial on the optimised variables. We hence present a series of relaxations that upper-bound the solution to the Optimisation problem. We finish by applying our techniques to a variety of Bell inequalities, and discussing the necessity of the Local Tomography assumption.
Inspired by our results, we moreover formulate a conjecture: Conjecture 1. Under the assumption of local tomography, the local observables that are parityreadable satisfy Tsirelson's bound, i.e., they cannot violate Bell inequalities better than quantum mechanics does (with arbitrary measurements).
The inspiration came from the fact that various of our numerical explorations do indeed satisfy this property. Moreover, we have a counterexample demonstrating the necessity of the assumption of local tomography within the conjecture. That is, if we have a GPT which does not satisfy local tomography, then it is possible to create a PR box using observables which are parity-readable.  Finally, as we comment in the Discussion section, our work opens the door to further conjecturing that Quantum Theory, within the landscape of possible locally-tomographic physical theories, is the theory that features the best balance between allowed states and effects, in the sense that it yields the largest violation of any Bell inequality by parity-readable fiducial measurements.

Descriptive summary of the results
Suppose that two parties, the ubiquitous Alice and Bob, each have three binary observables, X, Y , and Z, that they can measure on some shared system. Moreover, suppose that there exists some joint measurement that they could perform on their joint system if they were to get together, whose outcome would determine the parity of XX and ZZ. That is, the joint measurement does not necessarily reveal the values that they would have obtained had they measured X and Z individually, but just whether or not their XX and ZZ measurements would have been correlated or not. 2 We say that such observables are parity-readable, and illustrate this idea in Fig. 1.
In this manuscript, we consider the impact that parity readability has on the correlations that can be generated in a Bell scenario when measuring parity-readable observables. For example, is it possible to create PR-box correlations via parity-readable observables?
It is clear that, even within the standard quantum mechanical formalism, this imposes a restriction on the correlations which can be observed -not all observables are parity-readable, and some correlations can only be achieved by those that are not. However, what about if we go beyond quantum mechanics?
We conjecture that, within this landscape of parity-readable observables, quantum theory is always optimal. That is, any correlation that can be generated by parity-readable observables, independently of which underlying (tomographically-complete) physical theory they belong to, can also be generated by parity reading observables within quantum theory.
If this conjecture is true, then this would be in stark contrast to the landscape of arbitrary observables, in which there are correlations which cannot be realised within our quantum world. It would therefore show, for the first time, a way in which quantum theory is an optimal physical theory for an information theoretic task. PRM PRM Figure 2: The existence of a measurement that reads the parity of the parity-readable observables, implies an infinite hierarchy of positivity constraints. For instance, the probabilities of outcomes should be positive on (a) a single copy of a state, (b) all pairs of products of steered states that can be obtained from the original state, (c) three copies of the state in a triangle network, and (d) a single copy of the state times any two pairs of steered states.
Utilising techniques and insight coming from the field of generalised probabilistic theories we specify the constraints that the existence of a parity-reading measurement imposes on the possible correlations those observables may generate in a Bell test. These constraints are formulated as a hierarchy of convex optimisation problems which can be tackled using standard numerical methods. We apply this technique to and numerically explore various Bell scenarios and Bell inequalities, whose results lead us to formulating the conjecture discussed above. More precisely, we consider scenarios in which each party has at most three observables X, Y and Z, and in which either two or three of these are parity-readable. The Bell inequalities explored include the CHSH [25], AMP [35], and AQ [36] inequalities. Further development of both the numerical methods, as well as analytical convex optimisation techniques, are necessary to explore this conjecture further.
It is worth highlighting that the main technique that we develop and apply here, relies on demanding the parity-readable observables to yield valid probabilities when applied both to a composite state as well as to products of steered states that can be generated from it (see Figs. 2a, 2b.). These constraints are indeed phrased as positivity conditions on variables we optimise over. However, to capture the full power of the constraints that these parity-readable observables impose, one needs to take into account infinitely many conditions (examples of which are presented in Figs. 2c and 2d), described in the article, and which deserve further exploration.
3 Warm-up: non-existence of Popescu-Rohrlich correlations for parity measurable X and Z observables.
While the general problem of finding a bound for Bell inequalities for parity-readable observables is a complex one (as we will see further in this paper), one can relatively easily show that the CHSH inequality cannot achieve its maximal algebraic bound. Namely, we shall show that parity-readable observables cannot exhibit so called Popescu-Rohrlich correlations [16].
In this section we shall present such reasoning, as a simple warm-up exercise in anticipation of the rest of the paper. In this warm-up we will take a device-independent approach, in the sense of relying only on the conditional probability distributions for the argument (these black boxes are the only information we leverage for describing the underlying states of the system). In the remainder of the paper we will utilise the language of Generalised Probabilistic Theories (GPTs) and, in particular, we will express the question using a formal diagrammatic language. Full justification of some formulas will be found later in the paper. For self-consistency of the main part of the paper, we will repeat there some definitions used here.

States
The so called PR-box [16] is defined as the set of conditional probabilities: where (i) specifies the party (i.e., (1) for Alice and (2) for Bob). This is indeed similar to the representation of non-signalling correlations in the CHSH scenario known as Collins-Gisin [37], which shows how nine parameters are enough to fully specify the 16 components of the full conditional probability distribution. Since a PR box has perfect correlations or anticorrelations for each pair of observables, it is natural that the steered states obtained from it are all the states that have well defined value for both observables. Thus, we have that, for each party, there are four steered state, the state s 1 with X = 0, Z = 0, the state s 2 with X = 0, Z = 1, the state s 3 with X = 1, Z = 0 and the state s 4 with X = 1, Z = 1. These four states are depicted in Fig. 3. We shall present here the four pairs of steered states which we will use in the proof. These are products of all four steered states on Bob's site and a single fixed state on Alice's side, namely the state with X A = Z A = 1, as is depicted Fig. 4.
We denote these four states as: In the above matrix notation these are characterised by:

Parity Reading Measurement.
Now we consider the measurement that measures the parities of XX and ZZ in a single shot, that is, X and Z are parity-readable observables. Specifically, this parity reading measurement (PRM) outputs a pair of bits: the first one reports the parity of XX and the second one reports the parity of ZZ. This is expressed by the following pair of conditions: where we denote by p s (rq|PRM) the probability of obtaining the pair of outcomes rq when measuring the PRM on state p s . Notice that R XX (R ZZ ) have the interpretation of the probability of XX (ZZ) being correlated (i.e., the probability of obtaining the same outcomes by Alice and Bob). In addition, note that the probabilities of the outcomes of the PRM satisfy the normalisation condition: We therefore have a system of three linear equations with four unknown quantities p s (rq|PRM).
Hence, a solution may be found with one quantity remaining an independent variable, for example the expression: Due to local tomography, the probabilities of the outcomes of any measurement for a particular state can be written as a linear combination of state parameters, p s . Hence, as C is a linear combination of such probabilities, it can be computed via: where · denotes Frobenius matrix product, and C is the matrix of, at the moment, unspecified parameters describing the PRM: Combining Eqs. (5), (7), (8) and (9), we obtain the formulas for computing the probabilities of PRM outcomes, expressed in terms of the state parameters, as well as the free parameter C: 3.3 Proof of nonexistence of the PR-box.
We shall now argue that, for any arbitrary choice of C, the probability of at least one PRM output will necessarily be negative on some of the pairs of steered states. This proves that parity-readable observables cannot feature PR-box correlations, and hence cannot violate the CHSH inequality up to its maximal algebraic bound. We shall first impose positivity of p s (qr|PRM) for each of the four states in Eq. (3). Note that, for the state s 44 , by definition, both XX and ZZ are perfectly correlated, since X A = X B = 1 and Z A = Z B = 1. Hence, R XX (s 44 ) = R ZZ (s 44 ) = 1 for the state s 44 . An analogous reasoning for the remaining three states yields: Of course, we might alternatively compute these values from the matrix form of the states, and the definition of R XX , R ZZ . Inserting them into Eq.(11) for each of the four states, yields the following conditions for the positivity of PRM probabilities: We then use the matrix form of our states of Eq. (4) and the form of C of Eq. (10) and rewrite these conditions as: We see that this set of equalities does not have any real solutions. We therefore conclude that there does not exist a Parity Reading Measurement whose outcomes give legitimate probabilities for the above four pairs of steered states. Since the existence of a PR-box allows for such steered states, we conclude that existence of Parity Reading Measurement excludes PR-box correlations. The remainder of this work explores this curious property in more detail. In particular: on the one hand, can we go beyond simply ruling out the PR-box and see how PRMs constrain the set of correlations; and, on the other hand, can we go beyond the assumptions involved in this derivation, namely, that of local tomography and that there are simply two observables per party.

Generalised probabilistic theories
In order to explore the previously presented questions, we will work within the framework of generalised probabilistic theories (GPTs) [24,26]. This framework was developed in order to be able to describe essentially arbitrary conceivable theories of nature -taking quantum and classical theory as just two particular points within a broad landscape of potential physical theories. The GPT framework is based on the idea that, ultimately, the way that we characterise physical devices is by the probabilities that they give rise to in experiments. From this simple observation, one can build a rich mathematical structure which any GPT must have.
For simplicity of the presentation, in this manuscript we will focus on a particular class of GPTs, namely, those that satisfy the principle of Local Tomography. A GPT is locallytomographic if any state of a composite system can be uniquely determined by the information obtained from performing local measurements on its constituents (see, e.g., Ref. [26] for a full formal definition). GPTs that satisfy local tomography tend to have very useful properties, and in particular they admit a useful parametrisation of their state and effect vectors, which will come in handy in various stages in this manuscript. The majority of our results, such as the formulation of the hierarchy of constraints, however, do not require this principle to hold, hence we will highlight the instances where the assumption is indeed necessary.
In this paper we take a categorical approach to tomographically local GPTs. This is an intrinsically compositional approach, which allows us to describe arbitrary experimental scenarios. Moreover, the diagrammatic representation in terms of string diagrams, which comes from this approach, provides an intuitive way to reason about these complex situations. We provide a brief technical introduction to the formalism in Appendix B, and refer the reader to Refs. [32][33][34][56][57][58] for more extensive introductions to these tools.

Constraints on states and effects
We can see how the state and effect spaces constrain one another when demanding that scalars are probabilities. For some system V (see the Appendix for details on notation), the states can be thought of as vectors s ∈ Ω V living inside V , and effects as linear functionals, e ∈ E V living in the dual space V * . Any pair of an effect and a state must satisfy: The geometric consequences of this for local and composite states are presented in App. C. It was noted in Ref. [59], however, that even if a pair of state and effect spaces satisfy the standard constraints discussed in App. C (i.e., Eqs. (169) and (170)), it is not straightforward that they actually define a valid GPT, at least when the No-restriction hypothesis is not assumed (i.e., when it is not required that Ω = E * ). An important condition that must also be checked, as shown in Theorem 9 of Ref. [59], is that the steered states are also valid states within the theory. That is, any bipartite state s must satisfy: for all e v ∈ E V and e w ∈ E W . This constraint can be interpreted in many forms: • as a constraint on the bipartite state space, namely, that a bipartite state must lead to valid steered states, • as a constraint on the local state spaces, namely, it forces them to include all of these steered states as valid local states, • as a constraint on the local effect spaces, namely, an effect is only allowed if it leads to valid steered states when composed with any bipartite state. However, we believe that the constraints of Eq. (16) are probably best viewed not from any of these individual perspectives, but instead just as a compatibility condition between local states and effects, and bipartite states.
Similarly, we can consider bipartite effects e and note that these have a similar compatibility condition together with local states and effects: for all s v ∈ Ω V and s w ∈ Ω W . One may be inclined to think that these constraints on bipartite states/effects, steered states, and steered effects, are sufficient to characterise a valid GPT. However, there are only the tip of the iceberg -a whole plethora of further compatibility constraints lie underneath the surface. For example, consider a normalised bipartite state s and a bipartite effect e. By taking two copies of each, one should be capable of wiring them as follows and obtain a valid probability: In addition, if one takes two copies of s and one of e, one should be capable of wiring them as follows and obtain a valid bipartite state: Other types of compatibility constraints include diagrams like the following, which arise due to symmetry in the special case when all the local systems have the same type: One can readily see how these belong to an infinite family of constraints, each featuring the same (but arbitrary) number of normalised bipartite states s and bipartite effects e being connected in this "braided" fashion. Even if the bipartite states consist of local systems of the same type, it is not necessary that they are symmetric under a swap operation of the local systems. Hence, a different hierarchy of braided-type constraints will arise by requiring consistent probability assignments to diagrams of the form: In Section 6.1 we will see how to formalise these types of hierarchies, and how they can be used to constrain the potential correlations in a GPT.

Correlations in a GPT
To describe correlations in a GPT we must first introduce classical systems. Here we describe a classical system by classical random variable, which can take values from a set, such as X, Y, A, B. We denote these classical systems by thin gray wires (to distinguish them from GPT wires). Correlations, in this formalism, are then viewed as no-signalling stochastic maps, N : X × Y → A × B, between these random variables. Diagrammatically, we denote these no-signalling boxes as: which must satisfy the no-signalling constraints: This is equivalent to the standard view of correlations [60] as being described by a conditional probability distribution Pr(A, B|X, Y ) = {p(ab|xy)} {a∈A,b∈B,x∈X,y∈Y } , which can be seen by defining: and checking that the no-signalling conditions of Eqs. (23) are equivalent to the standard nosignalling conditions for the conditional probability distribution. To do so it is useful to note that, for example: Then, in order to understand the possible correlations in a GPT, it is useful to describe measurements as transformations from a GPT to a classical system, where the choice of measurement is controlled by another classical system. These controlled measurements must satisfy the constraint: Correlations that can be generated in a Bell experiment are hence of the form: where the local controlled measurements M A and M B (for Alice and Bob respectively) are performed on a bipartite system on state s, with local system types V, W in the GPT. These local measurements are controlled on the input classical variable and have an outcome recorded in the output classical variable. In any GPT, such a diagram corresponds to a no-signalling stochastic map: It can then be shown that the constraint on measurements of Eq. 26 immediately implies the relevant no-signalling conditions, for example: A more general version of this proof first appeared in Ref. [61] and was generalised to arbitrary causal structures in Ref. [62].
Bell inequalities [1,60] are then a particular class of linear functionals from this space of stochastic maps to the reals. A linear functional corresponding to a Bell inequality, hereon denoted by I, can be diagrammatically denoted as: Note that I should not be interpreted as a process within the GPT -I is simply some linear functional, and can lead to negative values. The value of a Bell inequality on a stochastic map N -realised within the GPT as per Eq. (27) -is given by: The maximal value of a Bell inequality I achievable by correlations within a given GPT G, is therefore given by the following optimisation problem: Notice that the optimisation is carried over the types of systems V and W present in G, as well as over the local measurements M A and M B , and bipartite states s. The solution to this optimisation problem will of course depend on the properties of the GPT being studied. However, we readily see that when maximising over states and measurements from the theory, the compatibility constraints we discussed in the previous section will play a crucial role. Indeed, if the GPT admits some bipartite effect e, then the above mentioned hierarchy of constraints (see Eqs. (18), (19), (20), and (21)) will restrict the sets of states that the value of I is optimised over. In other words, the existence of bipartite effects e within the GPT will impose a hierarchy of constraints on the correlations that such GPT may feature. In the next sections we elaborate on this fact with a concrete example.

Parity reading measurement
In this section we will explore the constraints on the correlations that a GPT may feature, given that bipartite effects associated to a particular measurement -which we call a Parity reading measurement (PRM) -exist within the GPT. Suppose we have a controlled measurement, M , for a system V , with a setting variable labeled by the set η := {0, ..., n−1} such that |η| = n, and a binary outcome variable β := {0, 1} as an outcome. This is diagrammatically denoted by: Recall that, as this is a measurement, it must satisfy: which ensures that the correlations it can generate are no-signalling. Then we can define a measurement P[M ] which reads out the parity of such a measurement M as follows: , is a bipartite measurement on V ⊗ V with n binary variables as outputs: such that tracing out all but the i-th outcome gives the parity of the i-th setting for M : We will also be interested in situations in which we have a measurement which can only read the parity of a certain subset of the setting variable ι ⊆ η:  (37) where • is the canonical embedding of ι into η 3 . Using this notation we can succinctly define these partial partity reading measurements by: Note that when ι = η then we recover the notion of a PRM. In order to see how the existence of a (partial) PRM constraints the correlations that the GPT may feature, we will fist discuss the concepts of a Fiducial Measurement and Fiducial effects.
Let n be the affine dimension of the normalised state space, that is, n = |V | − 1. A fiducial measurement, F, is a controlled measurement with n settings (described by the set η) and binary outcomes (described by the set β): (39) F is called a fiducial measurement if all of the fiducial effects can be obtained from such a measurement. Fiducial effects, in turn, form a (minimal) spanning set for the effect space of the GPT. As an example, consider the case where n = 2: here F will have a binary input system, and the three fiducial effects will be given by where the equality for the third effect comes from the fact that F is a valid controlled measurement and hence satisfies: Coming back to the case of an arbitrary n, notice that the fact that fiducial effects, span the corresponding vector space, means that any state s can be uniquely characterised by the vector of probabilities: For example, going back to the case where n = 2, this vector of probabilities will be given by: Now we can briefly state the case we will explore in this section: GPTs that have a PRM for a Fiducial measurement, and where a bipartite Bell experiment is carried by Alice and Bob performing this controlled fiducial measurement in each wing. A PRM P[F] for the fiducial measurement F will satisfy the following constraints: It is worth mentioning that a PRM P[F] is not necessarily uniquely singled out by these constraints -more than one PRM may qualify as potential candidates for the role. We denote by ParMeas[F] the set of all PRM P[F] that satisfy Eq. (45) for the given F.

Examples
Qubits. Let us conclude this discussion with an example from quantum theory. Consider the case of qubit systems, where the affine local dimension is n = 3. A fiducial measurement corresponds to measuring the three Pauli observables X, Y , and Z. A PRM is given by (a suitable post-processing of) the Bell measurement Denoting the four element set of outcomes of the Bell measurement as B := {0, 1, 2, 3}, and the qubit system by Q 2 , we can diagrammatically represent this as: To see that a post-processing is necessary for this measurement to fit into our definition of a PRM is easy: the measurement of Eq. (47) is a four outcome measurement, but, a PRM for X, Y , and Z should have three binary outcomes. The required post-processing can be described diagrammatically as: where the white dot first makes three copies of the outcome, and the processes C X , C Y and C Z correspond to the three different equal bipartitions of B. For example: 16 To read the parity of observable ZZ, we hence apply post-processing {0, 1} → 0 and {2, 3} → 1 depicted in Eq. (49), which quantum mechanically comes from All three required post-processings, and the observables whose parity they read, are presented below: post-processing parity With this we conclude the argument for why the measurement given by is a parity reading measurement for the X, Y and Z observables.
Mirror quantum correlations. Such correlations have been considered in Ref. [63]. To introduce them, let us first notice a feature of the Bell measurement: • the effect corresponding to ψ − appears only in the bipartitions of B that give rise to anticorrelations, i.e., a value of 1 for the classical system β, • the other three effects appear each only once in a bipartition that measures anticorrelations. The following table summarises this feature, where we specify, for each observable whose parity we want to read, whether each effect belongs to the correlation (0) or anticorrelation (1) bipartitions: We then say that quantum parity reading measurement has signature In the case of mirror quantum mechanics [63] the table in Eq. (53) does not hold anymore, and instead the following are satisfied: where PT stands for partial transposition 4 . Thus in mirror quantum case we have signature: This case then corresponds to the following post-processing of the measurement outcomes: post-processing parity Classical theory. In the quantum case, the parity reading measurement was entangled. It had to be so, because it measured parities of observables that are not jointly measurable. In classical theory we can consider three bit system, described by X, Y, Z which are now jointly measurable. Then the PRM just amounts to post-process the joint measurement of all the 6 observables (three per party).

P[F] and maximal violations of Bell inequalities for fiducial measurements F
So, how does the existence of a P[F] in the GPT constrain the correlations we may observe in a Bell test? More precisely, what are the constraints on the correlations that a bipartite system of a certain type can produce when there exists a PRM for the fiducial measurements on that system type? In this section we aim at optimising the value of a Bell inequality I when the measurements that the parties perform are given by F on a system of type V . Notice that, in general, the cardinality of the input settings for the Bell test need not coincide with the number of settings for the fiducial measurement F. That is, |X| = |η| = |Y | does not necessarily hold. In this manuscript we will work with the case where |X| ≤ |η| and |Y | ≤ |η|, and hence only some of the settings of the controlled measurement F might be used for the Bell test. In such a case, then, we will focus on the constraints that a partial PRM imposes when its existence is demanded on the settings of F used in the Bell test. For simplicity in the discussion, in this section we will present the case where |X| = |η| = |Y |, but the most general case follows similarly. We will return to the optimisation problems for partial PRM later on in the manuscript.
The optimisation problem that we focus on then reads: where X, Y, A, B are all binary variables, and we are using the shorthand notation: Given a particular GPT G -and, in particular, given a specification of its state and effect spaces -this optimisation problem reduces to a type of cone program which has been explored in recent literature [47,48,64] regarding their relationship to GPTs. That is, there is some convex spanning cone of states K V ⊗V ⊂ V ⊗ V , which s belongs to, and some normalisation constraint on s, u V ⊗ u V (s) = 1 so the above problem can be rewritten as: Here, however, we do not consider a particular GPT G which we optimise over. What we carry out here is an optimisation over the space of GPTs which have the relevant structure -those which admit a PRM for the fiducial measurement. This optimisation problem is much more complex than that of Eq. (60), as we will now explain. In Section 4.1 we elaborated on the types of compatibility constraints between states and effects that a GPT must feature. Here, we will demand that the GPT admits the fiducial local measurement F and a PRM P ∈ ParMeas[F]. By imposing the compatibility constraints motivated in Section 4.1, we hence restrict the possible cones of states K[P] that such GPT could feature. If we have a characterisation of K[P] ⊂ V ⊗ V , then the optimisation problem becomes: This turns out to be a non-linear optimisation problem, as we will show next.

The cone K of bipartite states
The key question here is: how to characterise the cones of states K[P]? Here we will take the types of diagrammatic constraints motivated in Section 4.1, and define a systematic hierarchy of conditions that the existence of P[F] imposes on the cone K. This hierarchy of conditions will be specified in terms of the number of copies of the bipartite state s featured in the diagram. Framing these constraints in the form of a hierarchy is useful because, as we will see, interesting results can be obtained without needing to impose all of the constraints. For example, in our case we will be interested in possible violations of a given Bell inequality, and, we can obtain upper bounds on this by simply working at the second level of the hierarchy.

Hierarchy constraints -Level 1.
Given the fiducial measurement F and the PRM P ∈ ParMeas[F], the normalised bipartite states where by ≥ 0 we mean that every matrix element is non-negative.
Notice that the constraints that come from the deterministic effect u -in particular, the normalisation condition u V ⊗ u V (s) = 1 -together with these positivity constraints, ensures that diagrams in Eqs. 62, 63, and 64 are stochastic maps.
Notice moreover that the constraint of Eq. (64) has the same structure as that of Eq. (63) but applied instead to the swapped state: We see then that there is a certain structure emerging: (i) there are two layers -one corresponding to the state s and one to the measurements F and P -, and (ii) we can vary the order in which the output wires of the state are plugged into the measurements of the second layer. This motivates the definition for the remaining levels of the hierarchy, which relies on the concept of a wiring, which we explain next.

Definition 5.3 (Wiring).
A process which describes how a collection of input systems are connected to a collection of output systems is here referred to as a wiring, and denoted usually by W .
When all the input systems are of the same type -a case we focus on here -wirings reduce to permutations of the systems, e.g.: 20

Hierarchy constraints -Level k.
Given the fiducial measurement F and the PRM P ∈ ParMeas[F], the normalised bipartite states s ∈ K[P] must satisfy the constraints imposed by Hierarchy Level k ′ for all k ′ < k, as well as the following: First, notice that if the constraints of Level k+1 are satisfied, then the constraints of Level k are also by definition satisfied. Moreover, the constraint that the wirings are distinct ensures that there is no redundancy within a particular level in the hierarchy. In addition, the condition that the wirings are totally connected ensures that the constraints they impose do not reduce to constraints at lower levels in the hierarchy. We discuss the convergence of this hierarchy in App. E. Enumerating and finding a simple description of the distinct totally connected wirings is left as an interesting open problem.
We see that the constraints that each level k imposes are then of two types: (i) one where the process in the second layer is the product of k copies of P -Eq. (68) -, and (ii) one where the process in the second layer is the product of k −1 copies of P and two copies of F -Eq. (67). Note that if we had more copies of F (and so fewer of P) then it would be impossible to have a totally connected wiring, hence we need consider at most two copies of F.

One particular example of a of constraint imposed by the hierarchy is
This condition, imposed first in Level 2, can be given the following interpretation: the PRM P must give valid probabilities on products of two steered states, each constructed from s.

The optimisation problems
The hierarchy of constraints presented in the previous subsection ultimately defines some convex cone. To see this, suppose σ 1 and σ 2 satisfy the compatibility constraints of Eqs. (67) and (68) for all k. Then, r 1 σ 1 + r 2 σ 2 will satisfy the constraints for all r 1 , r 2 ∈ R + . The optimisation problem of Eq. (61) is therefore carried out over Σ, the union of the cones This allows us to cast the optimization problem in a deceptively simple form: Optimisation Problem 1.
While this form of the optimisation may appear simple, determining membership of the set Σ is computationally extremely difficult. Indeed, Σ is defined as the union of a (potentially infinite) set of cones, each of which is defined by an infinite hierarchy of constraints. In the remaining of the paper we will see how to relax these constraints, to make the optimisation problem computationally tractable, and by so compute upper bounds to I max . Note that, since the objective function is linear, we can make this a convex optimisation problem by optimising over the convex closure of Σ.
In addition, one may wish to optimise the value of a Bell inequality where the cardinality of the input variables in the Bell test does not coincide with the number of settings in the fiducial measurement, i.e., |X| ≤ |η| and/or |Y | ≤ |η|. In this case, only a subset ι ⊂ η of the control settings are of interest, and the relevant constraint is the existence of a partial PRM for ι. In this case, the Optimisation problem becomes: Optimisation Problem 1 ′ .
where Σ ι is the set of potential states compatible with the existence of a partial PRM P[F] ι for the settings ι ⊆ η.

A relaxation to Optimisation Problem 1
In this section we will specify a particular subset of constraints imposed by the hierarchy that defines K[P]. We will focus on a particular set of minimum requirements to demand to the GPT, which are colloquially stated as: The optimisation problem to solve therefore reads as follows: Optimisation Problem 2.
It is readily seen how the Optimisation Problem 2 is a relaxation of the Optimisation Problem 1 -a solution I R max to the former will yield an upper bound to the solution I max of the latter.
To be able to implement this sort of optimisation problem on a computer, we must switch from the high level diagrammatic description of the processes in a GPT, to a lower level tensorial representation. The method for doing this is presented in App. D, together with a spelled-out example for a particular scenario.
The constraints that appear in Optimisation Problem 2 can then be recast by means of the tensor representation as constraints on real vectors. This lower level form of Eqs. (62), (63), (69), and (45), will be used when coding the scripts to carry out the numerical calculations of the next section.

Example 1.
Consider the case where n = 2. Here, as we discussed in Section 5, a normalised state s can be fully parametrised as in Eq. (44) by the vector of probabilities: where p s (a|x) is the probability that outcome a is obtained when the fiducial measurement x is performed on a system on state s. In a locally tomographic GPT, a bipartite system can be parametrised as follows: where p (j) s (a|x) denotes the marginal conditional probability of subsystem j, and p s (ab|xy) denotes the joint conditional probabilities. Note that these parameters are also precisely those required to characterise a no-signalling box with binary inputs and outputs. This is not a coincidence -indeed, this is precisely the no-signalling box that we will obtain when we measure this state with the fiducial measurement F on both systems. Hence, this parameterisation of the bipartite state and the form of F ensure that the observed correlations are no-signalling.
Using the tensorial notation described in App. D and, in particular, the above parameterisation of the composite state (Eq. (75)), the constraints in Optimisation Problem 2 can be recast as follows.
The first one, i.e., Eq. (62), reads: That is, the outcome statistics of fiducial measurements on a state s are a well-defined no-signalling normalised conditional probability distribution.
where v and w are the indices associated to the two GPT vector spaces V , and q and r are the indices associated to the classical outcomes of the parity reading measurement, that is, each corresponds to one of the parities which is being read. Since s vw is represented by a 9-dimensional probability vector p s , P qr vw may be represented by a 4 × 9 matrix, [P]. The third constraint, i.e., Eq. (69), reads: where the indices x and y are the indices associated to the measurement settings of the two fiducial measurements, and the indices a and b to their outcomes. Notice that the tensors F a xv 1 and F b yw 2 correspond to the definition of the fiducial effects for system of type V . Specifically, we can write that: Hence, this equation can be further written as: where by ≥ 0 we mean that every element of the matrix must be ≥ 0. This equivalent form of Eq. (69) makes it clear to see that it indeed imposes that P[F] is a valid measurement on products of steered states, which are steered by fiducial measurements. We can denote the (subnormalised) steered states explicitly by Then, the condition (69) can be finally written as that is, the parity reading measurement must give valid probabilities on products of steered states.
The last constraint, given by Eq. (45), can be recast in the n = 2 case as follows: r=0:1 where ⊕ denotes sum mod 2. These tensorial equations indeed correspond to equality constraints between 9-dimensional covectors: which can be straightforwardly verified by noting that: and that ■ Optimisation Problem 2, despite being a relaxation of Optimisation Problem 1 , still shares a common feature with the latter: they are both nonlinear optimisation problems. Indeed, we can see clearly in the formulation of Optimisation Problem 2 how the constraints feature products of the variables being optimised over. Solutions to such polynomial optimisation problems may be approximated by standard techniques in the literature. Here, we will consider the hierarchy of semidefinite relaxations to polynomial optimisation problems given by Lasserre [65]. Each level of such hierarchy will give an upper bound to the solution I R max of Optimisation Problem 2.
Optimisation Problem 2 is formulated for the situations where the cardinality of the input variables in the Bell test match the number of settings in the fiducial measurement, i.e., |X| = |Y | = |η|. However, as we mentioned in Section 5.2, this is not always necessarily the case. We will therefore next reformulate Optimisation Problem 2 to encompass the case of partial PRMs. This adjusted version of the optimisation problem will come in handy when exploring quantum correlations, since for example it allows the study of Bell inequalities with two measurement settings per wing (see, e.g., the CHSH scenario in which |X| = |Y | = 2) on qubits (whose affine dimension is 3 rather than 2). In addition, importantly, this adjusted version of the optimisation problem might allow us to make device-independent studies of the results 5 , since do not require full knowledge of the dimension of the local systems to impose the constraint of existence of partial PRMs. Namely, we hope that the constraints for correlations obeyed by set of local observables imposed by existence of PRM persists, regardless of the dimension of the system.
Notice finally that Optimisation Problem 2 ′ is indeed a relaxation of Optimisation Problem 1 ′ , in the same way that Optimisation Problem 2 is a relaxation of Optimisation Problem 1.

Example 2.
Consider the case where n = 3. Here, as we discussed in Section 5, a normalised state s can be fully parameterised as in Eq. (44) by the vector of probabilities: where p s (a|x) is the probability that outcome a is obtained when the fiducial measurement x is performed on a system on state s. In a locally tomographic GPT, a bipartite system can be parameterised as follows: where p Since s ij is represented by a 16-dimensional probability vector p s , P ι qr vw may be represented by a 4 × 16 matrix. Notice that the fact that P ι is a partial PRM is captured by the fact that it only has two output systems -hence its matrix representation has four rows.
The third constraint, i.e., Eq. (69), reads: Notice that the tensors F a xv 1 and F b yw 2 actually correspond to the definition of the fiducial effects for system of type V . If we represent P ι qr vw by a 4 × 16 matrix [P ι ], hence, this equation can be further written as: where by ≥ 0 we mean that every element of the matrix must be ≥ 0. This equivalent form of Eq. (69) makes it clear to see that it indeed imposes that P ι is a valid measurement on products of steered states, which are steered by fiducial measurements. The last constraint, given by Eq. (45), can be recast in the n = 3 case as follows: where ⊕ denotes sum mod 2. These tensorial equations indeed correspond to equality constraints between 16-dimensional covectors:

■ 7 Approximating I max for various Bell inequalities
In Ref. [23] it was shown that existence of the Bell measurement for two systems, each with three observables (i.e., for |η| = 3) imposes that there are no post-quantum correlations. In our work we want to pose the more general problem of whether parity-readable observables can lead to post-quantum correlations. In order to tackle this question, we defined a hierarchy of constraints that the existence of parity-readable observables imposes on the states and effects of the underlying GPT, and therefore on the sets of correlations that the GPT allows. We explored the boundary of the allowed correlations by tackling the Optimization Problems defined in the previous section. The results of these numerical explorations led us to further formulate a conjecture, re-stated below: Formally, the conjecture, if true, means that for any Bell inequality the Optimization Problem 1 ′ returns at most the quantum bound. Note, however, that we do not always expect OP1 ′ to actually reach the quantum bound, as the maximal quantum value is not necessarily achieved by observables which are parity-readable within quantum theory.
In this section we present numerical computations towards upper-bounding the solution to OP2. Indeed, our numerical explorations focus on the relaxed problems OP2 and OP2 ′ , depending on the cardinalities of β and η. As mentioned in the previous section, we will approximate the solution to OP2 ′ by means of a hierarchy of semidefinite relaxations formulated by Lasserre [65], using mostly the Lasserre hierarchy levels 1+AB and 2. A brief explanation of what these two levels mean is presented below, and refer the reader to Ref. [65] for a thorough exposition. Throughout the next subsection we also discuss the relation between the numerical results and the conjecture we formulated.
Finally, in this section we provide an analytical proof that GPTs which violate local tomography admit PR-box correlations under the constraints of OP2 -that is, OP2 may yield a value of 1 2 for the CHSH inequality (in the notation of Eq. (103)) within non-tomographically local GPTs. This highlights the relevance and impact of the assumption of local tomography. We also discuss how the violation of local tomography by a GPT may impact whether its correlations satisfy or not the conjecture. In particular, we discuss how the result we show does not necessarily imply that non-tomographically local GPTs violate Conjecture 1, since the actual optimisation problem to be solved -OP1 ′ -imposes additional constraints to those appearing in OP2.
Before moving on to presenting the numerical results, let us briefly comment on the so-called Lasserre hierarchy. Each Lasserre hierarchy level is related to the semidefiniteness of a matrix (whose definition we will not give here), and the rows and columns of this matrix have particular labels depending on what level we are focusing on. Let Υ be the set that contains the variables we are optimising over plus the element 1. In the first level of the Lasserre hierarchy, the matrix under study has row and columns labelled by the elements of Υ. In the second level of the hierarchy, however, the matrix under study is of much larger size, and its rows and columns are labelled by the elements of Υ × Υ, where × denotes the Cartesian product of sets. The so-called 1+AB level lies in between the first and the second -the matrix corresponding to 1+AB is a sub-matrix of that of level 2, and the matrix corresponding to level 1 is a sub-matrix of that of 1+AB. In particular, the rows and columns of the matrix corresponding to level 1+AB are labelled by the elements of the set Υ × Υ \ {(υ, υ)|υ ∈ Υ}.
The numerical computations from Sec. 7.1 were performed with Python 3.7. The SDP relaxation of polynomial programming was calculated with the package Ncpol2sdpa [66]. The SDP problem was solved using SDPA [67]. All other numerical computations were carried out by a sparsity-adapted SDP relaxation of the polynomial optimization problem modeled with the TSSOS [68] algorithm. For more details on the modeling syntax, we refer the interested reader to the tutorial from [69,Appendix B.2] and the online website https://github.com/ wangjie212/TSSOS. Each SDP problem was solved using Mosek [70].

CHSH inequality
In a bipartite Bell scenario featuring two dichotomic measurements per party, the most studied inequality is the Clauser, Horne, Shimony, Holt (CHSH) inequality [25], which, using the notation of Eq. (75), reads: (103) This inequality is bounded from above, and the corresponding classical, quantum, and nonsignalling bounds are: Note that in this section we shall make a slight abuse of notation, and denote Alice and Bob's observables by X, Y, Z, which is not to be confused with the use of X and Y to denote the sets of inputs.
The numerical results presented in this subsection are summarized in Fig. 6.
Case |ι| = |η| = 2. This is the simplest possible problem, where there are only two observables per party (that is, Alice has two observables X, Z, and the same for Bob) and the PRM measures the two parities XX and ZZ. We approximated the solution of OP2 applied to the CHSH inequality, by applying a Lasserre SDP relaxation with hierarchy level 'a bit lower than 1+AB'. We will specify shortly what this means, but will first elaborate on the specific parameterisation we chose for OP2. The state is described by 8 parameters coming from local tomography (see Eq. (75)) which we recall here in a more compact, matrix notation: The (unnormalized) Alice states steered by Bob given by Eq. (82) are expressed as: Bob's states steered by Alice have the same form as in Eq. (106) but exchanging XZ ↔ ZX and (1) ↔ (2). Now, the constraint that PRM measures parity (given by Eq. (85)) reads This can be obtained from Eq. (85) as follows. First we have, for example, where · represents the Frobenius inner product of the two matrices. Then using we get (P 00 + P 01 ) · p s = 1 + 2p s (00|00) − p (1) which leads to the form above. Preserving probability by PRM reads as Thus, the condition that P is a PRM is captured by the following free parameters: Finally, the constraints that we still need to impose are the positivity of PRM effects both on the state, as well as on tensor products on all pairs of steered states that can be obtained from it. We see then that OP2 requires us to optimise over the free parameters (state s given by Eq. (105) and C), under the positivity constraints of the previous sentence. Now, to approximate the solution to OP2, we apply a particular level of the Lasserre hierarchy, which is slightly lower than the previously described 1+AB, and which we will denote by 1+AB * . Let Υ p denote the set of free parameters given by given by Eq. (105), and Υ C that given by the free parameters in C. Here, Υ = Υ p Υ C {1}. However, the matrix under study in the level we consider here has rows and columns labeled by the elements of (Υ × Υ) \ (Υ p × Υ p ) \ (Υ C × Υ C ) -that is, it is a submatrix of that considered in level 1+AB.
The upper bound to OP2 given by the 1+AB * level of the Lasserre hierarchy, gives a value of ∼ 0.2071, which agrees up to numerical precision with I R max = √ 2−1 2 . This equality follows from recalling that the Tsirelson's bound value can be achieved within Quantum theory by a Bell measurement, and hence yields a lower bound to I R max . In other words, here we recover Tsirelson's bound for the CHSH inequality.
Case |ι| = 2, |η| = 3. Here we still assume that the PRM measures just two parities (i.e., those of XX and ZZ), but now Alice and Bob have one more additional observable (i.e., Y). This is an important case, as it allows for the possibility that the constraints imposed by the existence of a PRM are sensitive to the dimension of the local systems. Notice that if the constrains stemming from the existence of a PRM turn out to be independent of the dimensions of the local systems, one can then take a device-independent approach to the problem and only rely on the black-box statistics to make assessments on the possible violations of the Bell inequality.
Our numerical results show that in this case Tsirelson's bound is also not violated. The results are computed exactly as in the previous case with |η| = 2: we upperbound the value of I RP max via the 1+AB * level of the Lasserre hierarchy, which agrees up to numerical precision with Tsierlson's bound for the CHSH inequality as the solution to OP2 ′ . Case |ι| = 3, |η| = 3. Here, both parties have three observables, and the PRM measures the parity of all three of them. In this case, there is no need to run numerics to approximate the solution to I R max . On the one hand, notice that the optimisation to be carried out is the same as that for the case with |ι| = 2 and |η| = 3, with some additional constraints given by the requirement that the PRM reads out the parity of the extra pair of fiducial measurements (since now |ι| = 3

AMP inequalities
In a bipartite Bell scenario featuring two dichotomic measurements per party, a relevant family of inequalities was defined by Acín, Massar, and Pironio (AMP) [35]. These correspond to tilted CHSH inequalities, and have been found to be useful for randomness 'generation' [35]. In the traditional language, the value assigned to the linear functional associated to the inequality reads: where the parameters α and γ satisfy: α ≥ 1, γ ≥ 0, and γ < 2. These inequalities are bounded from above, and their corresponding classical, quantum, and non-signalling bounds when αγ ≤ 2 are: In our notation, that is, in terms of the probabilities, the AMP inequialities are equivalently captured by the following linear functional: whose corresponding classical, quantum, and non-signalling bounds when αγ ≤ 2 are: To explore the case of these inequalities in this section, we have only considered the case of |ι| = |η| = 2.
The values of α and γ that we considered are quite varied. On the one hand, we took α from the set {1, 3, 5, 7, 9, 11} and then, for each such α, considered six equally-spaced values for γ (see Sec. A in the Appendix). On the other hand, we wanted to explore the transition between α = 1 and α = 3 more deeply, hence we explored the linear functionals I α,γ also for α taken from the set {1.01, 1.05, 1.1, 1.2, 1.5, 2, 2.2, 2.4, 2.6, 2.8, 2.9, 2.95} whilst keeping γ = 0. The motivation for this will hopefully become clear later on.
For each of the linear functionals I α,γ defined by the above-mentioned values of α and γ, we asked what the value of I max -the solution to the optimisation problem OP1 -is. Here we computed an upper bound to I max for each inequality, by applying two relaxations to OP1: • First, instead of demanding that s ∈ Σ, we only request that the state s belongs to the cone K[P] that satisfies the second level of our hierarchy (see Eqs.  (68)). • Second, by solving the associated level 3 of the Lasserre hierarchy, we upper bound the solution to the relaxation to OP1 defined in the previous item. Our numerical calculations show that, in the cases where α ≥ 3 the upper bound for I max is smaller than the inequality's Tsirelson's bound (see Fig. 5). Indeed, up to numerical precision I max ≤ 0, where 0 is the classical bound of the inequality. For the case where α = 1, the inequality becomes the CHSH inequality plus a extra term corresponding to the single-party observable A 0 . Beyond the case γ = 0 (which corresponds to the traditional CHSH inequality), other values of γ give an upper bound to I max that is larger than yet close to the inequality's Tsirelson's bound (see Fig. 5). Finally, for the values of α ∈ {1, 1.01, 1.05, 1.1, 1.2, 1.5, 2, 2.2, 2.4, 2.6, 2.8, 2.9, 2.95, 3, 5, 7, 9, 11} and γ = 0 one observes that the upper bound to I max drops to 0 when α goes from 1 to 3. Reading into the data of Fig. 5, one can notice some additional interesting behavior. For instance, in a few cases the upper bound to I max is equal to Tsirelson's bound, at least, up to the numerical precision. In particular, this happens for the cases where αγ = 2 explored in this manuscript. The numerical results presented in this subsection are further summarized in Fig. 6.
The cases in which the PRM bounds the value of the AMP inequality to be smaller than the maximal quantum value (in contrast to CHSH in which the exact quantum bound was obtained) are likely to be cases in which the quantum bound is achieved for quantum observables for which there does not exist a PRM. This suggests that the observables which allow for a PRM may feature some particular properties regarding them being maximally complementary. Understanding the scope of parity-readable observables within quantum theory, and the correlations which they can realise, is therefore an important topic for future work. Now, what does this all mean for the purpose of our conjecture? Well, no conclusive statement can be drawn from the numerics run for α = 1 , γ ̸ = 0. However all other cases are consistent with (and hence support) Conjecture 1.

AQ inequality
In Ref. [36] an inequality was provided, which is violated by so called "almost quantum" correlations [36], but is not violated by any quantumly realisable correlations. Here we refer to this inequality as AQ inequality. In our notation it is given by  , γ), i.e., a different AMP inequality (see Table in Eq. (150)). The vertical axis plots, for each inequality, both the numerical approximation to OP1 (dashed red line) and the Tsirelson's bound of the inequality (solid blue line). (b) The horizontal axis is the same as for the case (a). The vertical axis plots the difference between the numerical approximation to OP1 and the Tsirelson's bound of the inequality. (c) The value of the horizontal axis corresponds to the value of α. The value of γ is always 0. The vertical axis plots, for each inequality, both the numerical approximation to OP1 (dashed red line) and the Tsirelson's bound of the inequality (solid blue line). (d) The horizontal axis is the same as for the case (c). The vertical axis plots the difference between the numerical approximation to OP1 and the Tsirelson's bound of the inequality. In (b) and (c), witnessing a nonnegative value in the plot means that I R max lies below Tsirelson's bound for that particular inequality. In all these figures, we approximate OP1 by first relaxing OP1 and then using the third level of the Lasserre hierarchy -see main text for details. This inequality is bounded from above, and the corresponding classical, quantum, almostquantum, and non-signalling bounds are: β AQ AQ = 1.0232 , and β NS AQ = 3.5347 .
Let us first consider the case |ι| = 2, |η| = 2. Similarly to the case for the AMP inequalities, we upper-bound I max by the solution to OP1 provided by the second level of the PRM-hierarchy of constraints presented in this paper. This solution is moreover estimated (i.e., upper bounded) by using the third level of the Lasserre hierarchy for polynomial optimisation problems. In this case, we obtain I max ≤ 1.387818418422242. This upper bound to I max is quite larger than the quantum bound, and hence not much can be concluded. Going to higher levels in the PRM-hierarchy might be the most promising step to take, however our current computational capabilities cannot handle the number of constraints and hence we defer this option for future work.
Next, we considered the case of |ι| = 3, |η| = 3. Approximating the solution of OP2 via the 1+AB level of the Lasserre hierarchy gives I R max < 1.7. This number is substantially larger than the quantum bound for the inequality, and hence we are in a similar situation to the case presented before.
In this case, however, one can further explore the specific cases where some extra properties are required of the PRM being optimised over. This is similar to what we discussed in Sec. 5.1. So let us remind ourselves first of what these two specific types of PRM we focus on are. Notice that since |ι| = 3 and |η| = 3 the outcome of a PRM is of the form (±, ±, ±|1, 2, 3), where ± j tells whether the pair of fiducial measurements (j, j) is correlated (+) or anti-correlated (−). A PRM is of 'quantum' type if it assigns non-zero probability only to the outcomes The motivation behind the name is that it is exactly the complement of the 'quantum'-type PRM.
With this in mind, we approximated the solution of OP2 via the second level of the Lasserre hierarchy. The results we obtained are: We see that restricting the optimisation to PRMs that are of 'quantum' type give quite a strong constraint on the possible value of I max , which here happens to be below the quantum bound (even below the classical bound) of the inequality. We believe that demanding that a PRM of 'quantum' type exists somehow forces the fiducial measurements to display some complementarity properties, and hence are not ideal for maximising the value of the linear functional I AQ .
The numerical results presented in this subsection are summarized in Fig. 6. Figure 6: Bounds for Bell inequalities from parity reading measurement. "Arbitrary PRM" means that we do not restrict it in any way. In particular, for |ι| = 3 it means that the PRM has 8 outcomes.

Necessity of local tomography
In this section we show that, if we give up on the assumption of local tomography in Optimization Problem 2, then Tsirelson's bound is violated. Moreover, it is violated in an extreme way, namely, that PR-box correlations can be achieved. Recall that the PR-box is a nosignaling box that reaches the nosignaling bound, that is, the algebraic maximum, for the CHSH inequality (i.e., in our notation of Eq. (103) it achieves the value of 1/2). The PR box can be defined by the fact that it exhibits perfect correlations for XX, XZ, and ZX observables, and perfect anticorrelations for ZZ. We now assume that local tomography does not hold, and, in particular, that the states are described by one extra non local, "holistic", parameter, which we denote by w N L . Our parameterisation of the state, that is, the equivalent of Eq. (75), now takes the form: The parameter w N L is described as a holistic degree of freedom, as products of local observables are independent of its value. In general, however, a PRM will not be simply a product of local observables, and hence, it is possible that it will indeed depend on this holistic parameter. For the remaining of this section it is more convenient to use a more compact matrix notation for bipartite states and effects, given by: 36 In this notation, the state which realises a PR box looks as follows where w PR N L can be an arbitrary value. By definition of a PRM, the sums P 00 +P 01 and P 00 +P 10 depend only on local parameters -hence, their holistic parameter is zero and we get Preserving probability by PRM reads as Using this, we can write the free parameters for our optimisation problem as follows: Here, c N L corresponds to the holistic parameter.
We can then express our parity reading effects in terms of these matrices R 0 , R 1 and 1 and the free parameters C, as: In particular, the nonlocal parameter for PRM effects amounts to Let us now write the requirement of PRM effects to be positive on the PR-box state. First, notice that (as it should be, since PR box has perfect XX correlations and perfect ZZ anticorrelations). We thus get We see that the positivity of a PRM effect on the PR-box is equivalent to the following condition: Let us now explore the conditions that follow from products of steered states. The form of the unnormalized steered state is given by Eq. (82). Thus, the normalized ones arising from the PR-box state (for each party) are given by We see that these define the vertices of the so-called square bit, as can be seen in Fig. 7. The tomographically local degrees of freedom for the products of steered states are p ij s,LT = s (1) i ⊗s (2) j , i, j = 1, . . . 4. We then denote: Note that, here, the bracket does not mean scalar product, but rather indicates two groups of parameters: the group of locally tomographic ones, and the group consisting of one nonlocal parameter. For steered states, the nonlocal parameter w ij N L must be a linear combination of the local parameters: This follows from noting that: i) the way that local states are combined to give product states must be given by a bilinear function from the local vector spaces into the global vector space; ii) the universal property of the tensor product means that this can be written as a linear function from the tensor product space of the local vector spaces into the global vector space; iii) the local parameters are simply the tensor product space; iv) this means that the value of the non-local parameter is given by a linear functional on the local parameters (i.e., a linear map from the local vector spaces into the reals); v) finally, the Riesz representation theorem means that we can write this as the dot product with some vector h in the local parameter space. We thus have where we recall, that P qr N L are numbers (the values of the nonlocal parameter for PRM effects) given by Eq. (134). Using the above equation together with Eq. (133), we obtain where we have denoted with C LT being the locally tomographic part of C, and c N L the holistic part of C. Now, the positivity of PRM effects on products of steered states means that we require all four terms to be positive for all i, j = 1, . . . , 4. By using Mathematica [71] we find that positivity is satisfied for only one choice of g: To summarise, positivity conditions of the PRM on the PR-box states and products of its steered states reduce to: positivity on PR box state: C LT · p P R s,LT + c N L w P R N L = −1 (145) positivity on steered states: To prove our original claim, the idea is to choose values for C LT , c N L , h and w P R N L such that the above two constraints hold. Our choice is the following: These values for C LT and c N L fix the PRM to take the form: In addition, our choice of h defines the value of the nonlocal parameter for steered states to be We see then that the PR-box state is consistent with the existence of a PRM that satisfies the constraints of OP2. Since performing fiducial measurements on a PR-box state yields PR-box correlations, this shows that I R max = 1 2 for the CHSH inequality, as per Eq. (103). With this we conclude the proof of our claim.
Let us make a final comment on an interesting interpretation for the values of the nonlocal parameter for the pairs of steered states: they count the number of correlations. If both observables have the same value for a given pair of steered states (which happens when Alice and Bob's steered states are the same) then the parameter takes the value 2. When only one of the observables has the same value, then it takes the value 1, and when both observables have the opposite value, then it takes the value 0. This is depicted in Fig. 8.

Discussion
In this paper we have shown that postulating within a theory the existence of particular bipartite measurements has a surprisingly rich set of consequences for the structure of the theory itself. Indeed, we showed that this leads to an infinite hierarchy of constraints on the possible bipartite states. These conditions translate analogously into constraints on the statistical correlations allowed by the theory. In other words, the maximum violation of any Bell inequality by the correlations among fiducial measurements featured by the theory will be subjected to an infinite hierarchy of constraints.
We further explored the consequences of this rich structure for the particular case where there exists a bipartite measurement that can read out the parity of local fiducial measurements. For the case of tomographically-local GPTs, we found that these constraints on the structure of bipartite sates are enough to recover (up to numerical precision) Tsirelson's bound for various inequalities in the CHSH scenario. In addition, we also showed that non-tomographically local GPTs may still reach the maximum algebraic violation of such inequalities (i.e., go beyond Tsirelson's bound) when only the first levels of the hierarchy of constraints are considered. We also noticed that, for inequalities where the maximum quantum violation is not achieved by measuring complementary observables, our technique may also yield values below Tsirelson's bound.
Our initial numerical results led us to formulate a conjecture on the constraints that the existence of a Parity Reading Measurement may yield for tomographically local GPTs: Under the assumption of local tomography, the local observables that are parityreadable satisfy Tsirelson's bound, i.e., they cannot violate Bell inequalities better than quantum mechanics does (with arbitrary measurements).
It is worth mentioning that, after formulating the conjecture, we ran further numerics in other scenarios (all presented in this manuscript) which did not disprove the conjecture.
From looking at Conjecture 1 one can take a step back and further conjecture that quantum theory, among the landscape of GPTs that are locally tomographic, is the theory that displays the necessary balance between its allowed states and effects to feature the following property:

Conjecture 2. Quantum Theory yields the largest violation of any Bell inequality by parityreadable fiducial measurements, within the landscape of possible locally-tomographic physical theories.
Notice that in this conjecture we are comparing correlations obtained from parity-readable measurements in quantum theory vs. in other more generic (yet locally-tomographic) GPTs, and state that quantum theory will always produce correlations that are more non-classical. This is in contrast to Conjecture 1 which compared correlations obtained from arbitrary measurements in quantum theory vs. those that are achieved with parity-readable measurements in an arbtirary tomographically-local GPT. The significance of this is that not all quantum correlations can be achieved with parity-readable measurements alone. However, if measurements beyond parity-readable ones may be used, then it is possible that other GPTs beyond quantum (e.g., Boxworld) can generate correlations that are more non-classical than any that quantum theory may produce. Now, whether Conjecture 2 is true, how to formally express it, and what its consequences are, comprise a topic for future work.
Going beyond the CHSH scenario or GPTs with affine local dimension ≥ 3 is a computationally demanding task. Indeed, the complexity of the optimisation problems to be solved rises considerably with the number of settings and dimension. A complete understanding of the reach of the constraints imposed by parity reading measurements require the further development of analytical and numerical techniques, which are deferred to future work.
Moving forward, one may apply our technique to explore the constraints that entangled measurements beyond parity reading ones may impose. Indeed, Optimisation Problems 2 and 2 ′ may be straightforwardly adapted to study other bipartite measurements. It would be interesting to see if there is a relation between the properties of bipartite entangled measurements and those of the Bell inequalities whose Tsirelson's bound they recover.
More ambitiously, there is the natural question of multi-partite entangled measurements. Would the structure they impose on multi-partite state spaces have special features that we cannot envision from the phenomenology at the bipartite level? We hope such explorations will bring new insight into the structure of states and effect spaces in GPTs, and their non-classical properties. (151) We view the wires in the above diagram, corresponding to objects (i.e. finite dimensional real vector spaces), as representing physical systems. Then, points, such as, S : R → V ⊗ V , represent physical states, general morphisms, such as T 1 : V ⊗ U → V ⊗ W and T 2 : V ⊗ V → W represent physical transformations, and copoints, such as E : W → R, correspond to physical effects. Closed diagrams, such as: that is, elements of the unit interval, are interpreted as the probability of observing effect E given the system was prepared in state S. We denote the unique deterministic effect as: which defines the normalised states S as those satisfying: Note that for a set of effects {E i } i∈I to describe a measurement it must be the case that: One can then see that (finite dimensional) quantum theory defines such a GPT by noting that the set of Hermitian operators for some Hilbert space H forms a real vector space B(H), and that completely positive trace non-increasing (CPTNI) maps between these spaces are a particular class of linear maps between these vector spaces. The other constraints are simple to verify. Similarly, classical stochastic dynamics can be represented as such a GPT. To see this note that stochastic dynamics from some (finite) set X to another (finite) set A can be represented as a particular class of linear maps from the finite dimensional vector space R X to the finite dimensional vector space R A . We will work with the representation of GPTs in which this classical GPT is included as a subtheory. To distinguish it, we will represent the classical systems by thin gray wires, and, for convenience, we will simply label them by the finite set X, A, ..., rather than the vector spaces R X , R A , ... . This is convenient because it allows us to explicitly represent measurement outcomes and setting variables within the diagrammatic representation. For example, a controlled measurement of system V with setting variable X and outcome variable A is denoted as: which must satisfy the constraint: The situation where we perform this measurement M on the system V prepared in some normalisted state S is denoted by: and is simply a stochastic map from the setting variable X to the outcome variable A. The probabilities of obtaining a particular outcome a ∈ A given a setting x ∈ X can be extracted from this map via: We will also find it useful to use certain processes which live in Vect R but which are not part of the subtheory describing the GPT. To visually distinguish these 'non-physical' processes we draw them as shaded objects: (160) Finally, we will define a particular type of linear functionals I. The objects these act on are linear maps from one vector space U to a vector space V . We diagrammatically denote them as: Such a linear functional, I, maps some linear map L : U → V to a real number by: 48 Note that, as Vect R is a compact closed category, it can be readily verified that these linear functionals can always be written as: for some vector space ζ I , vector v I and covector c I .

C Geometric constraints on state and effect spaces
We define the dual of a set of vectors V ⊆ V by: If we then denote the set of states by Ω V and the set of effects by E V then the constraint on state-effect pairs implies 7 the pair of constraints: That is, the effect space is constrained by the state space and vice versa. Now, if we consider the special case of bipartite systems V ⊗ W then this means that: Hence, introducing some bipartite effects for the theory (i.e., enlarging E V ⊗W ) will induce a constraint on the bipartite state space (since E * V ⊗W will potentially be smaller). This constraint, however, whilst necessary is not sufficient to ensure that we will end up with a valid GPT. Considerations of compositionality and convexity further constrain our state spaces. For example, it follows from compositionality and convexity, that any state of the form: where s (i) v ∈ Ω V and s (i) w ∈ Ω W , p i ∈ R + , and i p i = 1, is a valid state for the composite system. This condition -that the bipartite state space contains all separable states -means that Ω V ⊗ min Ω W ⊆ Ω V ⊗W , where, the so called 'min tensor product' is defined as the set of separable states. The same is also true for effects -compositionality and convexity mean that any effects of the form: where e (j) v ∈ E V and e (j) w ∈ Ω W , q j ∈ R + , and j q j = 1, is a valid effect for the composite system. This means that E V ⊗ min E W ⊆ E V ⊗W .
In conjunction with condition Eq. (166), we can use this to obtain an upper bound on the state space as follows: That is, the bipartite state space is bound between the min-tensor product of the local state spaces and the max-tensor of the duals of the local effect spaces. Similarly for the bipartite effect space we obtain:

D Tensor representation
Essentially this representation boils down to picking a suitable basis (and dual basis) for each vector space. We have already seen, via Eq. (43), how a local state of V can be represented as a n + 1dimensional vector. Next we will see how to extend this to arbitrary processes. The simplest way to do so is to introduce a decomposition of the identity into orthogonal rank-1 projectors for each system. There are actually only three relevant systems (and their composites) in the above problem, the two classical systems, β and η, which decompose as: and the GPT system V which decomposes as: The e i are physically realisable effects, however, the v i are simply vectors in V which satisfy e i (v j ) = δ ij . It is important that we do not demand that the v i are physically realisable states, as, for any non-classical GPT, there are insufficient perfectly distinguishable states to span the vector space. A remark on notation: in the following sections we will also denote the unit effect for the β systems as u β , and their fiducial effects by ⃗ 0 and ⃗ 1.
Now, to obtain a tensorial representation of any diagram we simply decompose all of the internal identities in the diagram and attach e i and v j to the free inputs and outputs, for 50 example: A bipartite state, such as s in the above diagram, is therefore represented by a two-index tensor. If this bipartite state is a product state, then it is easy to see that this two-index tensor is simply the Kronecker product of the one-index tensors associated to the two components:
Given a particular linear functional I xy ab we optimise: β := sup s,P abxy I xy subject to the following constraints. Note that in these constraints any sum is implicitly taken over its whole range and there is an implict ∀ for any index which is not contracted: • Parity Reading: • Probabilities for fiducial measurements: • Hierarchy L1: for all π where π is a permutation of {1, 2}. • Hierarchy L2: for all π where now π is a permutation of {1, 2, 3, 4} • Hierarchy L3: for all π where now π is a permutation of {1, 2, 3, 4, 5, 6}.
Notice that the constraints (182) and (183) just say that the are probability distributions for each fixed x, y. Also, the form of tensor F of Eq.
Here p (1) and p (2) are marginals obtained from p(ab|xy), e.g., p (1) (a|x) = b p(ab|xy). Note that the latter does not depend on y due to no-signaling of p(ab|xy), which in turn is enforced by the form of Eq. (190) and the definition of F given in Eq. (178).

E Convergence of state cone hierarchy
In this appendix we demonstrate that the hierarchy that we define does indeed converge to the cone K[P]. The cone K[P] is characterised by the condition: s ∈ K[P] if and only if every diagram with only classical inputs and outputs formed from a finite number of processes must be non-negative.
We now show that this condition is equivalent to our hierarchy.
To begin with, note that any diagram in our theory is constructed by wiring together a finite number of each of: i) the bipartite state, s, ii) the controlled fiducial measurement, F, and iii) the parity reading measurement, P. We call these the generating processes. We can therefore classify diagrams by first representing them in terms of the generating processes, and then counting the number of copies, k, of s that appear.
Next, note that if a diagram has only classical inputs and outputs, then any copy of V that appears in the diagram must have a start point and an end point in the diagram. There is only one generating process which can serve as a start point, namely, the bipartite state s, and either F or P can serve as the end point.
If we have k copies of the state s within the diagram, then these must therefore be wired into the measurements F and P. Every such diagram will factorise into totally connected subdiagrams. Note then, that nonnegativity of the full diagram is guaranteed by nonnegativity of the component subdiagrams. That is, to ensure nonnegativity for every diagram (with only classical inputs and outputs) we must only demand nonnegativity of totally connected diagrams (with only classical inputs and outputs).
It is then simple to see that the totally connected diagrams with k copies of s come in two forms. Firstly, those in which there are k copies of P which the states s are wired to, and secondly, those in which there are k − 1 copies of P and two copies of F. If there were more than two copies of F then the diagram would necessarily not be totally connected. Clearly, the first of these is captured by condition (68) and the second by the condition (67) of level k in the hierarchy. Therefore, our hierarchy of constraints fully charaterises the cone K[P].