Informationally restricted correlations: a general framework for classical and quantum systems

We introduce new methods and tools to study and characterise classical and quantum correlations emerging from prepare-and-measure experiments with informationally restricted communication. We consider the most general kind of informationally restricted correlations, namely the ones formed when the sender is allowed to prepare statistical mixtures of mixed states, showing that contrary to what happens in Bell nonlocality, mixed states can outperform pure ones. We then leverage these tools to derive device-independent witnesses of the information content of quantum communication, witnesses for different quantum information resources, and demonstrate that these methods can be used to develop a new avenue for semi-device independent random number generators.


Introduction
Consider an experiment of the kind illustrated in Fig. 1, where a sender, Alice, selects an input x ∈ {1, . . . , n X }, encodes it into some physical system and transmits it to a receiver, Bob. Bob performs on the incoming system some measurement, represented by an input y ∈ {1, . . . , n Y }, and gets an outcome b ∈ {1, . . . , n B }. This prepare-and-measure experiment is ubiquitous in physics and forms the basis of many communication systems.
The transmission of physical messages between Alice and Bob serves to establish certain correlations between them. These correlations can be fully characterised by the set of probabilities p(b|x, y) which represent how, for a given measurement y performed by Bob, his outcome b depends on Alice's input x. In full generality, we can associate to each input x selected by Alice a quantum state ρ x and to each measurement y selected by Bob a positive operator-valued measure (POVM) {M b|y } b , so that we can write p(b|x, y) = Tr ρ x M b|y .
(1) The special case where Alice and Bob are manipulating classical systems, instead of quantum ones, can be treated analogously by taking the states and measurements to be diagonal in the same basis: M b|y = m p(b|y, m)|m m|, where the variable m denotes the possible values of Alice's classical message.
In this work, we are interested in characterising what kind of correlations between Alice and Bob, i.e., which set of probabilities p(b|x, y), are possible under the sole restriction of some constraint on the communication capabilities of the classical or quantum systems ρ x emitted by Alice.
To date, the most commonly considered communication constraint in this setting has been a bound on the Hilbert-space dimension d of the emitted quantum systems (corresponding to the number of different possible messages m in the classical case). In the last two decades, a large body of works has investigated the interplay between correlations and dimension in this setting [1][2][3][4][5][6][7][8][9][10][11]. This line of work led, e.g., to the notion of dimension witnesses [5,12] and to semidevice-independent protocols [13], such as randomness generation [14], quantum key distribution [15], and self-testing [16]. Evidently, a quantum or clas-sical d-dimensional system can carry at most log 2 d bits of information and thus a bound on the dimension represents an information constraint. However, the physical dimension does not provide a complete picture of the concept of information. For instance, there are many systems of dimension d > d that do not carry more than log 2 d bits of information. Furthermore, in practical semi-device-independent protocols, assuming an exact bound on the dimension may be problematic to justify (a fact that has partly motivated other recent approaches [17][18][19][20]). A more satisfying and practically relevant approach may be to constrain the communication in terms of a continuous information measure.
Following [21], we specify here the communication constraint on the physical systems ρ x received by Bob as an upper bound on the guessing probability of the input X 1 , where the maximisation is taken over all possible POVMs {N x } x on the physical system that Bob receives. This guessing probability represents the optimum average probability with which Bob would correctly guess Alice's input x if he were to perform an ideal POVM on the incoming messages ρ x , assuming that Alice selects each input with prior probability q x [22]. The guessing probability P g (X|B) can take any value from P g (X|B) = max x q x when the states are the same and hence carry no information about Alice's input (in which case Bob's best guessing strategy is to output the most probable input x according to q x ), up to P g (X|B) = 1 when they are perfectly distinguishable. Different communication restrictions on the messages can be specified by the choice of the bound G, as well as the input probabilities q x . Equivalently, one can express the communication restriction (4) as an upper bound I(X|B) ≤ α on the information measure defined in term of the min-entropies H min (X) = − log 2 max x {q x } and H min (X|B) = − log 2 P g (X|B) .
This quantity, expressed in bits, ranges from I(X|B) = 0 when the states carry no information about Alice's input, up to I(X|B) = log 2 (n X ) bits, when they are perfectly distinguishable and chosen equiprobably, i.e., q x = 1/n X . There exist in principle a number of different other information measures that we could consider (see e.g. [23]) but the one we choose has a clear operational meaning and is convenient to work with. 1 X and B are random variables.
We emphasise that q x does not represent the actual prior from which Alice selects her input. Instead, it is a part of the assumption on Alice's source. Indeed, we are interested here in constraining conditional probabilities p(b|x, y) which therefore do not depend on any prior probabilities with which Alice's input x and Bob's inputs y are selected. To constrain these conditional probabilities p(b|x, y) we make a certain assumption about the source, specifically about the information-capacity of the ensemble of states {ρ x } it prepares. This information-capacity can be defined in various ways. The definition we chose here can be thought of as a fictious game: how well the classical variable x could correctly be identified by Bob if it were encoded by Alice in the state ρ x and chosen with probability q x . In the same way that the optimal measurement performed to guess x in this fictious game is not necessarily the same as the actual measurements taking place in Bob's measurement apparatuses and leading to the conditional probabilities p(b|x, y), the prior probabilities q x need not be the same as the prior probabilities p x used by Alice to select her input in any actual scenario or protocol involving the conditional probabilities p(b|x, y). In particular, a given scenario, say a DIRNG protocol where Alice's select her input with some fixed probabilities p x , can be analyzed using different choices of q x , this simply correspond to different assumptions about the source.
Note that one can also completely eliminate q x from the analysis by choosing the uniform prior q x = 1/n X (where n X denotes the number of inputs of Alice). For a bound of the form I(X|B) ≤ α, this corresponds to the strongest assumption on the source in the sense that I(X|B) uni ≤ α implies I(X|B) bias ≤ α for any choice of biased distribution q x , as shown in [24].
Finally, we remark that instead of viewing the bound (4) as characterizing the preparations of Alice, we can alternatively view it as a constraint on the channel relating Alice to Bob. Indeed, = 1 − G can be understood as an upper-bound on the average 2 error through which a classical message of size n X can be communicated in one shot through the channel for whatever encoding Alice may choose [25].
We develop here a versatile toolbox for characterising the set of probabilities p(b|x, y) that are possible given arbitrary information constraints P g (X|B) ≤ G (or, equivalently, I(X|B) ≤ α). Our approach is fully general and does not make any assumptions about the states and measurements beyond the information constraint, and in particular no assumptions about their dimension. In the classical case, we provide a characterisation of the set of informationally restricted correlations in terms of linear programming and in the quantum case through a hierarchy of semidefinite programming relaxations. We also show, in analogy with the dimension bounded case, how to apply our methods to construct device-independent witnesses of communication (quantified in terms of our information measure), resource inequalities for classical and quantum systems carrying one bit of information, and semi-device-independent random number generation (RNG) protocols. In particular, we will show concrete examples of high-rate RNG and also demonstrate that data obtained in RNG experiments assuming a 1-qubit bound can be recycled to certify the same amount of randomness under the strictly weaker assumption of a 1-bit information bound.
Our work can be seen as a follow-up to Ref. [21], which originally proposed to replace the dimension bound in semi-device-independent scenarios by the information bound P g (X|B) ≤ G (or I(X|B) ≤ α) considered here. However, [21] implicitly modelled the correlations established between Alice and Bob as statistical mixtures of correlations p λ (b|x, y) obtained by measuring pure states: The guessing probability constraining the communication was then defined as the following averaged quantity over the classical shared variable λ: x .
(9) Similarly, in the classical case, the correlations between Alice and Bob were modelled as statistical mixtures of correlations established by sending deterministic messages m for given x and λ.
The sets of such pure state correlations, in the quantum case, or deterministic correlations, in the classical case, compatible with a given communication constraint P g (X|B) ≤ G are easily seen to be particular subcases of the more general correlations that we consider here. Indeed, they can be obtained by assuming the states and measurements in (1) to take the following specific forms in the quantum case, and M b|y = λ,m p(λ)p λ (b|y, m) |λ λ| ⊗ |m m| (14) in the classical case, which recovers both the convex sum (7) and the average guessing probability (9). Interestingly, while in more traditional works on correlations, such as in the study of Bell nonlocality [26], statistical mixtures of pure states (or of deterministic correlations) generate the full set of correlations, they only represent a proper subset of the possible correlations in our information-restricted setting. This is because given a set of arbitrary states ρ x satisfying the information constraint P g (X|B) ≤ G, one can generally not re-interpret them as a mixtures of pure states without increasing their distinguishability, hence potentially violating the condition P g (X|B) ≤ G. The formulation we consider here is fully general and does not make any implicit assumption on the structures of the states appearing in the definition (1). Throughout the paper, we will compare our results to those that would be obtained under the pure-state approach of [21] in order to illustrate the differences in the two formulations.

Basic properties and simple scenarios
In the following, we refer to the prepare-and-measure scenario of Fig. 1, with n X inputs for Alice, n Y inputs for Bob, and n B outputs, as a (n X , n Y , n B )-scenario. Given an information bound specified by a probability distribution q x and a number G ∈ [max x {q x }, 1], we denote by Q the set of quantum correlations compatible with that information bound, i.e., the set of probability distributions p(b|x, y) for which there exist states ρ x and measurement operators M b|y defined on some Hilbert space of arbitrary dimension d that satisfy the Born rule (1) and the constraint (4). Similarly, C denotes the set of classical correlations, i.e., satisfying in addition (2)-(3). By plugging this specific form for the states and measurements in (1) and (4), classical correlations can also be defined as those that can be written as and satisfying the information constraint since the optimal POVM {N x } in this case is the one that reads the classical message m and outputs the value x that maximises q x p(m|x).
The sets Q and C are easily seen to be convex, using a construction akin to (11) and (12). That is, we can without loss of generality assume that the states sent by Alice and the measurements performed by Bob depend on some shared randomness λ (independent of x).
As a consequence, when writing the correlations explicitly as a convex sum (7), we can without loss of generality assume Bob's measurements to be extremal conditioned on λ: if the measurements of Bob depend on some local randomness, we can always incorporate it instead in the shared randomness λ. In the classical case C, this means that we can without loss of generality assume Bob's classical response p λ (b|y, m) to be deterministic, i.e., such that p λ (b|y, m) ∈ {0, 1}. However, as noted earlier, we cannot without loss of generality assume the states to be pure (or deterministic in the classical case) when conditioned on λ as rewriting a mixed-state as a convex combination of pure states could violate the original guessing probability bound.
The sets Q and C satisfy certain basic inequalities. Obviously, since the p(b|x, y) are probabilities, they must by definition satisfy the positivity and normalisation conditions and b p(b|x, y) = 1, ∀x, y.
In addition, since post-processing cannot improve the distinguishability between messages and since all measurements {M b|y } b of Bob can be viewed as (typically suboptimal) information-extraction POVMs, it holds that since, as in (16), when Bob gets the result b when he performs the measurement corresponding to input y, his best guess of x is the value that maximises q x p(b|x, y). This last constraint can explicitly be rewritten as a series of linear inequalities We remark that, though it is harmless to specify them, not all of the inequalities (20) are always relevant as they may already be implied by normalisation and positivity of the probabilities alone (as well potentially as constraints specific to C and Q). Precisely which ones are redundant depends on the upper bound G chosen. The instances with all the components of x equal (x 1 = x 2 = · · · = x n B ) in particular are always redundant as the left side of (20) is in these cases always upper bounded by the smallest possible value, max x {q x }, of the guessing probability. At the opposite extreme, (19) always becomes redundant entirely for sufficiently high G when Alice's device has more inputs than Bob's has outcomes. This, supposing we label Alice's inputs so that q 1 ≥ q 2 ≥ · · · ≥ q n X , is because the left side of (19) is also always bounded by which is strictly less than one if Alice has more than n B inputs that are used with nonzero probability. The set of correlations satisfying Eqs. (17), (18), and (19) is a polytope G. The polytope G can be interpreted as the set of correlations attainable under informational restrictions when no assumption is made on the underlying physical theory. Therefore, recalling also that the classical set is contained in the quantum set, we have the inclusions C ⊆ Q ⊆ G.
An important first step in semi-device-independent approaches is to establish that one can distinguish between classical and quantum correlations, i.e., that C ⊂ Q. We show here below that in the simplest case of communication experiments with only two inputs on Alice (n X = 2), the classical, quantum and theoryindependent sets are identical (C = Q = G). Notably, this stands in contrast to other established approaches to semi-device-independence [17,18]. Later, we will find that C ⊂ Q indeed is possible when Alice has more than two inputs. In sections 3 and 4 we describe how to characterise the classical set and quantum set, respectively, in a general and systematic manner.

C = Q = G when Alice has n X = 2 inputs
We show that for n X = 2 it holds that C = Q = G by proving that every p(b|x, y) ∈ G admits a classical model. To this end, note that the constraints Eqs. (17)- (19) are decoupled with respect to y. In other words, for each individual value of y, we obtain a separate polytope and the full set of probabilities is just the Cartesian product of the n Y identical polytopes corresponding to the individual values of y. We derive the vertices of these polytopes in Appendix A. For n B = 3 (which is representative), up to permutations of Bob's outputs they are v 1 (y) = 1 0 0 where we use a matrix notation v j (y) = p(1|1, y) p(2|1, y) · · · p(1|2, y) p(2|2, y) · · · (26) to summarise the probabilities p(b|x, y) defining each vertex v j . The vertices for n B = 3 are trivial variations of those above: for n B > 3 the vertices are the same except with additional columns of zeros while for n B < 3 we simply discard the vertices that have more than n B columns with nonzero entries in them.
Crucially, all the vertices v 1 (y)-v 4 (y), including all their permutations, can be generated by performing different measurements on the same two commuting (classical) states Furthermore, any convex mixtures of vertices of the kind above, which is to say, any probability p(b|x, y) satisfying the conditions (17)- (19) above, can be generated by performing the corresponding convex mixtures of POVMs on Bob's side. We conclude that Eqs. (17), (18), and (19) completely characterise both C, Q and G.

Inequivalence of general correlations and pure-state correlations
Following [21], we denote by Q pure ⊆ Q the subset of Q consisting of convex combination of pure-state correlations (8) and C det ⊆ C the subset of C consisting of convex combinations of deterministic classical correlations (10). As we show below, already for the simplest communication scenario (n X = 2), we can distinguish between Q and Q pure as well as between C and C det , i.e., Q pure ⊂ Q and C det ⊂ C. Note, though, that the relation between Q pure and C is more complex. We will see that in the simple scenario below that Q pure ⊂ C. But in other scenarios one can have correlations in Q pure that are outside C so that the two sets intersect, but none is strictly contained in the other. This justifies looking at the larger quantum set Q, which by definition always satisfies C ⊆ Q and thus can never be outperformed using classical correlations. Before looking at the general n X = 2 case, let us first consider the exceptional situation that Alice's inputs are equiprobable (q 1 = q 2 = 1/2). The states (27) and (28) become Consider now a deterministic classical strategy with one bit of shared randomness. Specifically, Alice receives either λ = 1 with probability p(1) = 2(1−G) or λ = 2 with probability p(2) = 2G − 1. If λ = 1 Alice prepares the state |0 0|, while if λ = 2 Alice prepares the state |x x| depending on her input x ∈ {1, 2}. This strategy generates the same states as (32) and (33) on average and the average guessing probability is still G. Thus, all correlations in G can be obtained and one finds no difference between the various sets: In contrast, whenever the prior is biased (q 1 = q 2 ), we find that the pure-state correlations and the general correlations are inequivalent (see Fig. 2). Considering the scenario (n X , n Y , n B ) = (2, 1, 2), the correlations can be characterised in terms of the expectation values for x = 1, 2, where we have omitted y due to its fixed value. In Appendix B, we show that the nontrivial facets of C and Q are in terms of the guessing probability bound G. The facets of C det are likewise straightforward to derive due to the small number of possible deterministic strategies. We do this in Appendix B and find that the nontrivial facets are where q max = max(q 1 , q 2 ) and q min = min(q 1 , q 2 ). Whenever q 1 = q 2 this bounds a strictly smaller set than (35). Finally, we derive the exact boundaries of the set Q pure in Appendix B. Unlike the classical sets and Q, this set is not a polytope. Aside from the trivial constraints |E x | ≤ 1, it is bounded by an infinite family, of linear inequalities, for parameters c 1 and c 2 satisfying c 1 + c 2 = 1 in the range q min ≤ c 1 , c 2 ≤ q max . This set is larger than C det but smaller than C and Q. Note that at the extreme c 1 = q 1 , (37) reduces to (35). Hence, two flat parts of the boundary of Q pure (see Fig. 2) coincide with the nontrivial facets of Q.

Characterising classical correlations
In this section, we explain how one can systematically determine the boundaries of the classical set C, which is a polytope; the characterisation of the deterministic set C det was already addressed in [21]. We then apply our method to explicitly derive the boundaries of C in the (3, 2, 2) scenario assuming Alice's inputs are chosen equiprobably, finding that C is strictly larger than C det in this case. This differs from the case with two inputs considered earlier, where C and C det were only found to be different when Alice's inputs are not equiprobable. Finally we also point out how one can, alternatively, generally test by linear programming whether a correlation is in C or not without explicitly needing to determine its boundaries.

General method
The classical set C is, as mentioned above and as we have seen explicitly for n X = 2 in the previous section, a polytope and we could in principle characterise it by determining its facets for any given upper bound G on the guessing probability. This direct approach would, however, require us to rederive the facets of C for each value of G that we may be interested in. To avoid this we instead consider a related but different set, which we call C + , of possible pairs p(b|x, y), G of probability distributions p(b|x, y) and guessing probability bounds G that are compatible with (15) and (16), which we repeat here for convenience: Casting the problem in this way allows us to derive the boundaries of the classical set while leaving G as a free variable.
The set C + is clearly convex, as it is easily seen that q p 1 (b|x, y) To characterise it, it is thus sufficient to characterise its extreme points and take their convex hull.
As explained at the beginning of Section 2, remember that the extremal points of C have deterministic response probabilities for Bob: p(b|y, m) ∈ {0, 1}. If we fix such a deterministic response for Bob, the probabilities p(b|x, y) are then entirely determined by the probability distribution p(m|x) of Alice's messages. Those are simply constrained by which represents a finite set of linear inequalities for the set M + of possible pairs p(m|x), G of message probabilities and guessing probability bounds. The set M + is thus a polyhedron, i.e., an object like a polytope except that it is not necessarily bounded 3 . Explicitly, this is a set P = {p} of points that can be generated from a finite number of vertices v i ∈ V and conic generators w j ∈ W, i.e, or, more explicitly, the set of points {p} that can be expressed as Provided that the number of possible messages m is limited to a finite number, the vertices and conic generators of M + can be determined using software such as PORTA or PANDA [27]. In Appendix C we prove that every pair p λ (b|x, y), G λ can be constructed with a message of size 2 n X −1 without loss of generality. In practice, however, the number of necessary messages may be considerably less than this in general: in cases with two or three inputs where we explicitly determined the vertices we never found that the number of necessary different messages exceeded the number of inputs n X .
Once the vertices and conic generators p(m|x), G of M + have been obtained, one can generate all extreme points p(b|x, y), G of C + using (38) for each of the finite number of possible deterministic distributions p(b|y, m) for Bob. We thus find that C + is described by a finite number of vertices and conic generators, i.e., it is a polyhedron. Solving the facet enumeration problem, which again can be done in software provided that the problem is not too large, yields a finite number of inequalities that completely characterises the set of points p(b|x, y), G compatible with classical stochastic communication.

Boundaries of C in the (3,2,2) scenario
We found earlier, in Section 2.2, that the classical stochastic and deterministic sets C and C det are always the same if Alice has two equiprobable inputs. The (3, 2, 2) setting is therefore the smallest in which we could hope to find that C and C det are different even if Alice's inputs are chosen with the same probabilities (q x = 1/3). This is indeed what we find for certain values of the upper bound G that we impose on the guessing probability.
We applied the method we described in the previous subsection to find the facets of C + in the (3, 2, 2) setting with q x = 1/3. In terms of the correlators E xy = p(1|x, y) − p(2|x, y), in addition to the trivial conditions ±E xy ≤ 1 and G ≥ 1/3 its facets, up to relabellings of the inputs and outputs, are For comparison, the facets of the deterministic version of the set, which we could call C + det , are 4 We see here that C + and C + det share two nontrivial classes of facets. Of these, (47) and (50), which we can rewrite as are instances of the constraints (20) that we pointed out apply regardless of the underlying physical theory in Section 2. They are not always facets of the sets C and C det with G fixed due to Alice having more inputs than Bob has outputs in this setting: in particular they become redundant if G is larger than 2/3. The other boundary (46) and (48) common to the two sets, by contrast, is a nontrivial facet of both C and C det for all 1/3 ≤ G ≤ 1.
The only difference between the stochastic and deterministic classical sets is the class of boundaries (49) unique to C det . Eq. (49) nontrivially constrains the correlations for any G < 5/6 but becomes redundant for G ≥ 5/6. This tells us that C and C det coincide if G ≥ 5/6 but that C det is a strictly smaller set than C for G < 5/6 in the (3, 2, 2) setting with equiprobable priors.

Membership testing by linear programming
While knowing the boundaries of C is useful for certain purposes, it is possible to solve the basic problem of testing for membership in C without explicitly needing to derive its boundaries. Given a bound G on the guessing probability, determining whether or not a given behaviour p(b|x, y) is contained in the corresponding classical set C is equivalent to determining whether the pair p(b|x, y), G is contained in the set C + that we introduced in the previous subsection. This amounts to determining whether p(b|x, y) and G can respectively be expressed as and bounded 5 by averages of the respective components of vertices p λ (b|x, y), G λ of C + . Recalling how we generate the vertices of C + from those of M + in the previous subsection, we may substitute every vertex probability p λ (b|x, y) in (52) by where p λ (m|x) is a vertex probability of M + and p λ (b|y, m) is a deterministic response function. Furthermore, we may limit the number of messages to an alphabet of size n M = 2 n X −1 without loss of generality. This allows us to express the problem above as with p λ (b|y, x) ∈ {0, 1} and where p λ (m|x), G λ is a vertex of M + , for all λ.
There are a finite number n K = n n M ·n Y B of possible deterministic response functions on Bob's side. Let us denote these p k (b|y, m), identified by an index k taking one of n n M ·n Y B distinct values, and group the 5 If we follow the exact formulation in the previous subsection then, as we point out in Appendix C, C + has one conic generator p(b|x, y), G = (0, 1) in addition to its vertices which can be added to any point in C + to increase its guessing probability bound component. Eliminating this conic generator results in (53) being an inequality. remaining terms by k. Defining Λ k as the set of λs appearing in the problem above for which we can rewrite our problem as where and p(λ|k) is defined in such a way that p(k)p(λ|k) = p(λ).
The reexpression (58) and (59) of our problem is superficially the same as (55) and (56) except that now there is a known finite number of the indices k and m, while the pairs p k (m|x), G k are no longer necessarily vertices of M + . The p k (m|x), G k s are still necessarily contained in M + , however, since M + is convex, and thus by definition satisfy together with Using these constraints in place of (61) and (62) and then eliminating the G k s simplifies the problem to where p(k) and p k (m|x) are probability distributions.
To turn this into a linear programming problem we combine p(k) and p k (m|x) into a joint distribution, which satisfies the marginal condition that Recalling that (69) is a shorthand for n n B M linear inequalities, determining whether there exist n K · n M · n X weights p(k, m|x) that satisfy Eqs. (68)-(72) for a given behaviour p(b|x, y) is a linear programming feasibility problem.
We remark, finally, that if we drop the marginal constraint (72) and combine (k, m) into a new variable which we rename m, we recover the definition of the classical set C that we started with in Section 2. This confirms that we did not inadvertently relax the problem when we replaced the vertices p λ (m|x), G λ of M + with the conditions (63) and (64) on p k (m|x). Deriving the linear programming feasibility problem following our characterisation of C + , however, allows us to put a finite upper limit n K · n M · n X , with n K = n n M ·n Y B and n M = 2 n X −1 , on the number of weights p(k, m|x) that we need to consider.

Characterising quantum correlations
In this section, we develop tools for the characterisation of informationally restricted quantum correlations. In Section 4.1, we develop an efficient method for optimising any given linear witness from inside the set of informationally restricted quantum correlations Q. Hence, this method enables lower bounds on quantum correlations. In Section 4.2, we present a hierarchy of semidefinite relaxations of Q (and of Q pure ). This allows us to establish increasingly precise necessary criteria of a given correlation admitting a quantum model. In Section 4.3, we apply these methods to the simplest relevant communication experiment and use it to device-independently quantify the information content of a quantum ensemble. In Section 4.4, we focus on the case of one bit of information and prove several strict resource inequalities involving two-dimensional systems, pure-state informationally restricted systems and general informationally restricted systems, in both the quantum and classical setting.

Lower bounds: alternating convex search method
In many situations arising in the study of quantum correlations, it is possible to use alternating convex searches in order to optimise a linear functional of the quantum correlations (a linear "witness"), such as in the case of Bell inequalities [28,29] or quantum dimension witnesses [30]. Such a search amounts to attempting to solve the full optimisation problem (over both states and measurements) by repeatedly optimising over the states and measurements separately in an alternating manner. The advantage of such an approach is that often each separate optimisation, over states (measurements) for fixed measurements (states), is convex and can be solved by standard methods. While alternating convex search often works well in practice, it is not guaranteed to converge and therefore only offers lower bounds on the optimal quantum correlations.
In order to optimise a linear witness over the set of informationally restricted quantum correlations, one encounters a less straightforward situation. For a fixed set of states, it is clear that the optimisation over the set of measurements can be evaluated as a semidefinite program (SDP). In contrast, for a fixed set of measurements, the optimisation over the set of states is less obvious due to the relevance of the informational restriction. Evidently, the optimisation must be performed under the constraint P g ≤ G which itself involves a maximisation over the extraction POVM {N x } x . We show how this difficulty can be overcome so that lower bounds on informationally restricted quantum correlations can be efficiently computed through alternating implementations of SDPs.
Consider that we are given a linear witness A, in general written as for some real coefficients c xyb , and asked to maximise it over the set of informationally restricted states ρ x and measurements M b|y . For this purpose, let us define an auxiliary positive semidefinite operator σ with the property that This allows us to place the following upper bound on the guessing probability: where we have used that x N x = 1. The introduction of σ stems from considering the semidefinite dual of the guessing probability and does therefore not constitute a relaxation of the problem [31]. Its advantage is that it allows us to treat the informational restriction as a tracial constraint enforced through the additional semidefinite constraints in (74). We may therefore cast the maximisation of the linear witness A, for a given bound G on the guessing probability, as the following optimisation problem: If we fix the measurement operators {M b|y }, this problem becomes an SDP for the states {ρ x } and σ.
Conversely, if we fix the states {ρ x } and σ, it is an SDP for the measurement operators {M b|y }. We can thus alternate these SDPs to obtain a lower bound on the optimal value. Note that it is implicit that these SDPs must be performed in a given Hilbert space dimension, but one may find successively better lower bounds by increasing the dimension. The usefulness of this method is exemplified in Section 4.3. Note that the above approach cannot be applied to the pure-state set Q pure , since the condition ρ x ≥ 0 would have to be replaced by ρ 2 x = ρ x , which is nonlinear. The existence of a practical algorithm lowerbounding the general quantum set Q is another advantage of our general formulation.

Upper bounds: hierarchy of semidefinite relaxations
The idea used in the previous section, of introducing the auxiliary operator σ, can be further leveraged to systematically obtain increasingly precise upper bounds on the informationally restricted set of quantum correlations. We now present a hierarchy of semidefinite relaxations for the set Q, which is based on the tracial variant [32,33] of the NPA non-commutative polynomial optimisation hierarchy [34,35].
Let us first slightly rewrite the problem (76) as where compared with (76), we have replaced the constraint ρ x ≥ 0 by ρ x − ρ 2 x ≥ 0, added the redundant constraint G1 − σ ≥ 0, and assumed, without loss of generality if we do not bound the dimension d of the Hilbert space, that the measurements {M b|y } b are projective. The optimization problems (76) and (77) are entirely equivalent, but the second formulation is better suited for the tracial non-commutative opti-mization method of [32] 6 , which we now explain how to apply.
Let w denote a monomial, i.e. a product, of the n X + 1 + n B n Y basic operators ρ 1 , ρ 2 , . . . , ρ n X , σ, and M 1,1 , . . . , M n B |1 , M 1|2 , . . . , M n B |n Y . We refer to the number k of such basic operators in the product w as the degree k of this monomial. By convention, the identity operator 1 is the monomial of degree 0.
Let W k denote the set of all monomials of degree at most k and let n(k) denote the number of such monomials. Linear combinations p = w∈W k α w w of the monomials then correspond to polynomials of degree k in the basic operators.
Let L be a linear functional that assigns to each monomial w in W 2k of degree 2k the real number L(w), and thus which assigns to each polynomial p = W ∈W 2k α w w of degree 2k the real number L(p) = w∈W k α w L(w). Given such a functional L, we define • the moment matrix Γ k (L), as the matrix of size n(k) whose entries are indexed by monomials u, v ∈ W k and are equal to • the localizing matrix Γ k (L; p) associated to a polynomial p of degree two or less, as the matrix of size n(k) − 1 whose entries are indexed by monomials u, v ∈ W k−1 and are equal to Consider now the following problem for k ≥ 1, for any polynomials p, p of degree 2k,

where in the last condition the identity Tr[p] = Tr[p ]
is evaluated by taking into account the polynomial identities M b|y M b |y = δ bb M b|y and b M b|y = 1 satisfied by the measurement operators. This optimization problem is an SDP (since it amounts to optimize n(2k) variables, the values L(w) of the monomials w of degree less than 2k, subject to linear constraints and to the positivity of matrices whose entries are linearly related to these variables). 6 Specifically, the constraints ρx − ρ 2 x imply not only that ρx ≥ 0 but also that ρx ≤ 1. Together with the constraint σ ≤ G1 this guarantees that the feasible set of (77) satisfies the archimedean assumption and that the entries of the moment matrices stay bounded. Assuming that the measurements are projectives instead of general POVMs dispenses us from introducing localizing matrices associated with them.
Clearly any solution of (77) defines a solution of (78) through L(w) = Tr[w] 7 . Thus the problem (78) represents an SDP relaxation of (77) approximating the set Q from the outside. By increasing the relaxation level k, one obtains a hierarchy of increasingly constraining conditions on Q.
Note that the above method can also be used to characterise the set of pure-state quantum correlations Q pure by replacing in (76) the positivity constraints ρ x − ρ 2 x ≥ 0 by the polynomial constraints ρ x = ρ 2 x , resulting in the simpler relaxation for any polynomials p, p of degree 2k, where the last condition is evaluated using, in addition to the polynomial constraints on the measurement operators, the conditions ρ x = ρ 2 x . We remark that by additionally imposing that all operators commute, we can also bound classical correlations via the above SDPs. This can be useful in scenarios that are too large to be efficiently treated with the methods developed in Section 3.
Finally, let us stress that the series of SDP relaxations that we introduced are relaxations. Convergence to the exact quantum set is not guaranteed in the limit k → ∞, see [32,33,36] for more details about the general properties of the SDP hieararchy for non-commutative tracial optimization.

Device-independent witnessing of the information content of quantum communication
Consider a quantum communication experiment in which we do not know the amount of information communicated from Alice to Bob. Is it possible to determine a lower bound on the amount of information that Alice must send to Bob given only the observed correlations p(b|x, y)? This amounts to the task of device-independently testing the information content of quantum communication. Using the tools of the previous sections, we exemplify such deviceindependent certification in the simplest relevant communication experiment.
As we have seen in Section 2.1, there can be no quantum advantage when the scenario only features two states. Moreover, no advantage is possible when Bob only has a single input, because his measurement could then be performed already in Alice's lab and the outcomes simply relayed to Bob as classical communication (since performing a measurement cannot increase the guessing probability). Therefore, the simplest relevant scenario in which we expect a quantum advantages is that in which Alice has three states (n X = 3) and Bob has two binary-outcome measurements (n Y = n B = 2). In this scenario, we focus on the linear witness (46) (here labelled A 322 ) corresponding to a facet of the classical polytope under uniform priors (q x = 1/3).
Firstly, we apply the SDP hierarchy for the set Q pure to find upper bounds on A 322 as a function of the guessing probability (information). We implemented the SDP relaxation (79) with k = 3 but to simplify the numerical optimisation considered a subset of all SDP and linear constraints. Specifically, we only imposed the positivity of the submatrix of Γ 3 (L) whose rows and columns are indexed by the monomials and the linear constraints L(P ) = L(P ) involving the entries of such matrices. This corresponds to a 98 × 98 moment submatrix Γ and three 25 × 25 localising submatrices Σ x . Evaluating the corresponding SDPs for different informational restrictions, we obtain the red curve illustrated in Fig. 3. Notably, this upper bound is in fact tight, since it coincides with the explicit pure-state quantum strategy reported in Ref. [21] (thus proving its optimality). Similarly, we have also implemented the SDP hierarchy for the general quantum set Q using submatrices of the localising matrices P x 3 (L) based on the same monomial list (81) as for Σ x 3 (L). The obtained bounds on the witness are given by the blue curve in Fig. 3. We observe that for every guessing probability P g ∈ ( 1 3 , 1) \ { 2 3 } we find a larger bound in the general setting as compared to the pure-state setting. In order to show that this gap is not an artefact of the bounds in the general setting not being tight, we have employed the alternating convex search described in Section 4.1 to construct explicit quantum models. The obtained values of the witness are illustrated by the black curve in Fig. 3. We find that for P g ∈ [ 1 3 , 2 3 ], the upper and lower bounds in the general setting accurately coincide. In the interval P g ∈ ( 2 3 , 1) a small gap between the upper and lower bound remains. Nevertheless, our lower bounds exceed the upper bounds for the pure-state setting, thus proving that informationally restricted quantum correlations outperform their P u r e q u a n t u m Information (guessing probability) A322 Figure 3: The witness A322 versus the information (in terms of the guessing probability). The plot displays an upper bound (blue) and lower bound (black) on general quantum models, a tight upper bound on pure-state quantum models (red) and a tight upper bound on classical models (green). As the first two curves coincide in the interval Pg ∈ [ 1 3 , 2 3 ], this part of the quantum boundary is fully characterised. However, in the interval Pg ∈ ( 2 3 , 1], the quantum boundary is not fully characterised but delimited by the blue and black curves. pure-state counterparts. It is interesting to note that for the special case of P g = 2 3 , which corresponds precisely to I(X|B) = 1 bit of information, there is no discrepancy between Q and Q pure .
We can interpret these results in the context of device-independent tests of information. If the information content of the quantum communication is not known, then we may use the upper bound on the quantum correlations (blue curve) to determine a bound on the minimal amount of information required to explain the observed correlations in a quantum model. For example, Ref. [37] experimentally implemented this communication experiment using both qubit and qutrit ensembles and reported a witness value of A qubit 322 = 3.7815 ± 0.0782 and A qutrit 322 = 4.9303 ± 0.1032 respectively. In order to determine the information content of these ensembles (without assuming their respective dimensions), we use our upper bounds on the quantum correlations. Specifically, when the experimental errors are taken into consideration, we certify a quantum information content of at least I(X|B) = 0.98 ± 0.02 bits for the first ensemble and I(X|B) = 1.54 ± 0.05 bits for the second ensemble. Both these results nearly saturate the maximal possible information content of qubit and qutrit ensembles, namely 1 bit and log 2 3 bits respectively.

Resource inequalities for one bit of information
Consider the information restriction I(X|B) ≤ α with α = log 2 d for some integer d ≥ 2. This is a particularly interesting case since it enables a meaningful comparison of classical and quantum correlations to those that can be obtained from d-dimensional classical and quantum communication. Here, we focus on the simplest case of d = 2 (I X ≤ 1 bit) and consider the comparative relation between classical and quantum correlations respectively when obtained from i) communication of two-level systems, ii) one bit of communication in pure-state models and iii) one bit of communication in general models. Let us denote the set of classical and quantum correlations achievable with two-dimensional communication by C dim and Q dim . It is clear that the following two chains of inclusions must be true: The first inclusion in each case follows from the fact that every ensemble of classical or quantum two-level systems can be simulated by classical or quantum ensembles of pure two-level systems under shared randomness 8 . The second inclusion on each line follows trivially from the fact that general classical and quantum models admit deterministic and pure-state models respectively as special cases.
It is interesting to determine which of the inclusions (82) are strict, i.e., which classical and quantum resources are fundamentally different. We first focus on the quantum case and prove that all three resources are inequivalent. Notably, Ref. [21] proved that Q dim ⊂ Q using a construction that involved 16 states. The proofs presented here are simpler, as they only require three states, but inherently different as they are based on biasing the prior probabilities.
Consider again the input/output scenario (n X , n Y , n B ) = (3, 2, 2) and once again the witness A 322 . In the previous section, we saw that for I X ≤ 1 bit (P g ≥ 2 3 ), there was no discrepancy between the general quantum model and the purestate quantum model. In addition, if we restrict to qubits, the witness A 322 reduces to that introduced in Ref. [5], whose maximum is known again to give the same result. However, consider now that we change the prior distribution of Alice's inputs: instead of being uniform, let us choose it as q 1 = q 2 = 2 5 and q 3 = 1 5 . Since H min (X) = log 2 (5) − 1, the guessing probability corresponding to one bit of information is P g = 4 5 . What now are the largest possible values of A 322 under qubits, pure-state models with P g ≤ 4 5 , and general models with P g ≤ 4 5 ?
Since biasing the prior affects the information constraint but not the dimension of the physical system, it follows that the largest value of A 322 remains unaffected when evaluated over qubits. We have which is a tight bound. However, in the case of purestate models and general quantum models, biasing the prior means that Bob already has some knowledge of Alice's input. Thus, we would intuitively expect that the correlations improve as compared to the unbiased case. This intuition can be proven using the tools from the previous sections. Evaluating the respective semidefinite relaxations of the set of quantum correlations, we find that We use alternating convex search to place a lower bound on the witness in the stochastic case: for qubits we achieve A 322 = 3.8284 (saturating (83)), for qutrits we achieve A 322 = 4.2641 and for ququarts we achieve A 322 = 4.4142. The ququart strategy uses one pure state and two mixed states each with spectra (1/2, 1/2, 0, 0). The lower bound obtained with ququarts is sufficient to outperform pure-state quantum models and conclude that Q pure ⊂ Q. Moreover, in order to also show that Q dim ⊂ Q pure , it is sufficient to note that the following strategy based on pure-state quantum communication outperforms the qubit bound. Let Alice prepare the qutrit states It is easily checked (e.g. via an SDP) that the guessing probability is P g = 4/5. Then, let Bob perform compatible measurements {|3 , |1 + 2 } and {|2 , |1 + 3 }. Then, one finds A 322 = 4 which exceeds the qubit bound.
Let us now consider the same problem with classical resources. Using the tools from Section 3, we can straightforwardly show the tight inequalities which immediately assert that informationally restricted classical correlations are more powerful than dimensionally restricted classical correlations; specifically C dim ⊂ C det . However, it still does not determine whether C det is a strict subset of C for one bit of information. This is left as an open problem.

Semi-device-independent random number generation
In the previous section, we have seen how quantum correlations can be bounded in communication experiments in which the only assumption is a bound on the amount of information that the communication carries. Here, we leverage these methods towards application in semi-device-independent RNG. In a first example, we focus on the facet-defining witness A 322 and compute the certified randomness as a function of the information. This allows us to obtain a nearly optimal RNG rate. In a second example, we consider the case of one bit of information and consider the amount of randomness that can be robustly certified under a conventional qubit assumption as compared to that certified under an information assumption. We show that the correlations used in a standard qubit experiment can be recycled to certify the same amount of randomness when the assumption is relaxed to the strictly weaker information assumption.

Randomness versus information
Let us again consider the witness A 322 in (a general) quantum model. In Section 4.3 we obtained the maximal quantum witness value for any information between zero and one bit, corresponding to a guessing probability P g ∈ [ 1 3 , 2 3 ]. Here, we evaluate the extractable randomness in the output of Bob associated to such maximal quantum witness values. Specifically, we consider that Alice and Bob decide to extract randomness from the event corresponding to Alice's third input (x = 3) and Bob's first input (y = 1). Then, the certified randomness is given by the min-entropy H min = − log 2 p * , where p * = max{p(1|3, 1), p(2|3, 1)}, compatible with the observed maximal value of A 322 9 . Using the introduced semidefinite relaxations, we can place an upper bound on p * which translates into a lower bound on the certified randomness. The results are illustrated in Fig. 4. These results can also be accurately matched with upper bounds on the randomness obtained via the alternating convex search method (see Section 4.1). Hence, the bound on the certified randomness is tight (up to solver precision). In Fig. 4, we see that by suitably tuning the information in Alice's communication, one can obtain nearly one bit of randomness (which is algebraically maximal for binaryoutcome measurements). Specifically, at P g ≈ 0.522 we certify approximately 0.995 bits of randomness. Hence, we conclude that nearly optimal randomness can be certified under the information assumption. Notably, for P g ≈ 2 3 , the randomness vanishes. This is due to our choice of setting (x = 3, y = 1). A substantial amount of randomness can be certified for P g ≈ 2

Qubits versus one bit of information
We investigate the comparison between certified randomness under the conventional assumption of qubits and our assumption of informational restriction. This comparison is only meaningful for one bit of information; to which we therefore restrict ourselves. To this end, we focus on a witness that has previously been employed for RNG in dimension bounded systems [38,39], namely a quantum random access code.
In a quantum random access code, Alice receives one of four possible inputs labelled by two bits x = x 1 x 2 ∈ {1, 2} ×2 while Bob has two possible inputs y ∈ {1, 2} and two possible outputs b ∈ {1, 2}. The correlation witness is defined as We analyse this witness in two scenarios, i) Alice sends qubits to Bob (dimension assumption) and ii) Alice sends at most one bit of information to Bob (information assumption). Naturally, since all qubit ensembles carry at most one bit of information, while many higher dimensional ensembles also carry no more than one bit of information, the information assumption is less restrictive than the dimension assumption. It is well known that the optimal value of A RAC using qubits is 1 √ 2 [16]. Using the tools from Section 4, we find that A RAC = 1 √ 2 also is the largest possible value under one bit of information.
Due to the symmetries of the witness A RAC , the choice of event from which randomness is extracted does not influence the amount of randomness certified. We therefore choose the event (x, y) = (1, 1) and employ semidefinite relaxations for informationally restricted quantum correlations to place a lower bound on the randomness as a function of the witness. The results are illustrated in Fig. 5. A nearly optimal value of the witness certifies over 0.2 bits of randomness while also significantly sub-optimal witness values permit a non-zero amount of certified randomness. Then, we consider the same problem under the assumption of qubit communication. To this end, we have used the symmetrised semidefinite relaxation hierarchy of Refs. [8,40]. Up to solver precision, we certify the same amount of randomness as is obtained under the information assumption, i.e. the curve is identical to that displayed in Fig. 5. Moreover, the obtained lower bounds on the randomness are optimal since we can saturate them with an explicit family of quantum models based on qubits. Hence, we conclude that the quantum random access code allows us to certify the same amount of randomness under the strictly weaker assumption of informational restriction as compared to the dimension bounded scenario, while only requiring the experimental realisation of standard qubit strategies.

Conclusion
In this article, we have investigated classical and quantum correlations limited only by the information content of the corresponding classical and quantum communication. This constitutes a departure from conventional dimension bounded communication in favour of an analysis based on entropic quantities. We have presented a complete characterisation of informationally restricted classical correlations in terms of linear programming, thereby generalising the results of [21] based on deterministic communication models. For the set of informationally restricted quantum correlations, we have both developed efficient interior-point search methods and hierarchies of semidefinite relaxations for placing upper bounds on the set. We have applied these tools to device-independently witness the amount of information carried by a classical and quantum ensemble as well as to establish strict resource inequalities for different information resources. Furthermore, we have outlined a new avenue for semi-device-independent quantum information processing based on the information assumption. This was exemplified through the investigation of semi-device-independent random number generation for which we both reported nearly optimal rates and advantages over dimension bounded systems. The results presented in this work provide important tools for analysing informationally restricted classical and quantum correlations.
Our work leaves a number of open problems, some of which we list here. 1) How tight are the bounds obtained through our semidefinite hierarchy for informationally restricted quantum correlations? Can one introduce a semidefinite hierarchy that provably converges to the quantum set? 2) Is there a strict resource inequality for informationally restricted classical correlations for the deterministic versus general communication models? 3) It would be interesting to consider the experimental implementation of semidevice-independent random number generation based on the information assumption. 4) Are there other semi-device-independent protocols that are practical to base on the information assumption? Two obvious candidates to consider for this purpose are quantum key distribution and self-testing.
Finally, we note that the information-restricted approach also can be used as a relaxation method to bound correlations in prepare-and-measure experiments subject to other assumptions, for which methods to bound the set of quantum correlations are not known.
is trivial, Alice's messages depend only on x, and if the guessing probability bound G is anything strictly less than one then the only possibility is that Alice sends the same message in both cases, in which case the resulting probabilities must be the same. Therefore without shared randomness, the correlations set collapses to a line E 1 = E 2 .
In the following we suppose that q 1 > q 2 without loss of generality. If shared randomness is available, then Alice can sometimes send the same message (with associated guessing probability q 1 ) and sometimes send different messages (with guessing probability one) as long as the average guessing probability remains smaller than G.
(153) The extremal probabilities that Bob can generate overall are combinations of (152) with some probability θ and (153) with some probability 1 − θ. We should use which are chosen such that in order to respect the guessing probability bound of G on average. (We could make these inequalities rather than equalities, but this is unnecessary since (153) includes the two extremal points in (152) and any excess in the value of θ could be absorbed into that.) After eliminating two redundant ones this yields six vertices, (E 1 , E 2 ) = (+µ, +1), (+1, +1), (+1, +µ), (−µ, −1), with All probabilities represented by values (E 1 , E 2 ) in this scenario must be convex combinations of these six vertices. In addition to the trivial conditions |E x | ≤ 1, this implies two facet inequalities,

B.2 Characterisation of Q det
The problem is very similar to a quantum set studied in Section 3.1 in [17]. For pure states, the guessing probability associated to the ensemble E = {q 1 , ψ 1 ; q 2 , ψ 2 } is P g (X|E) = 1 2 + 1 2 1 − 4q 1 q 2 ψ 1 |ψ 2 2 . (159) Assuming the guessing probability satisfies P g (X|E) ≤ G for some bound G and rearranging for the inner product gives In the following we derive what this implies for a linear combination of correlation terms E x = Tr[Eψ x ] with −1 ≤ E ≤ 1.
We remark first that the witness W is trivial if the coefficients c 1 and c 2 are not of the same sign because the positivity constraints E x ≤ 1 alone imply which is trivially attained with E 1 = E 2 = ±1 when the coefficients are of opposite signs. We thus concentrate on the case that c 1 and c 2 are both of the same sign. In the rest of this section we suppose without loss of generality that c 1 and c 2 are nonnegative and that c 1 + c 2 = 1. We also suppose for simplicity that q 1 ≥ q 2 .
Bounding W with c 1 and c 2 taken to have the same sign gives where we substituted c 1 + c 2 = 1 in the last line. Combining this with the bound (160) on | ψ 1 |ψ 2 | in terms of G gives The inequality (164) gives a tight upper bound on the witness to the left in terms of the guessing probability assuming Alice sends one of two pure states |ψ x with probabilities q x . To generalise to allow shared randomness we need to take the convex hull of the right side of (164). Fortunately this is straightforward. The right side of (164) is convex if and concave otherwise; this can be determined by computing the second derivative of the family of functions f Q (x) = 1 − 4Qx(1 − x). and eliminating the µ mx s gives γ − x q x ξ x = 0, β + x u x α mx + ξ x ≥ 0, ∀m, u x ∈ {0, 1}. (199) The last step would consist of eliminating the ξ x s. Note first that the instance u x = 0 for all x of (199) gives an inequality β ≥ 0 (200) which does not involve any of the variables ξ x . This corresponds to the (unique) conic generator p(m|x), G = (0, 1) of the polyhedron M + which, in turn, just expresses a property of M + that was already evident from its definition: we can increase the guessing probability bound component G of any point p(m|x), G in M + , by adding any nonnegative multiple of (201) to it, and the resulting point will still be in M + . For the remaining instances of (199), we seek to bound the maximum number of different values of the message index m that can appear in any inequality in the process of eliminating the n X variables ξ x . We rewrite the problem as to make it clear that the initial inequalities (203) all give lower bounds on the ξ x s and the problem can be seen as combining (202) with sums of instances of (203) such that the left side equals x q x ξ x . Eliminating first one of the ξ x s, which consists of combining (202) with all the instances of (203) in which the chosen variable ξ x appears with a nonzero coefficient u x , yields a system of inequalities that each involve only one value of m. The process of eliminating the remaining n X − 1 variables ξ x can then at worst double the number of different values of m appearing in the inequalities at each step. The inequalities we obtain for γ, α mx , and β at the end of this process can thus not involve more than 2 n X −1 different values of the index m. With the the exception of (200) we can write all of them in the form from which we infer that the vertices of M + are strategies p(m|x), G in which no more than 2 n X −1 different messages m are used in each strategy, i.e., in a matrix notation At this point we remark that we have not restricted the number of messages m used overall, which is simply whatever number n M of different values of m we allow to appear in the problem from the beginning, since the 2 n X −1 messages used in each vertex will generally be different for each vertex. Remember, however, that we are not interested in the communication strategies represented by M + themselves but the extremal correlations p(b|x, y) = m p(m|x)p(b|y, m) that can ultimately be generated with them, which also depend on Bob's extremal responses p(b|y, m), and we can use a symmetry of the setting to reduce the communication strategies we need to consider. In particular, both the sets of extremal communication strategies p(m|x), G and of Bob's extremal responses {p(b|y, m)} are symmetric with respect to relabellings of the messages, under which (206) is also invariant. We can hence limit the number of messages n M we need to consider to 2 n X −1 for the purpose of generating the extremal points p(b|x, y), G of C + , as allowing more messages will only result in more ways of generating the same correlations p(b|x, y) through (206).