No-free-information principle in general probabilistic theories

In quantum theory, the no-information-without-disturbance and no-free-information theorems express that those observables that do not disturb the measurement of another observable and those that can be measured jointly with any other observable must be trivial, i.e., coin tossing observables. We show that in the framework of general probabilistic theories these statements do not hold in general and continue to completely specify these two classes of observables. In this way, we obtain characterizations of the probabilistic theories where these statements hold. As a particular class of state spaces we consider the polygon state spaces, in which we demonstrate our results and show that while the no-information-without-disturbance principle always holds, the validity of the no-free-information principle depends on the parity of the number of vertices of the polygons.

in [4] but never fully investigated in all probabilistic theories. The no-free-information principle seems to have not been investigated at all in any other theory than quantum theory.
Amongst these principles, no-free-information principle is conceptually the weakest, with no-broadcasting the strongest: If the no-free-information principle is valid in some GPT-that is, for every non-trivial observable there exists another incompatible with it-then the no-information-without-disturbance principle must also be valid, as a non-disturbing observable would be compatible with every other observable. Furthermore, if the no-informationwithout-disturbance principle is valid and hence no non-trivial observable is non-disturbing, then the no-broadcasting principle has to hold, otherwise we would be capable of using the broadcasting map to create non-trivial non-disturbing observables . We will define three classes of observables, the first one consisting of those observables that always yield a constant outcome independent of the measured state, the second one consisting of those observables that can be measured without any disturbance and the third one consisting of those observables that are compatible with any other observable. We will then characterize these classes, enabling us to show that the properties are different in some GPTs. We will also derive a necessary and sufficient criterion for a GPT to have both the no-information-without-disturbance principle and no-free-information principle valid be valid. Finally, we demonstrate the difference between the three principles by analyzing them in polygon state spaces. The main results of our investigation are summarized in Fig. 1.

II. MOTIVATING EXAMPLE
In this section we will present a simple example to motivate our current investigation. A proper mathematical formulation of the general framework will follow in later sections; in the following example we are going to work with the set B h (H) of square self-adjoint matrices over a finite dimensional Hilbert space H. We denote by 1 the identity matrix and 0 the zero matrix. For A ∈ B h (H), we write A ≥ 0 if A is positive-semidefinite. Let A, B ∈ B h (H), then if A ≥ 0 and Tr(A) = 1, then A is a state and if 0 ≤ B ≤ 1, then B is an effect. We refer the reader to [5] for a more throughout treatment of states and effects and their operational meanings in quantum theory.
Imagine that we have an imperfect state preparation device that is meant to prepare qubits in a state ρ, but may malfunction and prepare a qutrit in a state σ. Moreover we assume that the machine malfunctions with a probability p e , thereby the final state should be a mixture of ρ and σ with probabilities 1 − p e and p e , respectively. This means that the machine is going to output a state Ψ that should formally be given as Ψ = (1 − p e )ρ + p e σ. But how does one understand the mixture of the 2 × 2 matrix ρ and the 3 × 3 matrix σ? And how does one describe the output state-space of such a machine?
Qubits are effectively a spin-1 2 systems and qutrits a spin-1 systems, hence the joint Hilbert space H containing both representations of the group SU(2) is going to be 5 dimensional and divided into two superselection sectors [6] of dimensions 2 and 3, corresponding to the qubit and qutrit respectively. The output state Ψ is going to be a block-diagonal 5 × 5 matrix given as This is hardly a surprise, rather a known property of the superselection sectors. Yet this opens the questions of whether this is the only case when an observable is compatible with every other observable; whether no-informationwithout-disturbance still holds; and whether an observable does not disturb any other observables if it is compatible with them all.
As we saw in this example, we need to at least describe the set of states containing only block-diagonal matrices. For this reason we will work in the GPT formalism as it will provide a unified, cleaner and better suited apparatus for our calculations. III. PRELIMINARIES In the GPT framework we assume that a state space is convex as we want to interpret convex combinations as mixtures of states. To describe observables, we will introduce effects as functions that assign probabilities to states.

A. Structure of general probabilistic theories
A state space S is a compact convex subset of an ordered real finite-dimensional vector space V such that S is a compact base for a generating positive cone V + = {x ∈ V x ≥ 0}. Let V * denote the dual vector space to V, then the effect algebra E(S) ⊂ V * is the set of linear functionals e ∶ V → R such that 0 ≤ e(x) ≤ 1 for every x ∈ S. The zero and the unit effects o ∈ E(S) and u ∈ E(S) are the unique effects satisfying o(x) = 0 and u(x) = 1 for all x ∈ S.
The state space can be expressed as i.e. as an intersection of the positive cone V + and an affine hyperplane determined by the unit effect u on V. Similarly we can define subnormalised states as If dim(aff(S)) = d, we say that the state space S is d-dimensional, and then we can choose V such that dim(V) = dim(V * ) = d + 1. It follows that the effects can be expressed as linear functionals on V such that where the partial order in the dual space is the dual order defined by the positive dual cone V * + = {f ∈ V * f (x) ≥ 0 for all x ∈ V + } of V + . In fact E(S) is then just the intersection of the positive dual cone V * + and the set u − V * + . We say that a non-zero effect e ∈ E(S) is indecomposable if a decomposition e = e 1 + e 2 for some effects e 1 , e 2 ∈ E(S) is possibly only if e 1 and e 2 are positive scalar multiples of e [7]. The indecomposable effects are exactly the ones that lie on the extreme rays of the positive dual cone V * + . When dealing with systems composed of several systems we have to prescribe a procedure for how to construct a joint state space of the composed system. Mathematically, this amounts of specifying a tensor product. We are going to use a tensor product only in cases where the other state space is classical. Therefore, there is a unique choice known as the minimal tensor product [8].
Definition 1. Let S 1 , S 2 be state spaces, then their minimal tensor product, denoted as S 1⊗ S 2 , is given as

B. Observables and channels
In this section we will introduce the main objects of interest to us -observables, channels and compatibility. We will begin with observables and their compatibility, and build our way towards channels.

Definition 2.
An observable A with a finite outcome set Ω A on a state space S is a mapping A ∶ x → A x from the outcome set Ω A to the set of effects E(S) such that ∑ x∈Ω A A x = u. The set of observables on S is denoted by O(S). For each A ∈ O(S) we refer to Ω A as the outcome set of A.
Let A, B ∈ O(S) with respective outcome sets Ω A , Ω B . We say that B is a post-processing of A, denoted by A → B, if there is a right-stochastic matrix ν with elements ν xy , x ∈ Ω A , y ∈ Ω B , 0 ≤ ν xy ≤ 1, ∑ y∈Ω B ν xy = 1 such that in which case we also write B = ν ○ A. The operational interpretation is straightforward: we have A → B only if we can obtain the probabilities given by B from the probabilities given by A. The condition ∑ y∈Ω B ν xy = 1 follows from ∑ y∈Ω B B y = u.
Compatibility of observables and of observables and channels will play a central role in our calculations.
Definition 4. Let S 1 , S 2 be a state spaces. An operation is an affine map Ψ ∶ S 1 → S ≤1 2 . A channel is an affine map Φ ∶ S 1 → S 2 . The set of channels from S 1 to S 2 is denoted by C(S 1 , S 2 ) and in the special case where S 1 = S 2 ≡ S we denote it by C(S).
In quantum theory we also require channels to be completely positive, but we omit this within GPTs as in general it is problematic to specify what complete positivity means.
Let S be a state space and let A ∈ O(S) with an outcome set Ω A of n elements. We can identify the points of Ω A with the extreme points of a simplex, which allows us to form convex combinations of the points of Ω A . Moreover we will denote this simplex P(Ω A ) and its extreme points δ 1 , . . . , δ n as they correspond to classical measures on Ω A supported on a single point. Now we can see the observable A as a channel A ∶ S → P(Ω A ). Furthermore, a post-processing ν can be seen as a channel mapping the classical state spaces corresponding to outcome sets of observables.
As mentioned above, similarly to compatibility of measurements, we can introduce the compatibility of a measurement and a channel. The central role is going to be played by a generalization of partial trace, which is as follows: let S 1 , S 2 be state spaces and let x ∈ S 1⊗ S 2 , then by definition we have We then define the maps u 1 ∶ S 1⊗ S 2 → S 2 and u 2 ∶ S 1⊗ S 2 → S 1 as The maps u 1 , u 2 are direct generalizations of partial traces. The definitions may be easily generalized also for entangled states but this is out of the scope of what we will need in future calculations.
where ○ denotes the composition of maps.
If the channel Φ were an observable, we would obtain a definition of compatibility of observables which can be shown to be equivalent to Def. 3; see [9]. In a similar fashion one may also formulate the definition of compatibility of channels [10].
We will start with a simple lemma for the compatibility of an observable and a channel. Proof.
Let Ω A = {δ 1 , . . . , δ n } denote the outcome space with n points. Moreover let b 1 , . . . , b n denote the dual base of affine functions P(Ω A ) → R, such that b i (δ j ) = 1 if and only if i = j. If Φ and A are compatible, then there exists a channelΦ ∶ S → S⊗P(Ω A ) such that Φ = u 2 ○Φ and A = u 1 ○Φ.
for some f ij ∈ V * and ψ j ∈ V and for some index j from a finite index set J. Denote Φ i = ∑ j∈J f ij ⊗ ψ j and notice that Φ i are linear maps V → V.
SinceΦ must be a channel then b i ○Φ ∶ S → S must also be a positive map and since b i ○Φ = Φ i , we see that Φ i are positive maps. SinceΦ is a joint channel of Φ and A we must have (1) and (2), then defineΦ = ∑ n i=1 Φ i ⊗ δ i . Positivity and normalisation ofΦ follows from the positivity of Φ i and (2). The fact thatΦ is a joint channel of Φ and A follows from (1) and (2).

IV. FORMULATION OF THE TWO PRINCIPLES
The purpose of measuring an observable is to learn something about the input state via the obtained measurement outcome probability distribution. An observable is called trivial if it cannot provide any information on input states. More precisely, this means that a trivial observable T assigns the same measurement outcome probability distribution to all states, i.e., T = pu for some probability distribution p on Ω T . Physically speaking, a measurement of a trivial observable can be implemented simply by rolling a dice and producing a probability distribution independently of the input state. We denote by T 1 the set of all trivial observables, i.e., From the banal structure of trivial observables it follows that any such observable is compatible with every other observable. Formally, if T = pu is a trivial observable and A is some other observable, then we can define an observable J T,A with effects J T,A (x, y) = p(x)A y , and we have ∑ x J T,A (x, y) = A y and ∑ y J T,A (x, y) = T x .
Furthermore, a trivial observable is compatible with every channel. Namely, if T = pu is a trivial observable and Φ is a channel, then we can define operations Φ i ∶ S → V + as Φ i = p(i)Φ for all i ∈ Ω T . Clearly, then ∑ i∈Ω for all i ∈ Ω T so that by Lemma 1 we conclude that T and Φ are compatible.
These two features of trivial observables raise natural questions: are there observables other than trivial ones that have these features? If so, what is the structure of such observables? As we have seen in Sec. II, the answer to the first question is affirmative, hence the second question urges an investigation.
To properly analyze the two mentioned features, we consider them as independent properties that determine a subclass of observables. Hence, for a state space S, we define the following subsets of observables: We conclude that T 2 is the set of observables that can be measured without causing any disturbance. Now, suppose that T ∈ T 2 , so there exist operations Φ i ∶ S → V + such that ∑ i∈Ω T Φ i = id and u ○ Φ i = T i for all i ∈ Ω T . If A ∈ O(S), we define a joint observable G of A and T by G ij = A j ○ Φ i for all i ∈ Ω T and j ∈ Ω A . We then see that for all i ∈ Ω T and j ∈ Ω A . Thus, A and T are compatible, and since A was an arbitrary observable, it follows that T ∈ T 3 . We conclude that These three sets and the previous chain of inclusion allows us to give a simple and concise formulation of the two principles: The no-information-without-disturbance principle means that T 2 = T 1 , while the no-free-information principle means that T 3 = T 1 .

V. CHARACTERIZATION OF T2
The aim of this section is to characterize non-disturbing observables and the structure of the state spaces they may exist on. We will have to introduce additional mathematical results to provide the full description of such state spaces.

A. Direct sum of state spaces
We will introduce a direct sum of state spaces as a generalized description of using only block-diagonal quantum states. Our aim is to mathematically formalize the operational idea of having an ordered pair of weighted states from two different state spaces. Definition 6. Let V 1 , V 2 be real finite-dimensional vector spaces and let S 1 ⊂ V 1 and S 2 ⊂ V 2 be state spaces. We define a state space S 1 ⊕ S 2 ⊂ V 1 × V 2 as the set of ordered and weighted pairs of states from S 1 and S 2 , i.e., Given state spaces S 1 , . . . , S n one can define S 1 ⊕ . . . ⊕ S n in a similar fashion as a subset of V 1 × . . . V n , i.e., one would have In what follows we will present a few basic results about S 1 ⊕ S 2 . We will limit only to direct sum of two state spaces for the sake of not drowning in a sea of symbols, but it will be straightforward to see that all of the results hold for any finite direct sum as well.

It follows that if
Proof. If A 1 ○○ B 1 and A 2 ○○ B 2 then A ○○ B as we can form the joint observable as (J A,B ) k = ((J A 1 ,B 1 ) k , (J A 2 ,B 2 ) k ) and apply the respective post-processings to the respective observables, hence A ○○ B. Note that to make the observables have the same number of outcomes, we can always pad out one with zero effects corresponding to some extra outcomes that never happen.
If A ○○ B, then by restricting the state space only to states of the form (x 1 , 0) ∈ S 1 ⊕ S 2 , where x 1 ∈ S 1 it follows that A 1 ○○ B 1 are compatible as we can obtain J A 1 ,B 1 from J A,B . A 2 ○○ B 2 follows in the same manner. This explains our motivational example in Sec. II. One can also prove a similar result for the compatibility of an observable and a channel, but we will leave that for the next section, where we will investigate the conditions for the compatibility of an observable and the identity channel id ∶ S → S, where direct sums of state spaces will play a role.
This last result will help us identify the direct sum structure of a state space.
Proposition 3. Let S be a state space and let S 1 , S 2 ⊂ S be convex, closed, conv (S 1 ∪ S 2 ) = S and for every x ∈ S there are unique Proof. Let V 1 and V 2 denote the subspaces of V generated by S 1 and S 2 respectively. Define map . It follows that we have P ∶ S → S 1 ⊕ S 2 , moreover one can easily see that P is an affine isomorphism. It follows that S is affinely isomorphic to S 1 ⊕ S 2 , the result follows by simply omitting the isomorphism.

B. Compatibility of an observable and the identity channel
We are going to derive conditions for an observable to be compatible with the identity channel id ∶ S → S. Our results will be similar to the results mentioned in [4,11], but we will approach the problem from a different angle and with a different objective in mind.

Lemma 2. An observable
Proof. Assume that an observable A is compatible with id, then due to Lemma 1 we must have operations Φ 1 , . .
To prove our claim we will use the defining property of extreme points. We have Since this holds for every extreme point of S it follows that Φ is a joint channel of A and id.

Proposition 4. Observable
A is compatible with id if and only if there is a set of affinely independent extreme points of S, denote them x j , j ∈ {1, . . . , d} such that S ⊂ aff ({x 1 , . . . , x d }) and for every extreme point y ∈ S, y = ∑ d j=1 α j x j it holds that Proof. Assume that an observable A is compatible with id and let x 1 , . . . , x d ∈ S be a set of affinely independent extreme points, such that S ⊂ aff ({x 1 , . . . , x d }). Let y ∈ S be an extreme point, then y = ∑ d j=1 α j x j , where ∑ d j=1 α j = 1. According to Lemma 2 there is a channel Φ such that (4) holds. Plugging in the expression y = ∑ d j=1 α j x j we obtain Assume that (5) holds for an observable A and define a map Φ ∶ S → S⊗P(Ω A ) given for j ∈ {1, . . . , d} as and extended by affinity to all of S. Let y ∈ S be an extreme point, then we have where we have used (5) in the third step. By lemma 2 it follows that A is compatible with id.
Note that if S is a simplex, then the set {x 1 , . . . , x d } is unique and contains all extreme points of S, hence the requirement of Prop. 4 is trivially satisfied.
It is important to note that Prop. 4 provides a condition on the effects A i , not on A as a whole. Therefore it will be interesting to investigate the set of effects that satisfy the condition (5).

Definition 7.
We denote ET 2 set of effects on a state space S that satisfy the condition (5), i.e. f ∈ ET 2 if there is some set {x 1 , . . . , x d } of affinely independent extreme points of S such that S ⊂ aff ({x 1 , . . . , x d }) and for every extreme point y ∈ S, y = ∑ d j=1 α j x j it holds that The following is a straightforward.
Proof. If f ∈ ET 2 then λf + (1 − λ)µu ∈ ET 2 follows by Lemma 3. If λf + (1 − λ)µu ∈ ET 2 , then The result of Prop. 5 is non-trivial. As we will see, there are observables that are compatible with all other observables because they are "noisy enough". But according to Prop. 5 this is not the case for compatibility with the identity channel id. Loosely speaking Prop. 5 together with the next result show that the structure of T 2 is more like T 1 , than T 3 in the sense that observables in T 2 are in some sense classical; such as was the case in Sec. II.
Proof. If S is a simplex, then there is only one set {x 1 , . . . , x d } of affinely independent points and we have S = ⊕ d j=1 x j . The claim follows.
Let x 1 , . . . , x d be a set of affinely independent extreme points of S and let y ∈ S be an extreme point, then we have Let z ∈ S, then we have already proved that we have where 0 ≤ λ c ≤ 1, ∑ c∈[0,1] λ c = 1 and y c ∈ S c . Note that y c is not necessarily an extreme point of S. We will show that the decomposition (7) is unique. Assume there is another decomposition It follows that the right-hand side must be an affine combination of x j , j ∈ {1, . . . , n} such that f (x j ) = c ′ . This implies that for c ≠ c ′ we must have λ c y c = λ ′ c y ′ c as otherwise the aforementioned result would be violated. We get It follows that hence the two decompositions of z are the same. The result follows from Prop. 3.
Using Thm. 1 we can easily characterize all two-dimensional state spaces that have observables compatible with the identity channel, i.e. that have information without disturbance. Remember that if a state space is two-dimensional, then dim(V) = 3 where V is the vector space containing the cone V + which has the base S.
are the vector spaces that contain S 1 and S 2 respectively. This implies dim(V 1 ) + dim(V 2 ) = dim(V) = 3 and we can assume that dim(V 1 ) = 1, dim(V 2 ) = 2. This implies that S 1 contains only one point and S 2 is a line segment, i.e. it has two extreme points. It then follows that S must have three extreme points, hence it is a triangle state space, which is a simplex.
In a similar fashion one can show that every three-dimensional state space that has information without disturbance is pyramid shaped, where the base of the pyramid can be any two-dimensional state space.

A. Simulability of observables
Simulation of observables is a method to produce a new observable from a given collection of observables by a classical procedure, that is, by mixing measurement settings and post-processing the outcome data [12][13][14][15]. For a subset B ⊆ O(S), we denote by sim(B) the set of observables that can be simulated by using the observables from B, i.e., A ∈ sim(B) if there exists a probability distribution p, a finite collection of post-processing matrices ν (i) and observables B (i) ∈ B such that We will also denote sim(B) ≡ sim({B}). Clearly, We recall from [14] that an observable A is called simulation irreducible if for any subset B ⊂ O, we have A ∈ sim(B) only if there is B ∈ B such that A ∈ sim(B) and B ∈ sim(A). Thus, a simulation irreducible observable can only be simulated by (essentially) itself. Equivalently, an observable is simulation irreducible if and only if it has indecomposable effects and is post-processing equivalent with an extreme observable. We denote by O irr (S) the set of simulation irreducible observables. It was shown in [14] that for every observable there exists a finite collection of simulation irreducible observables from which it can be simulated.
It is worth mentioning that simulation irreducible observables are always incompatible, and in fact, a state space is non-classical if and only if there exists at least two inequivalent simulation irreducible observables [14].

B. Intersections of simulation sets
A trivial observable can be simulated by any other observable, and therefore The following stronger statement is less obvious, although not too surprising.
On the other hand, suppose that the inclusion is strict so that (w.l.o.g.) there exist a dichotomic observable T ∈ ⋂ B∈O(S)∖T1 sim(B) such that T ∉ T 1 . This means that the effects T + and T − are not proportional to the unit effect u so that especially T + and u are linearly independent.
We take λ, q ∈ (0, 1) and define another dichotomic observable A by where on the second line we have used the fact that T − = u − T + . From the linear independence of u and T + it follows that we must have λ(ν 1 − ν 2 ) = 1, which is a contradiction since 0 < λ < 1 and ν 1 − ν 2 ≤ 1.
If we use the same probability distribution (p i ) i as before, we have that for all z ∈ Ω A xz for all i ∈ {1, . . . , n}, x ∈ Ω B and z ∈ Ω A . Hence, also A ∈ sim(B) so that A and C are compatible.
As was shown in Prop. 7, the observables that are compatible with every other observable are exactly those that can be post-processed from every simulation irreducible observable. However, we note that it is enough to consider only post-processing inequivalent simulation irreducible observables since two observables B and B ′ are post-processing equivalent, B ↔ B ′ , if and only if sim(B) = sim(B ′ ). Thus, when we consider the intersection of the simulation sets of simulation irreducible observables, we only need to select some representative for each post-processing equivalence class.
The natural choice for the representative is to take the extreme observable with pairwise linearly independent effects: it has linearly independent indecomposable effects with the minimal number of outcomes in the respective post-processing equivalence class. It was shown [14] that such extreme observable exists in every equivalence class for simulation irreducible observables. We denote the set of extreme simulation irreducible observables by O ext irr (S) so that

Corollary 3. An observable A ∈ O(S) on a state space S is included in T 3 if and only if
Proof. Let first A ∈ T 3 . By Prop. 7 for all B ∈ O ext irr (S) there exists a post-processing ν B such that A = ν B ○ B, i.e., for all y ∈ Ω A . Since ν B xy ≥ 0 for all x ∈ Ω B , y ∈ Ω A for all B ∈ O ext irr (S), we have that for all B ∈ O ext irr (S) for all y ∈ Ω A , which proves the necessity part of the claim.
Since each B ∈ O ext irr (S), we have that each B consists of linearly independent effects B x [14], so that ∑ y∈Ω A µ B xy = 1 for all x ∈ Ω B . Thus, we can define post-processings µ B for each B ∈ O ext irr (S) with elements µ B xy so that A ∈ sim(B) for all B ∈ O ext irr (S).

C. Example showing that T2 ≠ T3
We will present an example of a two-dimensional state space S, such that there is an observable where the z-coordinate is used to identify S with a base of a cone. Let be a simpex, then we have S ⊂ S 3 as shown in Fig. 2. Let us define functionals x, y, u given as The points are shown in Fig. 3. According to Prop. 16 from appendix A there are 4 indecomposable effects corresponding to the 4 maximal faces of S. They are ξ 1 , ξ 2 , ξ 3 and u − ξ 3 , where It was shown in [14, Corollary 1] that simulation irreducible observables must consists of indecomposable effects. We are going to find all simulation irreducible observables on S as we know that A ∈ T 3 if and only if A is simulable by every simulation irreducible observable; see Prop. 7.
Assume that there would be a simulation irreducible observable with the effects α 1 ξ 1 , α 2 ξ 2 , α 3 ξ 3 and α ′ 3 (u − ξ 3 ), where α 1 , α 2 , α 3 , α ′ 3 ∈ R, then we must have Since the effects of simulation irreducible observables must be linearly independent, we know that at least one of the coefficients must be equal to zero.
We are going to use Prop. 5 to see that A ∉ T 2 . Assume that A ∈ T 2 , then A 1 ∈ ET 2 , which by Prop. 5 implies also u − ξ 3 ∈ ET 2 as u − ξ 3 = 2x. This would imply that B would be compatible with every other observable, but it is straightforward to see that B is incompatible with C as they are the only two simulation irreducible observables and if they would be compatible, then all of the observables on S would be compatible. This would in turn yield that S would have to be simplex [16] which it clearly is not.
An insight into how we obtained this example is provided by the simplex S 3 : ξ 1 , ξ 2 and x are effects on the simplex S 3 so that the compatibility of A and C follows. Moreover, the fact that u − ξ 3 = 2x ≥ x gives the compatibility of A and B.

VII. STATE SPACES SATISFYING T1 = T2 = T3
Next we will consider conditions under which the no-information-without-disturbance principle (T 2 = T 1 ) and the no-free-information principle (T 3 = T 1 ) hold and when they do not. First we note that, as was mentioned earlier, in general we have that T 1 ⊆ T 2 ⊆ T 3 so that if the no-free-information principle holds, and therefore we have that T 3 = T 1 , it follows that also T 2 = T 1 so that the no-information-without-disturbance principle must hold as well.

A. Conditions for T1 = T3
With the help of Prop. 7 we can show the following. Proposition 8. The following conditions are equivalent: iii) ⇒ ii): It is clear that cone (u) ⊆ ⋂ B∈Oirr(S) cone ({B x } x ). Now take g ∈ ⋂ B∈Oirr(S) cone ({B x } x ) so that for all B ∈ O irr (S) there exists positive real numbers (α B
ii) ⇒ i): As noted before, we always have T 1 ⊆ T 3 so that it suffices to show that T 3 ⊆ T 1 . Thus, take A ∈ T 3 . By Prop. 7, A ∈ sim(B) for all B ∈ O irr (S) so that for each B ∈ O irr (S) there exists a post-processing xy B x for all y ∈ Ω A . Since all the post-processing elements are positive for each B ∈ O irr (S), we have that A y ∈ cone ({B x } x∈Ω B ) for all y ∈ Ω A and B ∈ O irr (S). Thus, for all y ∈ Ω A from which it follows that A ∈ T 1 .
Proposition 9. Let S be a d-dimensional state space. If O ext irr (S) < ∞ and all the extreme simulation irreducible observables have d + 1 outcomes, then T 1 ≠ T 3 .
Proof. Since S is d-dimensional (i.e. dim(aff (S)) = d), the effect space is contained in a d + 1-dimensional vector space. Suppose that, on the contrary T 1 = T 3 . From Prop. 8 it follows then that Since dim(V * ) = d + 1 and each extreme simulation irreducible observable consists of d + 1 linearly independent effects, it follows that cone ({B x } x ) has a non-empty interior, denoted by int which is a contradiction.

Proposition 10.
If there exist at least two post-processing inequivalent dichotomic simulation irreducible observables on S, then T 1 = T 2 = T 3 .
Proof. By the assumption there exist two dichotomic observables E, F ∈ O irr (S) such that E ↮ F. Take A ∈ T 3 so that by Prop. 7 we have that A ∈ sim(E) and A ∈ sim(F). From Prop. 11 in [14] it follows that A x ∈ conv ({E + , E − , o, u}) and A x ∈ conv ({F + , F − , o, u}) for all x ∈ Ω A . Since E and F are inequivalent, it follows that the set {u, With the previous proposition we can show that the no-free-information principle holds in any point-symmetric state space, i.e., in a state space S where there exists a state s 0 such that for all s ∈ S we have that This means that for each state s there exists another state s ′ such that s 0 is an equal mixture of s and s ′ , i.e., s 0 = 1 2 (s + s ′ ). Point-symmetric state spaces include the classical bit, the qubit and polygon state spaces with even number of vertices.
One can show that the effect space structure is also symmetric for symmetric state spaces. Firstly, all the non-trivial extreme effects are seen to lie on a single affine hyperplane. Namely, if e ∈ E(S) is an extreme effect, e ≠ o, u, there exists a (pure) state s ∈ S such that e(s) = 0 [7]. For s, there exists another state s ′ such that s 0 = 1 2 (s + s ′ ) so that e(s 0 ) = 1 2 e(s ′ ). Similarly there exists a (pure) state t ∈ S such that e(t) = 1 [7]. For t, we can find t ′ such that e(s 0 ) = 1 2 (e(t) + e(t ′ )) = 1 2 (1 + e(t ′ )). Combining these two expressions for e(s 0 ) we find that e(s ′ ) = 1 + e(t ′ ) from which it follows that e(t ′ ) = 0 and e(s ′ ) = 1 so that e(s 0 ) = 1 2 for all extreme effects e. Thus, all the non-trivial extreme effects lie on an affine hyperplane determined by the state s 0 . Secondly, we see that all the non-trivial extreme effects must actually be indecomposable. If e ∈ E(S) is an extreme effect, e ≠ o, u, then we can find some decomposition into indecomposable extreme effects . Since all extreme effects give probability 1 2 on the state s 0 , we have that 1 = 2e(s 0 ) = ∑ r i=1 α i . Since e is extreme, it follows that r = 1 so that e is indecomposable. Thirdly, the convex hull of all the extreme indecomposable effects (that lie on an affine hyperplane) is also pointsymmetric: if e ∈ E(S) is a non-trivial extreme effect, then e ′ ∶= u − e is also a non-trivial extreme effect so that e 0 ∶= 1 2 u = 1 2 (e + e ′ ) acts as the inversion point of the set.  [14] that E and F are inequivalent dichotomic simulation irreducible observables. The claim follows from Prop. 10.

B. Alternative characterization of T1
Finally, we show that a seemingly different formulation of "free-information" does not lead to a new concept. Consider T ∈ T 3 and take an observable A ∈ O(S) such that A ∉ T 1 . Since T is compatible with A there exists a joint observable J A,T from which both A and T can be post-processed from. Since A is non-trivial and T is compatible with every other observable, we can ask whether measuring the joint observable J A,T actually gives us any more information than just measuring A. One way to consider this is to ask whether A is actually post-processing equivalent to J A,T so that both can be obtained from each other by classically manipulating their outcomes. If this is the case, there is no "free information" to be gained from measuring the joint observable. Thus, we consider one more set of observables: We can show the following.
Proof. Since T 1 ⊆ T 4 it suffices to show that T 4 ⊆ T 1 . Thus, take T ∈ T 4 so that for all A ∈ O(S) ∖ T 1 we have that A is post-processing equivalent with at least one of their joint observables J A,T . Thus, {A, T} ⊆ sim(J A,T ) and since A ↔ J A,T it follows that T ∈ sim(A) for all A ∈ O(S) ∖ T 1 . From Prop. 6 it follows that T ∈ T 1 .

A. Characterization of polygons
A regular polygon with n vertices in R 2 , or n-gon, is a convex hull of n points {⃗ x k } n k=1 such that ⃗ x k = ⃗ x j and ⃗ x k ⋅ ⃗ x k+1 = ⃗ x k 2 cos π n for all j, k = 1, . . . , n. As a state space S n , we consider the polygon to be embedded in R 3 on the z = 1 -plane. Thus, we follow the notation of [17] and define the extreme points of S n as where we have defined r n = sec π n . As the polygons are two-dimensional, the effects can also be represented as elements in R 3 . Hence, we can express each e ∈ E(S n ) as a vector e = (e x , e y , e z ) T ∈ R 3 . With this identification we have that e(s) = e ⋅ s for all e ∈ E(S n ) and s ∈ S n where ⋅ is the Euclidean dot product. Clearly, we now have the zero effect o = (0, 0, 0) T and the unit effect u = (0, 0, 1) T .
Depending on the parity of n, the state space may or may not have reflective point symmetry around the middle point s 0 = (0, 0, 1) T . As a result of this, the effect space E(S n ) has a different structure for odd and even n. For even n, we find that the effect space E(S n ) has n non-trivial extreme points so that E(S n ) = conv ({o, u, e 1 , . . . , e n }). All the non-trivial extreme effects lie on a single (hyper)plane determined by those points e such that e(s 0 ) = 1 2.
In the case of odd n, the effect space has 2n non-trivial extreme effects for k = 1, . . . , n. Now E(S n ) = conv ({o, u, g 1 , . . . , g n , f 1 , . . . , f n }) and the non-trivial effects are scattered on two different planes determined by all those points g and f such that g(s 0 ) = σ n ∶= 1 1+rn and f (s 0 ) = 1 − σ n = rn 1+rn . The even and odd polygon state spaces and their respective effect spaces are depicted in Figure 4.
In order to give a simple characterization of polygons, let us define functions η n e ∶ R 2 → R and η n o ∶ R 2 → R by η n e (⃗ x) = max k∈{1,...,n} r n cos 2πk n x + sin 2πk n y , is the rotation matrix with a rotation angle π n around the origin in R 2 . We use the notation η n e o when we consider some properties that hold for both η n e and η n o . We see that both η n e (⃗ x) and η n o (⃗ x) can be expressed as a maximization over an inner product of ⃗ x and a collection of unit vectors ⃗ b where we have defined Thus, both η n e and η n o are polyhedral convex functions [18]. It is straightforward to see that η n e o satisfy the following properties for all ⃗ x, ⃗ y ∈ R 2 : Additionally we see that also the following is satisfied for all x ∈ R 2 : iv) η n e o (α⃗ x) = αη n e (⃗ x) for all α ≥ 0. Thus, both η n e and η n o almost satisfy the requirements of a norm; the only missing property is the requirement for a reflective point symmetry, i.e. η n e o (−⃗ x) = η n e o (⃗ x) for all ⃗ x ∈ R 2 . For even n, however, it is easy to confirm that both η n e and η n o are point symmetric so that they are norms on R 2 . Similarly for odd n it is easy to see that the point symmetry does not hold.
Even though for general n the functions η n e o do not define a norm on R 2 , we can still use them to define different sized polygons. As continuous polyhedral convex functions, η n e and η n o have closed polyhedral level sets B n e o (r) = {⃗ x ∈ R 2 η n e o (⃗ x) ≤ r} which we will show to give rise to the polygons.
First of all, we see that the level sets B n e o (r) are bounded so that they actually describe polytopes: When we express ⃗ x ∈ R 2 in its polar form ⃗ x = (x, y) T = ⃗ x (cos(θ), sin(θ)) T , we have Considering η n e first, we see that since the angles 2kπ n are an angle 2π n apart from each other for consecutive k's and since the maximization of cosine actually minimizes the angle 2πk n − θ, for the k ′ ∈ {1, . . . , n} which minimizes the angle we have 2πk ′ n − θ ≤ π n so that cos 2πk ′ n − θ ≥ cos π n . The same arguments hold for η n o as well so if ⃗ x ∈ B n e o (r) for some r > 0, then Hence, the level sets B n e o (r) are compact (convex) polytopes for all r > 0. Furthermore, each B n e o (r) has at most n extreme points since it is an intersection of n closed half-spaces in R 2 .
The functions η n e and η n o have the following connection: for all ⃗ x ∈ R 2 and r ≥ 0. This can be seen using the expressions from (25) and (26); for example Let us consider the specific level set B n o (r n ). For each k ∈ {1, . . . , n}, we define ⃗ s k = r n cos 2kπ n , r n sin 2kπ n T so that s k = (⃗ s k , 1) T . It is easy to see that η n o (⃗ s k ) = r n so that ⃗ s k ∈ B n o (r n ) for all k = 1, . . . , n. Furthermore, we have that ⃗ s k = r n for all k so that each ⃗ s k lies on a circle of radius r n centered at the origin. This shows that ⃗ s k is extreme in B n o (r n ) for all k = 1, . . . , n, since a non-trivial convex decomposition for ⃗ s k would contradict the fact that ⃗ x ≤ r n for all ⃗ x ∈ B n o (r n ). This, combined with the fact that B n o (r n ) has at most n extreme points, shows that the extreme points of B n o (r n ) are exactly the vectors ⃗ s k for all k = 1, . . . , n. Hence, s = (⃗ s, 1) ∈ S n if and only if ⃗ s ∈ B n o (r n ). By similar arguments, we see that also B n e (r) is a regular polygon whose extreme points are rotated and scaled from ⃗ s k . For example, in the case of even n, we see that the effects lying on the hyperplane that contains all the non-trivial extreme effects can be characterized in terms of B n e (r); namely, e = ⃗ e, 1 Furthermore, for even n we have that conv ({e 1 , . . . , e n }) = ⃗ e, 1 2 and similarly for odd n conv ({g 1 , . . . , g n }) = (⃗ g, σ n ) In both cases, the above sets serve as a compact bases for the positive dual cones in R 3 .

B. Characterization of T2
The analysis of T 2 on polygon state spaces is straight-forward. If n = 3, then the state space is a simplex and T 2 = O(S 3 ). In all other cases we have T 1 = T 2 as a result of Coro. 2.

C. Characterization of T3
The post-processing equivalence classes of simulation irreducible observables on polygon state spaces were characterized in [14] where it was found that for an n-gon state space there exists m dichotomic and 1 3 m(m−1)(m−2) trichotomic extreme simulation irreducible observables when n = 2m for some m ∈ N (even polygons) and 1 6 m(m + 1)(2m + 1) trichotomic extreme simulation irreducible observables when n = 2m + 1 for some m ∈ N (odd polygons).
For even polygons with n = 2m where m ≥ 2, there exists at least two inequivalent dichotomic simulation irreducible observables, so by Prop. 10 the set T 3 coincides with the set of trivial observables.
For odd polygon state spaces we see that the extreme simulation irreducible observables have the same number of outcomes as the dimension of the effect space, so given that there are a finite number of them, it follows from Prop. 9 that T 3 ≠ T 1 . We continue to give a characterization of T 3 for the odd polygon state spaces.
Let S n be an odd polygon state space so that n = 2m+1 for some m ∈ N. There are q m ∶= 1 6 m(m+1)(2m+1) extreme simulation irreducible observables that generate the cones generated by all the simulation irreducible observables. By using some enumeration B (1) , . . . , B (qm) for these observables, we have that O ext irr (S n ) = {B (i) } qm i=1 so that for an observable A ∈ O(S n ) we have We can show that there are certain extreme simulation irreducible observables that are enough to characterize the above intersection. Let B ∈ O ext irr (S). Since for all k ∈ {1, 2, 3} the effects B k are indecomposable, for each k ∈ {1, 2, 3} there exists 0 < c k ≤ 1 and effect g i k ∈ {g 1 , . . . , g 2m+1 } such that B k = c k g i k . We see that we only need to consider the case when i k ∈ {j, j + m, j + m + 1} for all k ∈ {1, 2, 3} for some j ∈ {1, . . . , 2m + 1}, where the addition of the indices is taken modulo 2m + 1.
The complete proof of the proposition can be found in the appendix but one can easily convince oneself by looking at Fig. 5 which shows the case of heptagon effect space. For each B ∈ O ext irr (S n ) we can consider the base of the cone cone ({B 1 , B 2 , B 3 }) on the plane containing the indecomposable extreme effects {g i } n i=1 , where the base takes the form of a triangle that contains the middle point σ n u. We can see that in order to characterize the intersection of such cones, it is enough to consider the intersection of their respective bases, or triangles containing σ n u, equivalently. In the left of Fig. 5, the bases (coloured as blue and red) of two extreme simulation irreducible observables are shown with the whole effects space. On the right is depicted all the triangles (formed by dashed lines) of all the bases on the plane with the blue and red bases from the left figure also shown on the right. We see that the base of the intersection of the cones (darker blue area) is characterized by triangles with vertices g i , g i+m and g i+m+1 (like the blue triangle) so that their intersection is always contained in the intersection of other triangles (like the red triangle).
We are going to proceed with finding the base of the cone ⋂ 2m+1 i=1 cone ({g i , g i+m , g i+m+1 }) by identifying the extreme points of the base ⋂ 2m+1 i=1 conv ({g i , g i+m , g i+m+1 }). Let us denote and We will approach the problem as follows: at first, we will identify that C m must be a polygon itself by looking at its relation with the line segments L i . Then we will find the form of the extreme points of C m and in the end we will identify them. During the calculations we will work only in the 2-dimensional vector space given by aff It is very useful to realize that L i generate hyperplanes in R 2 and that C m is an intersection of the halfspaces corresponding to the hyperplanes L i that contain the point 0. It follows that we must have L i ∩ C m ≠ ∅, ∀i ∈ {1, . . . , 2m + 1}, otherwise there would be hyperplanes separating L i and C m , which is a contradiction with C m being given as an intersection of halfspace corresponding to L i . Since there are only 2m + 1 different line segments L i it follows that C m must have exactly 2m + 1 edges and from the symmetry it also follows that C m must be a polygon. Now the only thing we need to do is to identify the extreme points of C m .
Since the line segments L i must intersect C m it follows that the extreme points of C m must correspond to the intersections of these line segments. Let us denote where j ∈ {1, . . . , m}, where if i + j ≥ 2m + 1, then we take (i + j) mod (2m + 1). Also note that considering j ≥ m + 1 would be redundant. The next key step is to characterize the relation of x i,j and C m . We can show the following. Again, the complete proof of the lemma can be found in the appendix, but one can easily convince oneself by looking at Fig. 6, where the points {x i,j } m j=1 are depicted for a fixed i in the case of a heptagon (left) and nonagon (right) state space.
We are almost ready to move on to the complete characterization of T 3 in odd polygon theories in terms of the previously defined η n e o functions. We will still make a few remarks on the inner polygons C m . Let n = 2m + 1. We will consider separately, although analogously, the cases of even and odd m. This is because of the orientation of the inner polygon C m with respect to the outer polygon conv ({g 1 , . . . , g n }). To show the difference between even and odd m, let us consider the intersection point of the boundary of the outer polygon and the half-line through an extreme point x i,1 of the inner polygon emanating from the centroid (0, 0, σ n ) T . If this intersection point is also an extreme point of the outer polygon, then both the inner and outer polygons are similarly oriented; otherwise they are differently oriented. As , it is clear that the half-line through x i,1 that emanates from the centroid meets the boundary of the outer polygon at some of the line segments conv ({g i+1 , g i+2 }) , . . . , conv ({g i+m−1 , g i+m }).
For even m, i.e., for m = 2l for some l ∈ N, there exists an even number 2(l−1) of vertices g j between the vertices g i+1 and g i+m so that there is an odd number of such edges. From the symmetry it follows that for even m, the intersection point must lie in the middle of the midmost edge conv ({g i+l , g i+l+1 }). Thus, for even m, the inner polygon C m is differently oriented with respect to the outer polygon conv ({g 1 , . . . , g n }).
By contrast, for odd m, i.e., for m = 2l + 1 for some l ∈ N, there exists an even number of such edges, which together with the symmetry of the situation tells us that now the intersection point is exactly one of the vertices of the outer polygon, namely g i+l+1 . Thus, for odd m, the inner polygon is similarly oriented to the outer polygon. The orientations of the inner polygon for odd and even m are depicted in Fig. 6.
As we saw in the beginning of the section, the orientation of the polygon can also be characterized with the η n if m = 2l + 1 for some l ∈ N.
Proof. By Prop. 6 it follows that A ∈ T 3 if and only if (⃗ a x , σ n ) T ∈ C n for all x ∈ Ω A , and from Lemma 4 we know that the x i,1 = (⃗ x i,1 , σ n ) T are the extreme points of C n . Thus, if we show that ⃗ x i,1 = η n e o (⃗ x i,1 ) = σ n r n sin π 2n , it follows that C n = (⃗ x, σ n ) T ∈ R 3 η n e o (⃗ x) ≤ σ n r n sin π 2n which will prove the claim.
However, for m = 2l + 1 for some l ∈ N we have that the maximum in Eq. (32) is attained for j = k + l + 1 and similarly the maximum in Eq. (33) is attained for j ∈ {k + l, k + l + 1} so that for this case we have When describing noisy observables, the noise is most commonly added externally to an observable, but the noise content describes the amount of noise that an observable already has intrinsically. Usually the noise set is taken to be the set of trivial observables T 1 .
Examining Prop. 13 more closely, the set T 3 seems to be quite noisy in the sense that the effects of observables in T 3 are scattered quite closely to the trivial effects on the line segment conv ({o, u}). Our aim is to show this remark quantitatively by showing that an observable that is compatible with every other observable must have a quite high noise content with respect to the trivial observables. We also show that an observable with a high enough noise content is indeed compatible with every other observable on odd polygon state spaces.
For the noise set N = T 1 , the noise content of an observable A ∈ O(S) takes a rather simple form [9]: and furthermore if the state space is a polytope (as is in the case of polygons), we have that where S ext denotes the set of extreme points of S. We start by making a connection between min s∈S A x (s) and η n o (⃗ a x ). As before, for each effect A x there exists α x > 0 such that A x = α x a x for some a x = (⃗ a x , σ n ) T , where ⃗ a x ∈ R 2 . Since a x ∈ conv ({g 1 , . . . , g n }) for all x ∈ Ω A , we have that for all x ∈ Ω A there exists λ x ∈ [0, 1] such that a x = λ x h x + (1 − λ x )σ n u for some h x ∈ ∂conv ({g 1 , . . . , g n }) = {(⃗ g, σ n ) T ∈ conv ({g 1 , . . . , g n }) η n o (⃗ g) = σ n }. We note that since h x lies on the boundary of the convex hull of the indecomposable effects, for all x ∈ Ω A , there exists i x ∈ {1, . . . , n} such that h x ∈ conv ({g ix , g ix+1 }). Since g ix and g ix+1 are indecomposable, by Prop. 16 they give zero for some maximal faces G ix and G ix+1 of S n . Furthermore, it is easy to see that they must be adjacent maximal faces so that there exists an extreme state s ix ∈ S n such that h x (s ix ) = 0. Thus, for all x ∈ Ω A . Thus, min s∈S ext A x (s) = α x [σ n − η n o (⃗ a x )] for all x ∈ Ω A . We can now show the following. Proposition 14. Let A ∈ O(S n ) be an observable on an odd polygon state space S n with effects A x = α x (⃗ a x , σ n ) for all x ∈ Ω A . If A ∈ T 3 , then w(A; T 1 ) ≥ 1 − r n sin π 2n (36) if n = 4l + 3 for some l ∈ N, or w(A; T 1 ) ≥ 1 − r 2 n sin π 2n (37) if n = 4l + 1 for some l ∈ N.
Proof. As was established above, we have that min s∈S ext A x (s) = α x (σ n − η n o (⃗ a x )). For n = 4l + 3, we have from Prop. 13 that η n o (⃗ a x ) ≤ r n σ n sin π 2n for all x ∈ Ω A so that  where on the last line we have used the fact that ∑ x∈Ω A α x = 1 σ n which follows from the normalization of A. For n = 4l + 1, we have from Prop. 13 that η n e (⃗ a x ) ≤ r n σ n sin π 2n for all x ∈ Ω A . From Eq. (28) we get that η n o (⃗ a x ) ≤ r n η n e (⃗ a x ) for all x ∈ Ω A so that from similar calculation as above we get that w(A; T 1 ) ≥ 1 − r 2 n sin π 2n . The lower bounds of the noise content from the previous proposition for the first few polygons are presented in Table I. We see that for n = 3, i.e., when the state space is classical, Eq. (36) gives the trivial lower bound zero, but already for the pentagon (n = 5) Eq. (37) shows that the noise content of an observable in T 3 must be more than 1 2. We see that as the number of vertices in the polygons increase, so does the noise content of observables in T 3 for both Eq. (36) and (37). In the limit where n → ∞ the right hand sides (R.H.S.) of both equations give the limit 1, so that the observables in T 3 become trivial. Indeed, as the number of vertices approaches infinity, the state space becomes shaped like a disc, which is seen to be a point-symmetric state space so that by Cor. 4 we have T 1 = T 3 .
From the other point of view, we can ask if sufficiently noisy observables are necessarily compatible with every other observable. For that, let us consider the binarizations of an observable A ∈ O(S n ), i.e., binary observablesÂ (x) with effectsÂ for all x ∈ Ω A . We can now show that observables that have a high enough noise content are indeed included in T 3 .

Proposition 15.
Let A ∈ O(S n ) be an observable on an odd polygon state space S n with effects A x = α x (⃗ a x , σ n ) for all x ∈ Ω A . If then A is compatible with every other observable on S n .
Proof. From the previous expression for the noise contents of the binarizationsÂ (x) of A, and by using Eq. (28), we have that Since T 1 is closed under post-processing and sinceÂ (x) is clearly a post-processing of A for each x ∈ Ω x , we have by the basic properties of the noise content [9] that w(Â (x) ; T 1 ) ≥ w(A; T 1 ) for all x ∈ Ω A . Thus, by rearranging the previous expression we have that for all x ∈ Ω A , where we have noticed that (1 + 1 r n ) −1 = σ n r n . Now, if Eq. (39) holds, from Prop. 13 it then follows that A ∈ T 3 .