Information and disturbance in operational probabilistic theories

Any measurement is intended to provide information on a system, namely knowledge about its state. However, we learn from quantum theory that it is generally impossible to extract information without disturbing the state of the system or its correlations with other systems. In this paper we address the issue of the interplay between information and disturbance for a general operational probabilistic theory. The traditional notion of disturbance considers the fate of the system state after the measurement. However, the fact that the system state is left untouched ensures that also correlations are preserved only in the presence of local discriminability. Here we provide the definition of disturbance that is appropriate for a general theory. We then prove an equivalent condition for no-information without disturbance-atomicity of the identity-namely the impossibility of achieving the trivial evolution-the identity-as the coarse-graining of a set of non trivial ones. We prove a general theorem showing that information that can be retrieved without disturbance corresponds to perfectly repeatable and discriminating tests. As a consequence we prove a structure theorem for operational probabilistic theories, showing that the set of states of any system decomposes as a direct sum of perfectly discriminable sets, and such decomposition is preserved under system composition. Besides proving that no-information without disturbance is implied by the purification postulate, we show via concrete examples that the converse is not true. Finally we show that no-information without disturbance and local discriminability are independent.


I. INTRODUCTION
The possibility that gathering information on a physical system may affect the state of the system itself was introduced by Heisenberg in his famous gedanken experiment [1], which became the first paradigm of quantum mechanics.The issue raised by Heisenberg spawned a vaste literature up to present days (see [2,3] as recent reviews), with a variety of quantifications of "information" and "disturbance" and corresponding tradeoff relations [4][5][6][7].All these results are quantitative accounts of a core issue in quantum theory, the no-information without disturbance theorem [8,9].The proofs of the theorem rely on the mathematical structure of quantum theory, and thus do not emphasise the logical relation between no-information without disturbance and other quantum features, such as local discriminability (the possibility of discriminating multipartite states via only local measurements) or purification (every mixed state can be obtained as the marginal state of a pure state).
The framework here used for exploring the relation between information and disturbance is that of operational probabilistic theories (OPTs) [9][10][11].In this setting a rigorous formulation of the notions of system, process, and their compositions is given, which constitutes the grammar for the probabilistic description of an experiment.Quantum theory and classical theory are two instances of OPTs.
For some probabilistic theories which can be reframed as OPTs, the definitions of information and disturbance have been investigated in the presence of local discriminability, purification, and causality [12][13][14][15].For OPTs satisfying those three axioms the no-information without disturbance theorem has been proved in Refs.[9,10].In the present paper we point out a weakness in the existing notion of disturbance, which is ubiquitous in all past approaches.Indeed, the conventional definition of disturbance asserts that an experiment does not disturb the system if and only if its overall effect is to leave its state unchanged, disregarding the effects of the experiment on the environment.Whilst this captures the meaning of disturbance within quantum theory, we cannot consistently apply the same notion in theories that violate local discriminability.A significative case is that of the Fermionic theory [16][17][18] where, due to the parity superselection rule, an operation that does not disturb a bunch of Fermionic systems still could affect their correlations with other systems.This issue can be cured asking a non-disturbing experiment to preserve not only the system state, but also its purifications [9,10].This extension of the notion of disturbance is general enough to capture the operational meaning of disturbance for Fermionic systems, however, it is still unsatisfactory, since it cannot be used to describe disturbance in models that do not enjoy purification, e. g. classical information theory.
Here we will define non-disturbing operations only by referring to the OPT framework, thus providing a notion that holds also for theories that do not satisfy local discriminability, purification, or causality, and even for theories whose sets of states are not convex.Given a system, and an operation on it, the fate of any possible dilation of the states of the system is taken into account, where by dilation we mean any state of a larger system, whose marginal is the dilated state [19].We prove then a necessary and sufficient condition for a theory to satisfy no-information without disturbance.The condition is the impossibility of realizing the identity transformation as a nontrivial coarse-graining of a set of operations.Technically speaking the above conditions amounts to atomicity of the identity.Moreover, since a theory might satisfy no-information without disturbance only when restricted to some collections of states, we will provide a weaker necessary and sufficient condition for this case.
Similarly to the Heisenberg uncertainty relations, the no-information without disturbance has been considered as a characteristic quantum trait.Instead, as we will see here, this feature can be exhibited in the absence of most of the principles of quantum theory [9], and it is ubiquitous among OPTs.Moreover, the most general case is that of an OPT where some information can be extracted without disturbance, in which case this information has all the features of a classical one.On the other hand, the only kind of systems that allow extracting any information without disturbance is the classical system.This observation provides an alternative way of characterising classical systems with respect to Ref. [20].
In Section II we review the framework of operational probabilistic theories and some relevant features that characterize quantum theory within this scenario.In Section III we generalize the notion of equality upon-input to general OPTs, including the cases in which local discriminability does not hold.In Section IV, after introducing the definition of information and disturbance, we present the main results of this paper: i) the atomicity of the identity evolution as a necessary and sufficient condition for no-information without disturbance; ii) other equivalent necessary and sufficient conditions in terms of properties of reversible evolutions of the theory; iii) other only sufficient conditions (e.g.purification); iv) a structure theorem for theories where some information can be extracted without disturbance.We prove that the information that can be extracted without disturbance is "classical", in the sense that the measurement is a repeatable reading of shareable information.Moreover, the classical theory of information is the only OPT with local discriminability in which all the information can be extracted without disturbance.In Section V we deepen the relation between no-information without disturbance and other characteristic properties of quantum theory.We show that no-information without disturbance can be satisfied by theories without purification and independently of local discriminability, providing counterexamples based on some of the conditions mentioned above.We end with the conclusions in Section VI.

II. THE FRAMEWORK
In this section we review the framework of operational probabilistic theories (OPT) (we refer to [9][10][11] for further details).
The primitives of an operational theory are the notions of test, event, and system.A test {A i } i∈X is the collection of events A i , where i labels the element of the outcome space X.In the quantum case A i is the ith quantum operation of the quantum instrument {A i } i∈X .The notion of test bridges the experiment with the theory, with i ∈ X denoting the objective outcome, and A i the mathematical description of the corresponding event.The notion of system, here denoted by capital Roman letters A, B, . .., rules connections of tests.An input and an output label are associated to any test (event).We represent a test A X := {A i } i∈X and its building events A i by the diagrams respectively, with the rule that an output wire can be connected only to an input wire with the same label.Thus, given two tests A X and B Y we can define their sequential composition (BA) X×Y as the collection of events for i ∈ X and j ∈ Y.A singleton test is a test containing a single event.We call such an event deterministic.
For every system A there exists a unique singleton test {I A } such that I B A = AI A = A for every event A with input A and output B, and we call I A identity of system A.Besides sequential compositions of tests and events, a theory is specified by the rule for composing them in parallel.For every couple of systems (A, B) we can form the composite system C := AB, on which we can perform tests (C ⊗ D) X×Y with events C i ⊗ D j in parallel composition represented as follows , and satisfying the condition . Notice that we use the tensor product symbol ⊗ for the parallel composition rule.Actually, for the quantum and the classical OPT the parallel composition is the usual tensor product of linear maps.However, for a general OPT, the parallel composition may not coincide with a tensor product.There exists a special system type I, the trivial system, such that AI = IA = A for every system A. The tests with input system I and output A are called preparationtests of A, while the tests with input system A and output I are called observation-tests of A. Preparation-events of A are graphically denoted as boxes without the input wire ρ A (or in formula as round kets |ρ) A ), and the observation-events by boxes with no output wire A c (in formula round bras (c| A ).For example, one can have events of the following kind We will always use the Greek letters to denote preparation-tests {ρ i } i∈X and Latin letters to denote observation-tests {c j } j∈X (we will not specify the system when it is clear from the context).
An arbitrary test obtained by parallel and sequential composition of box diagrams is called circuit.A circuit is closed if its overall input and output systems are trivial: it starts with a preparation test and ends with an observation test.An operational theory is probabilistic if to any closed circuit of tests corresponds a probability distribution for the joint test.Such a theory is an OPT.For example the application of an observation-event c i after the preparation-event ρ j corresponds to the closed circuit (c i | ρ j A and denotes the probability of the outcome (i, j) of the observation-test c X after the preparation-test ρ Y of system A, i.e.
For a more complex example, consider the test Summarising: by a closed circuit made of events we denote their joint probability upon the connection specified by the circuit graph, with nodes being the test boxes, and links being the system wires.
Given a system A of a probabilistic theory we can quotient the set of preparation-events of A by the equivalence relation |ρ) A ∼ |σ) A ⇔ (c| ρ) A = (c| σ) A for every observation-event c.Similarly we can quotient observation-events.The equivalence classes of preparation-events and observation-events of A will be denoted by the same symbols as their elements |ρ) A and (c| A , respectively, and will be called state and effect for system A. For every system A, we will denote by St(A), Eff(A) the sets of states and effects, respectively.States and effects are real-valued functionals on each other, and can be naturally embedded in reciprocally dual real vector spaces, St R (A) and Eff R (A), whose dimension dim(A) is assumed to be finite.
In Appendix A it is proved that an event A with input system A and output system B induces a linear map from St R (AC) to St R (BC) for each ancillary system C.The collection of all these maps is called transformation from A to B. More explicitly, given two transformations A, A ∈ Transf(A, B), one has A = A , if and only if for every C, every Ψ ∈ St(AC), and every a ∈ Eff(BC), namely they give the same probabilities within every possible closed circuit.Notice that, using the fact that two states are equal if and only if they give the same probability when paired to every effect, the above condition amounts to state that A = A if and only if for every C, and every Ψ ∈ St(AC).
In the following, the symbols A and A A B will be used to represent the transformation corresponding to the event A. The set of transformations from A to B will be denoted by Transf(A, B), with linear span Transf R (A, B).It is now obvious that a linear map A ∈ Transf R (A, B) is admissible if it locally preserves the set of states St(AC), namely A ⊗ I C (St(AC)) ⊆ St(BC), for every system C.In the following we will write A |Ψ) AC instead of A ⊗ I C |Ψ) AC , with Ψ ∈ St(AC) and A ∈ Transf(A, B) when the domains are clear from the context.
An operational probabilistic theory is now defined as a collection of systems and transformations with the above rules for parallel and sequential composition and with a probability associated to any closed circuit [21].
We introduce now the notions of refinement of an event and atomic event.
Definition 1 (Refinement of an event).A refinement of an event C ∈ Transf(A, B) is given by a collection of events {D i } i∈X from A to B, such that there exists a test {D i } i∈Y with X ⊆ Y and C = i∈X D i .We say that a refinement for every i ∈ X.Conversely, C is called the coarse-graining of the events {D i } i∈X .
In the following we will often refer to a refinement of C simply as C = i∈X D i , without specifying the test including the events D i .
Definition 2 (Refining event).Given two events C, D ∈ Transf(A, B) we say that D refines C, and write D ≺ C, if there exist a refinement Definition 3 (Non redundant test).We call a test {A i } i∈X non redundant when for every pair i, j ∈ X one has A i = λA j for λ > 0.
Notice that a test that is redundant can be interpreted as a non redundant test followed by a conditional coin tossing.As a consequence a redundant test always gives some spurious information, unrelated to the input state.From a redundant test one can achieve a maximal non redundant one by taking the test made of coarse grainings of all the sets of proportional elements.Definition 4 (Refinement set).Given an event C ∈ Transf(A, B) we define its refinement set Ref C the set of all events that refine C. Definition 5 (Atomic and refinable events).An event C is atomic if it admits only trivial refinements, namely In the special case of states, the word pure is used as synonym of atomic, with a pure state describing an event that provides maximal knowledge about the system's preparation.This means that the knowledge provided by a pure state cannot be further refined.As usual a state that is not pure will be calle mixed.
A relevant notion is that of internal state: The geometric interpretation of "internal" is the intuitive one, namely an internal state cannot belong to the boundary of the set of states St(A).
The last definition we introduce is that of state dilation.
Definition 7 (Dilation of a state ρ).We say that for some deterministic effect e ∈ Eff(B).We denote by D ρ the set of all dilations of the state ρ.More generally, given a collection of states X ⊆ St(A) we define D X := ρ∈X D ρ .
We remark that, given σ ∈ X, every state of the form σ ⊗ ρ belongs to D X .
Notice that there are generally more than one deterministic effect for the same system, differently from quantum theory, where the partial trace over the Hilbert space of the system is the only way to discard it.Instead, in a theory with more deterministic effects for the same system B the marginal state of system A generally depends on the effect used to discard the system B. In the following we will call marginal of a state with deterministic effect e the specific marginal obtained by applying the effect e ∈ Eff(B).Proof.Consider a system B and a state Ψ ∈ St(AB): and e ∈ Eff(B) deterministic.

A. Relevant classes of OPTs
A frequently highlighted property within the wider scenario of OPTs is that of multipartite states discrimination via local measurements: Definition 8 (Local discriminability).It is possible to discriminate between any pair of states of composite systems using only local measurements.Mathematically, given two joint states Ψ, Ψ ∈ St(AB) with Ψ = Ψ , there exist two effects a ∈ Eff(A) and b ∈ Eff(B), such that Two relevant consequences of local discriminability are: i) the local characterization of transformations, stating that the local behaviour of a transformation is sufficient to fully characterize the transformation itself; ii) the atomicity of parallel composition.Here we report those two features for the convenience of the reader.

Proposition 1 (Local characterization of transformations). If local discriminability holds, then for any two transformations A, A ∈ Transf(A, B), the condition A |ρ)
See Ref. [10] for the proof.

Proposition 2 (Atomicity of parallel composition). If an OPT satisfies local discriminability then the parallel composition is atomic.
For the proof of the above proposition see Ref. [22].We observe that an OPT with local discriminability allows for tomography of multipartite states using only local measurements.In an OPT with local discriminability, the linear space of effects of a composite system is the tensor product of the linear spaces of effects of the component systems, namely Eff(AB) R ≡ Eff(A) R ⊗ Eff(B) R .Thus, any bipartite effect c ∈ Eff(AB) can be written as a linear combination of product effects, and every probability (c| ρ) AB , for ρ ∈ St(AB), can be computed as a linear combination of the probabilities ((a| A ⊗(b| B ) |ρ) AB arising from a finite set of product effects.The same holds for the linear space of states and in an OPT with local discriminability the parallel composition of two states (effects) can be understood as a tensor product.Finally, the relation dim (AB) = dim(A) dim(B) between the linear dimension of the set of states/effects holds, whereas for theories without local discriminability it holds dim (AB) > dim(A) dim(B).
Recently it has been shown that relevant physical theories, such as the Fermionic theory [16], can be described in the OPT framework relaxing the property of local discriminability [17,18].The most general scenario for OPTs that exhibit a finite degree of holism is that of OPTs with n-local discriminability for some n ∈ N [23]: Definition 9 (n-local discriminability).A theory satisfies n-local discriminability if whenever two states ρ and ρ are different, there exist a n-local effect b such that (b| ρ) = (b| ρ .We say that an effect is n-local if it can be written as a conic combination of tensor products of effects that are at most n-partite.
Two notable examples are indeed Fermionic quantum theory and real quantum computation [17,18,23] that are both 2-local tomographic.
Another relevant class of OPTs is that of theories with purification [10,24].As a result of this paper we will show (Proposition 7) that the set of OPTs with purification is strictly smaller than the set of OPTs that satisfy no-information without disturbance.Moreover, we will see that a weak version of purification, which does not require the uniqueness (as in quantum theory) but just the existence of a purification for each state of the theory, is enough to imply no-information without disturbance.Accordingly, we define the following class of OPTs.
Definition 10 (Purification).We say that an OPT satisfies purification if for every system A and for every state ρ ∈ St(A), there exists a system B and a pure state Ψ ∈ St(AB) that is a dilation of ρ.
Remark 1 (Equivalent definition of purification).One can provide an equivalent definition of purification based on the notion of dilation of Definition 7. Indeed, an OPT satisfies purification if for every system A and for every state ρ ∈ St(A), the dilation set D ρ contains a pure state.
As already noticed, the above definition does not require the purification of a state to be unique up to reversible transformations on the purifying system, as it is for example in quantum theory.
The last relevant class of OPTs that we point out is that of causal theories: Definition 11 (Causal OPTs).The probability of preparation events in a closed circuit is independent of the choice of observations.Mathematically, if {ρ i } i∈X ⊂ St(A) is a preparationtest, then the conditional probability of the preparation ρ i given the choice of the observation-test {a j } j∈Y is the marginal In a causal theory the marginal probability Pr i|{a j } is independent of the choice of the observation-test {a j }: if {a j } j∈Y and {b k } k∈Z are two different observation-tests, then one has Pr i|{a j } = Pr i|{b k } .
The present notion of causality is simply the Einstein causality expressed in the language of OPTs.As proved in Ref. [10] causality is equivalent to the existence a unique deterministic effect e A .We call the effect e A the deterministic effect for system A. By definiton in non causal theories the deterministic effect cannot be unique.

III. OPERATIONAL IDENTITIES BETWEEN TRANSFORMATIONS
As expressed in Eq. ( 1), two transformations A, A ∈ Transf(A, B) of an OPT are said to be operationally identical if for every system C and for every state Ψ ∈ St(AC) one has A |Ψ) AC = A |Ψ) AC .However, two nonidentical maps A, A ∈ Transf(A, B) could behave in the same way when their action is restricted to a relevant subclass of states.
The notion of identical transformation upon input of a state ρ ∈ St(A) has been introduced in the literature (see Refs. [9,25] and references therein): Definition 12 (Identical transformations upon input of ρ).We say that two transformations A, A ∈ Transf(A, B) are equal upon input of ρ ∈ St(A), and write According to this definition, even if A = ρ A , still the maps A and A could act differently on dilations of ρ, namely it could be A |Ψ) AC = A |Ψ) AC , for some Ψ ∈ St(AC) with Ψ ∈ D ρ .In this case the difference between A and A would go undetected if only their action on system A is considered.As proved in Corollary 1, the local action of a map is sufficient to determine the map itself if the OPT satisfies local discriminability.However, for theories without local discriminability the local action of a transformation might not be sufficient to characterize it.For this reason we introduce the notion of identical transformation upon input of dilations of a state ρ.Proof.This immediately follows observing that if χ is an internal state of system A then D χ coincides with the set of all states St(AC) for any possible system C.
In Definition 5 we introduced notion of atomic events.Based on the present notion of identical transformation upon input of D ρ we can provide a weaker version of atomicity for transformations Definition 14 (Atomic and refinable transformation upon input of D ρ ).A transformation A ∈ Transf(A, B) is atomic upon input of D ρ , with ρ ∈ St(A), if all its refinements are trivial upon input of D ρ , namely B ≺ A implies B = Dρ λA, λ ∈ [0, 1].Conversely, we say that an event is refinable upon input of D ρ whenever it is not atomic upon input of D ρ .
Here we show that the two definitions 12 and 13 coincide for causal OPTs with local discriminability.For this purpose we first need the following lemma.
Proof.By definition it is (e| We simply say that a state Ψ ∈ St(AB) is faithful for system A if, given two arbitrary transformations A, A ∈ Transf(A, C), the condition A |Ψ) AB = A |Ψ) AB implies that A = A .Clearly a state Ψ ∈ St(AB) that is faithful upon input of D χ , with χ an internal state, is faithful for system A.

IV. INFORMATION AND DISTURBANCE
Within the general scenario of operational probabilistic theories, and without further assumptions on the structure of the theory, we aim at defining the notions of non-disturbing and no-information tests.These notions have already been investigated for theories that satisfy local discriminability or purification.We start highlighting the weakness of previous approaches in cases where the above hypotheses do not hold.The disturbance and the information produced by a test on a physical system A are commonly defined in relation to measurements and states of the system A only, disregarding the action of the same test on an enlarged systems AB.
A test {A i } i∈X on system A is said to be nondisturbing upon input of ρ ∈ St(A) if for every σ in the refinement set of ρ it is i A i |σ) A = |σ) A , namely if i A i = ρ I A according to definition 12.However, this definition is not operationally consistent if applied to theories without local discriminability.A physically relevant example is that of the Fermionic theory [16] that, due to the parity super-selection rule, is non-local tomographic [17,18] (it is 2-local tomographic according to Definition 9).We can see via a simple example that, for a Fermionic system A, a test {A i } i∈X such that i A i = ρ I A still can disturb the states of a composite system AB.
The parity superselection rule on a system N F of N Fermions forbids any state corresponding to a superposition of vectors belonging to F e N and F o N , representing Fock vector spaces with total even and odd occupation numbers, respectively.As a consequence the set of states St(N F ) splits in the direct sum of two spaces, containing the states with even and odd parity, respectively.It is now convenient to make use of the projectors onto the well-defined parity subspaces P e , for the even space, and P o , for the odd one.Notice that, since P e P o = P o P e = 0 any Fermionic state ρ will be of the form ρ = P e ρP e + P o ρP o .Consequently the parity test {P e • P e , P o • P o } leaves every state ρ unchanged.Intuitively , this seems to suggest that parity can be measured without disturbing.Indeed, this view is in agreement with the notion of disturbance that has been considered in the literature so far.
Consider now a mixed state ρ ∈ St(N F ), with ρ = p e ρ e + p o ρ o , ρ e and ρ o an even and an odd pure state respectively, and p e + p o = 1.For example, consider the states and p e = p o = 1/2, so that ρ = 1 2 (|00 00| + |01 01|).Since Fermionic theory allows for purification [17], we can always find a state Ψ ∈ St(M F ), with M > N that purifies ρ.Since Ψ is pure, it has a definite parity, say even.In our example one can choose Therefore, the local test on the system N F that measures the parity of the system, will not disturb the states of N F but will decohere the state Ψ to a mixed state, then introducing a disturbance.For example, in our case (P e ⊗ I)Ψ(P e ⊗ I) ).In order to avoid the above issue, and to introduce a definition of non-disturbing test that works also for theories without local discriminability, one could say that a test {A i } i∈X on system A is non-disturbing upon input of ρ ∈ St(A), if for every σ in the refinement set of ρ and every purification Ψ AB ∈ St(AB) of σ it is i A i |Ψ) AB = |Ψ) AB .This route, which has been proposed in Refs.[9,10], captures the operational meaning of disturbance also for Fermionic systems.However, the definition of Refs.[9,10] requires purification, and thus cannot be used in theories without purification, e. g. the cases of PR boxes, or the classical theory of information.
Based on the above motivations our proposal is to define the disturbance (and the information) produced by a test in terms of its action on dilations.This leads to notions of information and disturbance that are completely general and thus do not depend on local discriminability or purification.This will allow us to prove a no-information-without-disturbance theorem for a very large class of OPTs.

Definition 17 (Non-disturbing test upon input of D ρ ).
We say that a test Notice that, according to the above definition, the test {A i } i∈X is non-disturbing upon input of D ρ if for every σ in the refinement set of ρ and for every dilation This definition of disturbance thus stresses the effect of a transformation on correlations with remote systems.We say that a test {A i } i∈X is non-disturbing if i A i = I A , namely the test is operationally identical to the identity transformation of system A (it preserves any state Ψ AB for every ancillary system B).In particular, this is the case if ρ in definition 17 is internal.
Clearly, a test on system A is disturbing upon input of D ρ , with ρ ∈ St(A) whenever it is not non-disturbing, namely there exists a σ in the refinement set of ρ and a dilation Ψ AB ∈ St(AB) of σ such that i A i |Ψ) AB = |Ψ) AB .
Remark 2. We could have defined a non-disturbing test from A to C upon input of D ρ as follows where In the same spirit we can establish if a test provides information according to the following definition: Definition 18 (No-information test upon input of D ρ ).We say that a test {A i } i∈X with events A i ∈ Transf(A, C) does not provide information on system A upon input of D ρ , with ρ ∈ St(A), if for every choice of deterministic effect e ∈ Eff(CB) there exists a deterministic effect f ∈ Eff(AB) such that Again we say that a test {A i } i∈X is a no-information test if for any choice of deterministic effects e CB there exists a deterministic effect f AB , such that namely the occurrence probability of each outcome i ∈ X does not depend on the state of the system.This is the case, e.g., if the test {A i } i∈X does not provide information upon input of D ρ , with ρ internal state of the system.Naturally a test {A i } i∈X provides information upon input of D ρ , with ρ ∈ St(A), whenever there exist states σ and σ in the refinement set of ρ and Ψ ∈ D σ , Ψ ∈ D σ , with Ψ, Ψ ∈ St(AB), such that for some deterministic effect e CB .
Remark 3. Notice that in Eq. ( 9) the probability of the transformation A i ∀i ∈ X generally depends on the deterministic effect (e| CB , this accounting for non causal theories.In the more general in which also the deterministic effect (f | AB on the right hand side of Eq. ( 9) depends on i ∈ X, the test {A i } i∈X would provide information on the system state (this will happen only for probabilistic states).

A. No-information without disturbance
In this section we state the condition of no-information without disturbance and introduce criteria for it to be satisfied by an OPT.
Definition 19 (No-information without disturbance upon input of D ρ ).Consider an OPT and a state ρ ∈ St(A), with A a system of the theory.Then the OPT satisfies no-information without disturbance upon input of D ρ if for every test {A i } i∈X ⊆ Transf(A) that is nondisturbing upon input of D ρ , the test does not provide information upon input of D ρ .
Definition 20 (OPT with no-information without disturbance).We say that an OPT satisfies no-information without disturbance if it satisfies no-information without disturbance upon input of D ρ for every ρ ∈ St(A), for every system A.
We prove now a necessary and sufficient condition for a theory to satisfy no-information without disturbance.Theorem 1.An OPT satisfies no-information without disturbance if and only if the identity transformation is atomic for every system of the theory.
Proof.We start proving that if an OPT satisfies noinformation without disturbance then the identity transformation is atomic.Consider a system A of the theory, and a refinement {A i } i∈X (A i ∈ Transf(A) for every i ∈ X) of the identity map I A = i A i for system A. The test {A i } i∈X is clearly non-disturbing, therefore by hypothesis it is a no-information test, namely for every deterministic effect e ∈ Eff(AB), there exists a deterministic effect f ∈ Eff(AB) such that for every i ∈ X one has (e| AB A i = p i (e) (f | AB .Summing both sides of the last equation over the index i ∈ X, and reminding that i∈X p i (e) = 1, we find that e = f .Therefore, the noinformation condition is for every deterministic effect e ∈ Eff(AB).Consider now an arbitrary pair of pure states where the coefficients λ i (Ψ k ) generally depend on the state Ψ k .However, for each pure state Ψ k there exists a deterministic effect e k ∈ Eff(AB) such that (e k | Ψ k ) = 0.
Upon applying the deterministic effect e := 1 2 (e 1 + e 2 ) on both sides of Eq. ( 12), we get Now, applying both sides of Eq. ( 11) to Ψ k , we get and comparing the last two identities, considering that (e| Ψ k ) AB = 0, we obtain Since this holds true for every pair of pure states Ψ k , we conclude that λ i (Ψ) is independent of Ψ.Moreover, by the same argument p i (e) ≡ p i is independent of e.
Notice that we implicitly assumed that the probabilities p i do not depend on the choice of the system B. Actually this can be proven as shown in Appendix B. The converse implication, namely that if in a OPT the identity transformation is atomic then a non-disturbing test cannot provide information, is trivial.
The above theorem straightforwardly generalises upon input of D ρ as follows.
Theorem 2. Consider an OPT and a state ρ ∈ St(A).Then the OPT satisfies no-information without disturbance upon input of D ρ if and only if the identity I A is atomic upon input of D ρ .
The proof can be obtained from that of Theorem 1 by referring to Definitions 17 and 18, and substituting the pertaining equalities with those upon input of D ρ .Proposition 4.An OPT satisfies no-information without disturbance if and only if for every system there exists an atomic transformation which is either left-or right-reversible.
Proof.We start proving that a theory with an atomic reversible transformation for each system satisfies noinformation without disturbance.Let R ∈ Transf(A, C) be atomic and left-reversible (the right-reversible case is analogous).Then consider a refinement I A = i A i , with A i ∈ Transf(A) for i ∈ X, of the identity transformation.By definition of identity map it is RI A = i RA i = R, and due to the atomicity of R it must be RA i ∝ R for every i ∈ X.Since R is left-reversible (namely there exists W ∈ Transf(C, A) such that WR = I A ) it follows that A i ∝ I A for every i ∈ X, which proves the atomicity of I A .
The other implication, that in a theory that satisfies no-information without disturbance for every system there exists an atomic transformation which is either leftor right-reversible, is trivial.Indeed, in a theory that satisfies no-information without disturbance the identity, which is both right-and left-reversible, is atomic as proved in Theorem 1.
Proposition 5.An OPT satisfies no-information without disturbance if and only if for every system every reversible transformation is atomic.
Proof.We prove that if the theory satisfies noinformation without disturbance, then every reversible transformation is atomic.Indeed, let R ∈ Transf(A) be reversible, and suppose that R = i∈X R i for test and by theorem 1 one has that R i R −1 = p i I A .Finally, multiplying by R to the right, we conclude that R i = p i R, namely the refinements of R must be trivial.For the converse, it is sufficient to observe that the identity is reversible.

B. Information without disturbance
In this section we provide the general structure of the state space of any theory where some information can be extracted from a system without introducing disturbance.Such information is "classical" in the sense that the measurement is the reading of information that is repeatable and sharable.In particular, for the classical OPT the whole information encoded on a system can be read in this way.The proof of the above statements are based on the following theorem.
Theorem 3. The non redundant atomic refinement of the identity is unique for every system.Moreover, given the non redundant atomic refinement {A i } i∈X ⊆ Transf(A) of the identity Proof.Suppose that the identity transformation of system A allows for two atomic refinements I A = i∈X A i , and I A = j∈Y B j .Since i A i B j = B j , from the atomicity of the transformations B j we get By non redundancy one has that for fixed j there is only one value of i = i(j) such that c ij > 0, and normalisation gives c i(j)j = 1.By a similar argument for a fixed i there is j(i) such that d ij(i) = 1.Then one has B j = A i(j) .This proves uniqueness of the non redundant atomic refinement of the identity.
By the same argument as before, for the non redundant atomic refinement of the identity one has By atomicity and non redundancy one must have c ij = d ij = δ i,j .
The above theorem has as a consequence the following structure theorem for OPTs.
Corollary 1.For any pair of systems A, B of an OPT one has the following decomposition of the set of states of AB where for non redundant atomic decompositions {A i } i∈X , {B j } j∈Y of the identities I A and I B , one has for all Ψ i j ∈ St i j (AB).
Remark 4. Notice that from Eq. ( 17) it trivially follows that for any system the block decomposition holds However, Eq. ( 17) contains the information that the decomposition holds in that specific form also for composite systems.This is not a straightforward consequence of the decomposition of local states, as witnessed by the fermionic case.Indeed, the state in Eq. ( 4) does not have definite parity for the two subsystems corresponding to two fermions on the left and one on the right, hence the state space cannot be of the form in Eq. ( 17).
Remark 5.For a theory without atomicity of parallel composition one has also the possibility that the refinement A i ⊗ B j in Eq. ( 18) of I AB is not atomic.
In such case one has St(AB) = k∈Z St k (AB), and St ij (AB) = k∈Zij St k (AB), for some partition Z ij of Z.
In the following Corollary we formalise the fact that a theory where any information can be extracted without disturbance must have classical sets of states.
Corollary 2. If an OPT is such that it does not satisfy no-information without disturbance upon input of D ρ , for every state ρ ∈ St(A) of every system A, then every system of the theory is classical.
Proof.The proof follows from the simple fact that all blocks St ij (AB) in Corollary 1 are one-dimensional, i.e. all pure states are jointly perfectly discriminable, namely all the systems of the theory are classical.Remark 6.We remind that a system is classical when all its pure states are jointly perfectly discriminable.As a consequence the set of states of a classical system is a simplex.A special case of theory whose systems are all classical is the usual classical information theory.However, even when all systems are classical, the theory can be non classical because the system composition does not satisfy local discriminability (see Ref. [26]).

C. Sufficient conditions for no-information disturbance
In this section we prove some further conditions for no-information without disturbance that are only sufficient.The first condition is expressed in the following proposition and in its corollary.Proposition 6.An OPT satisfies no-information without disturbance upon input of D ρ , with ρ ∈ St(A), if there exists a pure state Ψ ∈ D Refρ that is faithful upon input of D ρ .
Proof.Given a system A and a state ρ ∈ St(A), let Ψ ∈ D Refρ be pure and faithful upon input of D ρ (see Definition 16).Now let the test , and since Ψ is pure, there exists a set of probabilities {p i } i∈X such that A i |Ψ) = p i |Ψ).However, due to the faithfulness of Ψ, the map A → A |Ψ) is injective upon input of D ρ , and we conclude that A i = Dρ p i I A , and, by definition, the test {A i } i∈X ∈ Transf(A) does not extract information upon input of D ρ .Corollary 3.An OPT satisfies no-information without disturbance if for every system A there exists a pure faithfull state.
In the next proposition we show that the OPTs that satisfy purification are a subset of the OPTs with noinformation without disturbance.In the following we will see that it is actually a proper subset (see also Fig. 1).Proposition 7.An OPT with purification, satisfies noinformation without disturbance.
Proof.Given an OPT with purification suppose that it violates the no-information without disturbance, namely there exists a system A such that I A is not atomic.Then let I A = i A i , for some atomic non redundant test , and {p i } i∈X a probability distribution with p i > 0 ∀i.Then by Theorem 3 we have A i σ j = δ ij |σ i ).Since the theory allows for purification, let Ψ ∈ St(AB) be a purification of ρ for deterministic effect e ∈ Eff(B).Now, one one hand since the test {A i } i∈X refines the identity, it is |Ψ) AB = i∈X A i |Ψ) AB , and being Ψ pure it must be A i |Ψ) AB = q i |Ψ) AB , with {q i } i∈X a probability distribution.On the other hand, for every i = j the marginals with deterministic effect e ∈ Eff(B) of A i |Ψ) AB and A j |Ψ) AB are perfectly discriminable.But this contradict the fact that A i |Ψ) AB and A j |Ψ) AB are both proportional to Ψ.

V. OUTLOOK ON NO-INFORMATION WITHOUT DISTURBANCE
In this last section we analyse the relation between no-information without disturbance and other proper-ties of operational probabilistic theories.Here we focus on local discriminability and purification that, being typical quantum features, are commonly associated with no-information without disturbance.Here, via concrete examples, we show that no-information without disturbance can actually be satisfied independently of the above two properties.
Proposition 8.The PR-boxes theory satisfies noinformation without disturbance.
Proof.This can be proved in several ways.For example we show that any system of the theory allows for a reversible atomic transformation and then use Proposition 4. The fact that any system has a reversible atomic transformation follows from the following three points.I) The reversible transformations of the elementary system A of the theory (the convex set of normalized states of A is represented by a square, and the set of reversible transformations of A coincides with the set of symmetries of a square, the dihedral group of order eight D 8 , containing four rotations and four reflections) are atomic [29].II) From Refs.[33,34] we know that the set of reversible maps of the N -partite system A ⊗N is generated by local reversible operations plus permutations of the systems.Accordingly, the system A ⊗N allows for a multipartite reversible transformation U 1 ⊗ U 2 ⊗ U N made of local reversible transformations U i , i = 1, . . .N .III) Since PR-boxes satisfy local discriminability, the chosen multipartite transformation is atomic due to the atomicity of parallel composition (see Proposition 2).
The last proposition leads to the following relevant corollary.
Corollary 4. No-information without disturbance does not imply purification.
Proof.PR-boxes theory with minimal tensor product [14], which satisfies no-information without disturbance (see the above Proposition 8), indeed does not satisfy purification, since every pure state is a tensor product of local pure states.
Turning to the case of local discriminability, we now show that it is independent on no-information without disturbance.Indeed, there are theories that satisfy the former and not the latter (e.g.classical theory), and viceversa, theories that satisfy the latter but not the former, e.g. the fermionic [16][17][18] and the real [17,18,23] quantum theories.This follows from Proposition 7, considering that both theories satisfy purification.Therefore we have the following corollary.Comparing OPTs that satisfy no-information without disturbance (grey set), local discriminability (red set) and purification (blue set).Quantum theory (QT) lies at the intersection of the three sets.The set of OPTs with purification is a proper subset of OPTs with no-information without disturbance.An example of OPT that satisfies noinformation without disturbance but violates purification is the PR-boxes theory with minimal tensor product (PR).Moreover, PR-boxes satisfy local discriminability, providing a non-trivial intersection between local discriminability and no-information without disturbance in the absence of purification.We observe that no-information without disturbance is independent of local discriminability and viceversa.Indeed classical theory (CL) satisfies only local discriminability while Fermionic quantum theory (FQT) and real quantum theory (RQT) satisfy only no-information without disturbance.Finally, it has been shown in Ref. [24] that there exist OPTs without local discriminability, that have all systems classical, thus retaining the possibility of extracting all the information without disturbance.An example is the bilocal classical theory (BCT) of the same Ref.[24], which satisfies 2-local discriminability (see Definiton 9).
Finally, we observe that as a consequence of Corollary 2, and subsequent remark, the classical theory of information is the only theory with local discriminability in which all the information can be extracted without disturbance.However, in the absence of local discriminability, it is still possible to have other theories where all the information can be extracted without disturbance.This has been proved in Ref. [24] where the authors describe an OPT whose systems of any dimension are classical (and then violate no-information without disturbance), but with a parallel composition that differs from the usual classical one, leading to a violation of local discriminability, more precisely to a 2-local theory according to Definition 9.

VI. CONCLUSIONS
We have analysed the interplay between information and disturbance for a general operational probabilistic theory, considering the effect of measurements also on entanglement with the environment, differently from the traditional approach focused only on the measured system.Indeed, the two resulting notions of disturbance coincide only in special cases, such as quantum theory, as well as every theory that satisfies local discriminability.Our approach is universal for any OPT, including theories without causality or purification.In this setting we proved that the atomicity of the identity transformation is an equivalent condition for no-information without disturbance.
We have characterized the structure of theories where the identity is not atomic, showing that in this case the information that can be extracted without disturbance is "classical", in the sense that it is sharable and repeatable.On the other hand, the typical situation for a general OPT entails information whose extraction requires disturbance-the only exception being classical theory.
While no information without disturbance is a consequence of purification, the converse is not true, as pointed out by counter-examples.Similarly we have shown that no-information without disturbance and local discriminability are independent properties of the theory.
Our results are expected to have immediate applicability to secure key-distribution.Indeed, a physical theory including a system (or even just a set of states of a system) that satisfies no-information without disturbance can guarantee a private and reliable channel for distributing messages.The idea of studying secure key-distribution in a framework more general than the classical and the quantum ones has been proposed in Refs.[12,14], and the present generalisation of noinformation without disturbance to arbitrary OPTs is a first step in proving that secure key-distribution is possible in any non-classical theory.

Lemma 1 .
Given a state ρ ∈ St(A) one has Ref Dρ ⊆ D Refρ , where Ref Dρ denotes the union of the refinements of any state in D ρ .

Definition 13 (
Identical transformations upon input of D ρ ).Given a state ρ ∈ St(A), we say that two transformations A, A ∈ Transf(A, B) are equal upon input of D ρ , and write A = Dρ A , if A |Ψ) AC = A |Ψ) AC for every Ψ ∈ D Refρ .An immediate consequence of the above definition is the following lemma Lemma 2 (Identical transformations).If two transformations A, A ∈ Transf(A, B) are equal upon input of D χ , with χ ∈ St(A) internal, then the two transformations are identical and we simply write A = A .

Proposition 3 .
with e the unique deterministic effect of B. Given an arbitrary effect b ∈ Eff(B), we can always consider the observation test {b, e − b}.Since it is (b| B |Ψ) AB , (e − b| B |Ψ) AB ∈ Ref ρ we have that {(b| B |Ψ) AB |b ∈ Eff(B)} ⊆ Ref ρ .In a causal OPT with local discriminability, given two transformations A, A ∈ Transf(A, B), the two conditions A = ρ A and A = Dρ A are equivalent.Proof.We first prove that A = ρ A ⇒ A = Dρ A .Consider an arbitrary Ψ ∈ D Refρ , with ρ ∈ St(A).Let for example be Ψ ∈ St(AC).By hypothesis it isA = ρ A , namely A |σ) A = A |σ) A for every σ ∈ Ref ρ .Then, due to Lemma 3, ∀b ∈ Eff(B), ∀c ∈ Eff(C), it is (b| B (c| C (A ⊗ I C ) |Ψ) AC = (b| B (c| C (A ⊗ I C ) |Ψ) AC ,and by local discriminability we conclude that (A ⊗ I C ) |Ψ) AC = (A ⊗ I C ) |Ψ) AC .Since this holds true for every Ψ ∈ D Refρ , we conclude that A = Dρ A .The converse implication A = Dρ A ⇒ A = ρ A is trivial.Based on the above identities of transformations we can also provide the notion of faithful state upon input of ρ and faithful state upon input of D ρ (which again coincide for OPTs whit local discriminability).Definition 15 (Faithful state upon input of ρ).A state Ψ ∈ St(AB) is faithful upon input of ρ ∈ St(A) if, given two arbitrary transformations A, A ∈ Transf(A, C), the condition A |Ψ) AB = A |Ψ) AB implies that A = ρ A .Definition 16 (Faithful state upon input of D ρ ).Given a system A and a state ρ ∈ St(A), we say that a state Ψ ∈ St(AB) is faithful upon input of D ρ if, given two arbitrary transformations A, A ∈ Transf(A, C), the condition A |Ψ) AB = A |Ψ) AB implies that A = Dρ A .

Corollary 5 .Figure 1 .
Figure1.Comparing OPTs that satisfy no-information without disturbance (grey set), local discriminability (red set) and purification (blue set).Quantum theory (QT) lies at the intersection of the three sets.The set of OPTs with purification is a proper subset of OPTs with no-information without disturbance.An example of OPT that satisfies noinformation without disturbance but violates purification is the PR-boxes theory with minimal tensor product (PR).Moreover, PR-boxes satisfy local discriminability, providing a non-trivial intersection between local discriminability and no-information without disturbance in the absence of purification.We observe that no-information without disturbance is independent of local discriminability and viceversa.Indeed classical theory (CL) satisfies only local discriminability while Fermionic quantum theory (FQT) and real quantum theory (RQT) satisfy only no-information without disturbance.Finally, it has been shown in Ref.[24] that there exist OPTs without local discriminability, that have all systems classical, thus retaining the possibility of extracting all the information without disturbance.An example is the bilocal classical theory (BCT) of the same Ref.[24], which satisfies 2-local discriminability (seeDefiniton 9).
C) is a left-reversible transformation, namely there exists another transformation W ∈ Transf(C, A) such that WR = I A .However, the classification of non-disturbing tests according to this definition is is trivially provided by the classification according to Definition 17.Indeed, the most general non-disturbing test from A → C is the sequence of tests of the form {A i R} i∈X , with {A i } i∈X non-disturbing according to Definition 17, and R ∈ Transf(A, C) left-reversible.