Analysing causal structures in generalised probabilistic theories

Causal structures give us a way to understand the origin of observed correlations. These were developed for classical scenarios, but quantum mechanical experiments necessitate their generalisation. Here we study causal structures in a broad range of theories, which include both quantum and classical theory as special cases. We propose a method for analysing differences between such theories based on the so-called measurement entropy. We apply this method to several causal structures, deriving new relations that separate classical, quantum and more general theories within these causal structures. The constraints we derive for the most general theories are in a sense minimal requirements of any causal explanation in these scenarios. In addition, we make several technical contributions that give insight for the entropic analysis of quantum causal structures. In particular, we prove that for any causal structure and for any generalised probabilistic theory, the set of achievable entropy vectors form a convex cone.

for analysing the features of different theories by allowing us to phrase communication and cryptographic protocols in terms of the dependencies among the involved systems. They help us predict the success of players engaged in a protocol when restricted according to different theories, for example, in random access coding and the related principle of Information Causality [3,4].
The differences between the observable correlations that can be achieved with classical and quantum resources within a given causal structure have been extensively analysed, starting with the derivation of several classical constraints and their quantum violations [5,6], and progressing to a systematic analysis [7][8][9][10][11]. Less work has been dedicated to understanding the limitations of quantum systems [9,12,13] and of the behaviour of theories beyond. For the latter, there have been analyses of the implications of the nosignalling principle on causal structures [14,15]. More generally, understanding the differences of generalised probabilistic theories (GPTs) with respect to different tasks may inform the search for principles that single out quantum mechanics.
In this work, we introduce a technique for deriving constraints on the observable correlations that are achievable in different causal structures according to different GPTs, with the aim of moving towards a systematic analysis of the differences between such theories. Our approach is based on measurement entropy [16,17] and inspired by entropic approaches to analysing causal structures involving classical and quantum resources [8,9,[18][19][20][21][22]. That such a generalisation is possible, was not at all clear, since work regarding the definition of entropy in GPTs showed that there is no entropy measure that retains the relevant properties of the von Neumann entropy [16,17]. In particular, the additivity of entropy under the composition of different systems is not retained by the proposed measures, which in previous entropic approaches for analysing causal structures was crucial for encoding causal constraints.
One of the key points that allows us to overcome these issues is to explicitly include the conditional entropy in the analysis. Nevertheless, since our final results are stated in terms of the Shannon entropy they can be directly compared to those obtained with previous entropic techniques.
We apply our method to various causal structures, generating a series of entropic constraints that exclude certain causal explanations of observed correlations when restricted by arbitrary GPTs. This allows us to compare different causal structures with respect to the correlations they allow in different theories (in particular, we compare classical, quantum and arbitrary GPTs). In some cases we find the same sets of entropic description regardless of the theory (here known quantum inequalities also apply to GPTs), while in others we can show an entropic separation. For instance, we apply our technique to Information Causality [3], a candidate principle for singling out quantum theory, showing that our method improves upon that of [15], yielding the stronger inequalities of [23]. Although the maximally nonlocal GPT, box-world does not satisfy the notion of Information Causality, we identify minimal notions of causation that are satisfied.
In addition to providing a method for analysing causal structures with GPT resources, we make technical contributions by showing that any set of achievable entropy vectors for the observed variables in a causal structure involving quantum or other generalised probabilistic resources is a convex cone. Previously this had only been shown for the entropy vectors of classical resources [8,20,22,24]. This insight allows for easy comparison of the entropic sets within different theories, and in some cases enables us to prove that a given characterisation is complete by showing that all extremal points are achievable. We also give some insights into the entropic analysis of quantum causal structures.

Preliminaries
For every system A in a GPT, there is an associated state space S A , a compact convex subset of a real vector space V and an associated space of effects, F A . An effect e ∈ F A is a linear map S A → [0, 1] (thus, e is a vector in the dual space to V ). There is a special effect, u A ∈ F A , called the unit effect, with the property that u A (s) = 1 for all s ∈ S A . A measurement M is a collection of effects whose sum is the unit effect, i.e., we can write M = {e x ∈ F A : x e x = u A }. We use E A to represent the set of allowed measurements on A. The interpretation of e x (s) is the probability of outcome x when M is performed on a system in state s.
Consider two measurements on A: M = {e x } x∈R M and N = {f y } y∈R N . If there exists a map F : R M → R N such that x∈R M :F (x)=y e x = f y ∀y ∈ R N (1) we say that M is a refinement of N (equivalently, N is a coarse-graining of M ). 1 A refinement is trivial if for all x ∈ R M , e x = c(x) f F (x) for some c(x) ∈ R >0 . The subset of fine-grained measurements, E * A , are those for which there are no non-trivial refinements. Throughout this article we restrict to GPTs where there is at least one finite-outcome fine-grained measurement (in classical and quantum theory this is a restriction to finite-dimensional systems).
Transformations of systems are represented by linear maps between state spaces, T : S A → S B and the set of such transformations is denoted T A→B . The set T A→A , contains the identity transformation, I A , and is closed under composition. Furthermore, a transformation followed by a measurement is a valid measurement.
Two systems A and B can be thought of as parts of a single joint system AB. We do not specify precisely what the joint state space is, but a minimal requirement is that if s A ∈ S A and s B ∈ S B then s A ⊗ s B ∈ S AB . States that can be written as s A ⊗ s B are called product and convex combinations thereof are separable. Analogously, This implies that we have non-signalling theories: Suppose {e x a } x ∈ E A and {e y b } y ∈ E B are measurements for a = 1, . . . , n a and b = 1, . . . , n b , then, for example, which is independent of a. We also assume that there are well-defined reduced states: ∀s AB ∈ S AB ∃s A ∈ S A s.t. ∀e ∈ F A , e(s A ) = (e⊗u B )s AB . The post-measurement state on A after a measurement on B with outcome x is (2) If the system A is classical, then S A is a simplex and (up to relabelling) there is only one finegrained measurement that is not a trivial refinement of another fine-grained measurement. We call this a standard classical measurement. Note that classical systems can be represented in any GPT and composing them maintains separability.
Box world [25] is the GPT in which the joint state space of several systems is in one-to-one correspondence with the set of no-signalling distributions amongst those systems, i.e., its state space in this sense the largest possible within the framework.

Measurement entropy and its properties
The approach to analysing causal structures that we use in this work is based on measurement entropy. In this section we introduce this and outline some of its properties.
Measurement entropy was first introduced in [17,23]; we follow the exposition of [23] here. The measurement entropy, H + , is the minimal Shannon entropy of the outcome distribution after a fine-grained measurement, i.e., for s A ∈ S A , Several ways to define the conditional measurement entropy have been proposed [17,23], of which we use the following [23]. For any state s AB ∈ S AB with reduced state s B ∈ S B , the conditional measurement entropy is (4) where H + A |y is the entropy of the state on A after a measurement on B with outcome y, s A|y . For classical systems these entropies coincide with the Shannon entropy, H.
The measurement entropy satisfies a list of properties that are useful to this work. Some of these have previously been derived in [17,23], others are new to this work. In the remainder of this section, S A , S AB etc. refer to state spaces within an arbitrary GPT. For the proofs of the first two properties we refer to [23].
Property 2 (Reduction to Shannon entropy [17,23]). Let A and B be classical systems and s AB ∈ S AB , then H + (A) = H(A) and H + (A|B) = H(A|B).
Proof. If {f j } form a measurement on C, then {g j } form a measurement on B, where g j : s B → f j (T (s B )). It follows that Proof. If A and B are independent, after any measurement {e j } ∈ E B on B the postmeasurement state on A is s A , independent of the outcome of the measurement. Therefore H + (A|B) = H + (A).

Property 5 (Classical subsystem inequalities).
For a joint state s ABC ∈ S ABC with classical subsystems A and B, H + (AB|C) ≥ H + (A|C).

Proof. For any measurement
Applying this to a sequence of measurements that converge to H + (AB|C) establishes the claimed result.
Property 6 (Subadditivity [17,23]). In GPTs for which M ∈ E * A and N ∈ E * B imply M ⊗ N ∈ E * AB (which holds for locally tomographic theories, including box-world) We refer to [23] or Appendix A for a proof of Property 6. Note further that Property 6 is the only one that does not hold in arbitrary GPTs.
Property 7 (Lemma C2 of [23]). For s ABC ∈ S ABC where A and B are classical systems, We refer to [23] or Appendix A for a proof of Property 7.
If C is also classical then this holds with equality.
We prove this property in Appendix A. Note that Properties 7 and 8 are both relaxations of the chain rule, H(A|BC) = H(AB|C)−H(B|C), that holds for Shannon and von Neumann entropy.

Entropy vector method for causal structures in GPTs
A causal structure is a set of nodes arranged in a directed acyclic graph, some of which are labelled observed. Each observed node has a corresponding random variable, while the other, unobserved nodes correspond to resources from a GPT. For a causal structure C we use C C , C Q , C B or C G depending on whether the resources are classical, quantum, box-world systems or from some unspecified GPT respectively. For each unobserved node we associate a subsystem with each of its outgoing edges. An example for this is displayed in Figure 1.
A direct arrow from a node A in a causal structure to a node Z means that A is a parent of Z; a directed path from A to Z means that A is an ancestor of Z. For an unobserved node A, all subsystems associated with its outgoing edges are considered parents/ancestors of each its children/descendants. Given a causal structure, a coexisting set of systems [9,22] is one for which a joint state can be defined. In general, no coexisting set includes all nodes, since there is no joint state of a system and the output obtained from a measurement on it (unless the system is classical).
Our method to generate new inequalities for causal structures with GPT resources begins by considering an entropy vector whose components are the entropies and conditional entropies of all coexisting sets. Conditional entropies composed entirely of classical subsystems are excluded because they are linear combinations of other entropies (e.g., We then impose a system of linear (in)equalities that are necessary for a vector to be realisable as an entropy vector in a causal structure. These inequalities are constructed using the properties of the measurement entropy explained earlier and strong subadditivity in the cases where the measurement entropy reduces to the Shannon entropy. In the case of locally-tomographic GPTs, such as box-world, there is one additional property (Property 6) that does not hold in all GPTs. Further constraints come from the causal structure: two sets of nodes are independent if they do not share any ancestors in the causal structure. In general, there may be further independencies among the observed variables (see Theorem 22(i) of [14] and Appendix D). This system of inequalities constrains a polyhedral cone, which can be projected to a marginal cone that contains no components involving unobserved systems. The projection is performed with a Fourier-Motzkin elimination algorithm [26]. An example that illustrates this procedure in detail is provided at the beginning of Section 5.1.
When dealing with causal structures for which computing all entropy inequalities for the marginal scenarios of interest is computationally impractical or even not possible with the computational resources at hand, due to the scaling of Fourier-Motzkin elimination [27], we can still derive valid entropy inequalities by marginalising subsets of all valid inequalities.
Furthermore, given a particular observed distribution that we suspect to be incompatible with a causal structure (either for classical theory, quantum theory or boxworld), it is not necessary to go through the marginalisation procedure discussed here. Instead we can look for a certificate of incompatibility using a linear program. This program can be set up by computing the entropy vector for the distribution in question and then adding this as a list of equalities (one for each of its components) to the list of valid entropy inequalities for the causal structure. If the resulting system of linear (in)equalities is infeasible, then the distribution in question is certified as incompatible with the causal structure within the theory under consideration.
For some causal structures, the entropic constraints derived using Property 6 are also valid for GPTs that are not locally tomographic, which is the content of the following proposition. This proposition follows from the insight presented in the proof of Lemma 2 in [28] that any joint measurement on a classical and a GPT system can be written as a measurement on the classical system followed by an outcome-dependent measurement on the other (see also Lemma 4 in Appendix A).
Proof. Since each node has at most one unobserved parent, by Lemma 4 at each node we can assume a standard classical measurement on the classical subsystems followed by a measurement on the GPT subsystem depending on the result. Consider then Note that the same argument does not hold if there are multiple unobserved parents at a single node. This is because some joint measurements cannot be expressed as a measurement on one system followed by a measurement on the other conditioned on the first (cf. Lemma 4), for example a measurement in the Bell basis in quantum mechanics.
The method presented in this section recovers previous entropic approaches for describing classical [8,20] and quantum [9] causal structures as special cases. In the classical case, the measurement entropy and its conditional version coincide with the Shannon entropy and all variables in a causal structure (observed and unobserved ones) coexist. In this case, our method is equivalent to that of [8,20]. 2 For the quantum case, the recovery of the method proposed in [9] from ours is less obvious and is explained in Section 4.
When considering different causal structures, convexity of the sets of achievable entropy vectors of the observed variables is useful for their comparison: for instance, it allows us to prove that the achievable entropies in one case are contained in those of another by considering only the extreme points. The following theorem (proven in Appendix B) establishes convexity in general and is therefore an important structural insight.

Causal structures and post-selection
For a causal structure involving a parentless observed node X that takes values 1, 2, . . . , n, we can also analyse an adapted causal structure where each descendant of X is split into nvariables and X is dropped, e.g., a descendant Y is split into Y |X=1 , . . . , Y |X=n . The resulting causal structure is said to be post-selected on X (see Figure 5 for an example, and [9,22] for further details of this procedure). For some causal structures, C, post-selection is necessary for deriving entropy inequalities that distinguish between C C , C Q and C B [9,18,29]. When postselecting on parentless nodes, convexity of the set of achievable entropy vectors of the observed variables also holds by the following corollary of Theorem 1, which is also proven in Appendix B. Corollary 1. For any causal structure C G in which one or more nodes have been split into alternatives by post-selecting on parentless observed nodes, the closure of the set of achievable entropy vectors for the coexisting observed variables is a convex cone.

Quantum causal structures
For quantum causal structures there is an entropy vector method in which the components are the unconditional von Neumann entropies, H, of all coexisting sets [9]. Conditional entropies are not explicitly included, but, because , relations involving conditional entropy can still be encoded. It is natural to ask whether the technique introduced earlier in this paper could yield different results to the existing entropic approach to quantum causal structures. In this section, we consider this question.
Following the approach outlined earlier in the paper we consider including all entropy inequalities from [9] (see Appendix D for a full description of the method employed in [9]). These are automatically part of our approach, since the unconditional measurement entropy-for which we include all inequalities valid in the theory at hand-coincides with the von Neumann entropy in the quantum case. In addition, we also take into account inequalities for conditional measurement entropies, which are always positive and differ from the conditional von Neumann entropy. Thus, our approach could lead to more restrictive inequalities than the previous quantum one. With the following proposition we show that the previous method for quantum causal structures [9] can be refined in a way that makes the additional variables corresponding to conditional measurement entropies superfluous.

Proposition 2.
Consider a causal structure C Q and suppose that, in addition to any causal constraints, we use positivity of unconditional entropies, strong subadditivity and additionally impose positivity of conditional entropies for all combinations of variables that occur in a coexisting set. If we then eliminate all variables corresponding to unobserved systems, the resulting entropic inequalities for the observed variables are all valid.
As a result, all valid inequalities for conditional measurement entropy can be imposed for the conditional von Neumann entropy instead, and can hence be encoded as linear constraints on the (unconditional) von Neumann entropy. In other words, although the conditional von Neumann entropy can be negative for some quantum states, by constraining it to be positive and eliminating unobserved variables, we obtain valid entropy inequalities for the observed variables in C Q . This was not used in previous entropic analyses of quantum causal structures [9].
Previous quantum methods [9,22] instead analysed quantum causal structures by considering the von Neumann entropy of coexisting sets and imposing positivity of the entropy, strongsubadditivity as well as the weak monotonicity constraints that for any state ρ XY Z , H(X|Y ) + H(X|Z) ≥ 0 [30]. Weak monotonicity constraints are not needed in the statement of Proposition 2 because they are implied by the positivity of conditional entropies. See Appendix C for the full proof of Proposition 2 and for a complete account of previous quantum methods.
This also gives an important insight into the entropy vector method: the difference between the inequalities that result from using the entropy vector method in the classical [8,20] and quantum [9] cases is entirely due to the fact that in the quantum case not all variables coexist and does not arise from the different properties of the Shannon and von Neumann entropy (see also Appendix C for further discussion).
For most causal structures of interest we can prove that our refined entropy vector method does not allow us to find any tighter entropy inequalities than that of [9] (see Lemma 8 in Appendix D). Nonetheless, the possibility of using positivity of conditional entropy instead of weak monotonicity simplifies the quantum method even in these cases. before presenting these cases in detail.
First, in Section 5.1, we consider the case without post-selection and give three types of example: • A causal structure in which the actual entropic cones for classical, quantum and GPTs are provably the same is in given in Section 5.1.1.
• A causal structure in which our methods lead to the same outer approximations to the entropy cones in the classical, quantum and GPTs cases is given in Section 5.1.2.
• Two causal structures for which our methods yield the same outer approximations of the entropy cones in the classical and quantum cases, but a different outer approximation is obtained for GPTs is given in Sections 5.1.3 and 5.1.4. In these cases we are not aware of any GPT correlations that violate the classical/quantum inequalities.
Note that we already know of a case (the triangle causal structure) where we have different outer approximations of the entropy cones in the classical and quantum cases [10]. We are currently unaware of a case without post-selection in which the outer approximations of the entropy cones between two theories are different and in which we know of correlations in one theory that violate those in another. In other words, it remains possible that the gaps we find in the outer approximations are a symptom of the method used, rather than features of the actual entropy cones. Then, in Section 5.2 we move to the use of post-selection and show that: • For the bilocality causal structure there are different entropy cones for classical, quantum and locally tomographic GPTs (Section 5.2.1). We also identify distributions that certify that the true entropy cones are indeed different for all three cases.
• We can use our methods to obtain entropic inequalities for the Information Causality scenario in arbitrary GPTs (Section 5.2.2).

Analysis of causal structures without postselection
In this section we illustrate that our method for GPTs can recover the classically valid entropy Figure 1: The instrumental causal structure. The nodes labelled X, Y and Z correspond to observations, modelled as random variables. The node A labels a resourcesystem with subsystems A Y and A Z associated to its outgoing edges.
inequalities for some causal structures (the first two examples), and that for other causal structures we recover different inequalities (the last two examples).

Instrumental Scenario
In the classical literature, one well-studied causal structure is the instrumental causal structure [31] of Figure 1. The quantum version of this causal structure has also been studied [13,32].
In this case, the coexisting sets are all sub- The second set implies that the entropy vector includes components corresponding to H+(AY ), For these we impose all entropy inequalities of Properties 1 to 8 as well as the independencies of the subsystems of A and X, such as Eliminating all other variables in order to obtain inequalities that only involve the compo- , we obtain the Shannon inequalities (positivity of entropy and conditional entropy, and strong subadditivity) for three variables and which form a polyhedral cone Γ. Valid entropy vectors for distributions compatible with the instrumental scenario for a system A of some locally tomographic GPT are necessarily within Γ. For this causal structure, it is known that being in Γ is necessary and sufficient for being in the closure of the set of valid entropy vectors when A is classical or quantum [10,14,33]. Since classical systems are a special case of systems in a GPT, it follows that membership of Γ is also sufficient for locally tomographic GPTs. We have hence found all valid entropy inequalities in this scenario for such theories.
According to Proposition 1, (5) holds in any (not necessarily locally tomographic) GPT 3 . Thus, in the instrumental causal structure, Γ completely characterises the set of achievable entropy vectors independently of whether system A is classical, quantum or any GPT system. Figure 2 Applied to the causal structure of Figure 2, our method leads to the following entropy inequalities for the observed variables when A and B are taken to be box-world systems:

Causal structure of
and the Shannon inequalities. In [14] the same inequalities were derived for classical A and B. It follows from Proposition 2 (see below), that these constraints also hold in the quantum case. Thus, for all three theories we obtain the same outer approximation of the respective entropy cones 4 . Violation of any of these inequalities excludes this causal structure as a possible explanation of the observed correlations, irrespective of the nature of the pre-shared resources.
In this example, these outer approximations are not tight: there are further valid entropy inequalities for the classical systems C, D, E and F -so-called non-Shannon inequalities-that lead to tighter approximations, e.g. the following inequality (from [24]):

Causal structure of Figure 3(a)
With resources A and B that are allowed in boxworld we obtain the Shannon inequalities and Classical and quantum resources A and B lead to slightly tighter inequalities, namely the Shannon inequalities and The question of whether or not there exist boxworld distributions that violate one of these inequalities remains open 5 .

Causal structure of Figure 3(b)
With resources A and B that are allowed according to the theory of box-world we obtain the Shannon inequalities and Classical and quantum resources lead to slightly tighter inequalities, namely the Shannon inequalities, Equation (14), Inequality (15) and Note that the classical case was treated in [14]. 5 In general, the methods used here lead to outer approximations to the entropy cone of a causal structure. In some cases we can show these to be tight, see e.g. [29,34], but not always. It could thus be the case that while there is a gap between the outer approximations of these cones, there is still no gap between the true cones. In general, such comparisons are interesting for analysing the nature of causation in different theories. In a sense the box-world inequalities can be thought of as minimal requirements for a theory with a reasonable notion of causation. Developing a systematic understanding of this may hint at ways to find a physical principle that singles out quantum correlations in general scenarios.

Analysis of causal structures with postselection
In this section we apply our method to postselected causal structure and show how this allows us to distinguish the correlations obtained in different GPTs. We give two examples for this. 6

Bilocality
The bilocal causal structure, first analysed in [36,37], characterises the situation we encounter in scenarios where we rely on entanglement swapping [38], see Figure 4(a).
The entropy vector method provides us with a convenient means to compare the observable correlations when the sources L 1 and L 2 come from different theories. Applying the entropy vector method to the bilocal causal structure with sources from a locally tomographic GPT, apart from the Shannon inequalities we find only up to symmetry (exchanging X 0 and X 1 as well as Z 0 and Z 1 ). 6 These constraints can be useful even in the classical case, where there is already the inflation technique for deriving incompatibility constraints [11,35], if the cardinality of the variables is high, for instance.
Applying the entropy vector method to the bilocal causal structure in the quantum case we find the Shannon inequalities as well as up to symmetry. [There are 4 instances of (19) (obtained by exchanging X 0 and X 1 as well as Z 0 and Z 1 ), 16 of (20) (obtained by exchanging X 0 and X 1 , Y 0 and Y 1 , or Z 0 and Z 1 and by exchanging the roles of X and Z) and 8 of (21) (obtained by exchanging X 0 and X 1 , Y 0 and Y 1 , or Z 0 and Z 1 ).] The equality constraints are found in both the quantum and GPT case, while the quantum description is tighter, and in this case the gap is provably significant. An example of a boxworld distribution that violates (21) is obtained by taking the systems L 1 and L 2 to be PR-boxes, A ∈ {0, 1} is the uniform input and X ∈ {0, 1} the output on the left, and analogously for C and Z on the right. B ∈ {0, 1} is with probability 1 2 input into the first and with probability 1 2 input into the second box and the respective output serves as an input for the other, where Y is equal to the outputs of the two boxes. This distribution also violates the classical inequalities below and is, to the best of our knowledge, the first manifestation of such a violation with a GPT correlation that is proven to be achievable in the bilocal causal structure. [Note that when the violation of the classical inequalities of [20] was discussed, a tripartite box was considered directly (and without proof that it can be generated from a GPT in the bilocal causal structure) 7 .] For classical sources L 1 and L 2 , we obtain a convex cone constrained by Shannon inequalities and 53 other independent classes of linear inequalities. We list one representative of each of these 53 classes in Appendix E. Note that the (in)equalities obtained in the quantum case (cf. (19)-(21)) are present. A quantum distribution that leads to an entropy vector outside this cone and which is thus not achievable with classical resources in this causal structure is, for instance, the following: Suppose L 1 and L 2 share a singlet state each Figure 4: Bilocal causal structure. (a) The nodes labelled A, B, C, X, Y and Z correspond to observations, modelled as random variables. The nodes A and B label a resource-system which in this case we take to be quantum. (b) Post-selected version of the bilocal causal structure, where X 0 stands for X |A=0 and similarly for X 1 as well as the corresponding Y and Z.
and at the nodes the output is made according to projective measurements Π θ in the basis {cos θ|0 + sin θ|1 , sin θ|0 − cos θ|1 }. The angles are chosen as follows: for X 0 , θ = x, for X 1 , θ = 3x, for Z 0 , θ = 0, for Z 1 , θ = 2x and for Y 0 we consider θ = 0 on the state shared by L 1 and θ = (2y 0 + 1)x on the other, for Y 1 we consider θ = 2x on the state shared by L 1 and θ = (2y 0 + 1)x on the other system, where the output is made up of the outputs of both measurements in both cases and where x = 0.1 and y 0 denotes the outcome of the first measurement. This distribution violates several of our inequalities, for instance one inequality from class 31 from Appendix E). Note also that for the entropic inequalities derived in [20] for the bilocal causal structure with classical sources, no quantum violations are known, so, as far as we know, the characterisation reported here is the first entropic one that is provably able to resolve this gap.

Information Causality
Information causality [3] is a candidate principle for singling out quantum theory. Roughly speaking the principle is that that sending n bit of classical information from one party to another cannot give the recipient access to more than n bits of previously unknown information regardless of any pre-shared resources the parties may have. The associated causal structure is shown in Figure 5, and information causality is obeyed if This relation is known to hold in both classical and quantum theory, while it is violated in boxworld. 8 Using the technique of the present paper, we find that the following relations hold for underlying box-world systems These are valid in any GPT (see Proposition 1) for the entropy vectors H containing the entropies of all 23 subsets of the coexisting Thus, although information causality does not hold in general, some minimal notion of causation remains (beyond no-signalling).
We remark that the information causality scenario in boxworld was also considered in [15], but in a slightly different way. There the relations I(X 1 : Y |R=1 ) ≤ H(Z) and I(X 2 : Y |R=2 ) ≤ H(Z) were postulated. We are able to recover these with our approach in the following way.
If, instead of considering all joint entropies of coexisting variables, only the restricted entropy vectors with components ) are considered, the inequalities I(X 1 : Y |R=1 ) ≤ H(Z) and I(X 2 : Y |R=2 ) ≤ H(Z) emerge. Since all extremal vertices of the entropy cone of vectors H R are achievable (as was shown in [15]) and, according to Corollary 1, the (closure of the) set of achievable entropy vectors H R is convex, this is the true entropy cone for this restricted marginal scenario. However, we don't see a clear motivation for excluding the additional observed entropies.
The two inequalities (23) and (24) were already derived in [23] for box-world, but here they emerge systematically from our method (and our results imply that they hold for any GPT). They are the only inequalities (other than Shannon inequalities) that follow from our method as it was introduced above. However, further inequalities hold for this scenario, for instance the non-  Figure 5: Causal structure of the Information Causality scenario. (a) Alice holds two pieces of information X 1 and X 2 and is allowed to send a message Z to Bob. Bob then has to make a guess of either X 1 or X 2 , depending on the request of a referee, R = 1 or R = 2. A is a preshared resource the parties may use. (b) We divide Y into two variables, Y |R=1 and Y |R=2 , depending on the question R. While for classical A Bob can always compute the value of both Y |R=1 and Y |R=2 , more generally these have to be understood as alternatives, of which only one is generated.

Shannon inequality [24]
Because a complete set of non-Shannon inequalities is not known, we do not have a complete characterisation of the entropy cone of vectors H for this scenario (which may require infinitely many linear inequalities).

Limitations and directions to overcome them
As is the case for previous entropic methods [8,9,20], there are causal structures for which this method does not imply any entropic constraints for the observed variables (except Shannon inequalities), an example being the triangle causal structure [7, 9-11, 14, 19, 41]. 9 Furthermore, all known strategies that certify incompatibility of entropy vectors relying on GPT resources with classical and quantum scenarios rely on postselection (cf. Figure 5). If post-selection is necessary for this, for some causal structures (such as the triangle causal structure) current entropic techniques cannot certify this distinction. This is not a severe limitation, since most experimentally interesting causal structures involve measurement settings, which we can post-select on. Considering entropy vectors rather than the corresponding joint probability distributions 9 However, the triangle causal structure with inputs can be treated with our method. gives a computational advantage and provides constraints that are valid independently of the dimension of the involved resources. However, this advantage comes with restricted precision (see for instance [29]). In particular, there are distributions between observed variables that are realisable with box-world resources but not with classical or quantum systems but which the method cannot certify as such. While we found that our method strictly improves on previous entropic methods [15], another promising research avenue is to generalise the inflation technique to GPTs. This research has been started in [11], where it was pointed out that certain inflations are valid for any GPT and hence some general constraints can be derived with it. The question of how these constraints relate to the ones found with our method is left for future work. One key difference is that the method presented in [11] does not distinguish what GPT the latent systems are described by (e.g. whether they are quantum and box-world systems). Considering the latent variables explicitly allows us to make this distinction and to certify that different sets of correlations are produced within different GPTs (e.g. quantum theory and box-world).

A Further details regarding the inequalities for the measurement entropy in GPTs
Before getting to the additional properties, we need a few lemmas. The first is the concavity of H + , proven in [17,23], which follows from the concavity of the Shannon entropy.
The next lemma says that the infimum in the definition of conditional measurement entropy can be restricted to fine-grained measurements.
where α = f n−1 (s B )/(f n−1 (s B ) + f n (s B )). Using concavity, we have i.e., for this coarse-graining the measurement on Bob cannot decrease the expected measurement entropy on Alice conditioned on the result. Since all coarse-grainings can be formed by a sequence of such combinations, it follows that the infimum on Bob's measurements can be restricted to fine-grained measurements.
It is also worth noting the following.
Proof. Consider the case in which one of the effects in {e x } n x=1 is split into two to form {f y } n+1 y=1 with f y = e y for y = 1, . . . , n − 1 and f n + f n+1 = e n . We have H + A |n+1 = H + A |n and hence in this case the claim follows from f n+1 + f n = e n . Since any trivial refinement can be formed by combining such splittings, the result generalizes to all trivial refinements.
The following lemma is in essence a restatement of part of the proof of Lemma 2 from [28]. A be classical and B be a system from an

Lemma 4. Let
If we take N A = {f x } and N x B = {e r x } r , then this is equivalent to measuring M : where we have used that x s x A f x is the identity transformation on the classical system A.
We will in particular rely on the following corollary of this lemma. Proof of Property 7. Using Corollary 2, the measurement on BC in H + (A|BC) can be decomposed into a standard measurement on B, yielding y, followed by a fine grained measurement on C depending on the value of y obtained, i.e.,

Corollary 2. Let
For some fixed set {h z y } z ∈ E * C , the right hand side (without the inf) takes the form where we have used that there is a standard measurement achieving the infimum for classical systems in H + (B) and the subadditivity of the Shannon entropy. The result follows because we can choose {h z y } z ∈ E * C such that the left hand side is arbitrarily close to H + (A|BC).

Proof of Property 8. We start from the definition of H + (AB|C), and use Corollary 2 to give
In the last inequality we use that a measurement on C followed by a fine-grained measurement on A is a joint measurement on s AC , so the infimum over all joint measurements cannot be larger than this term.
If C is classical then we can drop inf {f z }∈E C and take C to always be measured with a standard classical measurement. This gives equality in both inequalities in the above proof.

B Proof of Theorem 1 and Corollary 1
The proof relies on the following Lemmas. Proof. We first prove convexity, and then show that the set form a cone.
Let C G have n observed variables and m unobserved ones. Let H 1 and H 2 be two achievable entropy vectors for the n observed variables in C G . In the following, we show that for any 0 ≤ p ≤ 1, there is a sequence of entropy vectors H k within C G , such that lim k→∞ H k = pH 1 + (1 − p)H 2 .
For i = 1, 2, suppose that H i is generated by using states {Y i j } m j=1 for the m unobserved nodes and that the observed random variables are {X i j } n j=1 . The strategy for achieving the convex combination is as follows. The common observed variable is taken to be X 1 (k) = (A, Z) where and where X k denotes k i.i.d. copies of a random variable X. Each of the unobserved nodes is prepared in state Each of the other observed nodes then behaves as follows. The children of X 1 have access to A. If A = 0 they output X j (k) = (0, 0). If A = 1 they perform the operation that would have led to H 1 k times independently, acting on the first k subsystems of any GPT resources they have access to. They then output X j (k) = (1, (X 1 j ) k ). If A = 2, the procedure is the same except that the operation that would have led to H 2 is repeated k times by acting on the second k subsystems of any GPT resources and the output is X j (k) = (2, (X 2 j ) k ). Note that the first part of the argument is equal to A, so, in this way, the value of A is transferred to all descendants. An analogous strategy is then used for subsequent generations.
For any subset S of the observed random variables {X j (k)} n j=1 we have where H k refers to the entropy in the new strategy and H 1 and H 2 refer to the entropies in the original strategies (i.e., according to H 1 or H 2 ). Noting that tends to 0 as k tends to ∞, we have lim k→∞ H k = pH 1 + (1 − p)H 2 .
If H 1 and H 2 are themselves only achievable as limits of entropy vectors the above argument can be followed for each vector in the corresponding sequences tending to H 1 and H 2 respectively and thus also holds for H 1 and H 2 . This shows that the closure of the set of entropy vectors is convex.
The next lemma extends this beyond the case where there is a common observed ancestor.

Lemma 6. For any causal structure C G the topological closure of the set of achievable entropy vectors of the observed variables is convex.
Proof. If all observed variables in C G have a common observed ancestor, the statement follows by Lemma 5. Otherwise, there are 1 < l ≤ n observed nodes without any observed ancestors, which we label X 1 , . . . , X l (all other observed nodes (X l+1 , . . . , X n ) are descendants of at least one of these nodes). We construct a larger causal structure C by introducing an observed parent node A i for each X i with i = 1, . . . , l, where A i has no direct link to any variable except for X i . Note that a distribution over the observed variables X 1 , . . . , X n is compatible with C G if and only if it is the marginal of a distribution over X 1 , . . . , X n , A 1 , . . . , A l that is compatible with C . Now let C be another causal structure that is constructed from C by adding a directed link from A 1 to all other A i with 2 ≤ i ≤ l. A distribution over X 1 , . . . , X n , A 1 , . . . , A l is compatible with C if and only if it is compatible with C and, at the same time, obeys I(A 1 : A i ) = 0 for all 2 ≤ i ≤ l.
The "if" condition follows because any distribution in C obeys I(A 1 : A i ) = 0 for all 2 ≤ i ≤ l and it can be realised in C without using the additional causal links For the "only if", we use that I(A 1 : A i ) = 0 holds if and only if p(a i a 1 ) = p(a i )p(a 1 ) 10 , so that any distribution in C obeying I(A 1 : A i ) = 0 for all 2 ≤ i ≤ l can be written as p(x 1 , . . . , x n , a 1 , . . . , a l ) = p(x l+1 , . . . , x n |x 1 , . . . , x l )p(x 1 |a 1 ) . . . p(x l |a l )p(a 1 ) . . . p(a l ) , with the right hand side compatible with C . Hence, a distribution over X 1 , . . . , X n is compatible with C G if and only if it is the marginal of a distribution over X 1 , . . . X n , A 1 , . . . , A l that is compatible with C and obeys I( The closure of the set of entropy vectors of the observed variables that are compatible with C (without any additional constraints) is convex by Lemma 5. The closure of the set of achievable entropy vectors in C G is the closure of the set of achievable entropy vectors in of C restricted by the linear equalities I(A 1 : A i ) = 0 for all 2 ≤ i ≤ l and projected to the marginals involving only X 1 , . . . , X n . Because these operations preserve convexity, the closure of the set of achievable entropy vectors of C G is convex.
The main theorem of this section now follows as a corollary.
Proof of Theorem 1. Convexity of the set of achievable entropy vectors follows by Lemma 6. That it is a cone follows because if H is an achievable entropy vector, then kH for k ∈ N is achievable by taking k independent copies of all systems in the strategy achieving H. Furthermore, in any causal structure C G , H = 0 is achievable by taking all observed variables to be 0 with probability 1. Hence, by taking convex combinations, if H is achievable, so is λH for any λ ∈ R ≥0 . Corollary 1 then follows in a similar way.
Proof of Corollary 1. Consider first postselecting on one of the parentless variables in C G , and suppose that this variable has k possible values. Let X 1 , . . . , X n be the set of all the observed descendants of the variable that has been postselected on and Y 1 , . . . , Y m be the set of all other observed nodes. In other words, for a fixed distribution, P Y 1 ,...,Ym , of all other observed nodes we consider k different ways to form X 1 , . . . , X n to give P X 1 respectively. For these distributions we define an entropy vector with k(2 n+m − 1) components by concatenating the entropy vectors of each of them. Since the marginal distributions obey P Y 1 components can be removed from the vector. If H 1 and H 2 are two such achievable entropy vectors, then for any 0 ≤ p ≤ 1, pH 1 + (1 − p)H 2 is also such an achievable entropy vector. This follows by applying the technique used to prove Lemma 6 separately to the causal structure including only one of the k alternatives (i.e., the causal structure formed from the post-selected causal structure by removing all nodes associated with other alternatives). This strategy leads to the same distribution on Y 1 , . . . , Y m for each alternative and hence the overall entropy vector has the same entropies for all subsets of {Y 1 , . . . , Y m }. We can then postselect on further parentless variables in a similar way.

C Proof of Proposition 2
We use X 1 , . . . , X n for the observed variables and Y 1 , . . . , Y m for the unobserved nodes in C. For each unobserved node Y i we use Y j i with 1 ≤ j ≤ k i for the subsystems associated with the k i outgoing edges, sometimes using Y = {Y j i } m, k i i=1,j=1 and X = X 1 , . . . , X n as a shorthand. For any unobserved node Y i with 1 ≤ i ≤ m and for any 1 ≤ j ≤ k i , we show how to modify Y j i toỸ j i such that, if Y j i is shared along the j th outgoing edge instead of Y j i , the same distributions among the observed variables are obtained. This construction ofỸ j i will make all conditional entropies of unobserved systems positive. If |a a| is a system that is uncorrelated with any other system and obeys H( andỸ j i can be used to produce the same observed distributions as Y j i , since α j i may be ignored when processing the unobserved systemsỸ j i to obtain observed variables. Furthermore, due to the independence of the α j i and by weak monotonicity, for anyỸ j i and any set of variables, S, coexisting withỸ j i , where the last equality follows by construction. (Note that for an observed variable X and a set S coexisting with X, the analogous relation H(X|S) ≥ 0 already holds.) We now show that for any two coexisting sets S, T ⊂ U with S ∩ T = ∅ and where U is a maximal coexisting set, the conditional entropy H(S|T ) is positive. First of all, by strong subadditivity, Positivity of H(S|U \ S) can be shown inductively in the cardinality of S. For cardinality 1 this is implied by (27) and (26). Assuming that this holds for any set with cardinality q, the following shows that it also holds for any set with cardinality q + 1. Let there be a set of variables S ⊆ U of a maximal coexisting set U with cardinality q + 1 and a one element subset S 1 ⊆ S, Writing S = S 1S1 , then which is at least 0 by the inductive hypothesis. It then follows from (27) that H(S|T ) ≥ 0 for all T ⊆ (U \ S).

D Remarks on the quantum entropy vector method
This appendix gives additional information regarding the role of Proposition 2 for the quantum entropy vector method introduced in [9] (see also [22] for a review). For completeness, we first briefly introduce the details of this method. The quantum entropy vector method is based on the von Neumann entropy. For any joint state of coexisting systems associated with some of the nodes (and edges) of a causal structure a joint entropy can be defined, where the notion of coexisting sets is the one discussed for the measurement entropy in the main text. However, the quantum method does not take the conditional entropies as separate variables (these would be redundant Note that whenever there is no entanglement between two subsystems X and Y of a state ρ XY , the stronger monotonicity statement H(X|Y ) ≥ 0 holds. Since it is always possible to purify an unobserved quantum state ρ A 1 ···An , we can impose the following.
• Purification for unobserved systems: For an unobserved system in state ρ A 1 ···An we can take H(A 1 · · · A n ) = 0 and for any subsystem S ⊂ {A 1 , . . . , A n } we can take H(S) = H({A 1 , . . . , A n } \ S). 12 Among the variables of different coexisting sets data processing inequalities hold.
• Data Processing: Let ρ XY be the joint state of two sets of coexisting nodes X and Y and let E be a completely positive trace preserving map taking Y to Z such that (I ⊗ E)(ρ XY ) = ρ XZ , then H(X|Y ) ≤ H(X|Z). 13 In addition, the causal structure will in general imply independence constraints among observed as well as among unobserved systems. These are based on the notion of d-separation: for three pairwise disjoint sets of variables X, Y and Z, X and Y are d-separated by Z, if Z blocks any path from any node in X to any node in Y . A path is blocked by Z, if it contains one of the following: i → z → j or i ← z → j for nodes i, j and a node z ∈ Z in that path, or if it contains i → k ← j, where k / ∈ Z. Note that it is possible that Z = ∅.
• Independences (following Theorem 22 (i) from [14]): For three pairwise disjoint sets of observed variables, X, Y and Z, if X and Y are d-separated by Z, then H(X|Z) = H(X|Y Z). (Note that Z = ∅ is allowed.) We show with the following Lemma that weak monotonicity constraints are not relevant in this approach when considering causal structures where none of the unobserved nodes have any parents. These are the scenarios that are usually considered in the literature.

Lemma 7.
For any causal structure C Q in which the unobserved quantum nodes do not have any parents, all weak monotonicity inequalities are implied by the other inequalities, i.e., for any two coexisting sets S 1 , S 2 with S 1 ∩ S 2 = ∅, is redundant.
Proof. Let A denote the collection of subsystems of all unobserved nodes and S be the maximal coexisting set that includes all unobserved systems A. Then for the coexisting sets S 1 , S 2 with S 1 ∩S 2 = ∅ we divide into three cases.
We use the purification of unobserved systems to rewrite, 14 where to obtain the first equality we used that in the second line we used the purity of ρ A . The last two terms in (29) are positive because these are classical conditional entropies (none of the sets contain elements of A). The sum of the remaining four terms is then positive by strong subadditivity. Case 2: S 1 ∩ S 2 ⊆ S and either S 1 ⊆ S or S 2 ⊆ S, or both. In this case we use data-processing to give where T 1 , T 2 ⊆ S are the sets of variables that are processed to S 1 \ S 2 and S 2 \ S 1 respectively. Since S 1 ∩ S 2 ⊆ S, S 1 ∩ S 2 = T 1 ∩ T 2 and so positivity of the remaining expression follows using Case 1. Case 3: S 1 ∩S 2 ⊆ S. In this case we can find R 1 ⊆ S and R 2 with R 2 ∩S = ∅ such that S 1 ∩S 2 = R 1 ∪R 2 , and rewrite The second and fourth terms are positive since R 2 is classical. The first and third terms correspond to a weak monotonicity inequality like that considered in case Case 2, and so their sum is also positive.
By Proposition 2, instead of purifying the unobserved systems and dropping weak monotonicity, we could alternatively replace all weak monotonicity constraints by monotonicity (doing so prevents us from purifying the unobserved quantum systems). The question then arises as to the implications of each for deriving new entropy inequalities for the observed variables. The following lemma shows that the quantum approach outlined in this section (which takes the purification of unobserved systems into account) leads to entropy inequalities that are at least as tight as those obtained by considering monotonicity instead.
Constraints on the observed variables are usually derived starting from: (1) The Shannon constraints for the observed variables.
(2) All independences among observed and unobserved variables that C implies.
(4) Positivity of all entropies. and for (8 ), In the following we show that neither (40) and (41) nor (42) and (43) imply any inequalities for the observed variables other than the ones that follow without them.
For (8), the only remaining inequalities containing H(A 1 A 2 ) are (35) and (39), both of which have H (A 1 A 2 ) as a lower bound to other entropies as well as (40) and (41), where H(A 1 A 2 ) is an upper bound. After eliminating H(A 1 A 2 ), we hence obtain H( , where the first two immediately follow from positivity of the entropy and the third is implied by (38) and If we were not to impose the inequalities (40) and (41), the variable elimination would lead to , the first of which is implied by positivity and the second by (38) and monotonicity for cq-states. We now show that after the elimination of H(A 2 ) and H(A 1 ) the additional inequalities we obtained from (40) and (41), E Inequalities for the bilocal causal structure with classical variables In the following we give one representative of each of the 53 types of entropy inequalities for the bilocal causal structure. The remaining inequalities can be generated by using the symmetries under exchange of X 0 and X 1 , Y 0 and Y 1 as well as Z 0 and Z 1 and the exchange of the pairs of variables (X 0 , X 1 ) with (Z 0 , Z 1 ). We list them in terms of the coefficients of the entropies in the inequalities such that each row understood as a vector v imposes an inequality v.H ≥ 0 for the entropy vectors H.