Distance-based resource quantification for sets of quantum measurements

The advantage that quantum systems provide for certain quantum information processing tasks over their classical counterparts can be quantified within the general framework of resource theories. Certain distance functions between quantum states have successfully been used to quantify resources like entanglement and coherence. Perhaps surprisingly, such a distance-based approach has not been adopted to study resources of quantum measurements, where other geometric quantifiers are used instead. Here, we define distance functions between sets of quantum measurements and show that they naturally induce resource monotones for convex resource theories of measurements. By focusing on a distance based on the diamond norm, we establish a hierarchy of measurement resources and derive analytical bounds on the incompatibility of any set of measurements. We show that these bounds are tight for certain projective measurements based on mutually unbiased bases and identify scenarios where different measurement resources attain the same value when quantified by our resource monotone. Our results provide a general framework to compare distance-based resources for sets of measurements and allow us to obtain limitations on Bell-type experiments.


Introduction
It is arguably one of the most remarkable features of quantum theory that certain quantum systems exhibit behaviors without any classical analog. While these quantum phenomena were first just regarded as a strange feature of nature which led to many philosophical questions [1][2][3], it has later been realized that these phenomena can actually be used as a resource in real-world applications such as computation [4], sensing [5], or cryptography [6]. To understand the potential of these upcoming applications, it is essential to characterize the advantages quantum technologies can provide over classical information processing technologies and which physical phenomena enable them. To achieve this advantage, properties of both quantum states and measurements are relevant. Together, quantum states and measurements give rise to purely quantum phenomena that cannot be explained by classical physics, such as entanglement [7] and its detection [8], EPRsteering [9][10][11], and Bell nonlocality [12,13].
The latter two are similar in the sense that both can be seen as resources that require one out of several judiciously chosen quantum measurements to be performed on a resourceful quantum state in each round of an experiment. In particular, it is well-known that entangled states and incompatible measurements are necessary to witness steering or nonlocality [10]. Moreover, if appropriate quantifiers are chosen, the amount of incompatibility and entanglement provide upper bounds for the possible amount of steering and nonlocality [14][15][16]. Therefore, entanglement and incompatibility can be thought of as a resource for quantum advantages. In general, states and measurements may possess various other resources responsible for quantum advantages as well [17][18][19][20][21][22][23][24][25].
Quantum resource theories (QRTs) [26] allow us to identify, study, and quantify quantum resources for certain quantum information processing tasks in a general framework. Moreover, this allows us to identify similarities among different resources, adapt concepts and quantification methods [27][28][29][30] from one to another, and establish relations between different resources [14,16,17,31]. Any QRT aims to answer at least the following three questions: (i) Which objects (e.g. states or measurements) are resources for a certain task, and which ones are free, i.e., do not provide any advantage? (ii) Which transformations are free, i.e., cannot create resources from free objects? (iii) How can we quantify the amount of the resource? A standard approach to quantify quantum resources, illustrated in Figure 1, asks how far away a given resource is from the set of free objects, as measured by some distance-based function.
While the distance-based approach has been employed successfully to resources of quantum states like entanglement [7] and coherence [18] and correlations like steering [49] and nonlocality [50], it is widely unexplored for quantum measurements, not to mention for sets of measurements. The reason for this is most likely that only recently some strategies for discrimination of single measurements on the basis of distances between them have been proposed [51,52]. Robustnessbased and weight-based quantifiers have been used as mathematical tools to circumvent this problem. Later, it was realized that these quantifiers are also linked to operational advantages [24,27,30,34,41,47,48,[53][54][55][56][57]. However, the question about the existence of an operationally meaningful distance between sets of quantum measurements remained open. The interest in detailed studies on distance-based resource quantification goes beyond the question of their existence. Having access to an additional formalism for resource quantification offers new tools to study measurement resources and allows us to better understand their operational significance.
In this work, we answer the question of whether distance-based resource quantification for sets of measurements is possible in the affirmative. Hence, we extend the class of geometric quantifiers for convex QRTs for sets of measurements (so-called assemblages) by introducing distance-based resource quantifiers for any convex resource theory of measurement assemblages.
First, we discuss the necessary properties any distance between sets of measurements has to fulfil. Then, we show that every such distance induces a resource monotone. We propose one particular quantifier, which is based on the diamond norm [58] between different measure-andprepare channels and is specially tailored to Bell-type experiments, as it captures the idea that only one particular measurement out of a given collection is applied at a time in a round-by-round protocol. Based on this quantifier, we establish a hierarchy of measurement resources, including recently introduced steering [49] and nonlocality monotones [50]. See Table 1 for an overview of the resources and the quantities we analyze in this work. We show that our quantifier can be  (20) F a|x = q(a|x)1d (19) Unital maps Λ † a , simulations ξ [24] SDP (87), (88) Average (a,x) |i i| (21) SI-operations Λ † SIO b [25], simulations ξ c SDP (90), (91) Average Incompatibility • I (Mp) (24) F a|x = λ v(a|x, λ)Gλ (23) Unital maps Λ † [35], simulations ξ [23] SDP (76), (

77) Set
Steering S( σp) (27), [49] τ a|x = λ v(a|x, λ)σλ (26) (Restricted) 1W-LOCC d [49,60] SDP (66), (73) Set Nonlocality N(qp) (29), [50] t(a, b|x, y) = λ π(λ)vA(a|x, λ)vB(b|y, λ) WCCPI e [61] Linear (62), (65) Set a Even though it is not discussed in [24], it follows directly from the definition of unitality, that no quantum channel Λ † can create informativeness from uninformative measurements. b SIO stands for strictly incoherent operations. c Even though it is not discussed in [25], it follows directly from the definition of the classical simulations ξ that they cannot create coherence, as linear combinations of diagonal matricies are diagonal.  Table 1: Overview over the resources analyzed in this work. The different resources are presented in terms of the monotones we consider, the respective free objects, and the set of free operations associated to the considered QRT. The monotones we introduce in this work are marked with a bullet point •. Furthermore, we present by which kind of optimization the respective monotone can be computed and whether the resources are genuine properties of a set of objects or an average over single object properties. The free operations for steering and nonlocality are listed for completeness here and we refer to the references in the table for more details.
computed efficiently by means of a semidefinite program (SDP) which we use to obtain analytical upper and lower bounds on the incompatibility (i.e., the non-joint measurability) [21] for any set of measurements. Finally, we show that these bounds are tight for special instances of projective measurements based on mutually unbiased bases (MUB) [59], which also play a special role in cases when different measurement resources attain the same value when quantified with our proposed quantifier.

Distance-based resource quantification
Consider the canonical example of the trace distance [62]. The trace distance between two quantum states ρ, τ ∈ S(H), where S(H) is the set of density matrices acting on a Hilbert space H ∼ = C d of finite dimension d, is given by where X 1 = Tr[ √ X † X] is the trace norm of X. The trace distance is a useful tool to distinguish ρ and τ , as it fulfils all necessary properties of a metric between quantum states. Consider ρ, τ, χ ∈ S(H) and any completely positive and trace preserving (CPT) map Λ also known as quantum channel. It holds that i.e., D 1 (ρ, τ ) is a faithful and symmetric function that obeys the triangle inequality and monotonicity (i.e. it does not increase) under arbitrary CPT maps Λ. In addition to these minimal requirements, it is well known that D 1 (ρ, τ ) has an operational interpretation in terms of the optimal probability to distinguish ρ and τ in a single-shot experiment [62]. That is, the optimal guessing probability is given by p (ρ,τ ) 1,guess = 1 2 (1+D 1 (ρ, τ )). These properties make the trace distance a viable tool to quantify (convex) resources.
Let us consider the prime example of a resource, the entanglement of a bipartite state ρ ∈ S(H ⊗ H). One can quantify the entanglement of ρ by its distance to the set Sep(H ⊗ H) of separable quantum states [42] given as It is now readily verified that E 1 (ρ) is a non-negative, convex function with E 1 (ρ) = 0 ⇐⇒ ρ ∈ Sep(H ⊗ H) obeying the monotonicity E 1 (ρ) ≥ E 1 (Λ LOCC (ρ)) under any local operations and classical communication (LOCC) [7] map Λ LOCC . That is, captures the fact that LOCC maps cannot create entanglement. The monotonicity of resources under these so-called free operations is a crucial property of any resource theory and also reflects that these types of operations cannot create resources from free objects [26]. We can use the insights for distances and resources of quantum states to define distancebased resource monotones for sets of quantum measurements in the following. We describe a quantum measurement most generally by a positive operator valued measure (POVM) i.e., a finite set {M a } a of effect operators 0 ≤ M a ≤ 1 d , acting on a finite d-dimensional Hilbert space H such that a M a = 1 d , where 1 d is the identity operator on H ∼ = C d . Note that X ≤ Y ⇐⇒ Y − X ≥ 0 for any Hermitian operators X, Y means that the operator Y − X is positive semidefinite. A set of POVMs with outcomes a for different settings x is known as measurement assemblage M = {M a|x } a,x . Note that we will omit in the following the set-indices and simply write M = {M a|x } when there is no risk of confusion. If we talk about a specific element of the assemblage M, for instance the POVM corresponding to setting x, we will write M x = {M a|x } a . Here, we consider assemblages with m measurement settings and o outcomes in each setting, i.e. x = 1, · · · , m and a = 0, · · · , o − 1. The outcome statistics of a measurement on any state ρ is given by A measurement assemblage can be converted by two different processes to another assemblage. First, as any quantum state ρ can be transformed via any CPT map Λ to another state Λ(ρ), it follows from Tr[M a|x Λ(ρ)] = Tr[Λ † (M a|x )ρ] that an assemblage M can be transformed via the Hilbert-Schmidt adjoint (unital) map Λ † to another assemblage Λ † (M). Second, classical simulations (via mixtures and classical post-processing) maps M = ξ(M) with M b|y = x p(x|y) a q(b|y, x, a)M a|x can be used to simulate [23] the assemblage M from M via the conditional probabilities p(x|y) and q(b|y, x, a) for all y, respectively for all y, x, a. Note that as p(x) = y q(y)p(x|y) one also obtains the allowed probabilities q(y) to perform setting y. See also [63] for an approach to simulability that combines quantum pre-processing and classical post-processing.
We use the probability distribution p = {p(x)} to capture the fact that typically only one quantum measurement can be performed at a time and it is also natural to assume that the likelihood of the settings x influences the capabilities of M in experiments. Note that we consider only the case p(x) > 0 ∀ x, as measurements that are never performed can be discarded trivially. We define a distance between sets of measurements weighted with the distribution p as follows.

Definition 1.
Let M be a measurement assemblage containing m POVMs and let p be a probability distribution with p(x) > 0 ∀ x = 1, · · · , m. We call the tuple M p := (M, p) a weighted measurement assemblage (WMA). Let M p , N p , and K p be any WMAs. Further, let Λ † be any completely positive (CP) unital map and ξ any classical simulation map. Any non-negative function D(M p , N p ) that fulfils the conditions is a distance between M p and N p .
Note that all conditions are in direct correspondence to the conditions in Eq. (2) for quantum states. Any distance that fulfills the conditions in Definition 1 can be used to define a faithful resource monotone for convex QRTs of measurement assemblages.

Definition 2.
Let F be a convex and compact set of measurement assemblages, F the (maximal) set of free quantum maps Λ † such that Λ † (F) ∈ F for any F ∈ F , and let S be the set of classical simulations ξ such that ξ(F) ∈ F for any F ∈ F . The tuple Q := (F , F, S) is called a QRT of measurement assemblages.
We want to emphasize that all of our considerations hold for the maximal set F of free quantum maps. Therefore, they also hold for any subset of free operations. In some situations, not considering the maximal set of free operations might be physically more motivated, as it is the case for LOCC in the resource theory of entanglement.
for any η ∈ [0, 1] it is is a faithful convex resource monotone of WMAs.
With these definitions we obtain the following lemma, showing that every (jointly-convex) distance between measurement assemblages induces a faithful (convex) resource monotone. Lemma 1. Let Q = (F , F, S) be any QRT of WMAs M p and D(M p , F p ) a (jointly convex) distance function. The distance of M to the set F weighted with the probability p given by is a faithful (convex) resource monotone.
Proof. The proof relies mainly on the conditions in Definition 1. The non-negativity and faithfulness (i.e., R(M p ) = 0 ⇐⇒ M ∈ F ) follow directly, and the monotonicity conditions follow from where we used the monotonicity of the distance and the fact that free operations Λ † ∈ F map free assemblages to free assemblages. An analogous calculation follows for the simulations ξ ∈ S. In addition, if D(M p , F p ) is jointly convex, i.e., obeys for any η ∈ [0, 1] and any WMAs M where we used that F (1) * , F (2) * ∈ F are the closest free assemblages to M (1) and M (2) , respectively. Furthermore, we used that ηF is free as well, by the convexity of F . Note that the arguments used here are similar to those for distance-based resource monotones of quantum states.
We propose in the following a specific distance on which we focus on in the remainder of the work (see however the appendix for alternatives). More specifically, we associate to any POVM M x = {M a|x } a a measure-and-prepare channel defined by where the register states |a form an orthonormal basis {|a } 0≤a≤o−1 . Note that the channel Λ Mx can equivalently be described by its Choi-Jamiołkowski-matrix (see e.g. [64]). The Choi-Jamiołkowski-matrix of a quantum channel is obtained by applying a given channel to the first subsystem of the (unnormalized) maximally entangled state |Φ + = d−1 i=0 |ii . More precisely, the Choi-Jamiołkowski-matrix of a measure-and-prepare channel as described in Eq. (11) is given by where the transpose is with respect to the computational basis. We denote the diamond distance between two quantum channels Λ 1 , Λ 2 by Due to the connection to the trace distance, the diamond distance determines the optimal singleshot probability p (Λ1,Λ2) ,guess = 1 2 (1 + D (Λ 1 , Λ 2 )) to distinguish between Λ 1 and Λ 2 . We want to make use of this operational relevance in the following, by designing a distance between measurement assemblages that has a similar operational interpretation. Based on the diamond distance, we propose the distance D (M p , N p ) between the WMAs defined as and its induced resource monotone Note that the diamond distance between measure-and prepare-channels has also been introduced in the context of single POVM discrimination [51,52]. To prove that R (M p ) is a convex resource monotone, we need to show that D (M p , N p ) is distance function according to the conditions in Definition 1 and that it is a jointly-convex function. Proof. The proof relies mostly on the properties of the diamond distance. It is possible to rewrite where we have introduced σ a|x (ρ) = Tr 1 [(M a|x ⊗ 1)ρ] and τ a|x (ρ) = Tr 1 [(N a|x ⊗ 1)ρ]. Note that Tr 1 [·] denotes the trace with respect to the first subsystem and that we omit here and in the following the Hilbert space ρ acts on. All conditions in Definition 1 and the joint-convexity can now be verified by direct computation. See Appendix A for all details.
Note that it follows directly from its definition that R (M p ) is upper bounded by R (M p ) ≤ 1, and that it fulfills the continuity condition Figure 2: Illustration of the idea to use the diamond distance as resource monotone. Quantum measurements Mx, Fx are associated with quantum channels ΛM x , ΛF x . These are distinguished by applying the channels to an optimal quantum state ρ and performing an ideal dichotomic measurement afterwards to distinguish between the output of the channels ΛM x and ΛF x . The probability p (M,F ) ,guess (x) tells us how distinguishable the resourceful measurement Mx is from the free measurements Fx.
due to the triangle inequality for the diamond norm. Moreover, it can be rewritten as with p (M,F ) ,guess (x) = 1 2 (1 + D (Λ Mx , Λ Fx )) which is up to normalization the average optimal probability to distinguish the resources M from the free measurements F in a single-shot experiment. Hence, Eq. (18) reveals the desired operational significance of R (M p ) in terms of an average single-shot distinguishability. See also Figure 2 for an illustration of the operational meaning of R (M p ).

Hierarchy of measurement resources
One main goal while studying QRTs is to obtain relations between different resources. In particular, we want to understand how one resource limits another quantitatively. This will show one strength of a geometric quantifier, as it can be defined for various resource theories and the discussion often reduces to an analysis of the free sets F . In the following, we will establish a hierarchy of measurement resources based on the newly introduced quantifier R (M p ). We start by introducing the different resources.
The most basic resource of an assemblage is its informativeness [24]. The informativeness of a WMA quantifies how valuable it is to actually perform measurements compared to randomly guessing the outcomes in an experiment. An assemblage M is called uninformative (UI) if where {q(a|x)} are some probability distributions of a conditioned on setting x. These measurements are UI as their measurement result does not depend on the quantum state. We denote the set of UI assemblages by F UI and introduce the informativeness monotone Note that measurement informativeness was initially introduced only for a single POVM and studied in terms of the generalized robustness [24]. We have extended the notion here by considering the average informativeness of M p . A resource that is the foundation for the distinction between classical and quantum systems is the coherence of measurements [25]. An assemblage M is incoherent (in some predefined or- where α i|(a,x) = i|M a|x |i . These measurements cannot distinguish quantum states ρ from their fully dephased versions ∆(ρ) = i |i i|ρ|i i|, hence they cannot detect coherence. We denote the set of incoherent assemblages by F IC and introduce the coherence monotone Similarly to the informativeness, the coherence of measurements was initially introduced for a single POVM and we have extended it here by considering the average coherence of M p . See also [28] for a different approach to coherence of measurement assemblages. The incompatibility of measurements is probably the best-known example of a QRT for measurements and has been studied extensively in recent years [21,35,36,53,65,66]. Contrary to classical physics, different quantum measurements may be incompatible, i.e., they cannot be performed simultaneously and one cannot access their joint measurement statistics as famously illustrated by the Heisenberg-Robertson uncertainty relation [3]. Initially interpreted as a drawback, this phenomenon lies at the heart of Bell-type experiments, as incompatibility is a necessary prerequisite to witness steering and nonlocality. An assemblage M is called compatible or jointly measurable (JM) if the statistics of M can be simulated by a single measurement via some POVM {G λ } and classical post-processing via the deterministic probability distributions {v(a|x, λ)} such that and is called incompatible otherwise. Note that using deterministic post-processings {v(a|x, λ)} (which represent the vertices of the corresponding probability polytope) is not a restriction as all randomness from non-deterministic distributions can be put inside G λ . We denote the set of JM assemblages by F JM and introduce the incompatibility monotone It is important to note that the incompatibility in Eq. (24) is not the average of single POVM properties, as incompatibility is always a property of sets of measurements. Therefore, the incompatibility I (M p ) is qualitatively different from the coherence or informativeness. Similar to entanglement, incompatibility can be witnessed in a Bell-type experiment, as both are necessary resources for steering and nonlocality. Consider the WMA M p and any bipartite quantum state ρ shared by two-parties, Alice and Bob. By performing the measurements M p on her share of the state, Alice prepares the conditional states for Bob. Here p(a|x) = Tr[σ a|x ] is the probability to obtain σ a|x . We denote the obtained state assemblage by σ = {σ a|x } and its weighted version by σ p = ( σ, p).
To make sure that Alice performs incompatible measurements on an entangled state, she can prove that she can demonstrate steering. A state assemblage σ is said to be steerable if it cannot be obtained from a local hidden-state model (LHS) given by where the σ λ are operators that satisfy σ λ ≥ 0 ∀λ and Tr[ λ σ λ ] = 1. Otherwise we say σ is unsteerable which we denote by σ ∈ LHS. Steering can also be quantified and we use the distance-based monotone introduced by Ku et al. [49] as Note that originally an additional consistency constraint a τ a|x = a σ a|x was introduced [49]. However, we do not require this constraint here, as consistency constraints are sometimes introduced for mathematical convenience which provides no advantages in our considerations. For more details on consistent quantifiers, see also [14].
Consider now that both parties, Alice and Bob, want to prove that they perform incompatible measurements on an entangled state. Let M p A and N p B be the WMAs of Alice and Bob, respectively, and let ρ be their shared quantum state. Alice and Bob obtain the probability distribution is the probability to choose setting x for Alice and y for Bob and we introduce the tuple q p = (q, p). To assure themselves that they share an entangled state and perform incompatible measurements, they can check whether they can demonstrate nonlocality. A probability distribution q is local if it can be obtained from a local hidden-variable model (LHV) given by where π(λ) is the probability distribution of the hidden variable λ and {v A (a|x, λ)} and {v B (b|y, λ)} are deterministic probability distributions of Alice and Bob, respectively. In this case we denote q ∈ LHV and we say q is nonlocal otherwise. To quantify the nonlocality, we use the distance-based resource monotone for nonlocality introduced by Brito et al. [50] as Having introduced all these different notions of quantum resources, we can complete our goal to establish relations among them.

Theorem 2. Let M p A , N p B be any WMAs and ρ any bipartite quantum state of appropriate dimensions. Let σ p A be a state assemblage obtained via
The following sequence of inequalities holds: follow from the nested structure of the sets of free assemblages. More formally, F UI ⊂ F IC ⊂ F JM which can be seen by realizing that POVM effects that are proportional to the identity are also incoherent (in any basis) and as incoherent POVMs commute pairwise, they are jointly measurable [28]. Since we are minimizing the distance with respect to these sets, the inequalities hold. To prove that I (M p A ) ≥ S( σ p A ) holds, we use that incompatibility is necessary for steering. This allows us to use τ = {τ a,x = Tr 1 [(F * a|x ⊗ 1)ρ]} as an unsteerable assemblage for any state ρ, as the closest JM measurements F * (with respect to the assemblage M) cannot lead to steerable assemblages. It follows, where we used the representation of I (M p A ) according to Eq. (16) in the last line. We employ a similar approach to show that S( Note that hierarchies related to that in Eq. (30) have also been established, at least partly, for weight-and robustness-based resource quantifiers [14,28]. The connection between incompatibility, steering, and nonlocality has been studied by Cavalcanti et al. [14] extensively for for weight-and robustness-based quantifiers while Designolle et al. [28] discussed the relation between coherence and incompatibility and, for a single POVM, between the informativeness and the coherence in terms of the generalized robustness.
The hierarchy (30) in Theorem 2 gives insights how resources like the incompatibility limit steering and nonlocal correlations quantitatively. On the other hand, every detection of these quantum correlations gives a lower bound to the measurement resources. In particular, the violation of every appropriately normalized steering or Bell inequality, in the nonlocal game formulation [67,68], can lower bound these measurement resources. We show in Appendix C that S( σ p A ) is the maximal possible steering inequality violation given by is the classical bound obeyed by all unsteerable assemblages τ ∈ LHS and the G a|x are positive semidefinite matrices s.t. G a|x ∞ ≤ 1, where · ∞ is the spectral norm. Moreover, the nonlocality N(q p ) can be reformulated as the violation of a Bell inequality given by where = max t∈LHV a,b,x,y p(x, y)C ab|xy t(a, b|x, y) is the local bound obeyed by all local correlations t ∈ LHV and C ab|xy are Bell coefficients s.t. 0 ≤ C ab|xy ≤ 1.
It is worth to highlight that the hierarchy (30) is reminiscent of the resource hierarchy for quantum states formulated by Streltsov et al. [17]. For quantum states, it holds that where P(ρ), C(ρ), D(ρ), and E(ρ) denote the quantum state's purity, coherence with respect to product bases, discord, and entanglement, respectively, using the same geometric quantifier. Comparing both hierarchies, it becomes clear that the informativeness of measurements is in some sense the analogue to a state's purity, as both quantify the deviation from their respective uninformative element. We also observe that coherence is an important resource for states as well as measurements, which allows for more complex phenomena such as entanglement and incompatibility. Incompatibility and entanglement both play a similar role in their respective hierarchies, as both are the smallest known resource that is necessary for steering and nonlocality. Interestingly, incompatibility and entanglement also share similarities in their respective resource breaking maps [69]. Moreover, we show in Appendix D that the entanglement E 1 (ρ) as defined in Eq. (3) also upper bounds the steerability S( σ p A ) ≤ E 1 (ρ). This leads to the conclusion that the nonlocality N(q p ) and the steerability S( σ p A ) are upper bounded by the smallest of the used resources to obtain q p , respectively σ p A .
We visualize our results in Figure 3.

SDP formulations
To study the hierarchy from Theorem 2 and the resources in more detail, an efficient method to numerically compute the respective resource quantifiers is needed. This can be done by formulating the quantifiers in terms of an SDP, which also allows us to study the quantifiers analytically by exploiting duality theory. The computation of the general quantifier R (M p ) from Eq. (15) can be stated as the following optimization problem: Primal problem (general): given : where the Z x are positive semidefinite matrices, J(M x ) is the Choi-Jamiołkowski-matrix (see Eq. (12)) associated to setting x of the assemblage M, and F are the elements of the set of free assemblages F . The formulation of the optimization in Eq. (39) mainly relies on the SDP formulation of the diamond distance due to Watrous [72]. This compact representation of R (M p ) can be brought into an explicit SDP formulation whenever the set F admits an SDP formulation as we show in Appendix E for the resources considered in this work. Every SDP comes with a dual formulation which under some mild conditions (Slater's condition see, e.g. [73]) returns the same optimal value as the primal problem. This condition is always satisfied for the SDP (39). Hence, R (M p ) can also be written as optimal value of the optimization problem: Dual problem (general): (40) given : subject to: where the C a|x , ρ x are positive semidefinite matrices and F are the elements of the set of free assemblages F . Note that the dual formulation in Eq. Since R (M p ) can be formulated as an SDP, it is efficiently computable (in the Hilbert space dimension d) and one can resort to standard toolboxes for its computation [74][75][76][77]. We want to remark that it is also possible to use a variation of the SDP (40) to obtain the optimal setting distribution p instead of fixing one in advance. This can be seen by introducing C a|x = p(x)C a|x and adjusting the constraints accordingly. See Appendix F for an example where optimizing over p leads to an advantage over the uniform distribution for the incompatibility I (M p ), even when only two measurement settings are considered.
Even though SDPs are mainly used for numerical optimization, the underlying structure of an SDP also offers a method to obtain analytical upper and lower bounds or even exact analytical expressions for R (M p ) depending on the complexity of the considered resource. More precisely, every feasible (but possibly sub-optimal) solution of the primal problem corresponds to an upper bound on R (M p ), while every feasible solution of the dual problem results in a lower bound. If we find feasible solutions of the primal and dual that result in the same value, we can conclude that this value is exactly R (M p ). We make use of this approach to derive bounds on the incompatibility I (M p ) for any assemblage M weighted with a uniform distribution p in Theorem 3 and to identify cases in which the hierarchy in Theorem 2 is tight in section 5.
Proof. The proof relies on finding feasible solutions of the primal (upper bound) and dual problem (lower bound) in Eq. (39) and Eq. (40) for the specific set of JM measurements F JM . For the primal, we choose where η ∈ [0, 1] is the largest number such that F obtained from is JM. The coefficient η is known as the depolarizing robustness of the assemblage M. Now by design, F is JM and Z x ≥ 0. The remaining constraint, Z x ≥ a |a a| ⊗ (M a|x − F a|x ) T , can be verified by direct computation. It follows that I ( where we used that p is uniformly distributed. Finally, the upper bound follows from [36], where it ) is a lower bound to the depolarizing robustness and therefore always leads to jointly measurable measurements for general measurement assemblages M with m measurements of dimension d.
To obtain the lower bound from the dual problem, we rewrite the objective function as Note that such an L always exists, which can be verified by multiplying both sides of the inequality with the POVM effect G λ before summing over all λ and taking the trace. We choose as feasible , and L = l1 with some free parameter l. Clearly, in this way all constraints are satisfied for some appropriately chosen parameter l which still needs to be determined. We obtain that the incompatibility is lower bounded such that 1 This means to find a valid l, we need to find an l such that where T := a,x v * (a|x, λ)M a|x ∞ and {v * (a|x, λ)} a,x are the deterministic probability distributions that maximize the right-hand side of Eq (44), respectively the spectral norm in the definition of T .
Using the results in [78] it follows that l While the bounds in Theorem 3 look complicated, we highlight that they become much simpler in the case of rank-1 projective measurements and especially for measurements based on MUB. Two orthonormal bases {|v a } 0≤a≤d−1 and The set of projectors onto the vectors of a basis form a measurement M = {M a = |v a v a |}. An MUB measurement assemblage is a set of measurements where the condition (45) holds for any two projections from different bases. MUB measurement assemblages find many applications in quantum information [59] and are natural candidates for highly incompatible measurements as studied in [36,66,79]. It is known that in every dimension d ≥ 2 there exist at least m = p r + 1 MUB, where p r is the smallest prime power factor of d [80]. While it is in general an open problem how many MUB really exist in a given dimension d, explicit constructions for m = d + 1 MUB are known when d is a prime-power.
Regarding the simplifications for rank-1 projective measurements we observe from the proof of Theorem 3 that it holds with T := a,x v * (a|x, λ)M a|x ∞ and the depolarizing robustness η defined via Eq. (43). Using the overlap relation in Eq. (45) and the same lower bound on η as in Theorem 3, it follows for a uniformly weighted MUB measurement assemblage that where {|k } 0≤k≤d−1 is the computational basis and ω = exp 2πi d is a root of unity. In prime dimensions d, the eigenbases of the d + 1 operatorsX,Ẑ,XẐ,XẐ 2 , · · · ,XẐ d−1 are mutually unbiased [81]. We use these eigenbases to form sets of projective POVMs. Note that it matters which subset of eigenbases we choose. p ) = 0.3685. This shows that different MUB are operationally inequivalent, which has also been demonstrated for the depolarizing robustness [66]. For the values in Table 2, we used the assignment of MUB according to the WMA M (1) p , i.e., we take the first m eigenbases. As one can see in Table 2, the upper and lower bounds combined give a good idea on how incompatible this implementation of MUB is in practical scenarios. The lower bound can be tightened significantly by using the bound from Corollary 2 directly. Note that this requires an optimization over all N det = o m deterministic assignments {v(a|x, λ)}, where o is the number of measurement outcomes for each of the m settings. Surprisingly, the tightened lower bound coincides with the numerical values for the incompatibility I (M p ) for all m, d in Table 2 up to the fourth digit. While we were not able to show that the lower bound from Corollary 2 is tight for MUB measurement assemblages in general, we are able to identify important cases where this is indeed the case.
More specifically, it was shown by Designolle et al. [66] that η = dT − m dm − m is the depolarising robustness for the standard construction of MUB measurement assemblages in prime power dimensions given in [82] for m = 2, m = d, and m = d + 1 measurements. It is important to highlight that the construction used above, based on the Heisenberg-Weyl operators, is an equivalent reformulation of this construction for prime dimensions [81]. From Eq. (47), it follows directly that To conclude this section, we want to emphasize that analogous discussions to obtain bounds on the resource quantifier R (M p ) can be made for any QRT with a free set F that can be described by SDP constraints. For instance, we show in Appendix G that the informativeness IF (M p ) of rank-1 projective measurements is given by IF (M p ) = 1 − 1 d for any probability distribution p.
Note that since the set F UI of UI assemblages (see Eq. (19)) has a much simpler structure than the set of JM measurements, it is also easier to obtain exact expressions.   (39) and (40), marked in blue, and the third number is the lower bound on the incompatibility. The bounds are obtained from Eq. (48). Note that the lower bound is tight for m = 2 measurements. Furthermore, it is shown in the text, that the incompatibilities for m = 2, m = d, and m = d + 1 measurements can be obtained analytically.

Tightness of the Hierarchy
It is particularly interesting to study the optimal conversion of one resource to another, i.e., to study for which measurements (and states) the bounds in Eq. (30) are tight. Obviously, for UI measurements it holds IF (M p ) = 0 and all bounds are trivially tight. We study nontrivial cases of resource equivalences where IF (M p ) = C (M p ) and I (M p ) = S( σ p ) holds. We start with the latter. Incompatibility and steerability are known to be deeply connected and equivalences have been reported for robustness and weight-based quantifiers [14,31]. We consider again the situation of uniformly distributed measurements, i.e., p(x) = 1/m. Let ρ = |Φ + Φ + | be the maximally |ii . It is readily verified that where the transposition is with respect to the computational basis. Using the state assemblage σ = {σ a|x } obtained via Eq. (50) is the standard approach to map incompatibility problems to steering problems and proves also to be useful here. In section 4, we showed that for the construction of MUB in [82] and m = 2, m = d, Using the state assemblage σ obtained from Eq. (50), it is possible to show that this bound is indeed fulfilled. To show this, we employ the steering inequality formulation of S( σ p ) as discussed in Eq. (32), which we repeat here for convenience:  [82] we conjecture that the equivalence between incompatibility and steerability holds for general constructions of MUB and 2 ≤ m ≤ d + 1.
We searched numerically for other cases with an equality between incompatibility and steerability. However, apart from the case of generic qubit projective measurements we were not able to identify any other scenarios. Note that this finding deviates from the observations for consistent weight and robustness quantifiers studied by Cavalcanti et al. [14], where an equivalence between incompatibility and steerability was found for all assemblages. This difference is not artificial, as it remains even if we include the consistency constraint for the steerability below Eq. (27).
The second equivalence of resources we want to discuss is that between the informativeness and the coherence of assemblages. More precisely, we discuss when IF (M p ) = C (M p ) holds. Interestingly, this equivalence is achieved by WMAs M p that are mutually unbiased to the set of projective measurements onto the incoherent basis {|i i|}. To see this, we note first that To show that this bound can be achieved, we use the dual formulation of C (M p ). More specifically, C (M p ) is given by where the x,i are scalars such that The fact that measurements that are mutually unbiased to the incoherent basis maximize the coherence is very similar to the situation for quantum states [17]. There, for a fixed spectrum, the coherence is maximized by states that have an eigendecomposition in a mutually unbiased basis with respect to the incoherent basis. Note that the measurements within M do not need to be MUB measurement assemblages themselves, as long as they are mutually unbiased to the incoherent bases. Indeed, we show in Appendix G that the CGLMP measurements defined via Eq. (37) and Eq. (38) also maximize the coherence in the sense that IF (M p ) = C (M p ) = 1 − 1 d . Note that it is known that the maximal coherence of a single POVM in terms of the generalized robustness can be achieved by measurements in the Fourier basis of the incoherent basis [47] .
Let us briefly comment on the other two inequalities of the hierarchy in Eq. (30). The remaining two inequalities are C (M p ) ≥ I (M p ) and S( σ p A ) ≥ N(q p ). While the relation between steering and nonlocality is notoriously hard to study, even for two-qubit states, the connection between coherence and incompatibility has only recently gained some attention [28,83]. Our numerical search suggests that both bounds are true inequalities in non-trivial scenarios. However, future research is needed to come to a conclusion.

Conclusion and Outlook
Quantifying quantum advantages plays an important role in modern quantum information theory, particularly in the framework of QRTs. The quantification of measurement resources has developed historically in a different direction than the quantification of state resources, as it was unclear how a distance-based approach can be applied to measurements and especially sets of measurements in a meaningful way. Instead, weight-based and especially robustness-based quantifiers were predominantly used for quantifying measurement resources so far.
In the present work, we have solved this problem by introducing the general notion of distancebased resource quantification for sets of measurements. We have studied which prerequisites are necessary for a function to be a proper distance between measurement assemblages and showed that every such distance induces a resource monotone for any convex QRT. We have proposed one particular quantifier, based on the diamond norm, with a clear operational meaning in terms of the optimal single-shot distinguishability of different measurement assemblages.
On the basis of this particular quantifier, we have established a hierarchy of measurement resources in Theorem 2 and showed that recently introduced steering [49] and nonlocality quantifiers [50] fit naturally into this hierarchy. Furthermore, we have shown that our quantifier can be studied numerically and analytically in terms of SDPs. We have used this insight to establish analytical upper and lower bounds on the incompatibility of any measurement assemblage in Theorem 3. Noteworthy, by focussing on rank-1 projective measurements, we have shown that the bounds on the incompatibility in Corollary 2 are tight for particular MUB measurement assemblages, which play a special role in the established measurement hierarchy. More precisely, we showed in section 5 that the incompatibility of MUB measurement assemblages attains the same value as the steerability of the state assemblages obtained from performing these measurements on one part of a maximally entangled state. Furthermore, we showed that measurements that are mutually unbiased to the incoherent basis maximize the coherence among all rank-1 projective measurements.
It would be interesting to see which insights can be obtained when distance-based quantifiers like the one presented here are studied for other resource theories like projective-simulability [22] or the QRT of imaginarity [20] applied to measurements. Furthermore, distance-based quantifiers should also be compared to possible entropic resource quantifiers of measurement assemblages. So far, entropic quantifiers have only been considered very recently [84] for a single POVM. With the definition of a distance between measurement assemblages, it is also possible to study the continuity of functions of measurement resources, which could be of independent interest for robust self-testing [85] or measurement tomography [86].

A Proof of Theorem 1
Here, we show that the function D (M p , N p ) defined in Eq. (14) is a jointly-convex distance function between the two WMAs M p and N p . In the following, we use that a measure-andprepare channel (see Eq. (11)) corresponding to setting x of the assemblage M applied to the first subsystem of a bipartite state is given by Proof. We start by writing D (M p , N p ) in a more convenient form. More precisely, we use in the following that the triangle inequality for the trace norm · 1 results in an equality due to the support on different subspaces of the terms with different a within the sum over the outcomes a. Furthermore, we used the multiplicity of the trace norm under tensor products and the fact that |a a| 1 = 1 ∀ a. It follows that We where we used the following properties. In the first line, we used the definition of D(ξ(M p ) q , ξ(N p ) q ) by introducing the assemblages M q and N q with measurement outcomes b for the settings y, associated with the probability distribution q.
In the second line, we use that M b|y = x p(x|y) a q(b|y, x, a)M a|x and the analogous expression for N b|y . In the third line, we used the triangle inequality. In the fourth line, we performed the sum over b. In the fifth line, we interchanged the maximization with the sum over x, which leads to more degrees of freedom since we can now choose a different ρ for each x. Finally, in the sixth line, we used that y q(y)p(x|y) = p(x), which leads exactly to the definition of D (M p , N p ) from which the monotonicity under classical simulations ξ(M p ) q follows.
The joint-convexity of D (M p , N p ) can be seen by first considering the joint-convexity of the diamond distance D (Λ Mx , Λ Nx ) between measure and prepare channels Λ Mx and Λ Nx . The joint-convexity of a norm induced distance follows from the triangle inequality and the absolute homogeneity of the norm, which implies its convexity. It now follows that which concludes the proof.

B Steerability as upper bound to nonlocality
Here, we show that the steerability of a state assemblage σ p A upper bounds the nonlocality of any probability distribution q p obtained from it. This completes the proof of Theorem 2.

Lemma 2. Let σ p
Proof. Let τ * be the closest LHS assemblage to σ with respect to the quantifier S( σ p A ). We use the fact that unsteerable assemblages always lead to local probability distributions. It follows that where we used in the first line the definition in Eq. (29) of the nonlocality N(q p ) and the fact that any measurement on the closest LHS assemblage τ * (with respect to σ) results in a local probability distribution. In the second line, we used some basic property of the absolute value and the fact that we can always decompose the difference of two Hermitian matrices like σ a|x − τ * a|x = T a|x − S a|x , where T a|x and S a|x are positive operators with orthogonal support. It follows that |σ a|x − τ * a|x | = T a|x + S a|x , where |X| = √ X † X. Finally, we used in the third line that b N b|y = 1 d ∀ y, y p(x, y) = p A (x), and the definition of S( σ p A ). Therefore, it follows that the steerability is an upper bound to the nonlocality.

C Dual formulation of steerability and nonlocality
Here, we show that the steerability S( σ p A ) and the nonlocality N(q p ) can be understood as optimal steering, respectively Bell inequality violation. S( σ p A ) be the steerability of the state assemblage σ p A . The steerability S( σ p A ) can be reformulated as the violation of an optimized steering inequality given by Proof. The proof relies on the dual formulations of S( σ p A ), which can be written in terms of an SDP, and N(q p ) which can be written as a linear program. We start with the nonlocality N(q p ) by stating the optimization for an optimal Bell inequality violation given the distribution q and by showing that it is dual to N(q p ). Note that all of the following optimization problems require the knowledge of the deterministic probability distributions of the corresponding problem, which for the nonlocality N(q p ) are denoted by {v A (a|x, λ)v B (b|y, λ)}. Since these are fixed for a given problem and are trivially accessible, we will not treat them as input variables.

Theorem 4. Let
Dual problem (nonlocality): (62) given : q p maximize C ab|xy , a,b,x,y p(x, y)C ab|xy q(a, b|x, y) − subject to: where C ab|xy are the Bell coefficients of the Bell inequality and is the local bound. Note that = max t∈LHV a,b,x,y p(x, y)C ab|xy t(a, b|x, y) follows directly from the first constraint. Remember that t admits an LHV decomposition according to Eq. (28). The equality can be seen by multiplying the constraints ≥ a,b,x,y p(x, y)C ab|xy v A (a|x, λ)v B (b|y, λ) ∀ λ with the probabilities π(λ) before summing all the constraints together. This leads to the bound ≥ max t∈LHV a,b,x,y p(x, y)C ab|xy t(a, b|x, y), b|y, λ). The equality follows from the fact that we maximize the objective function. Now, we show that the optimal value of the optimization in Eq. (62) is equal to N(q p ) by deriving the primal program. Note that this generally requires dealing with inequality constraints, which can be done by generalizing the method of Lagrange multipliers to using the Karush-Kuhn-Tucker conditions (see e.g. [73]). However, since we are interested in formulating dual formulations of convex optimization problems, we can rely on simpler but less general conditions for the equivalence of the primal and the dual problem, which we come back to down below. We start by stating the Lagrangian of the problem: where we introduced the Lagrange parameters π(λ), A ab|xy , and B ab|xy to make the constraints explicit. Note that a,b,x,y p(x, y)C ab|xy q(a, b|x, y)− ≤ L for any feasible {C ab|xy }, in Eq. (62), as long as all the π(λ), A ab|xy , and B ab|xy are non-negative coefficients. We obtain the dual function (which is here actually the dual function of the dual problem) by taking the supremum of the Lagrangian over the given (here the dual) variables. More precisely, the dual function is given by The dual function is unbounded from above, unless certain constraints (the primal constraints) are met. This can for instance be seen by realizing that unless (−1+ λ π(λ)) = 0, it is always possible to make the term (−1 + λ π(λ)) arbitrarily large, since is now treated as an unconstrained variable. We obtain the primal program by minimizing the dual function under these constraints. The primal program is given by Primal problem (nonlocality): (65) given : q p minimize A ab|xy ,B ab|xy ,π(λ) a,b,x,y A ab|xy subject to: Now, by definition of the 1 -distance between two normalized probability distributions, the optimal value of Eq. (65) is exactly N(q p ) (see, for instance, Remark 4.3 in [87]). Since we are dealing with a linear program, there is no duality gap between the primal and the dual formulation. Hence, N(q p ) describes the maximal Bell violation possible with the probability distribution q p . Next, we need to show that S( σ p A ) corresponds to the optimal steering inequality violation. The procedure is the same as before for the nonlocality. However, this time we start from the primal problem.
Primal problem (steerability): (66) given : subject to: First, we need to rewrite the trace norm explicitly in SDP form. We use the following formulation of the trace norm (see, e.g. [88]): This leads to the primal problem in explicit SDP form Primal problem (steerability): (67) given : subject to: where we introduced the Hermitian matrices U a|x , W a|x . This allows us to state the Lagrangian Note that S( σ p A ) ≥ L for any feasible point of the dual problem in Eq. (66). Analogous to the optimization for the nonlocality before, we can now formulate the dual function G({H a|x }, ) = inf σ λ ≥0,U a|x ,W a|x L. We obtain the following: 2Tr[H 12 a|x σ a|x ] + By identifying the conditions (the dual constraints) that make the dual function bounded we obtain the following dual program Dual problem (steerability): (71) given : Tr[σ a|x 2H 12 a|x ] + subject to: We can rewrite the dual problem in a more convenient form. By identifying the SDP formulation of the spectral norm (see, e.g. [88]) Z ∞ = min t t : t1 Z Z † t1 ≥ 0 and substituting = −˜ and , we arrive at Dual problem (steerability): (72) given : To finally arrive at the dual formulation equivalent to the statement in Eq. (60) we shift the variables such that G a|x = G a|x + 1 2 . This leads to Dual problem (steerability): (73) given : Note that it follows again directly that the classical bound fulfills = max where τ admits an LHS as defined in Eq. (26). As last step of the proof, we note that we can always find a strictly feasible point in the SDP corresponding to Eq. (73) by choosing the G a|x proportional to the identity and sufficiently large. Hence there is no duality gap due to Slater's theorem (see e.g. [73]). Therefore, S( σ p A ) can be written as optimized steering inequality, which concludes the proof.

D Entanglement as upper bound for the steerability
Here, we show that the geometric entanglement E 1 (ρ) defined in Eq. (3) as where Sep(H ⊗ H) with H ∼ = C d is the set of separable states, upper bounds the steerability S( σ p ) of the assemblage σ p . More precisely, when σ p is obtained by performing d-dimensional measurements form any WMA M p onto any state ρ ∈ S(H ⊗ H) via σ a|x = Tr 1 [(M a|x ⊗ 1)ρ] we show that it follows S( σ p ) ≤ E 1 (ρ). Let ρ * S be the closest separable state with respect to the given state ρ. It holds, where we used in the first line the definition of the steerability S( σ p ) and the fact that separable states ρ S cannot lead to steering. In the second line, we used the variational characterization of the trace norm by introducing the optimization variables {O a|x } a,x . In the third line, we used some basic property of the trace and the partial trace. Next, we used in the fourth line, that we can interchange the sum over a and the maximization. In the sixth line, we upper bounded the trace by its absolute value. In the seventh line, we used the Hölder inequality. Finally, in the last line we used the fact that − a (M a|x ⊗1) ≤ a (M a|x ⊗O a|x ) ≤ a (M a|x ⊗1) in the positive semidefinite sense. This lets us find as an upper bound a M a|x ⊗ 1 ∞ = 1, due to the completeness relation of the POVMs. Therefore, the entanglement E 1 (ρ) limits the steerability S( σ p ) ≤ E 1 (ρ).

E SDP formulations of incompatibility
Here, we give detailed information about the SDP formulations in Eq. (39) and Eq. (40). As an example, we explicitly derive the primal and the dual formulation of the incompatibility quantifier I (M p ). More specifically, we show that the incompatibility I (M p ) is the optimal value of the following two SDPs.
Primal problem (incompatibility): given : Dual problem (incompatibility): (77) given : subject to: The formulation of the primal problem heavily relies on the SDP formulation of the diamond norm due to Watrous [72], see also [89]. Let us recall that the Choi-Jamiołkowski-matrix of a measure-and-prepare channel (see Eq. (12)) corresponding to one POVM M x = {M a|x } a is given by where the transpose is with respect to the computational basis. The diamond distance between the quantum channels Λ Mx and Λ Fx can now be computed as Using this form of the diamond norm, the primal problem in Eq. (39) follows by summing over the settings x weighted with probabilities p(x) and by explicitly minimizing over the Choi-Jamiołkowskimatrices J(F x ), where F x is the POVM corresponding to setting x of the free assemblages F. To arrive at the specific SDP for the incompatibility in Eq. (76) from the general formulation in Eq. (39) we first note that the spectral norm Tr 1 [Z x ] ∞ of a positive semidefinite matrix Tr 1 [Z x ] can be written as the minimal value a x such that a x 1 ≥ Tr 1 [Z x ] holds. Next, we write out the Choi-Jamiołkowski-matrices corresponding to the channels Λ Mx and Λ Fx in terms of the POVM elements M a|x and F a|x . Finally, we constraint the F a|x explicitly to be JM i.e., that it holds To derive the dual formulation in Eq. (77), we formulate the Lagrangian of the primal problem by incorporating the constraints explicitly. The Lagrangian is given by subject to: which is formally the dual program to the primal formulation of Eq. x is already included in the optimization, we can simply ignore it. Second, we rewrite the first term of the objective function as ]. This shows that only the blockdiagonal entries of C x are important. Note that the same observation holds for the constraints C x is involved in. It is therefore no loss of generalization to assume C x as block diagonal. We denote With this, we arrive at Dual problem (incompatibility): subject to: Next, we note that it is always possible (without loss of optimality) to chose p( subject to: which is equivalent to the SDP (77). Note that by virtue of the first constraint in Eq. (84), it follows that The bound can be seen by multiplying all the inequalities L ≥ a,x p(x)v(a|x, λ)C a|x ∀ λ with G λ , then summing over all λ and taking the trace. The equality follows from the fact, that a strict inequality would contradict with the maximization of the objective function.
As a final step in the proof, we need to show that there is no duality gap between the primal and the dual program. This follows from Slater's theorem (see e.g. [73]) since it always possible to find a strictly feasible point in either the primal or the dual problem. This can be seen directly for the dual program in Eq. (77), as we can chose all C a|x to be proportional to the identity and adjust the ρ x and L accordingly.

F Optimal input distribution
Here, we give an example where an optimization over the input distribution p = {p(x)} for the settings x is relevant to optimize the available resources. In particular, we show that for the incompatibility I (M p ) (see Eq. (24)) of a measurement assemblage with only two settings, the optimal incompatibility is not always achieved for a uniform distribution p(1) = p(2) = 1 2 . The idea is to introduce noise in only one of the measurement settings, here for x = 2. Let us consider an MUB measurement assemblage N containing m = 2 POVMs, constructed in the same way as the assemblages considered in Table 2. From the MUB measurement assemblage N we obtain the measurement assemblage M via where µ ∈ [0, 1] is a depolarizing noise parameter for the second measurement. In the following, we analyze how to choose the probability distribution p = {p(x)} such that the incompatibility I (M p ) is maximized for the given assemblage M. As mentioned in section 4, the SDP (40) can be rewritten such that it includes a maximization over the input distribution p = {p(x)}. We illustrate our results in Figure 4 for the optimal setting probabilities p(x) of the assemblage M in dimension d with noise parameter µ. As one can see, even for only two measurements, strong biases towards one setting can be necessary in order to maximize the incompatibility I (M p ). We want to remark that except for the noise free case, i.e. µ = 1, the optimized input distribution leads to a strictly larger incompatibility than with a uniform distribution. Note that in this particular example, the advantage is weak as can be seen in Figure 5. However, an optimization over the distribution p can lead to a strong increase in incompatibility for m ≥ 3, by essentially neglecting weakly incompatibly subsets of measurements. On a qualitative basis, this effect can be explained in terms of the informativeness IF (M p ) (see Eq. (20)). For large noise (e.g. µ = 0.1) the distribution is strongly biased towards the noise-free, hence more informative measurement.

G SDP formulation of informativeness and coherence
While the calculations in Appendix E are specific to the quantifier I (M p ) and the QRT of incompatibility, analogous considerations can be made for any resource that has a free set F that admits a formulation as an SDP. In order to not repeat almost the same calculation as above, we simply state the corresponding SDP formulations for the coherence and the informativeness in the following. We start with the latter. The informativeness IF (M p ) is given as the optimal value of Figure 4: The optimal input probability p(x = 1) for the first (noise free) measurement setting of the assemblage M in Eq. (86) depending on the dimension d and the noise parameter µ. The plot shows the optimal probability p(x = 1) which maximizes the incompatibility I (Mp). It can be seen that a uniform distribution is only optimal in the absence of noise (i.e. µ = 1). Especially for high noise regimes (e.g. µ = 0.1) a strong bias towards the noise free measurement can be seen. However, this strong bias decreases with increasing dimension d.
the following SDPs.
Primal problem (informativeness): (87) given : M p minimize ax,Zx,q(a|x) x p(x)a x subject to: Dual problem (informativeness): given : The optimization variables of the primal problem are the positive coefficients a x , the positive semidefinite matrices Z x and the probabilities q(a|x). The optimization variables of the dual problem are the positive semidefinite matrices C a|x , ρ x , and the scalars x . Note that it follows Figure 5: Comparison of the incompatibility I (Mp) between the optimal input distribution (dashed lines) and the uniform distribution (solid lines) depending on the noise parameter µ for a given dimension d. It can be seen that the optimized input distribution outperforms the uniform distribution for the measurement assemblage described in Eq. (86). For low noise regime (µ close to 1) the solid and the dashed lines approach each other, as the uniform distribution is optimal for µ = 1.
directly from the first constraint of the dual that This can be seen by realizing that where we multiplied both sides with the conditional probabilities q(a|x). We identify q(a|x)1 = F a|x due to the definition of the UI measurements in Eq. (19). Finally, we sum both sides over a and x. The equality follows from the fact that we maximize the objective function. Like for the incompatibility, the SDP formulations of the informativeness IF (M p ) allow us to gain additional insight on the informativeness of WMA M p . In particular, we show in the following that the informativeness IF (M p ) of any set of rank−1 projective measurements is given Coherence.-Finally, the coherence C (M p ) can also be computed by SDPs. In particular, C (M p ) is the optimal value of the following two SDPs.
Primal problem (coherence): (90) Dual problem (coherence): given : The optimization variables of the primal problem are the positive coefficients a x , the positive semidefinite matrices Z x and the coefficients α i|(a,x) . The optimization variables of the dual problem are the positive semidefinite matrices C a|x , ρ x , and the scalars x,i . With the same reasoning as with the previous resources, it can directly be seen that We use these insights about the informativeness IF (M p ) and the coherence C (M p ) to identify non-trivial cases for which it holds that IF (M p ) = C (M p ) in the following. We start by considering assemblages M p where every POVM is a rank-1 projective measurement that is mutually unbiased to the incoherent basis. More formally, it has to hold Note that this holds true for appropriately chosen assemblages M of MUB measurement assemblages that are also mutually unbiased to the incoherent basis. However, it is actually not necessary that the measurements within the assemblage are MUB themselves. All what is needed is that Eq.

H More distances
In the main text, we defined general distances between measurement assemblages in Definition 1. However, so far we only focused on one particular distance. Here, we introduce more examples of distances for measurements and discuss their basic properties. We start by introducing the Schatten p−norm functions D p (M p , N p ) for p ∈ [1, ∞), defined as where X p = (Tr[|X| p ]) 1/p is the Schatten p−norm of X. Note that the cases p = 1 and p = ∞ correspond to the trace norm, respectively the spectral norm. While the functions D p (M p , N p ) will generally not fulfil the monotonicity under Hilbert-Schmidt adjoint channels Λ † according to Definition 1, we will show in the following that the p = ∞ case corresponds to a proper distance. Note that for p = 1, the monotonicity under quantum channel Λ † is not fulfilled, which can be seen by considering trivial extensions of the form Λ † (M a|x ) = 1 ⊗ M a|x . Nevertheless, we also define the induced functions We formulate the following theorem to show that D ∞ (M p , N p ) is a distance between measurement assemblages. For the monotonicity under quantum channel, we consider a more general data-processing type inequality for the ∞− distance between two POVM elements. Namely, for Λ † (M a|x ) and Λ † (N a|x ), where Λ † is a unital completely positive map, it follows where we used the dual representation of the Schatten-∞ norm and the fact that maximum is always achieved for a density matrix ρ (more specifically the projector onto the eigenvalue of largest absolute value of Λ † (M a|x ) − Λ † (N a|x )). Furthermore, we used that the adjoint of the unital completely positive map Λ † is a CPT map Λ which maps density matrices onto density matrices and therefore shrinks the state-space one optimizes over.
The monotonicity under classical simulations ξ(M p ) q follows by direct computation, where we used the following properties.
In the first line, we used the definition of D ∞ (ξ(M p ) q , ξ(N p ) q ) by introducing the assemblages M q = ξ(M p ) q and N q = ξ(N p ) q where we inserted M b|y = x p(x|y) a q(b|y, x, a)M a|x and N b|y = x p(x|y) a q(b|y, x, a)N a|x directly. In the second line, we used the triangle inequality. In the third line, we performed the sum over b. Finally, in the fourth line, we used that y q(y)p(x|y) = p(x), which leads exactly to the definition of D ∞ (M p , N p ) from which the monotonicity under classical simulations ξ follows. Therefore, D ∞ (M p , N p ) is a distance between measurement assemblages according to Definition 1.
The proof of the joint-convexity of D ∞ (M p , N p ) follows exactly the same lines as the proof of the joint-convexity of D (M p , N p ) in Theorem 1 and can be adapted from there.
Even though they are not resource monotones generally, the functions R p (M p ) in Eq. (95) can be used to bound the resource quantifier R (M p ) defined in Eq. (15). More specifically, we derive in the following the bounds on the diamond distance based quantifier R (M p ) given by where d is the dimension of the Hilbert space H the POVMs from M act on. Note that due to the monotonicity of Schatten norms, it holds X p ≤ X p for p ≥ p from which the bounds where we used in the first line the definition of R (M p ) and in the second line that ρ = |Φ + Φ + | is a feasible point within the maximization over the quantum states within the diamond norm. In the third line, we used that Tr 1 [(M a|x ⊗ 1)|Φ + Φ + |] = 1 d M T a|x , where the transposition is with respect to the computational basis. Finally, we can use that a transposition does not change the singular values. The monotone R ∞ (M p ) in particular is not only a valuable tool to bound the diamond distance R (M p ) but is also interesting in itself. More specifically, we show in the following that R ∞ (M p ) obeys also a measurement hierarchy similar to that in Eq. (30). Let IF ∞ (M p ), C ∞ (M p ), and I ∞ (M p ) be the informativeness, coherence, and incompatibility of M p as measured by the distance R ∞ (M p ) in Eq. (95) with respect to the free sets F UI , F IC , and F JM . It follows directly from F UI ⊂ F IC ⊂ F JM that the hierarchy holds. Moreover, it can be shown that I ∞ (M p A ) ≥ S( σ p A ) (remember that we already showed that S( σ p A ) ≥ N(q p )). This follows from the direct computation for any quantum state ρ of appropriate dimension and the closest JM assemblage F * to M (with respect to the monotone I ∞ (M p )): where we first used that JM measurements always lead to unsteerable assemblages. Second, we used that the trace norm is non-increasing under partial traces. Third, we used the Hölder inequality. It therefore follows, that the hierarchy holds.
Another distance for measurement assemblages that can be considered is based on the 1distance between probability distributions. More specifically, the induced 1 -distance between two WMAs is given by Hence, D 1 (M p , N p ) is a jointly-convex distance function which induces the distance-based convex monotone R 1 (M p ) = min F ∈F D 1 (M p , F p ). (106) Note that while it follows directly that R 1 (M p ) will naturally induce a hierarchy between the informativeness, coherence, and the incompatibility of a WMA M p , it is not clear whether there exist steering or nonlocality monotones that are in natural correspondence to it. Note further that in the context of coherence of single POVMs, this kind of statistical measure has also been defined by Baek et al. [25].
Even though it is not clear whether a complete hierarchy of measurement resources holds, the quantifier R 1 (M p ) is important, as it can be seen as limiting case of the quantifier R (M p ) when the maximization is performed only over product states. More formally, it holds where we used in the first line that we maximize only over the set of product states. In the second line we used the definition of the partial trace and finally, we used that states ρ B are normalized in the 1−norm and identified the last line with the definition of the induced 1 -distance quantifier R 1 (M p ). In Appendix I, we show that the quantifiers R 1 (M p ) and R ∞ (M p ) coincide with R (M p ) in the case of dichotomic measurement assemblages.

I Dichotomic measurements
Here, we show an additional property of R (M p ) that can be useful for the case where we consider measurement assemblages M with only two outcomes for each setting x. We show that in this special case, the diamond distance quantifier R (M p ) is equivalent to R ∞ (M p ) and R 1 (M p ). Consider the WMAs {M 1|x , 1 − M 1|x } x and {F 1|x , 1 − F 1|x } x . Remember that we already showed previously that R (M p ) ≤ R ∞ (M p ), so we only need to show that in the case of dichotomic measurements it also holds R (M p ) ≥ R ∞ (M p ) = R 1 (M p ). This follows directly via where we used the bound R (M p ) ≥ R 1 (M p ) and the fact that for dichotomic measurements both outcomes contribute equally towards R 1 (M q ). Finally, we used that this holds also true for R ∞ (M p ). Alternatively, it is also enough to see that the same ρ A is optimal for both outcomes, which leads to the conclusion that R ∞ (M p ) = R 1 (M p ). Note that the above result shows that entanglement does not offer an advantage in distinguishing two measure-and-prepare channels for dichotomic measurements by means of the diamond norm.