Only Classical Parameterised States have Optimal Measurements under Least Squares Loss

Measurements of quantum states form a key component in quantum-information processing. It is therefore an important task to compare measurements and furthermore decide if a measurement strategy is optimal. Entropic quantities, such as the quantum Fisher information, capture asymptotic optimality but not optimality with finite resources. We introduce a framework that allows one to conclusively establish if a measurement is optimal in the non-asymptotic regime. Our method relies on the fundamental property of expected errors of estimators, known as risk, and it does not involve optimisation over entropic quantities. The framework applies to finite sample sizes and lack of prior knowledge, as well as to the asymptotic and Bayesian settings. We prove a no-go theorem that shows that only classical states admit optimal measurements under the most common choice of error measurement: least squares. We further consider the less restrictive notion of an approximately optimal measurement and give sufficient conditions for such measurements to exist. Finally, we generalise the notion of when an estimator is inadmissible (i.e. strictly worse than an alternative), and provide two sufficient conditions for a measurement to be inadmissible.


Introduction
Extracting information from systems requires performing a measurement. The outcome of the measurement is then used to infer information about some property of the system of interest. Parameter estimation is the well-established study of different strategies of measurement and the subsequent inference of some number of parameters of a system. In particular, when the underlying system is quantum then understanding the non-classical features of quantum measurement becomes of paramount importance and forms the central object of study in quantum metrology. The asymptotic regime of such problems, corresponding to an infinite number of measurements, has been extensively studied [1,2]. This regime is also known as local estimation, as one only considers infinitesimal changes in parameter values. The quantum Fisher information [1,3] has emerged as the standard metric of asymptotic information for a parametrised quantum state. Its operational significance is emphasised by the quantum Cramér-Rao bound [1,3] where its inverse lower-bounds the variance of locally unbiased estimators. However, asymptotically optimal measurements, as characterised by the quantum Fisher information, may be sub-optimal in finite-resource settings (see below). Additionally, in practice, parameter estimation is a global problem; one must consider parameters across the whole parameter space. Global estimation is often studied within the Bayesian regime, where one has some a priori belief about the distribution of the parameter. Optimal measurement choices in the Bayesian regime are well understood [4,5], however their optimality relies on an accurate choice of prior. To the best of our knowledge, a thorough analysis of measurement optimality in the most general setting of global parameter estimation without prior knowledge is missing.
In this paper, we provide such an analysis. Our methodology relies on the fundamental property of expected errors of estimators, known as risk, defined below. We establish a fundamental result of quantuminformation processing: only classical parametrised states [1], those that are a classical mixture of some parameter-independent pure states, admit optimal measurements. We go on to propose three ways in which a measurement can be approximately optimal, and we prove that if a state is close to being classical, there exists an approximately optimal measurement. Finally, we consider when one measurement dominates (is strictly better than) another measurement. We present two "bad" classes of measurements-those that extract no information from a system and those that can be fine-grained. We show that these measurements are always dominated by suitable different measurements, which we construct.

Technical Background
We begin by reviewing classical parameter estimation [6]. As an example, suppose one wishes to conduct an experiment to estimate q ∈ [0, 1], some unknown proportion of a population satisfying some property. We sample some subset of n members of the population and observe a realisation of a binomially distributed random variable X ∼ B(n, q). We do not, a priori, know X's distribution, only that it will be binomially distributed according to the parameter q. Based off a realisation x of X, one gives an estimate of q, most commonly x/n. In the general setting, one wishes to estimate some parameter θ, which could be any of the elements of the parameter space Θ. We observe the outcome of some random variable X, whose distribution is unknown to us, but is contained in the set {P θ : θ ∈ Θ}. An estimator of θ is then a function θ(x), whereθ(x) encodes our estimate of the underlying parameter θ in the case that X takes the value x.
To discuss the merit of estimators, one introduces a loss function L, where L(θ 1 , θ 2 ) quantifies how bad a guess of θ 1 if the true parameter were θ 2 . Common examples include least squares L(θ 1 , θ 2 ) = ||θ 1 − θ 2 || 2 , the Kullback-Leibler divergence [7] and the Bhattacharyya distance [8]. The loss function induces a risk function for an estimator: (1) R(θ, θ) measures how badly one expects an estimator to perform for a fixed value of the underlying parameter θ. An estimator is "good" at the point θ if it has a low risk there. We say thatθ dominatesθ ifθ is always at least as good asθ and sometimes better, i.e., ∀θ ∈ Θ, R(θ, θ) ≤ R(θ, θ).
Ifθ is dominated by some estimatorθ one says that it is inadmissible, otherwise it is said to be admissible. Admissibility is a very weak condition; constant estimators-always guessing θ 0 regardless of the outcome of the experiment-are often admissible [9], since they have zero risk at θ 0 . However, admissibility is certainly desirable, or, by definition, one can choose a nominally "better" estimator.
In the quantum setting [10, 11], our parameter is encoded in some quantum state ρ(θ). Often, one may have n copies of the same state σ(θ) in which case ρ(θ) = σ(θ) ⊗n . We choose some (parameter independent) generalised measurement M = {M k }, where each M k is a linear operator satisfying M k ≥ 0, and k M k = 1. This gives rise to a probability distribution parametrised by θ: Thus, after measurement, the quantum experiment reduces to a classical parameter-estimation problem and one can pick an estimatorθ M (k) using the probability distribution in Eq. (3). Hence, the objects of interest in quantum parameter estimation are pairs (M,θ M ), where M is a generalised measurement and θ M is an estimator based off the outcome of M . Note that an experimenter may choose a POVM without any particular estimatorθ M in mind. However when the experimental data is eventually used to estimate the parameter, the way in which the data is processed exactly corresponds to picking an estimator. Often, when it is clear from context, we will omit the M label of the estimator. We can compare the performance of estimators based on different measurements. In particular, for two pairs (M,θ M ) and (F,θ F ) we writê θ M ≤θ F to mean thatθ M is always at least as good , in close analogy to equation (2). Before we introduce our notion of measurement optimality, we briefly discuss how the commonly used notion, based on the (quantum) Fisher information F(θ) 1 , is, in general, unsatisfactory. F(θ) is an entropic quantity that characterises the performance of estimation protocols in the asymptotic limit: asymptotic risk is lower-bounded by Tr(F(θ) −1

Optimal Measurements
Below, we extend the aforementioned framework for estimator comparisons to include quantum measurements. Here, we define what we argue is the only pos-1 For the "single parameter" case Θ ⊆ R, the quantum Fisher information F (θ) is defined as Tr(ρ(θ)Λ(θ) 2 ). Here, Λ(θ) is the symmetric logarithmic derivative, defined implicitly by the equation 2∂ θ ρ(θ) = Λ(θ)ρ(θ) + ρ(θ)Λ(θ). For the "multiparameter case" of Θ ⊆ R N with N > 1 there are multiple definitions of the quantum Fisher information; for a review see [1]. sible such extension, unless further modelling assumptions are made. We begin by extending the notion of admissibility of estimators to measurements. We say that a generalised measurement M is at least as good as a generalised measurement F , written M F if If M F , then if one swaps an experiment's measurement from F to M , then whatever estimation strategŷ θ F was in use before, one can pick a new oneθ M that is at least as good asθ F regardless of the true underlying parameter. If, however, M F , then one cannot do so-by swapping to M , there will be at least one estimation strategyθ F such that whatever new strategyθ M one picks there are some values of the underlying parameter where one would expect to do worse. Thus, our definition is the natural and only way one can define M as being at least as good as F without further assumptions on the estimation problem. We say that M dominates F , written M ≺ F , if M F , but F M . In analogue with the classical case, we say that a measurement M is admissible if no other measurement dominates it. Clearly, as in the classical case, admissibility of a measurement is strongly desirable. There is then a natural definition of an optimal measurement: we say that M is optimal if for all other possible measurements F , M F . This is the strongest and most general definition of optimality one could make-we emphasise that any other definition must make some additional modelling assumptions (e.g., some a priori belief about the unknown parameter). Nonetheless, and somewhat surprisingly, we show that optimal measurements exist for a family of parameter estimation problems.
Consequently, by swappingθ F toθ M we can only decrease the risk and thus M F .
Note that the parameter may be a vector (N > 1), so that Theorem 3.1 holds in the multi-parameter setting. States where the eigenbasis of the quantum state may be expressed in a parameter independent way, are called classical parametrised states [1]. More precisely, a parametrised quantum state ρ(θ) is called classical if there exists a θ-independent eigenbasis {|i } and probabilities {p i (θ)} such that ρ(θ) = i p i (θ) |i i|. For example, classical states are relevant for estimating temperature, where the system is in a (thermal) state: Theorem 3.1 shows that measuring the energy of the system is always an optimal strategy, regardless of the ensuing choice of estimator.
As a second example, we consider estimating the strength of depolarizing noise acting on a fixed quantum state. Suppose that depolarising noise acts on some d-dimensional pure state |ψ with probability 2 p ∈ [0, 1] to produce the mixed state: Theorem 3.1 shows that measuring in any basis containing |ψ is optimal for estimating p.
It is natural to ask whether there exist any parametrised quantum states that are not classical, but still admit an optimal measurement. We give a partial converse to Theorem 3.1, in the case that the loss function is least squares. The proof of this Theorem is given in Appendix B. However, here we provide a sketch of the proof.
Proof Sketch. We assume the existence of an optimal measurement M and deduce that ρ(θ) must be classical. We first restrict our analysis to the singleparameter (N = 1) case. We fix θ 1 , θ 2 ∈ Θ and consider the restricted parameter space {θ 1 , θ 2 } ⊆ Θ. We then consider the Bayesian scenario of being given the states ρ(θ 1 ) with probability p and ρ(θ 2 ) with probability 1 − p. We show that if an optimal measurement exists for this Bayesian parameter-estimation problem, then ρ(θ 1 ) and ρ(θ 2 ) commute. However, we also show that M must be optimal for any such Bayesian problem. Thus, we deduce that if an optimal measurement exists on Θ, all {ρ(θ) : θ ∈ Θ} must commute. We use this to simultaneously diagonalise the {ρ(θ)} and thus we deduce that ρ is classical. By restricting to lines in higher dimension parameter spaces, we generalise the single parameter case to the multiparameter case.
Thus, for ρ(θ) non-classical, there is no optimal measurement strategy. Whatever measurement M one picks, there is always a measurement estimator pair (F,θ F ) that one cannot guarantee to do at least as well as. Some trade-off or additional modelling assumption is required to decide on the best measurement strategy.
To develop physical intuition for Theorems 3.1 and 3.2, it is useful to consider distinguishablity of states. For σ and ν quantum states, the optimal measurement to distinguish them [10] is with respect to the eigenbasis of σ − ν. If ρ(θ) is a classical parametrised state, then for any θ 1 , θ 2 ∈ Θ the eigenbasis of ρ(θ 1 ) − ρ(θ 2 ) is constant and thus the same measurement is optimal for distinguishing all possible pairs of states. Since it is pairwise optimal, it is intuitive that this measurement is then "globally" optimal for parameter estimation. For non-classical states, different measurements are better at distinguishing different parts of the parameter space. This effect is best demonstrated by the "most non-classical" parametrised state: a pure state.
Consider a measurement M of the state defined by measuring with respect to the basis {|+ , |− }. M saturates the quantum Fisher information (for all θ) [11] and is thus a good candidate for an optimal measurement. It is easy to check that the outcome probability distribution is Consider a different measurement F , defined by measuring with respect to the basis Measuring with respect to F gives rise to a probability distribution of Define an estimatorθ F byθ F (1) = π/4,θ F (2) = π/2. Note in particular thatθ F has zero risk at π/4 and thus we have a non-constant estimator that has zero risk at π/4. If M were optimal there would exist some estimatorθ M such thatθ M ≤θ F . But then we needθ M to have zero risk at π/4, but from equation (10), we see this is only possible with the constant estimatorθ M (+) =θ M (−) = π/4. However, in this case R(θ F , π/2) < R(θ M , π/2) and we deduce that M F . Thus M is certainly not optimal. Note that M and F both have maximal Fisher information and thus the measurements both have the same asymptotic, local, estimation ability. However, within global estimation they are not interchangeable; M cannot discriminate between [0, π) and [π, 2π) whereas F cannot discriminate between [π/4, 5π/4) and [5π/4, 9π/4). Thus, given a large number of copies it would be a poor strategy to measure all of the states with M or all of the states with F as one could not distinguish θ from θ + π. Thus in practice, one would use a collection of different measurements; see for example [15].
The above argument generalises to other pure states. Suppose one has some (generically multiparameter) parametrised pure state |ψ(θ) . Then for any fixed θ * , letting F θ * be a measurement in some basis containing |ψ(θ * ) , one can clearly construct an estimator with zero risk at θ * . However, in general, the only estimatorsθ F θ * with zero risk at some θ = θ * will be constant. Thus we expect none of the {F θ * |θ * ∈ Θ} to be optimal and thus expect no optimal measurement for |ψ(θ) .

Approximately Optimal Measurements
Since the previous section shows that optimal measurements (under least-squares loss) only exist for the very restrictive class of classical parametrised states, it is natural to ask whether measurements can be approximately optimal. There are three clear candidates for defining when a measurement is approximately optimal: (ii) A measurement M is η-multiplicatively optimal if, for any other measurement-estimator pair (F , θ F ) there exists an estimatorθ M such that for all θ ∈ Θ, (iii) A measurement is δ-locally optimal if, for any other measurement-estimator pair (F ,θ F ) there exists an estimatorθ M and subset S ⊆ Θ such that S has volume (measure) less than or equal to δ and for all θ in Θ \ S, We can combine the third definition with either of the first two. For example, a measurement is -additively and δ-locally optimal if, for any other measurementestimator pair (F ,θ F ) there exists an estimatorθ M and subset S ⊆ Θ such that S has volume (measure) less than or equal to δ and for all θ in Θ \ S, For each of the three definitions of approximate optimality, there is a corresponding notion of "closeness" of quantum states such that if ρ(θ) is close to being classical, there is an approximately optimal measurement-approximately classical implies an approximately optimal measurement. We briefly review each of the results here, the first two of which are proved in Appendix C. The proofs are made up of a series of technical inequalities. We start with additive optimality, considering closeness in trace norm: for any matrix A, define its trace norm as We denote We can now state the first result: For the multiplicative error, we need a different notion of closeness, namely the maximum relative entropy [16]: for quantum states ρ and σ, we define We can then state the multiplicative result: The final closeness result is for local optimality. For a subset of real vectors Y ⊆ R p , denote its volume (Lebesgue measure) by |Y |. Proof. Take the optimal measurement for ρ| Γ . This clearly has the desired property.
Note that it is not expected that being close to a classical state is necessary for an approximately optimal measurement to exist. For example, pure states |ψ(θ) where the quantum Fisher information can be saturated are expected to have an approximately optimal measurement for a large number of copies |Ψ(θ) = |ψ(θ) ⊗n , n 1. However, a large number of copies of a pure state is still a pure state and thus such states are far from being classical.

Admissibility of Measurements
Our definition of when a measurement is better than another (M ≺ F ) naturally gave rise to a definition of admissibility-a measurement M is admissible if no other measurement dominates it. In this section we investigate two classes of intuitively "bad" measurements and demonstrate their inadmissibility under some mild assumptions. To state these assumptions, we must introduce a specific class of loss functions-Bregman divergences [17]. Specifically, for a convex set Θ ⊆ R p and any real-valued continuously differentiable, strictly convex function g(θ), the Bregman divergence associated to g is given by A Bregman divergence is any function that can be written in such a fashion. For example, least squares is a Bregman divergence with g(θ) = ||θ|| 2 and the Kullback-Liebler divergence is a Bregman divergence with g(θ) = i θ i log(θ i ). Bregman divergences may be viewed as generalisations of the least-squares loss function. Appendix D reviews two of their properties that we require for our results. The first class of measurements we discuss are those that can be fine-grained to extract more information from a system. Suppose that one has some parameter estimation problem ρ(θ) and use some measurement F where one knows the post measurement states. That is to say that F = {F † i F i } and, in the case of outcome i, the post measurement state is Intuitively, if some ρ i still depends on θ, then one can, in the case of outcome i, perform another measurement on the system to extract more information about the unknown parameter-refining our measurement. Recall that ρ might be many copies of some density matrix: ρ = σ ⊗n , so that outcome i could be the result of many measurements of individual density matrices. More precisely, we say that F is refineable if there exists an outcome (WLOG the first) as well as θ 1 , θ 2 ∈ Θ such that p 1 (θ 2 ) > 0 and ρ 1 (θ 1 ) = ρ 1 (θ 2 ) 3 . In Appendix D, we prove the following Lemma, that shows that such measurements are indeed inadmissible: Proof Sketch. Constructing an M such that M F is relatively straightforward-in the case of the first outcome, our state still depends on the parameter θ, so one can measure it again. We are always free to ignore the result of this subsequent measurement, and thus it is clear that M F . The proof that F M is somewhat technical and involves the framework of (classical) Bayesian estimation [6], which is outlined in Appendix A.
The second case of inadmissibility that we consider is when a measurement does not extract information from a system. For example, one could do a measurement with a single outcome or measure in a mutually unbiased basis to the basis our parameter is encoded in. Formally, this corresponds to the case where the measurement outcome probabilities are independent of the underlying parameter. Again, in Appendix D, we prove the following Lemma: Lemma 5.2. Let Θ ⊆ R N , L be some Bregman divergence loss function on Θ and ρ(θ) be some nonconstant parametrised state. Suppose that F = {F k } is a measurement whose outcome probabilities are independent of θ, then F is inadmissible under leastsquares loss.
Proof Sketch. There are two main ideas in this proof. First, we show that the only admissible estimators for F are constant estimators (described above) and thus any measurement M satisfies M F . Second, we find an M dominating F using the Bayesian framework.
Note both results Lemmas show that the identity measurement (i.e. doing nothing and just guessing) is admissible iff. the state ρ is constant. Thus no matter how weakly a system varies, there is always a measurement that extracts useful information from it.

Conclusion
In Conclusion, we have defined the most general way in which the performance of quantum measurements for parameter estimation can be compared. As we have argued, any other comparison would require additional modelling assumptions. Remarkably, we demonstrated a class of parametrised states-classical ones-that admit optimal measurements, even at this level of generality. Under some more strict assumptions, we have shown that these are the only such states. Since only a very restrictive class of parametrised states have optimal measurements, we proposed several criteria for when a measurement may be considered approximately optimal. A further direction of research that appears interesting is characterising when a state has an approximately optimal measurementwe have given a selection of sufficient conditions but, as argued, they are not necessary. In particular, understanding approximately optimal measurements in the asymptotic regime could provide a new perspective on well established ideas, such as the quantum Fisher information. Finally, we have demonstrated two inadmissible classes of measurements-those that can be refined or those that do not extract information from the quantum state of interest. Another possible further direction of research would be to attempt to characterise the admissible estimators for a given parametrised quantum state.

A Bayesian estimation
In this section we introduce Bayesian estimation, which we will need for the proofs of the results from the main text. In a Bayes setting, one has some a priori belief about our underlying parameter [9]. For example, one may believe that it is likely to be centred around some value, and correspondingly very unlikely to be far away from it. Formally, this corresponds to a probability distribution π(θ) on our parameter space. A common choice is a normally distributed prior N (µ, Σ); to encode an underlying expectation of our parameter µ along with decay of probability away from µ encoded by Σ. Since one has some probability distribution on our parameter space, one can now talk about the risk of an estimator averaged across different possible values of the parameter rather than at a specific parameter value. That is, one defines the Bayes risk corresponding to π as An estimator isθ B is said to be Bayes if it minimises the Bayes risk, that is for any other estimator θ, R π (θ B ) ≤ R π (θ). We note the following fact about Bayesian estimators.
In the quantum case, where one has a state ρ(θ), one must minimise the Bayes risk over estimators based off a fixed measurement and then additionally minimise this across all possible measurements. Thus one searches for a measurement-estimator pair which minimises the Bayes risk. In such a case, both the measurement and estimator are described as Bayes.
In the single parameter case (Θ ⊆ R), we also introduce some notation: for any measurement-estimator pair (F,θ F ), we define Where F i are the measurement operators of F . For any prior π(θ) on Θ, we define two operators B Proof of Theorem 3.2 In this section, we prove that only classical parametrised quantum states have optimal measurements. We specialise to the case that our parameter θ can be expressed as a real vector -Θ ⊆ R N and that we use least-squares loss -where L(θ 1 , θ 2 ) = ||θ 1 − θ 2 || 2 . We now present the proof of Theorem 3.2 as a series of Lemmas. We first restirct to the single parameter (N = 1) case. We fix θ 1 , θ 2 ∈ Θ and consider the restricted parameter space {θ 1 , θ 2 } ⊆ Θ and show that ρ(θ 1 ) and ρ(θ 2 ) commute. This is achieved by considering the Bayesian measurements associated with all possible priors on {θ 1 , θ 2 }. This then shows that if an optimal measurement exists on Θ, all {ρ(θ) : θ ∈ Θ} must commute. We use this to simultaneously diagonalise the {ρ(θ)} and thus deduce ρ is classical. By restricting to lines in higher dimension parameter spaces, we use the single parameter case to prove the multiparameter case. Proof. Note that this lemma, and most of this proof, is presented in [4] as a sufficient condition. We prove that the conditions are also necessary. Fix some measurement-estimator pair (F,θ F ). Note that we may write the Bayesian risk in terms of Λ and Λ 2 (in the case of least-squares loss): We note that and thus Tr(Λ 2ρ ) ≥ Tr(Λ 2ρ ). Since we want to minimise the Bayes risk, we now check when this inequality is saturated. Note that for any positive semidefinite operator A, where we used the assumption thatρ is full rank, and hence √ρ , is invertible. Thus we saturate the inequality iff.
But since this is a sum of positive semi-definite operators, this can only happen if, for each i, (θ(i)−Λ) † F i (θ(i)− Λ) = 0. Ifθ(i) is not an eigenvalue of Λ, then F i = 0. Otherwise, F i must only have support on theθ(i) eigenspace of Λ. Since Λ = F iθ (i), condition (i) follows. On saturation of the inequality, we see that finding a Bayes measurement reduces to minimising the quantity over Hermitian operators Λ. By differentiating with respect to Λ and setting the derivative to zero, we see that there is a unique minimum and that it satisfies the equation

Lemma B.3. Suppose ρ(θ)
is some single parameter state (Θ ⊆ R) with an optimal measurement M under least-squares loss. Fix θ 1 , θ 2 ∈ Θ distinct. If ρ(θ 1 ), ρ(θ 2 ) both have full rank, then there exist simultaneously diagonalisable Hermitian maps Γ, Γ satisfying Proof. For notaional convenience, let ρ i = ρ(θ i ) for i = 1, 2. For p ∈ (0, 1) fix a prior on Θ where we are given ρ 1 with probability p and ρ 2 with probability 1 − p. Note that for any p ∈ (0, 1),ρ has full rank. By Lemma B.1 we see that M must be Bayesian for any value of p -letθ p M be a Bayes estimator for a fixed value of p. Then Λ = i M iθ p M (i) and M must satisfy condition (i) of Lemma B.2. Then, as a function of p we may expand Λ = k µ k (p) |k k| -in an eigenbasis that does not depend on p. By condition (ii) of Lemma B.2, for any value of p ∈ (0, 1) we know that Λ satisfies As well as being invertible, the smallest eigenvalue ofρ, λ min is bounded below by the minimum of the eigenvalues of ρ 1 and ρ 2 . Consider equation (34) in the limit of p → 0. Let Γ be defined implicitly as the solution to the equation Note since ρ 1 is invertible, by expanding equation (35) in the eigenbasis of ρ 2 , we see that Γ is well defined and is furthermore Hermitian. For some "remainder" matrix R, we expand Λ as Λ = θ 2 1 + 2p(θ 1 − θ 2 )Γ + R. Substituting this ansatz into equation (34), we see that R must satisfy the equation Expanding in an eigenbasis ofρ and recalling that all the eigenvalues ofρ are bounded below by λ min (which is independent of p), we see that R = O(p 2 ). Consider an eigenstate |ψ of Λ with eigenvalue µ(p), then note that Taking the limit as p → 0, we see that |ψ must be an eigenstate of Γ. But, by symmetry, in the limit p → 1, we may define Γ by the equation and |ψ must also be an eigenstate of Γ . Thus Γ and Γ must be simultaneously diagonalisable in Λ's eigenbasis.
Then A and B commute.
Proof. Let Γ and Γ be simultaneously diagonalisable in the orthonormal basis {|i } n i=1 . Let Γ |i = λ i |i , Γ |i = µ i |i , i| A |j = A ij and i| B |j = B ij , for i, j = 1, . . . , n. By positive definiteness, note that A ii , B ii > 0 for any i. Rewriting equations (39) in components of this basis, we reach the set of equations Setting i = j we deduce that Note equation (40) shows A ij = 0 ⇔ B ij = 0. Suppose A ij = 0, then multiplying the two individual equations in (40) and dividing by A ij B ij we see that But x + 1/x = 2 iff. x = 1 and thus we deduce Consider a graph G on n nodes, with an edge between i and j iff. A ij = 0. If G is connected, then for any nodes i, j in G there is path between them. Assuming connectedness, we apply the result of equation (43) along the path, we deduce that A ii B jj = B ii A jj for every i, j = 1, . . . , n. But then summing over j, we deduce that A ii = Tr(A) Tr(B) B ii . Substituting this into equations (40) and (41), we see that A and B are proportional and thus commute. If G is not connected, then we can apply the above procedure to each connected component of G. The result is that A and B are block diagonal, with the diagonal matrix entries proportional and thus A and B commute.
Proof. Again, for notational convenience, let ρ i = ρ(θ i ) for i = 1, 2. Redefine our parameter space Θ = {θ 1 , θ 2 }, but still allow estimators to take any real value. Note, by definition, that M must also be optimal for this parameter estimation problem. Suppose that U = ker(ρ 2 ) ∩ ker(ρ 1 ) = 0. Let U ⊥ be the orthogonal compliment of U and let Π U ⊥ be the orthogonal projection matrix onto U ⊥ . Then note that replacing M with the measurement does not chance any of the measurement probabilities when measuring ρ 1 or ρ 2 . Thus we may WLOG restrict to U ⊥ , which still has an optimal measurement for this restricted parameter estimation problem. Suppose that ρ 2 is not full rank, i.e. it has some nontrivial kernel K ≤ H. Take the measurement E of the two orthogonal projectors Π K and Π K ⊥ , along with an estimatorθ E (K) = θ 1 ,θ E (K ⊥ ) = θ 2 so that Since M is optimal, we have that M E. Then, in particular, there must be an estimatorθ M satisfying θ M ≤θ E . In particular,θ M must have zero risk at θ 2 and thus if Tr(ρ 2 M i ) = 0, then we must haveθ M (i) = θ 2 . Let I = {i | Tr(ρ 2 M i ) = 0}, then, by the above, θ M must satisfy with equality iff. for every i ∈ I,θ M (i) = θ 1 . WLOG Assume that this holds, so that we saturate equation (46). If Tr(ρ 2 M i ) = 0, then, as M i ≥ 0, every eigenvector of M i with non-zero eigenvalue must lie in K and thus i∈I M i ≤ Π K . Thus, R(θ M , θ 1 ) ≥ R(θ E , θ 2 ) with equality iff. ρ 1 Π K − i∈I M i = 0. Since ρ 1 does not kill any vector in K (as we restricted to U ⊥ ) we get equality iff. i∈I M i = Π K . Thus for i / ∈ I and |k ∈ K, M i |k = 0. Now, for p ∈ (0, 1) fix a prior on Θ where we are given ρ 1 with probability p and ρ 2 with probability 1 − p.
Taking Λ corresponding to M by Lemma B.1, we see that Λ decomposes as Λ in order for our estimator to be Bayes, we must always guess θ 1 in the case of a K outcome i.e. Λ K = θ 1 Π K . Take the inner product of condition (ii) of lemma B.2 with |k ∈ K and | ∈ K ⊥ . This gives Note that as p → 0, we must have Λ K ⊥ → Π K ⊥ θ 2 , as our estimates must approach θ 2 for our estimator to be Bayes. Thus the only way for equation (47) to be satisfied for all p ∈ (0, 1) is for | ρ 1 |k = 0. Thus ρ 1 fixes K and we may decompose it as This essentially reduces the Hilbert space to K ⊥ , on which ρ 2 has full rank. We may then repeat the above to assume ρ 1 has full rank (note that, as we restricted to U ⊥ , ker(ρ 1 ) ≤ K ⊥ . By Lemmas B.3 and B.4, ρ 1 and ρ 2 commute on their joint support. But as they fix each other's kernels and are Hermitian, they must, therefore, fully commute.
Lemma B.6. Suppose ρ(θ) is some single parameter state (Θ ⊆ R). If there is an optimal measurement M under least-squares loss, then ρ(θ) is a classical state problem.
We remark that the the condition of convexity in Theorem 3.2 can be weakened. By Lemma B.2, we see that the optimal measurement must be a fine-graining of the projection onto ρ's joint eigenspaces. But then for any two distinct sets on which ρ is classical, we see that it must be classical on the union of these sets too.
Then the result holds for Θ a disjoint union of convex sets too. Moreover, if Θ is open, around any θ ∈ Θ we can fit a convex set in which, ρ must be classical. But then by the same reasoning as before, we see that ρ must be classical on the whole of Θ. We do not prove this result in full detail, as most parameter spaces of interest are convex.

C Approximately Classical Implies an Approximately Optimal Measurement
In this section we provide the remaining two proofs for each of the "close" to classical implies "close" to optimal results from Section 4. It will be useful to slightly extend our notation for risk functions to include a label for the state we are considering, i.e. we write R ρ (θ M , θ). We start with the first result: Proof. Since σ(θ) is classical, we can fix some optimal measurement M . Fix some measurement-estimator pair (F,θ F ) and θ ∈ Θ. Then note that Fix some F i ≥ 0. Diagonalising ρ(θ) − σ(θ) = j λ j |j j|, note that = Tr(F i |ρ(θ) − σ(θ)|). (53) Substituting inequality (53) into (51), we see that Since M is optimal on σ(θ) there existsθ M such that for all θ, R σ (θ M , θ) ≤ R σ (θ F , θ). Then applying (54) twice, we see For the multiplicative error, we will make use of the following property of maximum relative entropy, Proof. Fix some measurement F , note that where we have used Lemma C.1. But then fixing an optimal measurement for σ and using the inequality (60) twice, the result follows.

D Admissibility of Measurements
The aim of this section is to prove that the two classes of measurements discussed in Section 5 are inadmissible.
To begin, we must prove a series of technical results about Bregman divergences and Bayesian estimation (see Appendix A).There are two properties of Bregrman divergences that we will need, stated below in Lemma D.1. We will not prove them, instead referring the reader to [17].
Lemma D.1. Let L a Bregman divergence. Then (i) L is strictly convex in its first argument.
(ii) For any prior π on Θ, the Bayes estimatorθ B is unique and is given by the posterior mean θ B (x) = E π [θ|outcome x].
Next we prove some technical lemmas to do with Bayesian estimation.
The second of these conditions is to ensure that outcome one is possible at θ 1 and θ 2 . Otherwise, the post measurement state, as defined above, is not welldefined. We can now prove the first result on inadmissibility from the main text.
Lemma 5.1. Let Θ ⊆ R N , L be some Bregman divergence loss function on Θ and ρ(θ) be some non-constant parametrised state. Suppose that F = {F † i F i } is a refineable measurement, then F is inadmissible.