Operational meanings of a generalized conditional expectation in quantum metrology

A unifying formalism of generalized conditional expectations (GCEs) for quantum mechanics has recently emerged, but its physical implications regarding the retrodiction of a quantum observable remain controversial. To address the controversy, here I offer operational meanings for a version of the GCEs in the context of quantum parameter estimation. When a quantum sensor is corrupted by decoherence, the GCE is found to relate the operator-valued optimal estimators before and after the decoherence. Furthermore, the error increase, or regret, caused by the decoherence is shown to be equal to a divergence between the two estimators. The real weak value as a special case of the GCE plays the same role in suboptimal estimation -- its divergence from the optimal estimator is precisely the regret for not using the optimal measurement. For an application of the GCE, I show that it enables the use of dynamic programming for designing a controller that minimizes the estimation error. For the frequentist setting, I show that the GCE leads to a quantum Rao-Blackwell theorem, which offers significant implications for quantum metrology and thermal-light sensing in particular. These results give the GCE and the associated divergence a natural, useful, and incontrovertible role in quantum decision and control theory.


Introduction
The conditional expectation is an essential concept in classical probability and statistics [1].Given some observed data in an experiment, the conditional expectation of a hidden random variable is the best approximation of the hidden variable in a least-squares sense and thus plays a central role in Bayesian estimation theory [1,2].Another important application is in the Rao-Blackwell theorem [3,4], which exploits the variance reduction property of the conditional expectation to improve an estimator and has found widespread uses in classical statistics [5,6,7].
Many attempts have been made over the past few decades to generalize the concept of conditional expectation for quantum mechanics [8,9,10,11,12,13,14,15,16,17,18].Umegaki's version for von Neumann algebra may be the earliest [8].His axiomatic definition is so restrictive, however, that his conditional expectation does not exist in many situations [19,12]; this existence problem has led Holevo to remark that "conditional expectations play a less important part in quantum than in classical probability" [19].In quantum estimation theory, Personick [9] and Belavkin and Grishanin [10] proposed an operatorvalued estimator that is optimal for Bayesian parameter estimation and can also be regarded as a quantum conditional expectation.On the other hand, Accardi and Cecchini proposed yet another conditional expectation for von Neumann algebra [11], which became instrumental in Petz's work on quantum sufficient channels [12].Many other investigations of quantum conditional expectations can be found in the literature on weak values [13,14], quantum filtering [20,21,22], quantum retrodiction [23,24], and quantum smoothing [25,26,27,28,29,30,16,17,18].In recent years, it has been recognized [16,17,18] that many of these quantum conditional expectations can be unified under a mathematical formalism of generalized conditional expectations (GCEs) [15].The GCE formalism can also be rigorously connected to the concepts of quantum states over time and generalized Bayes rules [31,32], as shown by Parzygnat and Fullwood [33].
Despite the mathematical progress, the GCEs have provoked fierce debates regarding their physical meaning and usefulness, especially when it comes to the weak values [34,35,36,37,38,39,40].The debates center on two issues: whether it makes any sense to estimate the value of a quantum observable in the past (retrodiction) and whether the GCEs offer any use in quantum metrology, where quantum sensors are used to estimate classical parameters.This work addresses both issues by demonstrating how a certain version of the GCEs-of which the real weak value is a special case-can play fundamental roles in quantum parameter estimation in both Bayesian and frequentist settings.
When a quantum sensor suffers from decoherence, I show that the GCE relates the two Personick estimators before and after the decoherence.Moreover, the error increase due to the decoherence, henceforth called the regret, is shown to be equal to a divergence measure between the two estimators.By regarding a suboptimal measurement as a decoherence process, I show that the weak value is a special case of the GCE and its divergence from the Personick estimator is precisely the regret due to the measurement suboptimality.For the frequentist setting, I also propose a quantum Rao-Blackwell theorem based on the GCE.
These fundamental results lead to many significant consequences in quantum metrology.To wit, the Markovian nature of the GCE is shown to enable the use of dynamic programming [41] for optimizing a measurement protocol, while Corollaries 1-6 in this work reveal the monotonicity of the Bayesian error, the optimality of von Neumann measurements in Bayesian and frequentist settings, the optimality of symmetric estimators for symmetric states, the optimality of direct-sum estimators for direct-sum states, and the optimality of photon counting for certain thermal states.A key feature of these optimality results is that they are direct statements about the mean-square errors and are valid for both biased and unbiased estimators, unlike many results based on Cramér-Rao-type bounds, which require heavy assumptions about the estimators and the density operators.
This paper is organized as follows.To set the stage and make the paper self-contained, Sec. 2 reviews the concept of GCEs, emphasizing their significance in minimizing a divergence quantity between two operators at different times [18].Section 3 presents some fundamental properties of the GCEs that are key to their applications to quantum metrology, including a chain rule (Theorem 1) that gives the GCEs a Markovian property for a sequence of channels and a Pythagorean theorem (Theorem 2) that gives the divergence an additive property.Sections 4 and 5 present the core results of this work, namely, the applications of a version of the GCEs to quantum parameter estimation.This GCE follows a particular operator ordering based on the Jordan product and is shown to play a natural role in quantum estimation theory.
Section 4 studies the role of the GCE in Bayesian quantum parameter estimation, a topic that has received renewed interest in recent years [42,43,44].Within Sec. 4, Sec.4.1 presents the general relations between the Personick estimators for a sensor under decoherence, Sec.4.2 shows how they enable the use of dynamic programming in quantum sensor measurement design, and Sec.4.3 discusses the special case of the real weak value.
Section 5 switches to the frequentist setting and presents the quantum Rao-Blackwell theorem, Theorem 3, in Sec.5.1.Sections 5.2-5.4present some significant consequences of the quantum theorem for quantum metrology, while Sec.5.5 discusses an application of the theorem to thermal-light sensing.
Section 6 is the conclusion, listing some open problems.Appendix A discusses the complementary concept of quantum prediction.Appendix B reviews the classical conditional expectation to give the quantum formalism a more familiar context.Appendix C gives an explicit formula for the GCE for Gaussian systems.Appendix D defines the von Neumann measurement.Appendix E presents the dynamic-programming algorithm.Appendix F justifies the name of Theorem 3 by deriving the classical Rao-Blackwell theorem from it.Appendix G discusses the differences and relations between the Bayesian and frequentist settings.Appendix H compares this work with some prior works.Appendix I offers an alternative derivation of the quantum U-statistics, first introduced by Gut ¸ȃ and Butucea [45], using Theorem 3. Appendix J contains the more technical proofs.

Review of generalized conditional expectations
This section follows Ref. [18] and Chap.6 in Ref. [15].Let O(H) be the space of bounded operators on a Hilbert space H and ρ ∈ O(H) be a density operator.Define an inner product between two operators A, B ∈ O(H) and a norm as ⟨B, A⟩ ρ ≡ tr B † E ρ A, ∥A∥ ρ ≡ ⟨A, A⟩ ρ , (2.1) where E ρ : O(H) → O(H) is a linear, self-adjoint, and positive-semidefinite map with respect to the Hilbert-Schmidt inner product ⟨B, A⟩ HS ≡ tr B † A. (2. 2) The weighted inner product ⟨•, •⟩ ρ is a generalization of the inner product between two random variables in classical probability theory [1].Some desirable properties of E are 1 ) where A is any operator on H, U is any unitary operator on H, ρ j is any density operator on H j , H j is any Hilbert space, A j is any operator on H j , I j is the 1 Equation (2.5) without the commutation condition is proposed in Ref. [15] and repeated in Ref. [18], but it turns out to be false for many operator products, including the Jordan product, as pointed out by Ref. [33].The results in Ref. [18] remain correct if the commutation condition in Eq. (2.5) is imposed.
identity operator on H j , ρ in Eq. (2.6) is any density operator on H 1 ⊗ H 2 , and tr j denotes the partial trace with respect to H j .Examples of E that satisfy Eqs.(2.3)-(2.6)include (2.9) In the following, I fix E to be a map that satisfies Eqs.(2.3)-(2.6).
Let L 2 (ρ) be the completion of O(H) with respect to the norm ∥•∥ ρ , such that it becomes a weighted Hilbert space for the operators.Each element of L 2 (ρ) is then an equivalence class of operators with zero distance between them.If H is infinite-dimensional, O(H) may not be complete and L 2 (ρ) may include unbounded operators as well [46].The infinite-dimensional case is much more complicated to treat with rigor, so I consider only finitedimensional Hilbert spaces in the following for simplicity, and assume that the results still hold for a couple of the infinite-dimensional problems studied later in Appendix C and Sec.5.5.

Definition 1.
Let σ be a density operator on H 1 and F : O(H 1 ) → O(H 2 ) be a completely positive, trace preserving (CPTP) map that models a quantum channel.Then the divergence between an operator A ∈ L 2 (σ) and another operator B ∈ L 2 (Fσ) is defined as where Re denotes the real part and F * denotes the Hilbert-Schmidt adjoint of F. This divergence, introduced in Ref. [18], can be related to the more usual definition of distance in a larger Hilbert space by considering the Stinespring representation where τ is a density operator on H 2 ⊗ H 0 , H 0 is some auxiliary Hilbert space, U is a unitary operator on H 1 ⊗ H 2 ⊗ H 0 that models the evolution from time t to time T ≥ t, and tr 10 is the partial trace over H 1 and H 0 .Let ρ = σ ⊗ τ and define the Heisenberg pictures of A and B as Then it can be shown that More explicitly, F * A is an equivalence class of operators that satisfy Equation (2.15) can be derived by assuming the ansatz B = F * A + ϵc with ϵ ∈ R, c ∈ L 2 (Fσ), and minimizing D with respect to ϵ.Given an A, the existence and uniqueness of F * A as an element of L 2 (Fσ) can be proved by viewing Eq. (2.15) as a linear functional of c and applying the Riesz representation theorem [47].Equation (2.16) can also be derived independently from a state-over-time formalism [33].With the GCE, the minimum divergence becomes (2.17) Note that the GCE map F * depends implicitly on the E map and the prior state σ; the choice of E and σ should be clear from the context in the following and, when necessary, σ is stated explicitly in the superscript as F σ * .Note also that Chap.6 in Ref. [15] writes F σ * as F σ,x , where x denotes the E map being used, while Ref. [18] writes F σ * as F σ .Appendix A presents more interesting formulas concerning F * and F * that justify the new notations, while Appendix B presents a brief and elementary review of the classical conditional expectation to give the quantum formalism a more familiar context.Some examples are in order.Consider the unitary channel where U is a unitary operator on H 1 .A solution to any GCE is ) is called the Heisenberg representation in quantum computing [48], and the GCEs can be regarded as generalizations of the Heisenberg representation for open systems.
With the root product given by Eq. (2.9), the GCE becomes the Accardi-Cecchini GCE [11,12], and its Hilbert-Schmidt adjoint (F * ) * is known as the Petz recovery map, which is useful in quantum information theory [49].Appendix C presents another example where σ is a Gaussian state, F is a Gaussian channel [50], and A is a quadrature operator.Then the GCE in terms of the Jordan product given by Eq. (2.7) and the associated divergence turn out to have the same formulas as the classical conditional expectation and its mean-square error for the usual linear Gaussian model [51].
Then the GCE of the composite map GF is given by which can be abbreviated as In other words, the GCE for a chain of CPTP maps is given by a chain of the GCEs associated with the individual CPTP maps.
Theorem 2 (Pythagorean theorem).Given the two CPTP maps F and G, the minimum divergences obey Proof.Use Eq. (2.17) and Theorem 1.
Figure 1 offers some diagrams that illustrate the theorems.
Before moving on, I list two more properties of the GCEs-their physical significance for generalizing the Rao-Blackwell theorem [5] will be explained in Sec. 5.

Lemma 1 (Law of total expectation). For any
Lemma 2. Let a be any complex number.Then See Appendix J for the proofs of Lemmas 1 and 2.
A map F * that satisfies Lemma 1 is also called a coarse graining [12].Whereas Petz's definition requires a coarse graining to be completely positive, the 2 I follow Ref. [11] to call this property a chain rule.Note that Ref. [15] calls it associativity, while Refs.[33,52]   GCEs here need not be.If a in Lemma 2 is set as the mean given by Eq. (3.4), then Lemma 2 says that the generalized variance of F * A given by The mathematics of GCEs would be uncontroversial if not for its physical implication: By defining a divergence between two operators at different times, a retrodiction of a hidden quantum observable A can be given a risk measure and therefore a meaning in the spirit of decision theory [2].In other words, after a channel F is applied, one can seek an observable B that is the closest to A if the divergence is regarded as a squared distance, and F * A is the answer.It remains an open and reasonable question, however, why the divergence between two operators is an important quantity.If A t at time t does not commute with B T at a later time in the Heisenberg picture, where A t and B T are defined by Eqs.(2.12), then Belavkin's nondemolition principle for their simultaneous measurability is violated [20,21,39], no classical observer can access the precise values of both, and the divergence does not seem to have any obvious meaning to the classical world.Notably, Gough claims in Ref. [39] that a retrodiction that violates the nondemolition principle is "misapplying Bayes theorem," "not possible," and "unwarranted."Reference [38], the preprint version of Ref. [39], goes even further in claiming that someone who does not follow the principle may obtain "wholly meaningless" answers and is "in a state of sin."In Ref. [40], James also claims that the principle should be observed for a quantum conditional expectation to make sense.To show that a retrodiction can make sense beyond the nondemolition principle, the next sections offer natural scenarios in quantum metrology that will give operational meanings to a GCE and the associated divergence.
4 Bayesian quantum parameter estimation

General results
Consider the typical setup of Bayesian quantum parameter estimation [9] depicted in Fig. 2(a).Let X be a hidden classical random parameter with a countable parameter space X and a prior probability distribution P X : X → [0, 1].A quantum sensor is coupled to X, such that its density operator conditioned on X = x is ρ x ∈ O(H 2 ).A classical observer measures the quantum sensor, as modeled by a positive operator-valued measure (POVM) M : Σ Y → O(H 2 ) on a Borel space (Y, Σ Y ), where Σ Y is the Borel sigma-algebra of Y [46].The observer uses the outcome y ∈ Y to estimate the value of a real random variable a : X → R. The problem can be framed in the GCE formalism by writing where {|x⟩ : x ∈ X } is an orthonormal basis of H 1 and the classical random variable a(X) is framed as the hidden observable A discussed in Secs. 2 and 3. F here is called a classical-quantum channel and has a natural generalization in the infinite-dimensional case [50].In the following, I consider only Hermitian operators (observables) and assume E to be the Jordan product given by Eq. (2.7), such that all the operator Hilbert spaces are real, the equalities in Eqs.(2.6) and (2.13) hold, and the GCE is in fact a projection in the larger Hilbert space [15].
Suppose that a von Neumann measurement of an observable B on H 2 is performed, as defined in Appendix D, and the outcome is used as the estimator.I call such a B an operator-valued estimator.The mean-square estimation error averaged over the prior is given by where Π is the projection-valued measure of B. Equation (4.3) is precisely the divergence in Definition 1.According to the seminal work of Personick [9], the optimal operator-valued estimator is the GCE F * A, and the minimum error, hereafter called the Bayesian error, is D σ,F (A, F * A).It can also be shown that the von Neumann measurement of F * A remains optimal even if POVMs are considered (see Sec. VIII 1(d) in Ref. [53], Appendix A in Ref. [43], or Corollary 2 below).Now suppose that a complication occurs in the experiment, as depicted by Fig. 2(b): Before the measurement can be performed, the sensor is further corrupted by decoherence, as modeled by another CPTP map G.The error of an operator-valued estimator B ′ is now where Π ′ is the projection-valued measure of B ′ .The Personick estimator after G is then B ′ = (GF) * A, and the Bayesian error becomes D σ,GF (A, (GF) * A).A fundamental fact is as follows.
Proof.Use Theorem 2 and the nonnegativity of D.
The scenario so far is standard and uncontroversial, as A is effectively a classical random variable.Mathematically, A t and (F * A) T in the Heisenberg picture commute (see Sec. IV F in Ref. [18]) and thus satisfy the nondemolition principle; so do A t and [(GF) * A] T .Physically, the principle implies that another classical observer can, in theory, access the precise value of A in each trial, the estimates can be compared with the true values by the classical observers after the trials, and D is their expected error.The monotonicity given by Corollary 1 is a noteworthy result, but unsurprising.
More can be said about the error increase, hereafter called the regret (to borrow a term from decision theory [2]).First of all, the chain rule in Theorem 1 gives an operational meaning to the GCE G * as the map that relates the intermediate Personick estimator F * A to the final (GF) * A = G * F * A. In other words, the final Personick estimator is equivalent to a retrodiction of the intermediate F * A, which is a quantum observable.Second, the Pythagorean theorem in Theorem 2 means that the regret caused by the decoherence is precisely the divergence between the intermediate and final estimators: The two divergences on the left-hand side have a firm decision-theoretic meaning as estimation errors because A is classical.It follows that, even though the divergence on the right-hand side is between two quantum observables, it also has a firm decisiontheoretic meaning as the regret-for being unable to perform the optimal measurement and having to suffer from the decoherence.As the regret concerns the performances of the two estimators in separate experiments, the estimators need not obey the nondemolition principle, which is a condition on two observables to be simultaneously measurable in the same experiment.I stress that the regret is not a contrived concept invented here solely to give an operational meaning to the divergence-its classical version is an established concept in information theory and Bayesian learning [54,55,56,57].

Dynamic programming
When the decoherence is modeled by a chain of CPTP maps G = F N . . .F 2 , the final error is the sum of all the incremental regrets along the way, viz., ) where F 1 = F for the parameter estimation problem, so even the error at the first step D 1 = D σ,F (A, F * A) can be regarded as a regret.Every D n , bar D 1 , is a divergence between a quantum observable A n and its estimator A n+1 that may not commute in the Heisenberg picture.Suppose that the experimenter can choose the channels (F 1 , . . ., F N ) from a set of options and would like to find the optimal choice that minimizes the final error.One example is the use of a programmable photonic circuit [58] to measure light for sensing or imaging.The Markovian nature of Eqs.(4.10)-(4.11)and the additive nature of the final error given by Eq. (4.7)-which originate from Theorems 1 and 2are precisely the conditions that make this optimal control problem amenable to dynamic programming [41], an algorithm that can reduce the computational complexity substantially [59].To be specific, let the system state (in the context of control theory) at time n be s n ≡ (σ n , A n ).Then Eqs.(4.7)-(4.11)imply that the state dynamics and the final error can be expressed as in terms of some functions f and g.Equations (4.12) and (4.13) are now in the form of a Markov decision process that is amenable to dynamic programming for computing the optimal maps (F 1 , . . ., F N ) among the set of options to minimize the final error [41]; Appendix E describes the algorithm for the reader's information.As dynamic programming is a cornerstone of control theory, there exist a plethora of exact or approximate methods to implement it, such as neural networks under the guise of reinforcement learning [60].

Weak value
To elaborate on the operational meaning for the weak value, which is a GCE of a quantum observable given a prior state and a measurement outcome [16,17,14,18], let us return to the scenario depicted by Fig. 2(a).Suppose, for mathematical simplicity, that the outcome space Y is countable.The measurement can be framed as a G map given by where {|y⟩ : y ∈ Y} is an orthonormal basis of H 3 and M is the POVM of a measurement that may not be optimal.An estimator b : Y → R as a function of the measurement outcome can be framed as the observable The GCE then leads to the optimal estimator tr M (y)Fσ . (4.17) The last expression in Eq. (4.17) is precisely the definition of the real weak value of F * A given a prior state Fσ and a measurement outcome y (see, for example, Eq. (3.13) in Ref. [61], Eq. ( 10) in Ref. [62], or Eq. ( 5) in Ref. [14]).Moreover, the divergence between the ideal F * A and the B associated with the weak value is precisely the regret caused by the suboptimality of the measurement M , as per Theorem 2. Hence, regardless of how anomalous the weak value may seem, it does have an operational role in parameter estimation, and its divergence from the ideal F * A has a concrete decision-theoretic meaning as the regret for not using the optimal measurement.
The preceding discussion also serves as a rough proof of the following corollary, which is proved by different methods in Sec.VIII 1(d) of Ref. [53] and Appendix A of Ref. [43].

Corollary 2. No POVM can improve upon the Bayesian error D σ,F (A, F * A) achieved by a von Neumann measurement of
Corollary 2 may be regarded as a consequence of monotonicity, since any measurement with a countable set of outcomes can be framed as a CPTP G map given by Eq. (4.14), and by Corollary 1, the error cannot decrease.A POVM with a more general outcome space can still be framed as a quantum-classical channel; see, for example, Theorem 2 in Ref. [63], but it requires a mathematical framework far more complex than what is necessary for this work.An easier proof for general POVMs, to be presented in Appendix J, is to use a later result in Sec. 5.
Note that the optimality of the weak value here does not contradict Ref. [34], which shows that weakvalue amplification, a procedure that involves postselection (i.e., discarding some of the outcomes), is suboptimal for metrology.Here, the weak value given by Eq. (4.17) is used directly as an estimator with any measurement outcome, and no postselection is involved.Note also that the optimality is in the specific context of finding the best estimator after a given measurement; it does not mean that any measurement method that is heuristically inspired by the weakvalue concept, such as weak-value amplification, can be optimal.In fact, by virtue of Corollary 2, such methods can never outperform the optimal von Neumann measurement.
5 A quantum Rao-Blackwell theorem

General result
In classical frequentist statistics, the Rao-Blackwell theorem is among the most useful applications of the conditional expectation [5,6,7].Here I outline a quantum generalization.Suppose that the quantum sensor is modeled by a family of density operators {ρ , where the unknown parameter x is now deterministic and there is no longer any need to assume a countable parameter space X .A parameter of interest a : X → R is to be estimated by a Hermitian operator-valued estimator B ∈ L 2 (ρ x ), which need not be unbiased or optimal in any sense.The local mean-square error (MSE) upon a von Neumann measurement of B, as a function of x ∈ X and without being averaged over any prior, is given by where Π is the projection-valued measure of B and the Jordan product is again assumed for E. Without the unbiased condition tr ρ x B = a(x), the quantum Cramér-Rao bounds commonly used in quantum metrology [53,15] do not apply to MSE x .There is no longer any simple optimality criterion for an estimator in this general setting, but one can still construct a partial order of preference between estimators.Following classical statistics (see p. 48 in Ref. [5]), I say that an operator-valued estimator B ′ dominates another estimator B if the error MSE ′ x of the former never exceeds the error MSE x of the latter for all x ∈ X and can go strictly lower for some x.I call an estimator admissible if it is dominated by none.
In classical frequentist statistics, there can be many admissible estimators for one problem with no clear winner among them, and it may be hard to even prove that a given estimator is admissible.The Rao-Blackwell theorem is then a valuable tool for improving estimators or for proofs regarding admissibility.A quantum version of the theorem can be similarly useful for quantum estimation problems.
To state the quantum theorem, suppose that the quantum sensor goes through a channel modeled by a CPTP map G : O(H 2 ) → O(H 3 ) and the GCE G * B in terms of ρ x and the Jordan product is used as an estimator.The error becomes Lemma 2 can now be used to prove the following.
Theorem 3 (Quantum Rao-Blackwell theorem).Let {ρ x : x ∈ X } be a family of density operators, a : X → R be an unknown parameter, B be a Hermitian operator-valued estimator, and MSE x be the local error at x ∈ X .If a channel G is applied and the GCE G * B in terms of ρ x and the Jordan product does not depend on x, then the error MSE ′ x of G * B as an estimator is lower by the amount Proof.Subtract Eq. (5.2) from Eq. (5.1) and apply Lemma 2.
For G to be realizable and G * B to be a valid estimator, both cannot depend on the unknown x.When there are many operator solutions to G * B that satisfy Definition 2, any of the solutions can be the estimator in Theorem 3 as long as it does not depend on x.
To demonstrate that Theorem 3 is indeed a quantum generalization of the Rao-Blackwell theorem, Appendix F derives the classical theorem from Theorem 3. As also shown in Appendix F, a parameter-independent conditional expectation in classical statistics can be obtained by conditioning on a sufficient statistic.The conditional expectation can then be used to improve an estimator in a process called Rao-Blackwellization [5].Roughly speaking, Rao-Blackwellization works by averaging the estimator with respect to unnecessary parts of the data, thereby reducing its variance.A quantum Rao-Blackwellization, enabled by Theorem 3, can be similarly useful for improving a quantum measurement if one can find a channel G that satisfies the constant GCE condition and gives a large divergence between B and G * B. As long as D ρx,G (B, G * B) > 0 for some x, the Rao-Blackwell estimator G * B dominates the original estimator B. The improvement stems from two basic facts about the GCE: G * B maintains the same bias as that of B by virtue of Lemma 1, while the variance of G * B cannot exceed that of B by virtue of Lemma 2. Roughly speaking, the quantum Rao-Blackwell theorem works in the same way as the classical case by averaging the estimator with respect to unnecessary degrees of freedom via the GCE, thereby reducing its variance.
For the confused readers who wonder how a channel increases the error in the Bayesian setting because of monotonicity but reduces the error in the frequentist setting because of the Rao-Blackwell theorem, Appendix G offers a clarification.
It is noteworthy that Sinha also proposed some quantum Rao-Blackwell theorems recently [64], although his versions impose stringent conditions on the commutativity of the operators.Another relevant prior work is Ref. [65] by Luczak, which studies a concept of sufficiency in von Neumann algebra for minimum-variance unbiased estimation but also makes some stringent assumptions.These prior works, while seminal and mathematically impressive, have questionable relevance to quantum metrology and are discussed in more detail in Appendix H.
Given the close relation between the Rao-Blackwell theorem and the concept of sufficient statistics in the classical case, it is natural to wonder if a similar relation exists between the quantum Rao-Blackwell theorem here and the concept of sufficient channels defined by Petz [12].One equivalent condition for a channel G to be sufficient in Petz's definition is that the Accardi-Cecchini GCE G * in terms of the root product given by Eq. (2.9) does not depend on x.The GCE here, on the other hand, is in terms of the Jordan product so that it can be related to the parameter estimation error.The relation between Petz's sufficiency and the constant GCE condition desired here is thus nontrivial.
A trivial example that makes any GCE constant and the channel sufficient in any sense is the unitary channel given by Eqs.(2.18) and (2.19), as long as the unitary operator there does not depend on x.Applying Theorem 3 to the unitary channel gives no error reduction, however.In the following, I offer more useful examples that both satisfy Petz's sufficiency and give the desired constant GCE condition.

A sufficient channel for tensor-product states
Lemma 3. Let where σ x is a density operator on H 1 and τ is an auxiliary density operator on H 0 .A solution to any GCE is which does not depend on x if τ does not.
See Appendix J for the proof.
A sufficient channel may be understood intuitively as a channel that retains all information in the quantum sensor about x.Then it makes sense that the channel in Lemma 3 is sufficient, as it simply amounts to discarding an independent ancilla that carries no information about x.A significant implication of the lemma is a more general version of Corollary 2 for the local error as follows.
where σ x and τ are defined in Lemma 3 and Π ′ is a projection-valued measure on H 1 ⊗ H 0 .As shown in Appendix D, the measurement and the data processing by b can be framed as a von Neumann measurement of B = b(y)Π ′ (dy) on the larger Hilbert space, such that its error MSE x with respect to ρ x = σ x ⊗ τ is given by Eq. (5.1).Now assume the channel in Lemma 3. A solution to G * B is given by Eq. (5.5), which does not depend on x.It follows from Theorem 3 that the error MSE ′ x achieved by a von Neumann measurement of G * B is at least as good as MSE x for all x ∈ X .
Note that Corollary 3 is more general than Corollary 2, since the former applies to the local errors for all parameter values, not just the average errors in the Bayesian case.A proof of Corollary 2 using Corollary 3 is presented in Appendix J.
The corollaries imply that, in seeking an admissible estimator for estimating a real scalar parameter under a mean-square-error criterion, it is sufficient to consider only von Neumann measurements, and randomization via an independent ancilla is not helpful in both Bayesian and frequentist settings.For example, optical amplification has been suggested to improve astronomical measurements [66], but since optical amplification must involve an independent ancilla [67,68], Corollary 3 implies that there always exists a von Neumann measurement that performs at least as well.
The corollaries are reminiscent of a well known result saying that a von Neumann measurement of the so-called symmetric logarithmic derivative (SLD) operator can saturate the quantum Cramér-Rao bound (see Sec. 6.4 in Ref. [15]).Note, however, that the bound assumes unbiased estimators and the differentiability of ρ x , while the SLD measurement may be a function of the unknown parameter and thus unrealizable.The corollaries here, on the other hand, are much more general and conclusive, as they apply to arbitrary estimators and arbitrary families of density operators, while the von Neumann measurements they offer are all parameter-independent.
Of course, one is often forced to use an ancilla in practice, such as the optical probe in atomic metrology [69] or optomechanics [70].Then the divergence offers a measure of regret in both Bayesian and frequentist settings through Theorems 2 and 3. Consider atomic metrology for an example [69].Let ρ x be the parameter-dependent density operator of the atoms on H 1 and τ be the initial state of an optical probe on H 0 .Then the state after the optical probing can be expressed as U (ρ x ⊗ τ )U † , where U is a unitary operator on H 1 ⊗ H 0 that models the atom-light interaction.If an optical measurement, modeled by a projection-valued measure Π 0 on H 0 , is performed and the estimator in terms of the outcome y is b(y), then the B observable in Lemma 3 and Corollary 3 can be expressed as and MSE x is the error of this indirect measurement of the atoms.The GCE G * B, on the other hand, is an atomic observable on H 1 , and MSE ′ x is the error of the direct atomic measurement of x can be regarded as the regret due to the indirectness of the optical measurement.The Bayesian setting can be studied similarly.

A sufficient channel for symmetric states
Let {U z : z ∈ Z} be a set of unitary operators on H 2 , and suppose that ρ x is invariant to all of them, viz., Examples include the symmetric states that are invariant to any permutation of a tensor-powered Hilbert space-to be discussed later-and optical states with random phases that are invariant to any phase modulation.ρ x is also invariant to the random unitary channel for any probability measure µ on (Z, Σ Z ).G is then a sufficient channel in Petz's sense, since another equivalent condition for Petz's sufficiency is the existence of an x-independent CPTP map that recovers ρ x from Gρ x [12].It is straightforward to compute the GCEs.
Lemma 4. Given Eqs.(5.8) and (5.9), a solution to any GCE is which does not depend on x if {U z } and µ do not.
See Appendix J for the proof.
Corollary 4. Given a family of states that are invariant to a set of unitaries {U z }, any estimator B ∈ L 2 (ρ x ), and the resulting local error MSE x , there exists an averaged estimator given by Eq. (5.10) that performs at least as well as B for all x ∈ X .
If Z is a group and {U z } is a projective unitary representation of the group that satisfies U z ′ U z = ω(z ′ , z)U z ′ z for a complex scalar ω with |ω| = 1 [46], then the left Haar measure μ on the group [1] plays a special role, as the GCE with respect to it, written as is invariant to any subsequent GCE for any random unitary channel, in the sense that for any µ.The left Haar measure is thus the ultimate choice that gives the highest error reduction in the context of Corollary 4. For a concrete example, let H 2 = H ⊗n 1 , π ∈ S n be a permutation function of (1, . . ., n), and S n be the permutation group.Define each unitary by [71] for any {|ψ j ⟩ ∈ H 1 : j = 1, . . ., n}.An operator invariant to all the permutation unitaries is called symmetric.Physically, a symmetric density operator corresponds to n indistinguishable systems.A common example is ρ x = σ ⊗n x , where σ x is a density operator on H 1 .The Haar measure is simply μ(π) = 1/n!, and the corresponding GCE is which is a symmetrization.Furthermore, if one assumes which lowers the variance of B by a factor of n if ρ x = σ ⊗n x .The derivation of the classical U-statistics by Rao-Blackwellization is well known [72], and Corollary 4 is indeed the appropriate quantum generalization.

A sufficient channel for direct-sum states
Suppose now that {ρ x : x ∈ X } is a family of density operators on a direct sum of Hilbert spaces given by (5.17) and each ρ x is given by the direct sum where each σ is a positive-semidefinite operator on H n .A prominent example in optics is the multimode thermal state, which will be discussed in Sec.5.5.Let Π n : H → H n be the projection operator onto H n .Suppose that the Hilbert-space decomposition given by Eq. (5.17) is parameter-independent, such that all {Π n : n ∈ N } do not depend on x.Then the channel is sufficient in Petz's sense.To compute the GCEs with respect to Eqs. (5.18) and (5.19),I impose two more properties on the E map given by for any , and any density operator on H 1 ⊕H 2 in the form of σ (1) ⊕σ (2) .These properties are satisfied by the products given by Eqs.(2.7)-(2.9)at least.Then the GCE has the following solution.
Lemma 5. Given Eqs.(5.18) and ( 5. 19) and assuming a GCE in terms of an E map that satisfies Eqs.(5.20) and (5.21), a solution to the GCE is which does not depend on x if the projectors {Π n } do not.
See Appendix J for the proof.The quantum Rao-Blackwell theorem can now be applied to Eqs. (5.18) and (5.19) to prove the following.
Corollary 5. Assume that the Hilbert space can be decomposed as Eq.(5.17) and the projectors {Π n : H → H n } do not depend on the unknown parameter x.Given a density-operator family in the form of a direct sum as per Eq.(5.18), any estimator B ∈ L 2 (ρ x ), and the resulting local error MSE x , there exists an estimator G * B given by Eq. (5.22), also in the form of a direct sum, that performs at least as well as B for all x ∈ X .Proof.Use Lemma 5 and Theorem 3.
An example in optics is now in order.

Thermal-light sensing
A multimode thermal optical state can be expressed as [73] where |α⟩ is a coherent state, âj is the annihilation operator for the jth mode, d 2J α ≡ J j=1 d(Re α j )d(Im α j ), and Γ x is the positive-definite mutual coherence matrix.In thermal-light sensing and imaging problems [74,75,76,77,78,79], Γ x is assumed to depend on the unknown parameter x.
Let H n be the n-photon Hilbert space.Define a pure Fock state with photon numbers m = (m 1 , . . ., m J ) ∈ N J 0 as where |0⟩ denotes the vacuum state.Let ∥m∥ ≡ j m j be the total photon number.Then {|m⟩ : ∥m∥ = n} is an orthonormal basis of H n .In terms of the Fock basis, each matrix element of ρ x is given by (5.27) The Gaussian moment theorem (see Eq. (1.6-33) in Ref. [73]) implies that meaning that ρ x can be decomposed in the direct-sum form as where each σ is an operator on H n .Then tr σ is the probability of having n photons in total and σ is the conditional n-photon state.The projectors can be written as (5.31) Ignoring the mathematical complications due to the infinite-dimensional Hilbert space, Corollary 5 can now be applied to Eq. (5.29).
If an estimator is constructed from a photoncounting measurement with respect to any set of optical modes, it can be expressed in a Fock basis, which commutes with all the projectors {Π n }.It follows that the estimator is already in the direct-sum form given by Eq. (5.22) and Corollary 5 offers no improvement.On the other hand, notice that Eq. (5.22) must commute with each projector Π n , viz., (5.32) If an estimator does not commute with all {Π n }, such as one obtained from homodyne detection, then the estimator does not have the direct-sum form and has the potential to be improved by the quantum Rao-Blackwellization.
To introduce a more specific example, diagonalize Γ x in terms of a diagonal matrix D x and a unitary matrix V x as where δ jk is the Kronecker delta and each λ j,x is an eigenvalue of Γ x .I call {λ j,x : j = 1, . . ., J} the spectrum of the thermal state.With the change of variable Define also a unitary operator Ûx by such that |α⟩ = |V x β⟩ = Ûx |β⟩.ρ x can then be expressed as where is separable into a product of Bose-Einstein distributions and is a Fock state with respect to the optical modes defined by Eq. (5.35).I call these optical modes the eigenmodes of the thermal state.Now suppose that only the spectrum {λ j,x } depends on the unknown parameter x, while V , U , and thus {ĝ j } do not, meaning that the eigenmodes are fixed.This assumption applies to the thermometry problem studied in Ref. [75] but does not apply to the stellar-interferometry problem studied in Ref. [74] or the subdiffraction-imaging problem studied in Refs.[76,77,78,79], because the eigenmodes in the latter two cases vary with x.With fixed eigenmodes, I can define a more fine-grained xindependent projector as In this example, the family of density operators given by Eq. (5.36) and the G * B given by Eq. (5.40) happen to commute with one another, but the original estimator B need not commute with the others, unlike Sinha's assumption in Ref. [64]; see Appendix H for a brief discussion of his theory.Take homodyne detection for example.An estimator constructed from homodyne detection can be framed as where q is a vectoral quadrature operator that is a linear function of {â j }.Equation (5.41) does not commute with Eq. (5.39) in general, but Corollary 6 still applies to it.
To demonstrate the possible improvement through an even more specific example, suppose that the spectrum is flat and a(x) = λ j,x = x, the mean photon number per mode, is the parameter of interest.With homodyne detection, Eq. (5.41) is an unbiased estimator of x if (5.42) The Rao-Blackwell estimator given by Eq. (5.40), on the other hand, can be expressed as With the thermal state, it is straightforward to show that which are plotted in Fig. 3, demonstrating the domination of the Rao-Blackwell estimator.Corollary 6 is reminiscent of the optimality of photon counting for thermometry proved in Ref. [75] in terms of a quantum Cramér-Rao bound.Corollary 6 is more general because it applies directly to the local x ) in estimating the mean photon number per mode x of a thermal state.The plot is in log-log scale, both axes are dimensionless, and the errors are normalized with respect to J, the number of optical modes.The improvement can be regarded as a result of the quantum Rao-Blackwellization.
mean-square error of any biased or unbiased estimator and allows the parametrization of the spectrum {λ j,x } and the parameter of interest a(x) to be general.The superiority of photon counting over homodyne detection for random displacement models has also been noted in many other contexts [80,81,82,83], although those works, like Ref. [75], rely on the quantum Cramér-Rao bound as well.
If the eigenmodes vary with x, as in the problems of stellar interferometry [74] and subdiffraction imaging [76,77,78,79], then Eq. (5.40) may not be a valid estimator, because x is unknown and the measurement may not be realizable.It is an interesting open question whether the quantum Rao-Blackwell theorem can offer any insight about those problems as well, beyond the optimality of the direct-sum form in Corollary 5. I speculate on two potential directions of future research: 1.Even if the G * map may not be constant in general, G * B for a particular B may happen to be constant and still a valid estimator.
2. Even if G * B varies with x, Eq. ( 5.3) remains valid and can be used as a lower bound on MSE x , in which case MSE x ≥ MSE ′ x is an oracle inequality.An estimator that approximates G * B, via an adaptive protocol for example [22], may still enjoy an error close to MSE ′ x .

Conclusion
This work cements the Jordan-product GCE and the associated divergence as essential concepts in quantum metrology.In the Bayesian setting, the GCE is found to relate the optimal estimators for a sequence of channels.In the frequentist setting, the GCE is found to give a quantum Rao-Blackwell theorem, which can improve a quantum estimator in the same manner as the classical version does and reveal the optimal forms of the estimators in common scenarios.In both settings, the divergence is found to play a significant role in determining the gap between the estimation errors before and after a channel is applied.Given these operational meanings, even the purists [39,40] can no longer dismiss the GCE and the divergence as pointless concepts.
For the more open minds, the concepts have unveiled a new suite of methods for the study of decoherence and the design of better measurements in quantum metrology.Many open problems remain.First, it should be possible to generalize the theory here rigorously for infinite-dimensional Hilbert spaces.Second, it may be possible to generalize the quantum Rao-Blackwell theorem here for other convex loss functions beyond the square loss, in the same manner as the classical version [5] or Sinha's versions [64].Third, there may be a deeper relation between Petz's sufficiency and the constant GCE condition desired here, beyond the specific examples in this work.Fourth, there should be no shortage of further interesting examples and applications of the theory here for quantum metrology.Last but not the least, the strategy of using quantum metrology to give operational meanings to GCEs may be generalizable for other versions of GCEs and other metrological tasks, such as multiparameter estithus expanding the fundamental role of GCEs in both quantum metrology and quantum probability theory.

Acknowledgment
I acknowledge helpful discussions with Arthur Parzygnat.This research is supported by the National Research Foundation, Singapore, under its Quantum Engineering Programme (QEP-P7).

A Quantum prediction
In analogy with Eq. (2.14), F * , the Hilbert-Schmidt adjoint of the CPTP map F that models a channel, obeys the following interesting formula for any E: In other words, F * is the optimal prediction in the same way F * is the optimal retrodiction.The notations of F * and F * also coincide with those of the pullback and the pushforward in differential geometry [84], respectively, and indeed F * and F * behave like those operations.
With the adjoint relation between F * and F * given by Eq. (2.15) in terms of the weighted inner product, F * can be shown to obey in analogy with Eq. (2.16), while the minimum divergence in Eq. (A.1) can be expressed as in analogy with Eq. (2.17).The divergence given by Eq. (2.10) can also be rewritten in a time-symmetric form as Together with Eqs.(2.14)-(2.17),these formulas complete a satisfying time-symmetric theory of quantum inference.

B Classical conditional expectation
The classical concepts in this Appendix are all special cases of the quantum concepts in Sec. 2 and Appendix A; see Table 1 for the correspondence.Let X and Y be classical random variables with countable sample spaces X and Y, respectively.Suppose that X is generated at an earlier time than Y .Let P X (x) be the probability distribution of X and P Y |X (y|x) be the probability distribution of Y conditioned on X = x.The unconditional distribution of Y is given by which can be expressed as P Y = FP X .The divergence between two complex random variables a(X) and b(Y ) can be defined as where the inner product and the norm are defined in Table 1 and F * is the adjoint of F with respect to the unweighted inner product in Table 1, such that In other words, F * b is the conditional expectation of b(Y ) given X.
Let L 2 (P ) be the Hilbert space of random variables in terms of the inner product ⟨•, •⟩ P defined in Table 1.The conditional expectation F * b is the optimal predictor of b(Y ) as a function of X, viz.,

Concept
Similarly, the conditional expectation of a(X) given Y is the optimal retrodictor of a(X) as a function of Y , viz., where {|x⟩ : x ∈ X } is an orthonormal basis of H 1 and {|y⟩ : y ∈ Y} is an orthonormal basis of H 2 .
Since the joint distribution P XY (x, y) = P Y |X (y|x)P X (x) is another probability distribution, Eq. (B.2) can be written as the squared norm and since L 2 (P X ) and L 2 (P Y ) are subspaces of L 2 (P XY ), the Hilbert projection theorem implies that the conditional expectation is a projection, viz., ∥a − b∥ where Π[u|V ] is the projection of a Hilbert-space element u into a subspace V .In the quantum case, F * and F * are also projections into appropriate Hilbert spaces if the equality condition in Eq. (2.13) holds.Because X is assumed to be generated earlier than Y and experiments follow an arrow of time, it is more straightforward to obtain P Y |X by experiments than P X|Y .This is the reason why one commonly assumes that P Y |X rather than P X|Y is known in an inference problem, and the prediction formula in terms of P Y |X looks simpler than the retrodiction formula in terms of the Bayes theorem.Probability theory in itself is agnostic to the arrow of time, and if P X|Y is given instead, the retrodiction formula would look simpler and the prediction formula would require the Bayes theorem.The same arrow of time is commonly assumed in quantum theory, where the F map models the evolution of a quantum system forward in time.It is therefore unsurprising that the prediction map F * in terms of F also looks simpler than the retrodiction map F * in terms of Eq. (2.16), which is a generalization of the Bayes theorem.If the map (F * ) * is given instead, then the retrodiction map F * is simply its Hilbert-Schmidt adjoint, while the prediction map F * is given by the complicated Eq. (A.2).

C GCE for Gaussian systems
I first briefly review the theory of quantum Gaussian systems, following Chap.12 in Ref. [50].Let H 1 be the Hilbert space for s bosonic modes.On H 1 , define the canonical observables as and the Weyl operator as where ⊤ denotes the transpose.If σ is a Gaussian state, its characteristic function can be expressed as where m ∈ R 2s is the mean vector and Σ ∈ R 2s×2s is the covariance matrix of the Gaussian state.Σ is symmetric, positive-semidefinite, and must observe an uncertainty relation that need not concern us here.Similar to the preceding definitions, let H 2 be the Hilbert space for t bosonic modes and define Q and W (ζ) as the canonical observables and the Weyl operator on H 2 , respectively.If F : O(H 1 ) → O(H 2 ) is a CPTP map that models a Gaussian channel, it can be defined by where F ∈ R 2t×2s is a transition matrix, l ∈ R 2t is the mean displacement introduced by the channel, and R ∈ R 2s×2s is the channel covariance matrix.F and R must obey a certain matrix inequality for the map to be CPTP, but again the inequality need not concern us here.With the Gaussian input state and the Gaussian channel, the output state remains Gaussian and its characteristic function is given by An explicit GCE formula can now be presented.

Proposition 1. Assume a Gaussian state defined by Eq. (C.4), a Gaussian channel defined by Eqs. (C.5) and (C.6), a quadrature operator given by
and the E map given by the Jordan product in Eq. (2.7).A solution to the GCE is while the divergence is See Appendix J for the proof.
It is interesting to note that Eqs.(C.12)-(C.14)are identical to the formulas for the classical conditional expectation E(A|Y ) and its mean-square error and X ∼ N (m, Σ) and Z ∼ N (l, R) are independent normal random variables [51].Here, the canonical observables Q and Q play the roles of X and Y , respectively.When F is a measurement map, similar formulas have been derived in Refs.[25,26,27,18] and may be useful for studying waveform estimation [42] beyond the stationary assumption.

D von Neumann measurement
Let B be a self-adjoint operator on a Hilbert space H. Then B admits the spectral representation [85,46] in terms of a projection-valued measure Π on the Borel space (R, Σ R ).A projection-valued measure, also called an orthogonal resolution of identity, is a POVM on (Y, Σ Y ) with the additional properties that Π(C) for any C ∈ Σ Y is a projection operator and

E The dynamic-programming algorithm
The dynamic-programming algorithm is based on the so-called principle of optimality for Markov decision processes.For the process given by Eqs.(4.12) and (4.13), the principle states that, if ( F1 , . . ., FN ) are the optimal "controls" that minimize the total "cost" given by Eq. (4.13), then ( Fk , . . ., FN ) must also be the optimal controls that minimize the "cost-to-go" at time k, defined as Based on this principle, the algorithm starts by choosing the final control F N that minimizes the cost-to-go J N = g(s N , F N ).Let this minimum be JN (s N ) ≡ min The optimal F N obtained this way is a function FN (s N ) of the state s N .Next, write JN (s N ) = JN [f (s N −1 , F N −1 )] using Eq.(4.12) and find the control F N −1 that minimizes the cost-to-go at time k = N − 1, given by JN−1 (s N −1 ) ≡ min The optimal F N −1 is again a function FN−1 (s N −1 ) of the state s N −1 .This procedure continues with for k = N − 2, N − 3, . . .until k = 1, when the complete optimal control law ( F1 (s 1 ), . . ., FN (s N )) has been found and J1 (s 1 ) is the minimum total cost.

F Classical Rao-Blackwell theorem
To derive the classical Rao-Blackwell theorem from the quantum theorem given by Theorem 3, assume the diagonal forms a solution of which is which is nonnegative and coincides with a form of the Rao-Blackwell theorem (see, for example, Problem 1.7.9 on p. 73 in Ref. [5]).
Z is called a sufficient statistic for the estimation problem if the distribution P Y |Z,X (y|z, x) given by Eq. (F.10) does not depend on x [5].Then Eq. (F.9) also does not depend on x for any b(y) and is always a valid estimator.

G Comparison of the Bayesian and frequentist settings
Both the monotonicity of the Bayesian error given by Corollary 1 and the error reduction due to the quantum Rao-Blackwell theorem in Theorem 3 are unsurprising results given their classical origins, but they may be confusing in that they seem to say opposite things about the effect of a channel.I offer a clarification here.
First, note that the Bayesian setting concerns the "global" error D σ,F (A, B) only, whereas the frequentist setting concerns the local error MSE x as a function of the unknown parameter x.The global error is a cruder measure because it is only an average of the local error given by assuming Eqs.(4.3) and (5.1).Second, the Bayesian results in Sec. 4, and Corollary 1 in particular, concern only the estimators F * A and G * F * A that are optimal with respect to the global error.Theorem 3 in the frequentist setting, on the other hand, is about the local errors of an estimator B and its Rao-Blackwellization G * B, with no special assumptions about the original estimator B. The theorem also says nothing about whether the Rao-Blackwell estimator is optimal in the global sense, only that it is at least as good as the original.
Third, note that the Personick estimators considered in the Bayesian setting do not depend on the unknown parameter and are naturally realizable, and Corollary 1 applies to any channel.In the frequentist setting, the channel and the Rao-Blackwell estimator must be parameter-independent for the measurement to be realizable, so there is a stringent requirement on the G channel for the improvement to be realizable, let alone significant.
In practice, the classical Rao-Blackwellization is typically used to improve an initial estimator design that is not expected to be optimal or even good in any sense; the derivation of the U-statistics [72] is a representative example.If the initial estimator is already optimal in the Bayesian sense, then the Rao-Blackwellization cannot offer any improvement almost everywhere with respect to the prior P X .
See Appendix J for the proof.

H Comparison with some prior works
Although Sinha's formalism in Ref. [64] is applicable to infinite dimensions and any convex loss function, he makes heavy assumptions about the commutativity of all the involved operators.
which is independent of x.
See Appendix J for the proof.Equation (H.6) is used implicitly in Sinha's Rao-Blackwell theorems and Theorem 3 here also applies to it.Physically, Proposition 2 means that, when the density operators are given by Eq. (H.4) and the original estimator is given by Eq. (H.2), one can measure the sufficient-statistic observable S given by Eq. (H.1), and the Rao-Blackwell estimator as a function of the outcome s is (tr σ s B s )/ tr σ s .
Another relevant prior work is Ref. [65] by Luczak, which studies a concept of sufficiency in von Neumann algebra for minimum-variance unbiased estimation in Sec. 5 of Ref. [65].His Theorem 5.1 states that a subalgebra with a special property called completeness is sufficient for the estimation if and only if there exists a constant GCE in terms of the Jordan product that projects onto the subalgebra.He makes no commutativity assumptions like Sinha's, but the completeness assumption is unfortunately rather restrictive, as is well known in classical statistics [5] and recognized by Luczak himself [65].Even in classical statistics, completeness is difficult to check, and not many models are known to satisfy it.It is unclear what quantum models beyond the known classical cases can satisfy the property.Theorem 3 here, on the other hand, does not require the unbiasedness and completeness assumptions.
Lastly, it is worth mentioning that Refs.[86,87] by Shmaya and Chefles concern a quantum generalization of another Blackwell theorem, which, to my knowledge, has no relation to the Rao-Blackwell theorem, apart from Blackwell's name being attached to both.

I Quantum U-statistics
The goal here is to compute the GCE given by Eq. (5.14) for an operator in the form of Eq. (5.15).Define the permutation matrix π on a column vector as and the symmetrization map given by Eq. (5.11) becomes which boils down to a symmetrization of B(u).In general, a symmetric operator on H ⊗m 1 is defined by Given any operator on H ⊗m 1 , a symmetric version can be obtained by applying the symmetrization map.
Define a projection matrix Π j : U n → U dim j by where j = (j 1 , . . ., j m ) ∈ J m is a vector of indices with 1 ≤ m ≤ n and J m is the set of m-permutations of {1, . . ., n} (ordered sampling without replacement).Define also {j} for a j ∈ J m as the vector of indices sorted in ascending order and define the set of all such vectors as which is equivalent to the set of m-combinations of {1, . . ., n} (unordered sampling without replacement).
A formula for the symmetrization can now be presented.

Proposition 3. Suppose that B ∈ O(H ⊗n
1 ) can be decomposed as where C ∈ O(H ⊗m 1 ) applies to the first m Hilbert subspaces in H ⊗n ) applies to the rest.Assume that both C and C ′ are symmetric.Then the symmetrized B is given by where {E(u)} is an orthonormal basis of O(H ⊗n 1 ) given by Eq. (I.2), C(v) and C ′ (w) are the components of C and C ′ with respect to the same basis, the projection matrix Π is defined by Eq. (I.11),K m is the m-combinations of {1, . . ., n}, and for each k, k ′ is defined as the rest of the indices in {1, . . ., n}.Proposition 3 gives the quantum U-statistics in Ref. [45] if Eq. (5.15), a special case of Eq. (I.13) with

See
, is assumed.The classical Ustatistics [5,72]  where Eqs.(2.4) and (5.8) have been used.The interchange of E ρx and the Bochner integral is valid because E ρx is a linear map on a finite-dimensional operator space (more assumptions would be needed for infinite-dimensional operator spaces; see Corollary 2 on p. 134 in Ref. [88]).By Eq. (2.16), Eq. (J.18) is equal to E Gρx G * B = E ρx G * B, resulting in a solution to the GCE given by Eq. (5.10).
Proof of Lemma 5. Assuming Eq. (5.22) and using Eq.(5.20), one obtains which is equal to by virtue of Eq. (5.21).It follows that Eq. (5.22) is a solution to the GCE, as per Eq.(2.16).
Note that Lemmas 1-5 apply to classes of GCEs and not just the Jordan version.Note also that the GCEs for any sequence of the channels can be computed by chaining the individual GCEs in a manner reminiscent of calculus.With similar steps and the ansatz Equation (G.3) can be proved by contradiction: assume that there exists a x ∈ X with P X (x) > 0 such that MSE x > MSE ′ x .
Since MSE x ≥ MSE ′ x by Theorem 3, the assumption would imply x P X (x) MSE x > x P X (x) MSE ′ x , which contradicts Eq. (G.2).It follows that the assumption cannot hold and one must have Eq.(G.3).The n! summands in Eq. (I.9) with respect to π can now be divided into subsets indexed by Eq. (I.12).

Proof of
call it compositionality.

Figure 1 :
Figure 1: (a) A diagram depicting the map of a density operator σ through the CPTP maps F and then G. (b) A diagram depicting the map of an observable A through the GCE (GF) * , or equivalently through the two GCEs F * and then G * , as per Theorem 1. (c) A diagram depicting the root divergences between the operators as lengths of the sides of a right triangle, as per Theorem 2. The subscripts of D are omitted for brevity.

Figure 2 :
Figure 2: Some scenarios of Bayesian quantum parameter estimation.See the main text for the definitions of the symbols.

Corollary 3 .
Given any POVM M : Σ Y → O(H 2 ), any estimator b : Y → R, and the resulting local error MSE x , there exists a von Neumann measurement that can perform at least as well for all x ∈ X .Proof.Write the Naimark extension of the POVM as tr

. 15 )
then Eq.(5.14)  leads to the quantum U-statistics introduced by Gut ¸ȃ and Butucea[45], as shown in Appendix I.The U-statistic is an unbiased estimator of a(x) = tr ρ x B = tr ρ x G * B. The simplest example is when m

Figure 3 :
Figure 3: Comparison of the mean-square error obtained by homodyne detection (MSEx) and that by photon counting (MSE ′x ) in estimating the mean photon number per mode x of a thermal state.The plot is in log-log scale, both axes are dimensionless, and the errors are normalized with respect to J, the number of optical modes.The improvement can be regarded as a result of the quantum Rao-Blackwellization.

P
x a(x)P X|Y (x|y), (B.7) P X|Y (x|y) = P Y |X (y|x)P X (x) P Y (y) .(B.8) Equation (B.8) is, of course, the Bayes theorem.It is straightforward to show that the formulas presented thus far are special cases of the quantum formulas in Sec. 2 and Appendix A; this is done by assuming the diagonal forms σ = x P X (x) |x⟩ ⟨x| , Y |X (y|x) ⟨x| σ |x⟩ |y⟩ ⟨y| , (B.11) B = y b(y) |y⟩ ⟨y| , (B.12)

2 Pb∥ 2 P
XY = Π[b|L 2 (P X )], (B.14) F * a = arg min b∈L2(P Y ) ∥a − XY = Π[a|L 2 (P Y )], (B.15) 46].The projectionvalued measure in Eq. (D.1) is unique to B in the sense that no other projection-valued measure can satisfy Eq. (D.1) for a given B. A von Neumann measurement of B is defined as a measurement with the projection-valued measure Π of B as the POVM.If H is finite-dimensional, B has a finite number of eigenvalues, and the spectral representation is much simpler and can be written asB = b∈Λ B bΠ(b), (D.2)whereΛ B ⊂ Ris the set of B's eigenvalues and {Π(b) : b ∈ Λ B } are projection operators that obey Π(b)Π(c) = δ bc Π(b) and b Π(b) = I.The set of outcomes can then be restricted to Λ B .Consider now a converse situation, where the POVM is given by a projection-valued measure Π ′ on (Y, Σ Y ) and each outcome y ∈ Y is processed by a function b : Y → R to produce a final outcome b(y).The measurement Π ′ together with the data processing by b(y) can be interpreted as a von Neumann measurement of B = b(y)Π ′ (dy).(D.3)This is because Eq. (D.3) can be expressed in the spectral representation given by Eq. (D.1), with Π(C) = Π ′ [b −1 (C)], b −1 (C) ≡ {y : b(y) ∈ C} , (D.4) and the probability tr Π ′ [b −1 (C)]ρ of each event C ∈ Σ R generated by (Π ′ , b) coincides with the probability tr Π(C)ρ from a von Neumann measurement of B. If Y is countable, then B = y b(y)Π ′ (y) can be expressed as Eq.(D.2), with Λ B = {b(y) : y ∈ Y} , Π(c) = y:b(y)=c Π ′ (y).(D.5) z c(z, x) |z⟩ ⟨z| , (F.8) c(z, x) = y P Y |Z,X (y|z, x)b(y), (F.9)P Y |Z,X (y|z, x) = P Z|Y (z|y)P Y |X (y|x) P Z|X (z|x) .(F.10) c(z, x) is the expectation of b(Y ) conditioned on Z = z and X = x.If c(z, x) = c(z) does not depend on x,then it is a valid estimator as a function of the statistic Z, and its error as per Eq.(5.2) becomesMSE ′ x = z [c(z) − a(x)] 2 P Z|X (z|x).(F.11)It follows from Theorem 3 and Eq.(2.17) that the difference between Eqs. (F.6) and (F.11) is MSE x − MSE ′ x = y [b(y)] 2 P Y |X (y|x) − z [c(z)] 2 P Z|X (z|x), (F.12)

Corollary 7 .
Assume the Bayesian problem specified by Eqs.(4.1) and (4.2) and let B = F σ * A be the Personick estimator.If another CPTP map G is applied and both G and the Rao-Blackwell estimator G ρx * B in Theorem 3 do not depend on x, then G ρx * B is a solution to the final Personick estimator G F σ * B.Moreover, both the Bayesian error and the local error remain the same after the G channel, in the sense of

1 and
C ′ ∈ O(H ⊗(n−m) 1 Appendix J for the proof.Each (C ⊗ C ′ ) k in Eqs.(I.14) is an application of C on the m Hilbert subspaces in H ⊗n 1 indexed by k = (k 1 , . . ., k m ) and an application of C ′ on the other n − m Hilbert subspaces.If C is not symmetric, it can be symmetrized first before Proposition 3 is used.This is because the left Haar measure is also the right Haar measure for the permutation group, making the symmetrization map invariant to any prior permutation as well.One is therefore free to symmetrize C in C ⊗ C ′ first before the total symmetrization in Proposition 3. The same goes for C ′ .If B is in the general form of ⊗ n C n , Proposition 3 can be applied recursively to produce a generalized multinomial form of Eqs.(I.14).
where {|y⟩ : y ∈ Y} is an orthonormal basis of H 2 , {|z⟩ : z ∈ Z} is an orthonormal basis of H 3 , X, Y , and Z are classical variables, and P O|O ′ is the probability distribution of O conditioned on a value of O ′ .For the estimation problem, X is the hidden parameter fixed at X = x, Y is the observation, and Z is a statistic generated from Y without knowing X, such that P Z|Y,X (z|y, x) = P Z|Y (z|y).
* B becomes where B s ∈ O(Π s H 2 ) is equal to Π s BΠ s on the subspace Π s H 2 .Given a family of density operators {ρ x : x ∈ X }, Sinha defines a sufficient S by the existence of a positive function φ : Λ S × X → R + and a parameter-independent positive-semidefinite operator σ s ∈ O(Π s H 2 ) such that, for any B that commutes with S, The use of a GCE in this work, on the other hand, makes Theorem 3 a more natural generalization of the classical theorem.As the conditional expectation is a standard and crucial step in the classical Rao-Blackwellization, the GCE can be similarly instrumental for the quantum case, as demonstrated by the corollaries and examples in this paper.It is straightforward to show that a measurement of Sinha's sufficient-statistic observable is a special case of a sufficient channel.
* B = s tr Π s