Integral formula for quantum relative entropy implies data processing inequality

Integral representations of quantum relative entropy, and of the directional second and higher order derivatives of von Neumann entropy, are established, and used to give simple proofs of fundamental, known data processing inequalities: the Holevo bound on the quantity of information transmitted by a quantum communication channel, and, much more generally, the monotonicity of quantum relative entropy under trace-preserving positive linear maps -- complete positivity of the map need not be assumed. The latter result was first proved by M\"uller-Hermes and Reeb, based on work of Beigi. For a simple application of such monotonicities, we consider any `divergence' that is non-increasing under quantum measurements, such as the concavity of von Neumann entropy, or various known quantum divergences. An elegant argument due to Hiai, Ohya, and Tsukada is used to show that the infimum of such a `divergence' on pairs of quantum states with prescribed trace distance is the same as the corresponding infimum on pairs of binary classical states. Applications of the new integral formulae to the general probabilistic model of information theory, and a related integral formula for the classical R\'enyi divergence, are also discussed.


Introduction
Half a century ago, Alexander Holevo proved his famous inequality: the quantity of information transmitted by a quantum communication channel using a given ensemble of quantum states is bounded from above by the extent to which von Neumann entropy is concave on the ensemble.One of the main ingredients in Holevo's proof is an explicit, closed formula for the directional second derivative S ′′ of the von Neumann entropy.
Since that time, the Holevo bound has become an important building block of the vast theory of quantum information.Generalizations and alternative proofs, often using advanced methods of that theory, have been given.Almost simultaneously with Holevo's work, Elliott Lieb and Mary Beth Ruskai [11] established the strong subadditivity of von Neumann entropy, which quickly led to Göran Lindblad's proof [12] of monotonicity of quantum relative entropy under completely positive trace-preserving linear maps -a generalization of Holevo's inequality.Much later, a further generalization was proved by A. Müller-Hermes and D. Reeb [13], based on work of S. Beigi [1]: quantum relative entropy cannot increase under a trace-preserving positive linear map -complete positivity of the map need not be assumed.
In this paper, we return to more classical methods of analysis and linear algebra.In Section 5, we prove the alternative formula for the directional second derivative of von Neumann entropy, which then leads to similar formulas for directional derivatives of higher order.Note that tr − stands for the sum of absolute values of negative eigenvalues.Before that, in Section 4, we establish a similar formula for the quantum relative entropy.The simplest form of this formula is which holds for any two quantum states.We then show in Section 6 that our formulae lead to the above mentioned monotonicity of quantum relative entropy and to the Holevo inequality in a very simple way, and a characterization of the case of equality in these inequalities can be deduced from our new proof.In a recent preprint [7], it is pointed out that the tr − appearing in (1) is closely related to optimal error probabilities in quantum state discrimination, and, as a consequence, (1) leads to a new characterization of recoverability of quantum states with respect to a quantum channel in terms of quantities related to quantum hypothesis testing.
In Section 7, we consider any 'divergence-like' quantity that is non-increasing under quantum measurements, such as the concavity of von Neumann entropy, sandwiched Rényi divergences, or the more general optimized quantum f -divergences.We use a simple construction due to Hiai, Ohya, and Tsukada [3] from the year 1981 to show that the infimum of such a 'divergence' on pairs with prescribed trace distance is the same for (arbitrary dimensional) quantum states as for binary classical states.For example, the Holevo inequality is used to obtain a new, tight lower bound on the concavity of von Neumann entropy, improving on a lower bound given by Isaac Kim in 2014.
In Section 8, we discuss how formula (1) can be applied in the general probabilistic framework of information theory, yielding, in particular, an extension of the Holevo inequality to that framework.In Section 9, we provide an integral representation of the classical Rényi relative entropy and discuss its possible relevance to quantum information theory and to general probabilistic theory.

Notations, terminology, and basic trace inequalities
We write log for the natural logarithm.Partial derivatives will be denoted by putting the corresponding variable in the subscript.A ′ means differentiation with respect to t.
The set of n-square matrices with complex entries is written M n (C).The identity matrix is 1.A complex matrix A is psdh if it is positive semi-definite Hermitian, written A ≥ 0. We write A ≥ B to mean that A − B ≥ 0. For a Hermitian matrix A, we write A = A + − A − , where A ± ≥ 0 and A + A − = A − A + = 0. We define |A| = A + + A − .We write tr ± A = tr A ± and ∥A∥ 1 = tr |A| for the sum of absolute values of positive/negative eigenvalues and all eigenvalues of A, respectively.Recall that tr + A = max{tr P A : P 2 = P = P * }.
The maximum is attained if and only if If A ≥ B, then P AP ≥ P BP and therefore tr P A ≥ tr P B for all projections P , whence There is equality here if and only if there exists a projection P satisfying (2) and tr P A = tr P B. In this case, tr P A + = tr P A+tr P A − = tr P A = tr P B, but im

Eigenvalues of matrix pencils
It will be useful to study the negative real eigenvalues of the linear matrix pencil where A(0) = ρ ≥ 0 and A(1) = σ is Hermitian.A priori, we allow t ∈ C here.Thus, in general, A(t) is not Hermitian, but some real eigenvalues could occur for some non-real t.However, we show that negative real eigenvalues of A(t) can only occur for real t.

Lemma 1.
If A(t)e = −re for a unit vector e and a positive real number r, then t((ρ − σ)e, e) > 0, and therefore t ̸ = 0 is real.
Consider the two-parameter family of matrices Its partial derivatives are X t = σ − ρ and X r = 1.
Define the bivariate polynomial For its partial logarithmic derivatives, we have, by Jacobi's formula, whenever X is invertible, i.e., whenever f ̸ = 0.In this case, We have f (t, r) = 0 if and only if −r is an eigenvalue of A(t).In this case, the ratio of partial derivatives is given by Lemma 2. Let t be real.If A(t)e = −re for a unit vector e, then f t (t, r) = ((σ − ρ)e, e)f r (t, r).
These two lemmas imply that any negative simple eigenvalue of A(t) gets more negative as t moves farther away from zero.More precisely, we have Corollary 3. If f (t, r) = 0 and r > 0, then (a) t ̸ = 0 is real, and tf r (t, r)f t (t, r) ≤ 0.
(b) The following are equivalent: f t (t, r) = 0; f r (t, r) = 0; −r is a multiple eigenvalue of A(t).
We have f (t, r) = f r (t, r) = 0 if and only if −r is a multiple eigenvalue of A(t).As a polynomial in r, f has a discriminant whose value is a polynomial in t.The discriminant is zero for a given value of t if and only if the matrix A(t) has a multiple eigenvalue.This happens either for finitely many t or for all t.

Quantum relative entropy
Let ρ and σ be psdh matrices.The notations ( 5), (6), and (7) introduced in Section 3 will be used.We wish to prove an integral formula for the quantum relative entropy D(ρ∥σ).In the first part of the proof, we join the pair (ρ, σ) to infinity in the direction of the identity matrix.This is done in the following two lemmas, and yields an integral formula in terms of certain logarithmic derivatives of the function f (t, r).We will then convert this into an integral formula in terms of tr − A(t) by applying the Residue Theorem and by changing the integration variable from r to t.
Lemma 5. (a) For all r > 0, we have Proof.(a) On the left hand side, we differentiate the product in the argument of tr to get Apply the identity to arrive at the middle expression in Statement (a) of the Lemma.By tr log = log det, the first two terms are clearly equal to the first two terms of the last expression in (a).By (8), the third terms are also equal.(b) When r → 0, the first two terms are O(log(1/r)), and the third term When r → ∞, the third term is ∼ tr(σ − ρ)/r, and the sum of the first two terms is tr log From Lemmas 4 and 5(a), we have Now we integrate by parts.If im ρ ⊆ im σ, then, by Lemma 5(b), the result is simply where h = (log f ) r = f r /f is a rational function of t and r, holomorphic unless f = 0, so certainly holomorphic at (t, r) when r > 0 and t ∈ [0, 1].Therefore, as a rational function of t, the function h/t has residue h(0, r) at 0 and is holomorphic at 1, the function h/(t−1) has residue h(1, r) at 1 and is holomorphic at 0, and the function h/(t − 1) 2 has residue h ′ (1, r) at 1 and is holomorphic at 0. Let For a fixed r > 0, observe that gh, as a rational function of t in the complex plane, is holomorphic except where t = 0, t = 1, or f = 0.The latter case occurs if and only if −r is an eigenvalue of A(t).If −r is a simple eigenvalue of A(t), then f t (t, r) ̸ = 0 by Corollary 3, and the residue of gh =gf r /f at t is gf r /f t because gf r is holomorphic at t and f (t, r) = 0.The bivariate polynomial f , when viewed as a polynomial in t, has a leading coefficient that is a univariate polynomial in r.For all but finitely many values of r, this leading coefficient is nonzero, and then we have whence gh = O(|g|) = O |t| −3 , so the contour integrals on circles |t| = T tend to zero as T → ∞.By the Residue Theorem, the sum of all residues of gh is zero.Let us assume that not all A(t) have multiple eigenvalues.In this case, only finitely many A(t) have multiple eigenvalues, therefore only finitely many numbers −r occur as multiple eigenvalues of some A(t).These finitely many values of r, together with the ones for which (11) fails, do not influence the integral in (10).The right hand side of (10) therefore becomes By Lemma 1, only real numbers t can satisfy the condition f (t, r) = 0 required in the summation.Therefore, we can think of ( 12) as an integral on the portion of the real algebraic plane curve f (t, r) = 0 that lies in the upper half-plane r > 0. The finitely many singular points that the curve may have, corresponding to −r being a multiple eigenvalue of A(t), do not influence the integral.We only need to study the integral along the smooth arcs of the curve.Each smooth arc is parametrized by the variable r, but we want to reparametrize it using the variable t.For a simple negative eigenvalue −r of A(t), Corollary 3 implies that f r (t, r) ̸ = 0 and |dr/dt| = |f t /f r | = −(sgn t)f t /f r as we move along the curve.Hence, by the rule for change of variables, the integral is rewritten as The change of variables is justified since the integrand is positive at every smooth point.The last sum that has appeared is tr − A(t).We arrive at the main result of this paper.
Theorem 6.Let ρ, σ ∈ M n (C) be psdh matrices.Then where A(t) = (1 − t)ρ + tσ, and tr ± stands for the sum of absolute values of positive and negative eigenvalues, respectively.
Proof.First equality: Both sides are +∞ unless im ρ ⊆ im σ, which we henceforth assume.Restricting our attention to the image of σ, we may assume that σ is positive definite to begin with.Since both sides are then continuous, we may change σ a little bit so that it has no multiple eigenvalues.The preceding discussion then applies and the first equality is proved.
From Theorem 6 and fact (4), we immediately recover the well-known fact that D(ρ∥σ) is a convex function of the pair (ρ, σ), and it is nonnegative whenever tr ρ ≥ tr σ.

Higher order derivatives of von Neumann entropy
In this section, ρ is a psdh matrix and σ is a Hermitian matrix with im σ ⊆ im ρ.We wish to find an integral formula for S (ρ + tσ) (m) t=0 when m ≥ 2. When m = 2, tr ρ = 1, and tr σ = 0, an explicit formula for this quantity, in terms of the spectral decomposition of ρ, has been given by A. S. Holevo in his seminal paper [6,Lemma 4].The fact that our integral formula yields the same value seems non-obvious.The proof given below does not rely on Holevo's explicit formula.
(a) For all m ≥ 2, we have where tr − stands for the sum of absolute values of negative eigenvalues.
(b) When m ≥ 2 is even, the quantity ( 14) is nonnegative and convex as a function of the pair (ρ, σ).
Proof.(a) Case m = 2: We have In the parentheses here, we have By Theorem 6, we have as t → 0 by Lebesgue's Dominated Convergence Theorem.Case m = 2 follows.
If the statement holds for m, then for |u| < T , where T > 0 is such that ρ ± T σ ≥ 0 for both signs.
When |u| < |t|, we have d du  4), we see that tr − (ρ + tσ) is nonnegative and convex as a function of the pair (ρ, σ).For m even, |t|t m ≥ 0 for all t.

Data processing inequalities
Let E : M n (C) → M n ′ (C) be a trace-nonincreasing positive linear map.Positivity means that psdh matrices are mapped to psdh matrices (and therefore Hermitian matrices are mapped to Hermitian matrices).Trace-nonincreasing means that tr EA ≤ tr A for all A ≥ 0. An important example is given by a positive operator valued measure, or partition of unity: psdh matrices E 1 , . . ., E k summing to 1, which give rise to a completely positive, trace-preserving linear map, the quantum measurement

Lemma 8. (a) For any trace-nonincreasing positive linear map E and any Hermitian
matrix A, we have tr ± EA ≤ tr ± A.
(b) Equality holds in the statement (a) if and only if tr EA ± = tr A ± and EA + EA − = 0.
(c) For a quantum measurement, the condition of equality in (a) is that for all i, we have (c) For a quantum measurement, the condition of equality is that there be no i with tr E i A ± > 0 for both signs.Remark 9.The condition EA + EA − = 0 appearing in (b) is equivalent to each of the following: (EA) + = EA + , (EA) − = EA − , ∥EA∥ 1 = tr E|A|.When tr E|A| = tr |A|, it is also equivalent to ∥EA∥ 1 = ∥A∥ 1 .

Quantum relative entropy
From Theorem 6 and Lemma 8, we recover (for the finite-dimensional case) the data processing inequality D(Eρ∥Eσ) ≤ D(ρ∥σ) (15) for any trace-nonincreasing positive linear map E and any psdh matrices ρ and σ such that tr Eρ = tr ρ.Note that complete positivity of E is not assumed.The inequality (15), in this generality, was first proved by A. Müller-Hermes and D. Reeb [13].They also covered the infinite-dimensional case.Their approach was based on the work of S. Beigi [1] establishing the data processing inequality for sandwiched Rényi divergences, with respect to quantum channels (completely positive trace-preserving linear maps).Proposition 10.Let ρ and σ be psdh matrices with im ρ ⊆ im σ.Let E be a tracenonincreasing positive linear map such that tr Eρ = tr ρ.
(a) Equality holds in (15) if and only if holds for all t > 0. This is equivalent to saying that every linear combination A of ρ and σ has EA + EA − = 0, and tr E(ρ − tσ) + = tr(ρ − tσ) + holds for all t > 0.
(b) For a quantum measurement, the condition of equality is that for all i and for all linear combinations A of ρ and σ, we should have Proof.(a) First claim: From Theorem 6, we see that equality holds in (15) if and only if every affine combination A = A(t) = (1 − t)ρ + tσ of ρ and σ has tr ± EA(t) = tr ± A(t) whenever ±t < 0. This with the + or the − sign is equivalent to (16) for 0 < t < 1 and t > 1, respectively.For t = 1, it follows by continuity.
The second claim is clear from Lemma 8(b).(b) Since quantum measurements are trace-preserving, the claim is immediate from (a).
In a recent preprint [7] discussing the relevance of Theorem 6 to the sufficiency of quantum channels and to hypothesis testing, it is pointed out that when E is tracepreserving, the condition for equality in ( 15) is ( 16), and that it can be reformulated as ∥E(ρ − tσ)∥ 1 = ∥ρ − tσ∥ 1 for all t ≥ 0. When E is completely positive and trace-preserving, it was known [2,17] that this preservation of the trace norm of linear combinations is equivalent to the recoverability of ρ and σ with respect to E. When E is 2-positive and trace-preserving, it was known [4,14,15] that the preservation of the quantum relative entropy is equivalent to the recoverability.

The Holevo bound
In [6], A. S. Holevo used his explicit formula for S ′′ to prove his celebrated upper bound on the quantity of information transmitted by a quantum communication channel.We shall now show how Theorem 7 quickly leads to a generalization of the same bound, which, however, also follows from (15).
Let E be a trace-nonincreasing positive linear map.From Theorem 7 and Lemma 8, we see that for any psdh matrix ρ, any Hermitian matrix σ satisfying im σ ⊆ im ρ, and any even m ≥ 2. We have equality if and only if every combination has EA + EA − = 0 and tr EA − = tr A − .For a quantum measurement E, the condition of equality is that for all i and t we should have In particular, S − S • E is a concave function on psdh matrices.For psdh matrices ρ 1 , . . ., ρ l and nonnegative weights q 1 , . . ., q l summing to 1, let ρ = l j=1 q j ρ j .Define the Holevo quantity From Theorem 7(b), or from Theorem 6 together with the fact (4), we recover the wellknown fact that the Holevo quantity is nonnegative and convex as a function of (ρ 1 , . . ., ρ l ).By Jensen's inequality, for any psdh matrices ρ 1 , . . ., ρ l , and any weights q 1 , . . ., q l > 0 summing to 1, we have χ(Eρ 1 , . . ., Eρ l ; q 1 , . . ., q l ) ≤ χ(ρ 1 , . . ., ρ l ; q 1 , . . ., q l ), (18) with equality if and only if EA + EA − = 0 and tr EA − = tr A − for every affine combination A of ρ 1 , . . ., ρ l .In words: the Holevo quantity is non-increasing under trace-nonincreasing positive linear maps.Note that complete positivity of the map need not be assumed.
When E is a quantum measurement, and each ρ j has trace 1, ( 18) is Holevo's inequality.The left hand side is the mutual information between the random input j (whose distribution is given by the probabilities q j ) and the measurement output i (whose conditional distribution is given by the conditional probabilities tr E i ρ j once j has occurred).We have equality in (18) if and only if for all i and all affine combinations A of the ρ j , we have

Lower bounds on generalized divergences
This section is only loosely related to the rest of the paper.A simple application of the data processing inequality is presented.
The most basic metric on quantum states is given by the trace distance.It is therefore desirable to find inequalities that compare other measures of dissimilarity of quantum states, i.e. quantum divergences, to the trace distance.Among the many divergences commonly studied, the quantum Jensen-Shannon divergence χ(ρ 0 , ρ 1 ; 1/2, 1/2), which is the Holevo quantity (concavity of the von Neumann entropy) with equal weights 1/2 and 1/2, has special interest because its square root is a metric [20].
It was shown by F. Hiai, M. Ohya, and M. Tsukada [3] that the minimum of the quantum relative entropy for two quantum states with prescribed trace distance is attained on binary classical states.In this section, we shall use their method prove the analogous result for any quantity that depends on two quantum states and is non-increasing under quantum measurements.As examples of such quantities, we have already discussed in Section 6 the quantum relative entropy and the concavity of von Neumann entropy, but the sandwiched Rényi divergence with parameter α > 1 is also non-increasing, not just under quantum measurements, but under quantum channels (completely positive tracepreserving maps), as was shown by S. Beigi [1].More generally, M. M. Wilde [21] proved the same for optimized quantum f -divergences.An alternative proof (with respect to trace-preserving positive linear maps satisfying a certain Schwarz-type inequality) was given by H. Li [10].

We arrive at
Theorem 12.For any nonnegative q 0 and q 1 summing to 1, we have χ(ρ 0 , ρ 1 ; q 0 , q 1 ) ≥ ≥ min{I(t 0 , t 1 ; q 0 , q 1 ) : 0 This theorem and the possibility of equality in (19) tell us that for the Holevo quantity, or 'quantum entropy concavity' χ(ρ 0 , ρ 1 ; q 0 , q 1 ), the largest lower bound that depends only on ρ 1 − ρ 0 and q 1 is the 'minimal classical binary entropy concavity', i.e., the minimum in Theorem 12.It does not seem possible to compute this minimum exactly.There are various ways to get weaker but more explicit lower bounds.A simple way is to use the convexity and symmetry of −h ′′ (x) = 1/x + 1/(1 − x) to prove that the minimum is with equality if and only if q 1 = 1/2 or ρ 0 = ρ 1 .Note that h(1/2) = log 2. For ρ 0 ̸ = ρ 1 and q 0 q 1 > 0, this weaker lower bound on χ(ρ 0 , ρ 1 ; q 0 , q 1 ) is still strictly greater than the previously known lower bound due to I. H. Kim [8].This is because −h ′′ is minimal, with value 4, only at 1/2.For lower bounds depending on other parameters of ρ 0 and ρ 1 , and also for upper bounds, see [9].

General probabilistic theory
In this vast generalization of quantum theory [16], the set of density matrices is replaced by a more general state space, i.e., a convex body K in a finite dimensional real affine space A. We may assume that K spans A as an affine space.Then points of A, called virtual states, play the role of Hermitian matrices with trace 1.The vectors of the underlying vector space V = A − A play the role of traceless Hermitian matrices.Let E be the set of effects, i.e., affine functions e : A → R such that 0 ≤ e ≤ 1 on K.For a virtual state A ∈ A, we define tr + A = max Clearly, tr + − tr − = 1.For a state ρ ∈ K, we have tr + ρ = 1 and tr − ρ = 0.For two (virtual) states ρ 0 and ρ 1 , we define their trace distance to be The role of quantum measurement is played by a general measurement, or partition of unity, i.e., a sequence e 1 , . . ., e k ∈ E of effects such that e 1 + • • • + e k = 1 (the constant 1 function).More generally, the role of a positive trace-preserving map is played by an affine map E : A → A ′ such that E(K) ⊆ K ′ , where K ′ is another state space spanning another affine space A ′ .Since E is determined by its restriction to K, we will refer to it simply as an affine map E : K → K ′ .When E is a general measurement, K ′ is the set of k-ary classical states, i.e., the simplex with k vertices.The role of Lemma 8(a) is played by Lemma 13.For any affine map E : K → K ′ , we have tr ± EA ≤ tr ± A for any virtual state A ∈ A.
Proof.For any effect e ′ ∈ E ′ , we have a corresponding effect e = e ′ • E ∈ E such that e(A) = e ′ (EA).
For two states ρ, σ ∈ K, we define the general relative entropy D(ρ∥σ) to be the integral (1).From Lemma 13, we get Theorem 14.The data processing inequality (15) holds for any affine map E : K → K ′ .
For states ρ 1 , . . ., ρ l ∈ K and nonnegative weights q 1 , . . ., q l summing to 1, we define the general Holevo quantity χ(ρ 1 , . . ., ρ l ; q 1 , . . ., q l ) to be the last sum in (17).From the previous Theorem, we get Theorem 15.The inequality (18) holds for any affine map E : K → K ′ ; in particular, for any general measurement.This is an extension of Holevo's inequality to general probabilistic theory.As an example, let K = {ρ ∈ R d : |ρ| ≤ 1} be the unit ball.The role of von Neumann entropy is played by the function S(ρ) = h((1 − |ρ|)/2), where h is the binary entropy function (20).For any given σ in the interior of the ball K, the sum S(ρ) + D(ρ∥σ) is an affine function of ρ.Indeed, for any linear subspace W ≤ R d of dimension three (or less) containing σ, we may identify K ∩ W with the Bloch ball of 2-square density matrices (or a central section of it), and then, for all ρ ∈ K ∩ W , the above sum becomes − tr ρ log σ.Therefore, for any states ρ 1 , . . ., ρ l ∈ K and nonnegative weights q 1 , . . ., q l summing to 1, we have S(ρ) = S(ρ) + D(ρ∥ρ) = l j=1 q j (S(ρ j ) + D(ρ j ∥ρ)), where ρ = l j=1 q j ρ j .This means that the identity (17) holds in this setting, providing an explicit form of the Holevo quantity χ in terms of the entropy function S, just as in quantum information theory.
For general state spaces, however, χ seems unlikely to have a computable closed form, and therefore Theorem 15 will be more difficult to apply than in the quantum case.However, exploring the properties of D and χ in this framework might be an interesting topic of future research.
Therefore, we have Theorem 16.If the trace distance is given, then any general divergence satisfying the data processing inequality for two-part general measurements (for example, the general relative entropy, or the general Holevo quantity for two states and two given weights) will have the same infimum on pairs of general states as on pairs of binary classical states.Due to Lemma 8(a) and formula (23), the data processing inequality ∆ α (Eρ∥Eσ) ≤ ∆ α (ρ∥σ) holds for trace-nonincreasing positive linear maps E such that Eρ is again a density matrix.Exploring the further properties of ∆ α , and relating it to existing quantum generalizations of Rényi divergence, might be an interesting topic for future research. 1ote that (22) allows us to define a notion of Rényi divergence in the general probabilistic setting of Section 8.For any states ρ, σ ∈ K, we put Similarly to Proposition 17, it is easy to check that the expression under the logarithm is nonnegative even if α < 1.Indeed, we have tr − A(t) ≤ |t| for t < 0 and tr − A(t) ≤ t − 1 for t > 1.
Proof.This is clear from Lemma 13.
Conversely, if this last equality holds, then the projection P onto im A + satisfies (2) and also 0 ≤ tr P (A − B) ≤ tr P (A + − B) = 0, so there is equality in(3).Recall the well-known fact that tr ± are nonnegative convex functions.(4) Indeed, from (3), the inequality tr + (A + B) ≤ tr + (A + + B + ) = tr(A + + B + ) = tr A + + tr B + = tr + A + tr + B holds for any self-adjoint A and B, and the identity tr + tA = t tr + A holds for any t ≥ 0 and self-adjoint A, implying convexity of tr + .Since tr − (A) = tr + (−A), the function tr − is also convex.A density matrix, or quantum state, is a psdh matrix with trace 1.The von Neumann entropy of a psdh matrix ρ (of arbitrary trace) is S(ρ) = − tr ρ log ρ.The quantum relative entropy (Umegaki [19]) of two psdh matrices ρ and σ (of arbitrary trace) is

Proof.
It suffices to treat the + case because passing from A to −A interchanges tr + and tr − as well as A + and A − .(a) We have A ≤ A + and therefore EA ≤ EA + .From (3), we get that tr + EA ≤ tr + EA + = tr EA + ≤ tr A + = tr + A. (b) Since (EA + ) + = EA + and EA + − EA = EA − , the claim follows from the condition for equality in (3).