An information-theoretic treatment of quantum dichotomies

Given two pairs of quantum states, we want to decide if there exists a quantum channel that transforms one pair into the other. The theory of quantum statistical comparison and quantum relative majorization provides necessary and sufficient conditions for such a transformation to exist, but such conditions are typically difficult to check in practice. Here, by building upon work by Matsumoto, we relax the problem by allowing for small errors in one of the transformations. In this way, a simple sufficient condition can be formulated in terms of one-shot relative entropies of the two pairs. In the asymptotic setting where we consider sequences of state pairs, under some mild convergence conditions, this implies that the quantum relative entropy is the only relevant quantity deciding when a pairwise state transformation is possible. More precisely, if the relative entropy of the initial state pair is strictly larger compared to the relative entropy of the target state pair, then a transformation with exponentially vanishing error is possible. On the other hand, if the relative entropy of the target state is strictly larger, then any such transformation will have an error converging exponentially to one. As an immediate consequence, we show that the rate at which pairs of states can be transformed into each other is given by the ratio of their relative entropies. We discuss applications to the resource theories of athermality and coherence.


Introduction
Various pre-and partial orders have been the subject of extensive study both in mathematical statistics [22,5,30,52,2,51,18] and in information theory [47,27,17]. An example of paramount importance is that provided by the majorization preorder [32]: a probability distribution p 1 is said to majorize another distribution p 2 , in formula p 1 p 2 , whenever there exists a bistochastic 1 transformation T such that T p 1 = p 2 .
The remainder of the paper is structured as follows. After introducing the relevant one-shot divergences and their properties in Section 2, we establish our main technical results in Section 3. First, in Section 3.1 we derive sufficient conditions in terms of one-shot divergences for exact pairwise state transformations. In Section 3.2 we relax this to allow for an error on one of the states, and again find sufficient conditions in terms of smoothed one-shot entropies. This then allows us to derive our main results in Section 3.3, Theorems 3.4 and 3.6, which together show that the relative entropy fully characterises when pairwise transformations are possible asymptotically. Section 4 then takes an information-theoretic approach to the problem by studying the maximal rate at which independent copies of states can be transformed into each other in Theorem 4.1. We finally discuss applications of our results to the resource theories of athermality and coherence in Section 5 and end with a conjecture characterizing the second-order asymptotic behaviour of resource transformations in Section 6.

Notation
Let S(C d ) denote the set of quantum states on a d-dimensional Hilbert space C d . A quantum channel E : S(C d ) → S(C d ′ ) is a linear map that is completely positive and trace-preserving (CPTP). We denote by ≤ the Löwner partial order, i.e., for two Hermitian matrices X and Y the relation X ≥ Y means that X − Y is positive semi-definite, and the relation X ≫ Y means that the support of Y is contained in the support of X. Throughout this paper we denote by log the logarithm to base 2.

Some divergences and their properties
In this work we use several different non-commutative divergences. We will introduce here only the measures and properties that are needed for this work -an interested reader may consult [48] for a more comprehensive discussion with references to all the original papers. For ρ, σ ∈ S(C d ) the relative entropy is given by D(ρ σ) := tr ρ(log ρ − log σ) if σ ≫ ρ, and +∞ otherwise. To simplify the exposition in the following we assume throughout that the states σ always have full support and are thus invertible, avoiding such infinities.
Another non-commutative family of Rényi divergences is the sandwiched quantum Rényi divergence [38,54] For α = 1 2 the sandwiched quantum Rényi divergence becomesD1 /2 (ρ σ) = − log F (ρ, σ). In the limit α → ∞ the sandwiched quantum Rényi divergence converges to the so-called max-divergence [46,20], i.e., Both non-commutative families of Rényi divergences introduced above satisfy many desirable properties: they are monotonically increasing in α, they satisfiy the dataprocessing inequality (i.e. they are non-increasing when the same CPTP map is applied to both states), and in the limit α → 1 they both converge to the relative entropy.
Smooth variants of the max and min-divergence will be useful to treat problems with finite errors. The ε-smooth max-divergence is defined as where B ∆ ε (ρ) := {ρ ∈ S(C d ) : ∆(ρ,ρ) ≤ ε} for ε ∈ (0, 1). The smooth variant of D min is the so-called hypothesis testing divergence. For any ε ∈ (0, 1) it is defined as In the limit ε → 0 we recover D min (ρ σ). We will use this definition for both trace distance and purified distance, but note that in contrast to some other works the optimization here only goes over close quantum states, not sub-normalized states. Finally, we note that both of these divergences satisfy the data-processing inequality, namely for any CPTP map E. The smooth max-divergence and the hypothesis testing divergence are closely related, as shown in [3,Theorem 4].
Note that we can interchange the Rényi relative entropies in both inequalities sincē D α (ρ σ) ≥D α (ρ σ) for all states due to the Araki-Lieb-Thirring inequality. Inequality (9) follows immediately from [4,Proposition 3.2]. Inequality (10) is shown in [3,Theorem 3], tightening earlier bounds that were established as part of the fully quantum asymptotic equipartition property (QAEP) [49]. The QAEP states that, for all ε ∈ (0, 1), the regularized smooth entropies converge to the relative entropy The analogous statement for D ε h is a immediate from quantum Stein's lemma and its converse [23,40].

Conditions for pairwise state transformation
In this section we derive sufficient conditions for the existence of a channel that transforms (ρ 1 , σ 1 ) to (ρ 2 , σ 2 ), where the first state is transformed either exactly or approximately, and the second state always has to be transformed exactly.

Conditions for exact state transformation
We start by considering the case of exact transformations. For what follows, we can restrict ourselves to a very special class of transformations, namely, test-and-prepare channels of the form where the γ i 's are density matrices and 0 ≤ E ≤ 1. Hence test-and-prepare channels constitute a subset of measure-and-prepare channels, in which the measurement is a simple binary test. Ref. [9] provides a complete characterization of this case. The following lemma can be obtained as a consequence of the results in [9], but we provide an independent proof here for the sake of the reader (see also Ref. [37]).
Proof. Since the implication (i) =⇒ (ii) is just the data-processing inequality, we only need to prove the reverse implication (ii) =⇒ (i).
By assumption, ρ 1 and σ 1 commute. Hence, we can see them as classical binary probability distributions, namely, ρ 1 ↔ p 1 = (p, 1 − p) and σ 1 ↔ q 1 = (q, 1 − q). Moreover, we can assume that p q ≥ 1−p 1−q ; otherwise, the first step is to map (p, 1 − p) and (q, 1 − q) into (1 − p, p) and (1 − q, q), respectively. Notice that this condition is Let us now define M := p q and m := 1−p 1−q and notice that log and there is nothing to prove.
From the above, we obtain a sufficient condition for the existance of a transformation, more precisely a test-and-prepare channel, for arbitrary pairs of states.
Proof. It sufficies to show the statement under the first condition; the second then follows by symmetry. Let us consider the measurement channel where Π is the projector onto the support of ρ 1 . Then, the binary classical probability distributions obtained from the pair ( We only need to show that, if Eq. (14) holds, then p 1 and q 1 satisfy condition (ii) in Lemma 3.1. That is indeed the case, since, on the one hand, . On the other hand, because We note that condition (14) is very strong in general. To see this, it is enough to simply consider two commuting states ρ 1 = ρ 2 and σ 1 = σ 2 : clearly, a transformation exists (i.e., the identity channel), but (14) will not hold with high probability due to the strict monotonicity of the Rényi divergence in α, for almost all distributions. More explicitly, consider ρ 1 = ρ 2 = diag(p, 1 − p) and σ 1 = σ 2 = diag(q, 1 − q) for p, q ∈ (0, 1) and p = q. The identity channel I satisfies I(ρ 1 ) = ρ 2 and I(σ 1 ) = σ 2 , however 0 = D min (ρ 1 σ 1 ) < D max (ρ 2 σ 2 ) and 0 = D min (σ 1 ρ 1 ) < D max (σ 2 ρ 2 ) as the max-relative entropy vanishes if and only if the two arguments coincide.
This fact can be understood as an indication that the restriction to test-and-prepare channels is indeed extremely limiting in the one-shot zero-error scenario. However, when errors are allowed, test-and-prepare channels already provide a non-trivial toolbox, as we show in what follows.

Sufficient condition for approximate state transformation
In this section we are interested in approximate state transformation, i.e., a transformation from (ρ 1 , σ 1 ) to (ρ 2 , σ 2 ) where we allow for a (small) error in the transformation ρ 1 → ρ 2 , while the transformation σ 1 → σ 2 is required to be exact.
then there exists a test-and-prepare quantum channel E : By definition of the smooth max-relative entropy, there furthermore exists a stateρ 2 ∈ B ∆ ε (ρ) such that Consider now the mapping We start by proving that E is a quantum channel, i.e., a trace-preserving completely positive map. To see that E is trace-preserving is straightforward since σ 2 andρ 2 are density operators. We note that because 0 ≤ Q * ≤ 1 it suffices to show that σ 2 ≥ρ 2 tr(σ 1 Q * ) in order prove that E is completely positive. By definition of the smooth max-relative entropy and by using (15) and (16) we find By definition of the max-relative entropy we thus have We have seen that E is indeed a quantum channel. It thus remains to show that E(σ 1 ) = σ 2 and ∆(E(ρ 1 ), ρ 2 ) 1 ≤ ε 1 + ε 2 . The first property is straightforward to verify. The second property requires some more work. Defineρ 2 := E(ρ 1 ). The triangle inequality immediately yields and it thus remains to bound the second term. Let us first consider the case where ∆(ρ, τ ) = T (ρ, τ ) = 1 2 ρ − τ 1 is the trace distance. Substituting the expression in (19), we find where we used (16) and (20) in the final step. Using (20) and (21) we find where Π + (X) denotes the projector onto the positive support of X. Combining this with (22) finally gives ∆ E(ρ 1 ), ρ 2 ≤ ε 1 +ε 2 , concluding the proof for the trace distance.

Conditions for asymptotic state transformation
In the following we will consider an asymptotic setting given by four sequences of states {ρ n 1 } n , {σ n 1 } n , {ρ n 2 } n and {σ n 2 } n for n ∈ N. We assume no specific structure for these states and the underlying Hilbert spaces, i.e. in general we have ρ n 1 , σ n 1 ∈ S(C d n 1 ) and ρ n 2 , σ n 2 ∈ S(C d n 2 ) for arbitrary dimensions {d n 1 } n and {d n 2 } n . The only requirement that we impose is that, for i ∈ {1, 2}, the limits • If λ 1 > λ 2 we show the existence of a sequence of channels that tranform (ρ n 1 , σ n 1 ) to (ρ n 2 , σ n 2 ) where the transformation ρ n 1 → ρ n 2 has an error that is vanishing exponentially as n → ∞ and the transformation σ n 1 → σ n 2 is exact.
• On the other hand, if λ 1 < λ 2 we show that any transformation for which σ n 1 → σ n 2 is exact leads to an error exponentially approaching one as n → ∞ in the transformation ρ n 1 → ρ n 2 .
We note that the case where λ 1 = λ 2 is left as an open question. For example, in case of four states ρ 1 , ρ 2 , σ 1 , and σ 2 such that D(ρ 1 σ 1 ) = D(ρ 2 σ 2 ) it is unknown if there exists a sequence of channels that take σ ⊗n 1 to σ ⊗n 2 for each n and that take ρ ⊗n 1 to ρ ⊗n 2 up to asymptotically vanishing error. Our main technical results are formally presented in the next two theorems.
Remark 3.5. The above theorem extends [35, Theorem 2.7] in that we are also able to show the exponential decay of the error in the transformation.
Proof. We show the statement for trace distance, and the statement for purified distance then follows by the Fuchs-van de Graaf inequality.
By the assumption of the theorem and the continuity guaranteed by the condition in (29), there exists a δ > 0 and κ > 0 such that Hence, by their definition as limits, there exists a n 0 ∈ N such that for all n ≥ n 0 , Let us now set ε n = 1 2 2 −γn for some γ > 0 to be determined later. Lemma 3.3 we learn that the maps E n with the desired properties exist if Indeed, Proposition 2.2 together with (33) imply where in the penultimate step we use that 1 − ε 2 n ≥ ε n since ε n ≤ 1 2 , and in the last step we assumed γn ≥ 3. We conclude that the choice γ = κδ 8 ensures that (34) holds.
Theorem 3.6 (Exponential strong converse). Let {ρ n 1 } n , {σ n 1 } n , {ρ n 2 } n and {σ n 2 } n for n ∈ N be sequences satisfying the condition in (29) and furthermore Then there exists γ > 0 such that for all sequences of quantum channels {E n } n∈N that satisfy E n (σ n 1 ) = σ n 2 there exists an n 0 ∈ N such that for all n ≥ n 0 we have Proof. We show the statement for purified distance, and the statement for trace distance then follows by the Fuchs-van de Graaf inequality. We again start by observing that the assumption of the theorem and the continuity guaranteed by the condition in (29) imply the existence of a δ > 0 and κ > 0 such that D 1+δ ({ρ n 1 } n {σ n 1 } n ) ≤D 1−δ ({ρ n 2 } n {σ n 2 } n ) − κ, and thus, there exists a n 0 ∈ N such that for all n ≥ n 0 , we havẽ It thus suffices to prove that (40) implies the desired property for all sequences of quantum channels. In the following we proof the contrapositive. Suppose that for all γ > 0, there exits a family of channels {E n } n∈N such that for some n ≥ n 0 we have both E n (σ n 1 ) = σ n 2 and P (E n (ρ n 1 ), ρ n 2 ) ≤ 1 − 2 −γn . Let us then fix a γ, to be determined later, and set ε n = 1 2 2 −γn . By Proposition 2.2 we havẽ where the penultimate step uses the data-processing inequality for the smooth maxdivergence and the fact that 1 − ε 2 n ≥ ε n since ε n ≤ 1 2 . The final step follows from the definition of the smooth max-relative entropy as an optimization over a ball of close states and the triangle inequality of the purified distance.
It is worth noting that we made no attempts to characterize the exact error exponents and strong converse exponents here. This is because finding the exact error exponent for this problem is still open even in simple commutative cases where, for example, σ 1 and σ 2 are proportional to the identity and thus commute with ρ 1 and ρ 2 .
Our goal is to understand the asymptotics of this quantity for n → ∞ when ε is constant, or at least not approaching 0 or 1 too quickly as n increases. Let us for now focus on the case where ε is constant. We can determine the first order asymptotics of R ε,∆ ρ 1 ,σ 1 →ρ 2 ,σ 2 (n), which turns out to be independent of the metric ∆ ∈ {T, P } and ε.
Theorem 4.1. Let ρ 1 , σ 1 ∈ S(C d 1 ) and ρ 2 , σ 2 ∈ S(C d 2 ) be two pairs of states. For all ε ∈ (0, 1), the pairwise state transformation rate satisfies Remark 4.2. Another viewpoint on this question can be taken by fixing a rate R and asking how the minimal achievable error ε behaves as a function of n. Strong qualitative statements for when we transform above and below the critical rate D(ρ 1 σ 1 )/D(ρ 2 σ 2 ) are immediate from Theorems 3.4 and 3.6. Namely, for transformations below the critical rate the error will drop exponentially as n → ∞, and for transformations above the critical rate the error will approach one exponentiall as n → ∞.

Applications to resource theories
The above results have some immediate consequences in resource theories: in what follows we consider in particular the resource theory of athermality and the resource theory of coherence. In concurrent work, Wang and Wilde independently derived Theorem 4.1, interpreting it in terms of a newly established resource theory of "asymmetric distinguishability" [53]. Our perspective is different insofar as we interpret Theorems 3.4, 3.6 and 4.1 as building blocks that have applications in different resource theories. Let us first consider the resource theory of athermality. There, we are given a Hamiltonian E, an inverse temperature β and a Gibbs state γ = 1 Z e −βE , where Z is the normalization factor. One then asks whether there exists a quantum channel that has the Gibbs state γ of a quantum system as a fixed state and transforms ρ 1 to ρ 2 . Theorem 3.4 reveals that there exists a sequence of Gibbs-preserving maps from ρ ⊗n 1 to ρ ⊗n 2 , with exponentially vanishing error as n → ∞, if where we used that the Helmholtz free energy F H = U − T H, where U is the internal energy, T is the temperature and H is the entropy, that is, Furthermore, as shown in Theorem 3.6, no such sequence of Gibbs-preserving maps can exist if the inequality in (58) is striclty reversed. In fact, any sequence of Gibbspreserving maps would incur an error approaching one exponentially fast as n → ∞.
Another resource theory in which Theorem 4.1 above plays a role is the resource theory of coherence, in particular, the resource theory of coherence based on dephasingcovariant incoherent operations (DIO) [12,33]. A DIO operation E is such that its action commutes with the completely dephasing channel diag, that is In this framework, the rate at which coherence can be distilled from an initial resource state ρ is defined, as usual, as the optimal rate at which the transformation where |+ = 1 √ 2 (|0 + |1 ) is one unit of coherence, can be achieved with asymptotically vanishing error. Such a rate is known to be equal to D(ρ diag(ρ)) [43,11]. Recently, a relaxation of the DIO paradigm has been proposed and motivated [44]: instead of requiring that the transformation E and the completely dephasing channel diag commute on all states, the commutation relation is enforced only on the initial resource state ρ.
In other words, one considers the ρ-DIO condition The existence of a ρ-DIO channel transforming ρ into σ can then be easily reformulated as the existence of a channel E achieving the following mapping of quantum dichotomies: Once formulated in this form, our Theorem 4.1 implies that the rate at which coherence can be distilled from ρ by means of ρ-DIO operations is given by D(ρ diag(ρ)). The asymptotic distillation rate under ρ-DIO has been independently computed in Ref. [44]. Beyond that, Theorem 3.6 establishes an exponential strong converse which implies that if we try to distill at a rate exceeding D(ρ diag(ρ)) then the error will go to one exponentially fast. Interestingly, since the asymptotic distillation rate is the same for both DIO and ρ-DIO operations, and since ρ-DIO operations constitute a larger set than DIO operations, we have that the above mentioned exponential strong converse property holds for DIO distillation too.
This form is of interest since it shows a resonance behaviour when ν = 1, where the contribution in the second-order term turns positive even for arbitrarily small ε. This means that there exist pairs of states that can be transformed into each other without loss due to finite size effects (up to second order). This effect has been observed both analytically and numerically in the commutative case [28], and its applications to fully quantum resource theories remain to be explored.
The limit expression in (66) was shown to hold for the case where σ 1 and σ 2 are both proportional to the identity in the work of Kumagai and Hayashi [29], and the above conjecture thus constitutes a natural fully quantum generalization of their result. Building on that and an embedding technique from quantum thermodynamics, the equality was also shown for general σ 1 and σ 2 as long as ρ 1 and ρ 2 commute with σ 1 and σ 2 , respectively [15]. 3 The same special case can also be solved in the moderate error regime by adapting the results in [16].
Finally, note that since we are now concerned with higher order contribution that are a function of the error threshold ε, the limit does in general depend on how exactly we measure the error. It is thus not obvious how an appropriate conjecture for the trace distance would look like, for example.