On contraction coefficients, partial orders and approximation of capacities for quantum channels

The data processing inequality is the most basic requirement for any meaningful measure of information. It essentially states that distinguishability measures between states decrease if we apply a quantum channel and is the centerpiece of many results in information theory. Moreover, it justifies the operational interpretation of most entropic quantities. In this work, we revisit the notion of contraction coefficients of quantum channels, which provide sharper and specialized versions of the data processing inequality. A concept closely related to data processing is partial orders on quantum channels. First, we discuss several quantum extensions of the well-known less noisy ordering and relate them to contraction coefficients. We further define approximate versions of the partial orders and show how they can give strengthened and conceptually simple proofs of several results on approximating capacities. Moreover, we investigate the relation to other partial orders in the literature and their properties, particularly with regard to tensorization. We then examine the relation between contraction coefficients with other properties of quantum channels such as hypercontractivity. Next, we extend the framework of contraction coefficients to general f-divergences and prove several structural results. Finally, we consider two important classes of quantum channels, namely Weyl-covariant and bosonic Gaussian channels. For those, we determine new contraction coefficients and relations for various partial orders.


Introduction
One of the arguably most fundamental concepts in (quantum) information theory is that of data processing 1 . Many relevant quantities are monotone under the application of a quantum channel. That is what allows us to assign them an operational meaning in terms of distinguishability and, in turn, makes them useful in assessing physical properties. A commonly considered quantity is the relative entropy for which, when considering an arbitrary channel N , data processing manifests as D(N (ρ) N (σ)) ≤ D(ρ σ) (1.1) for any two states ρ, σ. This mathematical statement also gives rise to an operational interpretation of data processing: The relative entropy gives the optimal rate at which one can asymptotically discriminate two quantum states in an asymmetric setting [36,66]. It is now immediately apparent that applying a quantum channel to the state can never make the discrimination task easier; therefore the relative entropy has to become smaller.
Since the relative entropy acts as a parent quantity for several other entropic quantities, they also inherit its data processing property. An important example is the mutual information: Given a bipartite quantum state ρ AA and a quantum channel N : A → B, we always have Although data processing is a potent tool, often it is not sufficient to know that a quantity is decreasing when applying a channel, one also needs to quantify by how much. When fixing the reference state σ and varying ρ, in many cases it is possible to show that D(N (ρ) N (σ)) is strictly smaller than D(ρ σ) unless ρ = σ. A natural approach is then to consider contraction coefficients, and we say that N satisfies a strong data processing inequality at input state σ when η Re (N , σ) < 1. These contraction coefficients have already found many applications in information theory [72,75] and were studied in the quantum setting in [51,37].
The contraction coefficient (1.3) has been extensively studied in the classical setting. In their pioneering work, Ahlswede and Gács [1] discovered a deep relationship between η Re (N , σ) and several other quantities, among which the maximal correlation and hypercontractive properties of the channel. After this, the notion of entropic contraction coefficients were extended to general fdivergences and other relevant measures of distinguishability between two probability distributions such as the Dobrushin contraction coefficient (for the trace distance), or that of the χ 2 -divergence. [17,14,59,16,30] (see also the more recent contributions [74,72,71,55]).
Another closely related question is to compare channels based on how much they contract a certain quantity. In this case, comparing the contraction coefficients is only of limited use. A more direct approach is to define partial orders on the set of channels based on how much the channels contract each possible input state individually. While many such orders are known, the most commonly used ones are the less noisy and the more capable order [47], especially in the classical setting.
In the quantum setting, Watanabe introduced quantum generalizations of the less noisy and more capable orders [85]. A noteworthy difference to the classical case is that the definitions in [85] include a regularization, in the sense that the property has to hold not just for N but N ⊗n for all n. This choice proved to be useful in the context of capacities and was therefore operationally motivated. However, in many cases it can also be useful to look at the n = 1 case, see e.g. [80]. We go a step further and also consider two additional quantum generalizations: a third version with additional reference system that turns out to have some desirable properties such as tensorization and a fourth where all systems are fully quantum.
In this work we revisit the concepts of contraction coefficients and partial orders in quantum information theory. Most importantly we show how both contraction coefficients and the less noisy order can be understood in very similar terms as either relative entropy or mutual information based tools. From this starting point, we discuss properties such as tensorization, bounds and the relation to other partial orders in the spirit of [55,74].
We then discuss the application of partial orders in the context of quantum capacities, revisiting the recently introduced concept of approximate degradability and the resulting capacity bounds [78] and introducing approximate versions of the different less noisy and more capable definitions. This leads to weaker requirements for approximating the quantum, private and classical capacities. Moreover, as corollaries we get the approximate degradability capacity bounds via conceptually straightforward proofs.
The main drawback of the entropy based partial orders discussed here is that compared to (approximate) degradability, they are rather complicated to verify. It can be nontrivial to check whether two quantum channels fulfill the conditions of the partial order of choice. To this end, we investigate alternative characterizations and conditions, providing some progress towards this question. We also draw connections to the notions of hypercontractivity and other functional inequalities.
Finally, we discuss both generalizations and special cases of the discussed concepts. First, we investigate generalizations of contraction coefficients and the less noisy order to f -divergences [67,Chapter 7]. Then we present several results for the setting where the channels are from the special classes of either Weyl-covariant channels or bosonic Gaussian channels. In particular we find examples of Weyl-covariant channels that satisfy the less noisy ordering without one being the degraded version of the other. We end this paper with a derivation of precise contraction coefficients for the most important classes of quantum Gaussian channels, namely the attenuator, the amplifier and the additive noise channels. We also consider the case of tensor products of the latter, under the constrained minimum output entropy conjecture.

Some Notations:
In what follows, we denote by H a finite dimensional Hilbert space. B(H) is the set of linear operators acting on H. The set of quantum states is denoted by D(H), and that of full-rank states by D(H) + . Given two quantum systems H, K, the set of quantum channels N : B(H) → B(K), that is of completely positive, trace preserving superoperators, is denoted by CPTP(H, K). For a quantum channel N , we will denote the channel in the Heisenberg picutre by N * . The set of probability mass functions over sets of cardinality m ∈ N is denoted by P m . We also use standard quantum information conventions, denoting quantum systems by capital letters A, B and classical ones by U and X.
Furthermore, for two states ρ, σ we define their relative entropy as D(ρ σ) = Tr [ρ (log(ρ) − log(σ))] (1.4) if the support of ρ is contained in that of σ. It this condition is not satisfied, it is defined to be +∞. We define the von Neumann entropy of a quantum state ρ A as For other standard definitions such as most entropic quantities and channel capacities we refer to [81,88].

Contraction coefficients and partial orders of quantum channels
In this section we will first review the main partial orders used throughout the manuscript: Degradable, less noisy and more capable. In the case of the latter two, we introduce several variants that are all valid generalizations of the corresponding classical formulations. Next, we review contraction coefficients and in particular, the one based on the relative entropy for which we then show an alternative formulation in terms of mutual information. This provides us with a theme throughout the rest of the manuscript. Finally, we briefly discuss some connections between contraction coefficients and partial orders.

Channel preorders in quantum information theory, and related tasks
Channel preorders can be understood as ways of quantifying a channel's noisiness as compared to another one. However, the notion of noise varies according to the tasks for which a given channel is being used. For this reason, a zoo of preorders can be found already in the classical setting [55]. We recall some of the main preorders one can find in the literature and recall their relations here. We start with the most common partial order: Degradability.
Degradation preorder: [18,6] A channel N ∈ CPTP(H A , K B ) is said to be a degraded version of a channel M ∈ CPTP(H, K B ) with same input space, denoted M deg N if the following equivalent conditions hold: • The channels are related as for some channel Θ ∈ CPTP(K, K ).
• For any c-q state ρ U AR , The first condition is the well-known standard definition of degradability, while the equivalence with the second was recently shown in [8]. We recall that the min conditional entropy is defined as While the first condition is often more useful due to data processing the second will make the close familiarity to the remaining partial orders easily evident.
We will continue by defining what can be considered the basic version of the less noisy and more capable partial orders.
Less noisy, more capable: [47,85] A channel M ∈ CPTP(H A , K B ) is said to be less noisy than a channel N ∈ CPTP(H A , K B ) with same input space, denoted M l.n. N , if for all classical random variables U and any c-q state ρ U A Moreover, M is said to be more capable than N , denoted M m.c. N , if for any ensemble {p, |ψ x A }, and the corresponding c-q state We recall that the mutual information I(A : B) ρ of a bipartite state ρ AB is defined as where H(C) ρ := H(ρ C ) = − Tr[ρ C ln ρ C ] denotes the von Neumann entropy of a state ρ on a subsystem C. In the classical setting, these definitions are well motivated by their applications to wiretap and broadcast channels [47,20,57]. In the quantum setting, the less noisy and more capable relations determine properties and relations of the quantum and private capacities [85]. Furthermore, it is immediately clear that

4)
where χ(N ) is the Holevo quantity of the channel N defined as Note that we stated the condition here in its most common form via the mutual information. It can however easily be written in terms of the conditional entropy, H(A|B) ρ := H(AB) ρ − H(B) ρ , as (2.5) and similarly for the more capable relation, which bears noticeable similarity with the alternative definition of degradability. The lack of a reference system here is crucial and we will discuss this in more details later. Indeed, one can also find a different partial order in the literature called less ambiguous that is defined as a mix of the above characteristics: min-entropy, but no reference system (see e.g. [9]).
As expected in the quantum setting, the notions of less noisy and more capable need in general to be regularized in order to lead to an operational interpretation [85] 2 : Regularized less noisy, more capable: With the previous notations, the channel M is said to be regularized less noisy than N , denoted M reg l.n. N , if for any n ∈ N, any classical register U and any c-q state ρ U A n , Moreover, M is regularized more capable than N , denoted M reg m.c. N , if for any n ∈ N, and ensemble {p, |ψ x A n } over H ⊗n A , and the corresponding c-q state In general the regularized version of the partial orders give a strictly stronger condition than the unregularized one. We will discuss the problem of tensorization in more details in Section 3. Similar to the standard definition we can also here directly observe that where C(N ) is the classical capacity of the channel N . An alternative quantum generalization is to provide the channel input with an additional quantum reference system. We will later see that this also leads to an order that obeys additional desirable properties such as tensorization.
Completely less noisy, more capable With the previous notations, the channel M is said to be completely less noisy than N , denoted M c l.n. N if for any classical register U and any c-q state ρ U AR , Moreover, M is completely more capable than N , denoted M c m.c. N , if for any ensemble {p, |ψ x AR } over H AR , and the corresponding c-q state ρ XAR = x p(x) |x x| ⊗ |ψ x ψ x |, Note that there is no known restriction on the dimension of the R system, making this condition generally even harder to check than the standard variants. This is a common problem in quantum information theory, see e.g. [5,41,15,22,90].
At this point it should be briefly noticed that, in the classical less noisy setting, allowing a classical reference system does not change the order itself. Even more, one can easily see that even in the quantum setting a classical reference system would not help. Lemma 2.1. If one restricts the reference system in the completely less noisy ordering to be classical the order would become equivalent to the usual less noisy ordering, meaning Proof. The "⇒" direction is obvious. The "⇐" direction follows by noticing that The first equivalence follows by subtracting I(U : R) ρ on both sides and the second because R is assumed to be classical. Now the proof follows by the definition of the less noisy ordering.
As quantum conditioning is significantly more complex, the same argument does not hold for quantum reference systems [5,39].
All previously discussed definitions of quantum extensions of the classical less noisy order in this work have in common that the system correlated to the input is classical. However, one could envision a fully quantum extension of the less noisy order where this restriction is lifted.
Fully quantum less noisy, more capable A channel M ∈ CPTP(H A , K B ) is said to be fully quantum less noisy than a channel N ∈ CPTP(H A , K B ) with same input space, denoted M fq l.n. N , if for all quantum states ρ AA Moreover, M is said to be fully quantum more capable than N , denoted M fq m.c. N , if for all pure quantum states Ψ AA Similar to before we can also here directly observe that where C E (N ) is the entanglement assisted classical capacity of the channel N .
We remark here that the fully quantum less noisy order has previously been defined in [19] where it was called informationally degradable. Furthermore, in line with the previous discussion, when considering the min entropy this partial order is the more coherent order discussed e.g. in [9].
The relations between the different partial orders are summarized in Figure 1 and we defer their proofs to Section 3. As customary we also define the anti-orders, i.e. anti-degradable, anti-less noisy and so on, as above but with the roles of N and M interchanged. This will become relevant when we discuss capacities where we often consider M to be the complementary channel of N .
Before ending this section we state some elementary facts about the channels dominated by a fixed channel M. To that end, given a channel M, define the less noisy domination region Proof. Non-emptiness is obvious since M ∈ D M and M ∈ L M . The closure of the sets is proved as follows: first, as we show later in Proposition 6.2 (see also Proposition 2.3), the condition Thus, we will work with the equivalent characterization of the less noisy ordering given by Eq. (2.11) to show closure. Given two states ρ, σ consider a sequence of channels N k ∈ L N such that N k → N (say with respect to the 2 → 2 norm). Then we also have that N k (ρ) → N (ρ) and N k (σ) → N (σ) (say in Hilbert-Schmidt norm). Therefore where the first inequality arises from lower semicontinuity of Umegaki's relative entropy, and the second one holds because N k ∈ L M for all k. Therefore the set L M (ρ, σ) := {N : D(N (ρ) N (σ)) ≤ D(M(ρ) M(σ))} is closed. Since L M = ρ,σ L M (ρ, σ), L M itself is closed as an intersection of closed sets. The closure of D M follows similarly: consider a sequence of channels N k such that N k → N . Since for each k, M k = Λ k • M for some Λ k belonging to the compact set of quantum channels, there exists a subsequence Λ km that converges by sequential compactness: Λ km → Λ. Hence N ∈ D M since N km = Λ km • M → Λ • M, proving closedness of D M . The convexity of the sets is obvious from the convexity of the relative entropy for L M and the linearity of the defining condition for degradability. Invariance under output channels is obvious by DPI for L M and using the direct definition for D M .

Contraction coefficients and the strong data processing inequality
The central objects of this section are contraction coefficients and strong data processing constants. Given a fixed state σ ∈ D(H A ) and a quantum channel N : B(H A ) → B(K B ), define the relative entropy strong data processing inequality (SDPI) constants [1,37]: , (2.12) and the contraction coefficient is then η Re (N ) := sup σ∈D(H) η Re (N , σ).
It turns out that these coefficients and the notion of (regularized/completely) less noisiness (reg/c) l.n.
can be understood on a very similar footing, allowing us to treat them with the same tools by expressing the partial orders in terms of relative entropies and the contraction coefficients in terms of mutual information. The key argument here is the following proposition that is a quantum generalization of [85, Proposition 4]: Proposition 2.3. Given two channels M ∈ CPTP(H A , K B ) and N ∈ CPTP(H A , K B ), n ∈ N, σ A ∈ D(H A ) and η ≥ 0, the following are equivalent: where U is an arbitrary classical system, Proof. We postpone the proof to Section 6, where a more general equivalence between f -divergences and the corresponding mutual information quantities is proven in Proposition 6.2. The next result is another direct consequence of the above Proposition and provides us with an alternative way to express η Re (N , σ) in terms of the mutual information. (2.14) Proof. The proof follows directly from Proposition 2.3.
The above result suggests the introduction of the following so-called less noisy domination factor [55]: given two channels M ∈ CPTP(H A , K B ) and N ∈ CPTP(H A , K B ), and a state σ ∈ D(H A ): and η Re (M, N ) := sup σ η(M, N , σ). In the case when M = id, we retrieve the usual SDPI constants of N . In particular, from Proposition 2.3 we directly have One can of course also define complete and regularized versions of these coefficients. However, it can be easily seen that the complete contraction coefficient η c Re (N ) is equal to 1 for any channel N , which limits its usefulness severely.
Finally, we want to end this section by pointing out some relations between the less noisy order in relation to erasure channels and contraction coefficients which will serve as an example of the close relation of the two concepts and reappear throughout this work. It is immediately clear from the above discussion that we have where |e ⊥ H A .
The following is a direct consequence of Proposition 2.3: and η e ∈ [0, 1]. Then the following are equivalent: Proof. First, a simple calculation gives the following (see [88, Proposition 21.6.1]): The result then follows directly from Proposition 2.3.
Thus, we conclude that a bound on the relative entropy contraction coefficient is equivalent to a less noisy relation for the corresponding channel when compared to the erasure channel. In the next section we will discuss further properties of the concepts introduced in this section.

Properties and Bounds
In this section we discuss properties of partial orders and contraction coefficients. First we will discuss tensorization properties of different partial orders and their relation with each other, then we discuss bounds on contraction coefficients and finally we illustrate our results with several simple examples.

Tensorization and relationships of partial orders
In the classical setting, it is a well-known fact that the notion of less noisiness tensorizes (see e.g. [72,Proposition 16], [77,Proposition 5]): Given channels M 1 l.n. N 1 and M 2 l.n. N 2 , In particular, we have as expected that the notions of less noisy and regularized less noisy are equivalent. This is false in general in the quantum case, as we will also discuss in more details further down. Often tensorization properties can be recovered when allowing for an additional reference system. The same is true in the case of the less noisy ordering and we show in the following that the completely less noisy ordering tensorizes. Proof. Assume that M 1 c l.n. N 1 and M 2 c l.n. N 2 . Therefore, for any two tripartite states where we successively used that M 1 c l.n. N 1 for the reference system RB 2 and M 2 c l.n. N 2 for the reference system RB 1 .
The previous result shows in particular that the notion of complete less noisiness is more stringent than the one of (regularized) less noisiness, but potentially less than that of degradation. We will now further investigate the implications between the different partial orders. The relations are also summarized in Figure 1. Proof. We first prove Equation (3.1): The last implication is obvious and was already discussed in Section 2.1. The first implication follows by the data processing inequality: Assume that N deg M.
Therefore, by definition there exists a channel Θ such that M = Θ • N , which also implies that for any reference system R, id R ⊗ M = id R ⊗ Θ • N . Then for any two states ρ AR , σ AR where the inequality follows by DPI for the channel id ⊗ Θ. Therefore N c l.n. M. Finally, that then N reg l.n. M is a simple consequence of Lemma 3.1 and the characterization of reg l.n. in Proposition 2.3. For Equation (3.2) the first implication was already shown and the last is obvious because classical-quantum states are a special case of general quantum states. It remains to show the middle one. Using Proposition 2.3 we can express the condition for N c l.n. M as one of relative entropies involving two arbitrary states with reference system. Now note that, for a state ρ AB , we have Using this to rewrite the condition for N fq l.n. M it becomes clear that it is indeed a special case of the completely less noisy relation.
It remains an interesting open problem whether any relation between the regularized and the fully quantum less noisy ordering can be found.
Sometimes, further relationships between partial orders can be uncovered when one restricts to the commonly used special case when M = N c is the complementary channel of N 3 . One such example is given by the following proposition.

Proposition 3.3. Given a channel N ∈ CPTP(H A , H B ) and its complementary channel
Proof. Note that the coherent information of a pure state Ψ ABE can be written as a difference of two mutual informations with respect to a classical quantum state ρ XBE (see e.g. [88, Theorem 13.6.1]): We can use this in the following chain of equivalences, which holds for all pure input states and therefore the first line corresponds to the fully quantum more capable condition and the last line to the usual more capable condition.
It is currently unclear whether something similar holds for an arbitrary second channel M.
We will now briefly discuss whether the less noisy ordering tensorizes. It does in the classical case [72,Proposition 16], [77,Proposition 5]. In the quantum setting tensorzation does not hold.
Namely, we can find M 1 , M 2 , N 2 , N 2 such that M 1 l.n. M 2 , N 1 l.n. N 2 but M 1 ⊗M 2 l.n. N 1 ⊗N 2 . This follows directly from superactivation of the private capacity. If we have P (1) (N A→BE ) = 0, it follows that the channel to Eve is less noisy than that to Bob. On the other hand, from P (1) (N ⊗2 A→BE ) > 0, it follows that this order does not carry over to the two copy case. Note that the additivity of the private information questions has been recently considered in [80], showing that when the sender is quantum but the receivers are classical additivity still holds. However, when either one of the receivers is quantum, additivity is violated, even when the sender is classical. Since the counterexamples in [80] exhibit the desired superactivation feature, the results directly carry over to the tenzorization problem for the less noisy order.
It should be noted that all the examples above are based on wiretap channels that are not necessarily isometric. It would be interesting to find similar examples in the isometric case, i.e. where the channel to Eve is the complement of that to Bob. From the same argument, it can easily be seen that anti-less noisy channels have P (1) (N ) = 0, however anti-regularized less noisy is needed to ensure that P (N ) = 0. Another consequence is that the less noisy ordering and the completely less noisy ordering (and therefore the degradable ordering) cannot be equivalent. An interesting open problem is whether the regularized or complete less noisy ordering could still be equivalent to the degradable ordering.
From the above it also follows that when checking for the regularized less noisy order one has to check n > 1. On the other hand, one can show that it is sufficient to check the condition in the asymptotic limit.

For all
Proof. Since regularized less noisy requires the mutual information condition to hold for all n the direction (1) ⇒ (2) is obvious. The other direction follows by noting that if the condition holds for some n it also holds for n − 1: This is simply because if the condition holds for all states ρ U A n it in particular also holds for those of the form ρ U A n = ρ U A n−1 ⊗ ρ An , for which we have This directly implies the result for n − 1 and therefore the claim follows by starting at n = ∞ and stepwise reducing n.
We could also ask for a weaker requirement than tensorization. We call a partial order tensor stable if for all channels the following holds: If N 1 N 2 then N 1 ⊗ M N 2 ⊗ M for all M. Every partial order that tensorizes is automatically tensor stable, however also those that do not tensorize can potentially still possess this property. We will show that for the less noisy ordering, while the property holds for some channels, in general the order is not tensor stable. Again an example based on erasure channels can serve as a primer. Lemma 3.5. Fix any two erasure channels E 1 and E 2 with erasure probabilities 1 and 2 , respectively. For any channel M we have the following: (3.10) Proof. First note that E 1 l.n. E 2 is clearly equivalent to 1 ≤ 2 . We now need to show that We are using which holds since the first term in the product is negative by assumption and the second positive by the data processing inequality. This concludes the proof.
A direct implication is that for two erasure channels, (3.14) The same would generalize to all channels if less noisy were tensor stable in general. In that case, the less noisy and the completely less noisy orderings would be equivalent, which we know they are not. Therefore, the less noisy ordering cannot be tensor stable in general.

Bounds on contraction coefficients
In this section, we will discuss several properties and bounds of SDPI constants and contraction coefficients. Let's start with a simple bound for concatenated channels.
Lemma 3.6. We have , (3.17) assuming that ρ is the optimizing state, from which the first claim follows directly. The second then follows by taking the supremum over σ on both sides.
A common approach to bounding capacities is to consider flagged channels. This approach also translates to contraction coefficients. Let N = i λ i N i for some positive numbers λ i and completely positive maps N i , and define In fact, the two above bounds are generally true for any contraction coefficient based on a divergence satisfying the data processing inequality. The first inequality also extends to the case of non-trace preserving N i .
Proof. The first inequality follows from data processing. The second from the properties of flagged channels and splitting the supremum.
Note that, in general, the flags do not have to be orthogonal and one could define an extension leading to a potentially tighter bound This approach has recently been employed to find very tight bounds on quantum capacities [32,84].
We will now discuss whether contraction coefficients tensorize and give some simples examples to the contrary. We start with the contraction coefficient η Re (M). Our counterexample is going to be based on the erasure channel. As mentioned before, we have Our goal will be to show that η Re (E) = η Re (E ⊗ E). First, we will see that the contraction coefficient cannot get smaller under tensorization.
Lemma 3.8. The following holds for any two quantum channels: and follows by symmetry.
Upper bounds are more involved, but one can give a nice one when one channel is an erasure channel.

Lemma 3.9. For all quantum channel M, we have
Proof. The main ingredient is the following equality where the first inequality follows from applying data-processing twice and the second by taking the supremum over all ρ U A 1 .
Now the second inequality follows by noting that and considering the two cases This allows us to calculate the contraction coefficient η Re (E ⊗n ).

Lemma 3.10.
Given the erasure channel E of parameter ε, we have Proof. The ≤ direction follows from Lemma 3.9. Set M = E ⊗n−1 and define a function f (n) by It can easily be shown by induction that f (n) = 1 − n and therefore we have η Re (E ⊗n ) = 1 − n .
For the opposite direction we have to show that this bound is indeed achievable. Consider the following state It can be checked that where the only non-trivial one is Equation 3.36, which follows from observing that the only overlap in support between the two output states of the erasure channels corresponds to the state being completely deleted. From this the desired result follows immediately.
This completes our argument that the contraction coefficient does not tensorize.
However, η Re (M, σ) might still tensorize. Indeed, this is the case for erasure channels. Consider two erasure channels E 1 and E 2 with erasure probability 1 and 2 , respectively. By Lemma 3.9 we immediately get Opposed to that, in the restricted case, tensorization holds as can be seen in the following lemma.
Lemma 3.11. For the erasure channels defined above and any two states σ 1 and σ 2 , we have Proof. As before, we can use Equation 3.27, however, this time we apply it twice and we get The second main ingredient is the following argument: where the first two inequalities follow by the chain rule, the third because ρ A 1 A 2 = σ 1 ⊗ σ 2 and the inequality by data processing. Now, we have to distinguish two cases. First, consider 1 ≥ 2 . We have, where the first inequality follows by using the previously derived argument, the equality is simple rearranging and the final inequality is data-processing together with the assumption that ( 1 − 2 ) ≥ 0. It follows directly that in this case The second part follows directly by assuming 2 ≥ 1 and exchanging the roles of A 1 and A 2 in the previous derivation. Putting both cases together concludes the proof.
Whether SDPI constants based on the relative entropy with respect to a fixed tensor product state tensorize in general remains an important open problem.

Additional examples
First, we recall an example from [37] for qubit channels. The original theorem applies to several contraction coefficients, however we state it here for the relative entropy. Recall that any unital qubit channel M can be represented by a 3 × 3 real matrix T describing how the channel acts on the Pauli matrices σ. This representation is usually referred to as the Bloch sphere representation. We then have:  Examples to which this result applies include the depolarizing channel, and the dephasing and bit-flip channels, We get, We can combine this with results from the previous section to test our bounds on products of channels. Let E 1/2 be an erasure channel with erasure probability = 1 2 , using the first upper bound in Lemma 3.9 and lower bounding by fixing the input state as that from Equation (3.33), we get These simple bounds already limit the possible values significantly to the green area in Figure 2, where we also plot the looser second bound from Lemma 3.9 and the lower bound from Lemma 3.8 for comparison.
Let us consider one more example. A channel that has recently proven useful in investigating the properties of quantum capacities is the dephrasure channel [50], We can easily get a lower bound on the contraction coefficient by picking the input state in Equation 3.33. An upper bound can either be found from using Lemma 3.6 or Lemma 3.7 (for the latter note that the dephrasure channel can directly be written as a flag channel). Together we get, Finally, a simple observation is that a replacer channel that always outputs a state τ , is the noisiest channel and an isometric channel V(ρ) = V ρV † for some isometry V is the least noisy channel: Lemma 3.13. For any channel N , replacer channel R τ and isometric channel V, we have Proof. Note that by additivity, faithfulness and data processing of the relative entropy. The proof of the first two lines follows easily from there. The third line is obvious by choosing in the first step N • V −1 and in the second step R τ as the degrading maps.

Approximate partial orders and capacities
In this section, we explore approximations of the partial orders introduced in Section 2. These are powerful tools which can be used for instance in approximating capacities of a quantum channel in terms of the ones of another channel when the latter are simpler to compute.

Approximate partial orders
We recall that the diamond norm distance between two quantum channels M, N ∈ CPTP(H A , H B ) is defined as Previously, Sutter et. al. [78] defined -approximate degradation as follows.
Now, one can also define approximate versions of other partial orders, as we will do in the following.

Definition 4.2.
A channel N is said to be -completely less noisy than M (denoted N c, l.n. M) if One defines similarly the approximate versions of the different more capable orders introduced in Section 2. -anti orders are defined in the same way by exchanging N and M.
Following the same arguments as in the exact case, see Proposition 3.2, we have immediately that We would of course hope that ε-approximate degradability implies the above orders. This is indeed the case as we will see in a moment. To relate the two partial orders we will make use of the continuity bounds introduced in [2] and [92]. In what follows, h( 1], denotes the binary entropy.  This allows us to directly show the following relation. Proof. We assume that M − Θ • N ≤ ε and start with an input state ρ U AR . Define σ U BR = (Θ • N ) ⊗ id(ρ U AR ) and τ U BR = M ⊗ id(ρ U AR ). First, observe the following: where the first inequality follows by data-processing to remove the degrading map. The first equality follows by the definition of the mutual information, the second by adding a zero twice and the fact that τ U R = σ U R and the last inequality by applying Lemma 4.4 twice. The second claim follows from (4.5).
In the same manner, one can go via the fully quantum less noisy order. Therefore, Proof. We assume that M − Θ • N ≤ ε and start with an input state ρ AA . Define σ AB = (Θ • N ) ⊗ id(ρ AA ) and τ AB = M ⊗ id(ρ AA ). First, observe the following: where the first inequality follows by data-processing to remove the degrading map, the first equality is by definition of the mutual information, the second by adding a zero once and the fact that τ A = σ A , the last inequality by applying Lemma 4.3 and Lemma 4.4. The second claim follows from (4.5).
Clearly, the diamond norm is symmetric under exchanging the arguments. This motivates the following definition.  Proof. This follows easily by Lemma 4.5 and the symmetry of the diamond norm.
As a special case, of course also the non-complete version (n = 1) is implied. Furthermore, it can be easily seen thatˆ -completely less noisy impliesˆ -completely more capable. Again, the same holds for the non-complete case. In a similar way, we can directly derive a few other Lemmas.  Proof. We again use the same technique as in the proof of Lemma 4.5. However, we need one additional step. Since we only want to show that a less noisy type ordering is implied we can restrict the input state to be classical-quantum. It needs to be shown that going from σ E n does not change the mutual information, where C is the conjugation map. We have that, Now, since (C ⊗n • N ⊗n )(ρ)) T = N ⊗n (ρ) it follows that the mutual information remains unchanged.

Approximating capacities of a quantum channel
Next, we consider a quantum channel N with associated complementary channel N c4 . We want to investigate properties of the private capacity P and the quantum capacity Q. Let us start with the following.
Theorem 4.11. Let N be a quantum channel.
(iv) If N is -fully quantum less noisy and -regularized more capable, then P (1) Proof. The proof of the first part is similar to that of [85,Proposition 1]. The second part is simply the same proof for arbitrary n and taking the limit, see also [  Proof. Follows directly from Theorem 4.11 and Lemma 4.6.
We can furthermore bound the quantum and private capacity of almost anti-less noisy or antimore capable channels. (4.17) The proof of the next two bounds follows immediately from the definitions of the private information and the less noisy partial order.
Again, this immediately implies the previously known results about degradable channels (and adding analog ones for conjugate degradable). Corollary 4.14. Let N be either -anti degradable or -conjugate anti degradable. Then we have This should be compared to [78,Theorem 3.8].
In a similar fashion we can also reproduce the results in [49]. If N is -fully quantum more capable than M, we have Proof. The first two statements essentially follow by definition of the involved quantities. The third statement follows by chaining them together and applying the additivity: The final statement again follows trivially from the definitions.
Combining the previous results we easily get the following corollary. Proof. This follows along Equation (4.26) and noting that the final inequality only requires the unregularized -less noisy property.
This should be compared to [49,Corollary II.7]. Moreover, we note that given channels N , M it is possible to efficiently compute the minimal such that N is an degraded version of M [49,78].

Characterizations of strong data processing
As we discussed in Sec. 2.2, it is possible to show that a channel N is less noisy than another channel M by considering their respective contraction coefficients. Thus, contraction coefficients can be used to determine if two channels are related in the order. In this section we will now gather characterizations and new properties of contraction coefficients that are of interest beyond the study of partial orders. Indeed, SDPIs are now also becoming a standard tool to understand the computational power of noisy quantum devices [33,83,82,26] and for that application tensorization is also key.

SDPI via hypercontractivity
Let us prove some characterizations of strong data processing inequalities (SDPI), i.e. the relative entropy contraction coefficient. Obtaining characterizations of strong SDPIs has been the focus of recent activity in quantum information theory. For instance, in [7] the authors show that for a channel M : B(H A ) → B(H B ):

D(M(ρ) M(σ)) ≤ ηD(ρ σ) ∀ρ ∈ D(H A )
and are equivalent. Here we will focus on a hypercontractive approach. In short, hypercontractive inequalities are bounds on various p → q norms of quantum channels N . As we will see, these bounds are closely related to entropic inequalities for the channel. But by reducing an entropic inequality to a norm inequality it is often easier to obtain tensorization.
In the classical setting, results connecting hypercontractivity of a Markov kernel and SDPIs go back to Alswede and Gács [1] and have also been the focus of the more recent paper [73,Theorem 6]. In order to state our results, let us define the σ−weighted p-quasi-norms for σ ∈ D + (H) and X ∈ B(H) as: For p ≥ 1 these quantities are norms, while for p < 1 only quasi-norms. They are closely related to the sandwiched Rényi divergences D p of [63,91], which can be defined as for p ∈ (0, 1) ∪ (1, ∞) and are known to satisfy a data processing inequality for p ∈ ( 1 2 , 1) ∪ (1, ∞) [3]. Moreover, we have that as p → 1, they converge to the usual relative entropy and that these entropies are monotone increasing in the parameter p [63]. The connection between these norms and Rényi divergences will allow us to obtain several conditions that imply bounds on the relative entropy contraction coefficients and the less noisy ordering based on the Petz recovery map of the channel. For a quantum channel M and a reference state σ, define Γ p σ (X) = σ Our techniques will mostly be based on studying how the p, σ norms contract for different values of p and reference states σ under the Petz recovery map. Such statements are intimately related to strong data processing inequalities. Indeed, as shown in [61], the data-processing inequality for the Rényi divergences is equivalent to the statement that M * (p,σ)→(p,M(σ)) ≤ 1 for all choices of input state σ and channels M. The basic idea of this section is that if the inequality M * (p,σ)→(q,M(σ)) ≤ 1 (5.1) holds for q > p, then we also obtain a strong data processing inequality for M and some Rényi entropy. Note that this statement is stronger than the one with p = q, as the norms are monotone increasing in p. Denote by p , q the Hölder conjugates of p, q. One can show by duality of norms that Eq. (5.1) is in fact equivalent to for all states ρ. As we will see in Proposition (5.4), the expression above is equivalent to an SDPI for Rényi entropies.
The following characterization of a strong DPI then follows: Proof. Let us start by showing that (5.4) implies (5.3). Note that by setting X = Γ −1 (ρ), (5.4) gives: Taking the logarithm and rewriting the Equation above in terms of sandwiched Renyi divergences gives and Eq. (5.9) follows by taking the limit τ → 0. Let us show the other direction. It follows from Lemma 5.1 and a Taylor expansion that for all τ small enough. The claim follows from the fact that is suffices to restrict to positive operators when computing p → q norms of completely positive maps [69]. Finally, Eq. (5.5) follows from a duality argument. Indeed, it is easy to show that Eq. (5.4) is equivalent to Eq. (5.2). Taking the logarithm, we obtain the inequality in terms of the Petz recovery map.
Thus, we see that an SDPI is equivalent to a nontrivial p → q inequality for the Petz recovery map for p, q for all p slightly larger than 1. We also immediately obtain a contractive characterization of the less noisy order: Proof. By rewriting the inequality in Eq. (5.6) in terms of the Rényi divergences and taking the limit τ → 0 we see that they imply that for all ρ, σ. By Proposition 2.3, this is equivalent to N l.n. M. To show the other direction we may again resort to a Taylor expansion.
It is straightforward to adapt the results above to the completely less noisy ordering by suitably adapting the involved channels and states.
However, it might be difficult to compute the 1 + τ norms analytically, as it is more convenient to work with integer values of p. Thus, we will show that any nontrivial p → q inequality gives rise to an SDPI, possibly with an error term.

Proposition 5.4.
Suppose that for some 1 < p ≤ q < ∞, C ≥ 1 and two channels M, N , where we assume that N is invertible, we have and Proof. Combining Eq. (5.7), Eq. (5.8) and the Stein-Weiss interpolation theorem (see e.g. [61] for a quantum information friendly exposition) yields that: for 0 ≤ θ ≤ 1, which can be rewritten as: for all X. Solving for p θ and q θ we see that Taking the logarithm of Eq. (5.11) and rewriting in terms of Rényi entropies we see that A close inspection of the expressions involved in the formula above shows that: The claim then follows by taking the limit θ → 1 in Eq. (5.12) and noting that p θ → 1 and q θ → 1.
For any quantum channel M we have by the Russ-Dye theorem that and, thus, Eq.(5.8) is always satisfied for N = id. That is, if we wish to prove an SDPI for a quantum channel with and additive error term, a p → q inequality suffices.
The case of 2 → 2 norms of the Proposition above is particularly interesting, as it leads to entropic inequalities that tensorize: Corollary 5.5. Suppose that N is invertible as a linear map. Then for some σ > 0, implies that for all states ρ D 2 (M ⊗n (ρ) M(σ) ⊗n ) ≤ D 2 (N ⊗n (ρ) N (σ) ⊗n ) + 2n log(C) . (5.14) We can further massage this to for all operators X. To see this, set Thus, we conclude that for all X we have: Picking X = σ − 1 2 ρσ − 1 2 and taking the logarithm yields the claim for n = 1. The claim for other n > 1 follows from noting that the condition in Eq. (5.13) tensorizes. Indeed, note that for N ⊗n and M ⊗n we have: The last equality follows from the fact that the 2 → 2 norm just corresponds to the operator norm of the map, which is multiplicative.
Note that the condition in Eq. (5.13) is equivalent to the operator norm of Γ being bounded by C. In particular, this means that it can be verified efficiently given the channels and a target state σ. Thus, we have identified a condition that implies a strong data processing inequality with the feature that it can both be verified efficiently and tensorizes.
This statement can also be used to obtain contraction coefficients that tensorize: Then for all states ρ Proof. By Prop. 5.5 we have that Eq. (5.16) implies that In [60] the authors show that which completes the proof.
The statement above can be used to bound capacities and entropic quantities of a quantum channel efficiently. For instance, if the channel M on a d-dimensional space is doubly stochastic, we immediately see that if M • N −1 p 2→2 ≤ 1, then By noting that D 2 (M ⊗n (ρ) I d n ) ≥ n−S(M ⊗n (ρ)), we immediately obtain that the minimum output entropy of M ⊗n (ρ) is lower bounded by Inspecting the expression M • N −1 p 2→2 more closely, we see that: Thus, we see that M

Generalized contraction coefficients and partial orders
In this section we will discuss extensions of the concepts discussed to other divergences, in particular f -divergences and χ 2 -divergences. Note that both partial orders and contraction coefficients can be naturally defined in terms of any divergence by simply replacing the relative entropy by the desired divergence.
Interestingly, classically contraction coefficients (and partial orders respectively) based on many different divergences have been shown actually to be the same, see e.g. [55]. In the quantum case this remains a mostly open problem with some of what is known being summarized in the following section.

f-divergences
Here, we extend the discussions of the previous sections to the setting of arbitrary f -divergences [67,Chapter 7]. The two most commonly used ones are the standard f -divergence: given ρ, σ ∈ D(H) with σ full-rank: where ∆ ρ,σ (X) := ρ X σ −1 is the so-called relative modular operator between states σ and ρ and can be interpreted as a non-commutative generalization of the Radon-Nikodym derivative between two probability mass functions. Similarly, the maximal f -divergence is defined as Note that, often, only those functions that obey f (1) = 0 are considered valid for f-divergences as it ensures that D f (ρ ρ) = 0. However, this excludes e.g. Rényi divergences.
Of course, we can define contraction coefficients η D f (N ) and η D f (N ) as for other divergences. Next, we want to consider mutual information like quantities based on f-divergences. Consider the following definitions, 3) In what follows, we denote I f , resp. D f for either I f or I f , resp. D f or D f . Note, that just like for the relative entropy, the following holds. Consider a classical quantum state ρ U B and its marginal ρ B , we have As a direct consequence, for a quantum channel N , we have resulting in Before proving the equivalence of Proposition 6.2, we recall this very simple technical lemma: Lemma 6.1. Let A be a self-adjoint operator over a finite dimensional Hilbert space, and let f be a function on R + that is differentiable at 1. Then Proof. Simply write the eigenvalue decomposition of A := i a i |i i|, so that The next Proposition is a direct generalization of Theorem 5. (i) For all c-q states ρ U A with marginal σ A , where U is an arbitrary classical system, where σ A = u P U (u)ρ u A is defined to be the marginal of ρ AU . Then the result follows directly from (ii). Next, we prove that (i)⇒(ii): without loss of generality we assume that σ A is full-rank. Then, for any ρ A ∈ D(H A ), and 0 ≤ λ ≤ , where is small enough to ensure that σ A − ρ A ≥ 0, we let U ≡ U λ be the binary random variable of distribution P U λ (0) = λ, P U λ (1) = 1 − λ, and conditional states ρ 0 Since The result follows after computing the latter derivative: We will be done as soon as we can show that the derivative above is equal to 0. By linearity of the channels M and N , it is enough to show that We first focus the case of the maximal f -divergences: we have where the second identity comes from Lemma 6.1. It is then straightforward to evaluate the derivative above and verify that d dλ λ=0 which gives what we needed to prove. The case of the standard f -divergence can be treated similarly: where in (1) we used once again Lemma 6.1 in (1).

Remark 6.3.
The previous result can be easily extended to the case of optimized quantum f divergences as defined in [89].
Remark 6.4. Note that we only require f to be differentiable. Thus, Proposition 6.2 holds for a larger class of functionals than f −divergences.
As a result we can express the contraction coefficient based on f -divergences in terms of the corresponding f -mutual information as well as define partial orders based on f -mutual informations that can be equivalently formulated in terms of f -divergences. We will denote these by D f ,l.n. .
Let us end this section by discussing our favorite example: the erasure channel. One can easily find the following.

Lemma 6.5. For any function
14) which immediately implies that for all functions f and g with the above property, we have Proof. By direct calculation.
This implies that we can easily extend our previous Proposition 2.5.
Proof. Analog to that of Proposition 2.5.
And the same holds for the maximal f -divergence.

Spectral approaches to quantum less-noisy pre-orders
In Theorem 1 of [55], it was shown in the classical setting that the less noisy pre-order M l.n. Ψ is equivalent to the following condition: for any probability mass functions p, q where the χ 2 divergence is defined as follows: This characterization is powerful as it allows for a spectral analysis of the less noisy pre-order [55]. The proof of this fact relies on two observations: to prove that less noisiness implies (6.16), it uses the characterization of less noisy seen in Section 2.2, namely that M l.n. N is equivalent to for any two probability mass functions p and q, together with the well-known fact that the χ 2 divergence locally approximates the relative entropy: On the other hand, the relative entropy can be rewritten in terms of the following integral [55] (see also [58,64,65] for slightly different related expressions): The characterization of less noisiness in terms Equation (6.16) follows from the joint use of Equations (6.18) and (6.19). This result implies as a special case the following result [14] (see also refinements, e.g. Theorem 3.6 of [74], and [74,1] for the link to the contraction coefficient for the maximal correlation): η Re (M) = η χ 2 (M) := sup p,q 0<χ 2 (p,q)<∞ χ 2 (M(p), M(q)) χ 2 (p, q) . (6.20) The χ 2 quantity is known classically to locally approximate any f divergence [10,13]. This is no longer the case in the non-commutative world, where a complete characterization of χ 2 quantities, also known as Fisher information metrics, was found [52,79]. First of all, we define the following set of functions: Next, for ρ, σ ∈ D(H), and k ∈ K, define the quantum χ 2 -divergence when supp(ρ) ⊆ supp(σ), and infinity otherwise. In the previous case, the inversion Ω k σ is defined as Here, we also restrict ourselves to a subclass of f -divergences: consider the class G of continuous operator convex functions g from R + to R that satisfy g(1) = 0. These functions can all be expressed in terms of the following integral representation: where a, b, c > 0 and the integral of the positive measure dν(s) on (0, ∞) is bounded. Then, for any k ∈ K, there exists g ∈ G such that [52,79]: The function k is related to g by Hence, like its classical restriction, the quantum χ 2 divergence can be understood as a local approximation of the quantum relative entropy. However, Equation (6.19) is not known to hold in general in the quantum case. Rather, the following integral representation of the quantum relative entropy is known and has been recently used to get tightenings of the data processing inequality in [12] as a special case of the integral representation (6.21):

24)
It can be easily shown that the integrands in Equation (6.24) and Equation (6.19) coincide when ρ and σ commute and can be identified with p and q. However, the integrand (6.24) cannot be identified in general with an adequately normalized χ 2 divergence. In particular, Equation (6.20) is not known to hold in general in the quantum setting, despite some recent progress in that direction [51,37]. We also refer to the more recent paper [11] where the strong data processing for the χ 2 divergence was shown to tensorize in the case where the state σ is taken to be a tensor product.
For the reasons mentioned above, it seems unlikely that the equivalence between less noisy and the ordering of χ 2 is true in the quantum setting. However, the following result is a direct consequence of the above discussion: (i) For any two quantum states ρ, σ ∈ D(H A ) and any t ≥ 0: (ii) For all g ∈ G and any two states ρ, σ ∈ D(H A ): (iii) For any k ∈ K and any two quantum states ρ, σ ∈ D(H A ) , Moreover, if (ii) holds only for a given g ∈ G, then (iii) holds for k ∈ K satisfying (6.23). Finally, obvious extensions of the above chain of implications hold true in the case of regularized and complete less noisy pre-orders, as well as for the related contraction coefficients.
Proof. This is a direct consequence of the expression (6.21) as well as the relation (6.22).
Finally, we will once more mention the erasure channel as an illustrating example.
Lemma 6.8. Let E : A → B be the quantum erasure channel of parameter ε, ρ A and σ A be some quantum states. Then for any k ∈ K we have that Therefore, This means that at least for the erasure channel the contraction coefficients for relative entropy, f-divergences and χ 2 k divergence are the same.

Functional inequalities
Another standard way of obtaining strong data processing is through functional inequalities [73], which we will describe in a bit more detail below. This approach is particularly effective whenever we have a semigroup (P t ) t≥0 converging to a unique full rank state σ. Such semigroups are called primitive. In [55] the authors relate such functional inequalities to the less noisy ordering in the classical setting, especially when comparing to symmetric channels. Here we will recover their results in the quantum setting, especially Proposition 12, while also revisiting the connections between functional and entropic inequalities.
We will start by defining logarithmic Sobolev inequalities and showing their consequences. Let N be a primitive channel whose unique invariant state is σ, and denote by N * the corresponding completely, positive unital dual map. Corresponding to the map N * , we construct a quantum Markov semigroup (QMS) of completely positive unital maps P t = e −t(id−N * ) for all t ≥ 0. In other words, the resulting semigroup (P t ) t≥0 has S := N * − id as its generator. The semigroup (P t ) t≥0 is by construction also primitive, and σ is its unique invariant state. We can then define a scalar product X, Y σ := Tr σ which induces the · 2,σ norm. Next we define the Dirichlet form E S corresponding to S as follows: for all X, Y ∈ B sa (H), Next, we define the 2-entropy as follows: for any X ∈ B sa (H), where here the relative entropy D(· ·) has been extended to non-normalized positive operators D(A B) = Tr(A(ln A − ln B)) whenever supp(A) ⊂ supp(B). The QMS (P t ) t≥0 satisfies the logarithmic Sobolev inequality (LSI) if there exists α > 0 such that for every X ∈ B sa (H) α Ent 2,σ (X) ≤ E S (X, X) .
The largest α such that Equation (7.1) holds is called the logarithmic Sobolev constant of (P t ) t≥0 . Likewise, the logarithmic Sobolev inequality for the discrete-time Markov chain with constant α > 0 states that for every X ∈ B sa (H), is sometimes referred to as the discrete Dirichlet form. Logarithmic Sobolev inequalities are particularly useful for reversible semigroups, i.e. semigroups such that their Petz recovery map with respect to the invariant state is the semigroup itself.
In this case, it is known that a LSI implies the following hypercontractive inequality: By the results of Proposition 5.4, we know that this can be used to obtain strong data processing inequalities involving Rényi divergences. Indeed, we immediately obtain: Nevertheless, the inequality above is a bit unsatisfactory, as it does not give complete contraction even as t → ∞. But this can be easily remedied by resorting to reversibility. As the semigroup is reversible, Eq. (7.2) also implies by duality. By Proposition 5.4, we conclude that: Exploiting the fact that 2e −2αt 1+e −2αt ≤ e −αt we then get an exponential decay of the relative entropy from the logarithmic Sobolev inequality. Although this result is by no means new [31,68,46], we believe that the approach presented here showcases transparently how a LSI implies several different entropic inequalities. Computing the LSI constant is, in general, an arduous task. For the depolarizing semigroup in dimension d, it is known that: It is possible to obtain estimates of LSI through 7.4 for any doubly stochastic channel through so-called comparison techniques. The same is true for generalized depolarizing channels.
Indeed, an easy way of estimating the LSI constant of a semigroup assuming that we know the LSI constant of another QMS is via the comparison of their Dirichlet forms: assume that S 1 and S 2 have the same unique invariant state σ, then: In the next theorem, we show that the information-theoretic notion of less noisy domination is a sufficient condition for the comparison of Dirichlet forms. First, we recall the following Lemma from [55]: Lemma 7.1. Given a positive semi-definite real matrix A and a normal real matrix B, From this we get: or equivalently for all states ρ.
(ii) Assume that M is positive semi-definite on (B sa (H), ·, · σ ) and that N * N * =N * N * . If M l.n. N , then for all X ∈ B sa (H) (iii) Assume that N = N p,σ , with N p,σ the generalized depolarizing channel with depolarizing probability p to a state σ > 0, and N p l.n. M for some channel M with σ as its stationary state. Then: Proof. (i) We use the implication (i) ⇒ (ii) of Proposition 6.7 in order to get that M l.n. N implies that for any two states ρ 1 , ρ 2 This implies, taking ρ 1 := ρ ≡ σ 1 2 Xσ 1 2 and ρ 2 = σ that Note that the adjoint of M * with respect to the weighted scalar product isM * . Thus, we can rewrite this last condition as follows: The previously proved result implies in particular that the maps M * and N * seen as matrices acting on the real vector space of self-adjoint operators provided with the σ-KMS inner product ·, · σ , N * N * ≤ PSD M * M * From Lemma 7.1, we find that for the depolarizing channel. Thus, the assumptions of (ii) are fulfilled. The claim then follows by a simple manipulation.
Note that pE id−N * 1,σ (X, X) is nothing but the variance of the observable X with respect to the state σ. Thus, we conclude that from a less-noisy comparison between channels with the same invariant state we immediately get a comparison of Dirichlet forms and a bound on logarithmic Sobolev constants if one of them is known. Unfortunately, such techniques do not yield sharp inequalities for the SDPI constant for channels. This is because it is known that for generalized depolarizing channels, we have that the optimal SDPI constant [62] is given by: with q s min (σ) the smallest eigenvalue of σ and where D 2 is the binary relative entropy. Moreover, for qubit depolarizing channels we have α(I/2) = 1 and we know that N ⊗n p,σ satisfies a SDPI with constant (1 − p) [4]. This is to be contrasted with the LS estimate, which decays with the local dimension.
8 Special classes of channels 8

.1 Weyl-covariant channels
In this section, we provide new comparison bounds for Weyl-covariant channels [87,23,42,43] (see also [86]). We consider the finite group G := Z n × Z n with the following projective representation on H := C n : Next, define the discrete Fourier transform F to be the following unitary matrix on C n for every X ∈ B(H) and (a, b) ∈ Z n × Z n . In particular, any additive noise channel of the form Proof. We simply need to prove the forward implication. By definition, there exists a channel N such that . Therefore, we found that there exists a function C : Z n × Z n → C such that for all (a, b) ∈ Z n × Z n , N (W a,b ) = C(a, b) W a,b . We conclude by an appeal to Theorem 4.14 in [86].
In Section 2.1, we introduced the notion of a less noisy domination region. Restricting ourselves to comparisons with additive noise channels, we can also define the so-called additive less noisy domination region of a channel M acting on system H as

Gaussian channels
In this section, we consider the setting of Gaussian quantum channels [44]: an n-mode quantum system is modeled by the Hilbert space L 2 (R n ) L 2 (R) ⊗n of complex-valued, square-integrable functions on R n . On this space, we define the so-called creation a † j and annihilation a j operators, j ∈ [n], by means of their commutation relations where for each j ∈ [n], a j and a † j act nontrivially on the j-th copy of the system. The number operator is defined as and corresponds to the energy of the system. In the case n = 1, it is known to have the following spectral decomposition onto the Fock basis {|k } k∈N : Next, we denote by |z the coherent state of parameter z ∈ C. These are the eigenvectors of the one-mode annihilation operator, a|z = z|z . In what follows, for a vector z ∈ C n ≡ R 2n , the tensor product of coherent states |z j will be denoted by |z . The Weyl displacement operators D(z), z ∈ C n , are unitary operators that rotate the n-mode vacuum state |0 to the coherent state |z = D(z)|0 . In terms of the creation and annihilation operators, they are defined as Next, an n-mode quantum Gaussian state is proportional to the exponential of a quadratic polynomial in the creation and annihilation operators. Among these states, thermal Gaussian states play an important role. They are defined by n-fold products of the following geometric probability distribution for the energy: The average energy of σ(E) is equal to Tr(σ(E)N (1) ) = E, whereas its entropy is equal to g(E) := H(σ(E)) = (E + 1) ln(E + 1) − E ln(E) . (8.8) Quantum Gaussian channels are those quantum channels N : D(L 2 (R n )) → D(L 2 (R m )) that preserve the set of quantum Gaussian states. The most important classes of such channels are the beamsplitter, the squeezing, quantum Gaussian attenuators and quantum Gaussian amplifiers. The first two are unitary quantum analogs of the operation of linearly mixing random variables: a two-mode beam-splitter U bs λ : D(L 2 (R 2 )) → D(L 2 (R 2 )) of transmissivity 0 < λ < 1 is defined by the following action on the ladder operators: This is a simple example of a passive Gaussian channel in the sense that it preserve the total energy: [U bs λ , N (2) ] = 0. On the other hand, the two-mode squeezing U sq κ of parameter κ ≥ 1 increases the energy of the input. It is defined similarly to the beam-splitter as follows: In both cases, it is standard to interpret the first factor of the tensor product as the system and the second factor as the environment. Next, for λ ≥ 0, we denote by B λ : D(L 2 (R 2 )) → D(L 2 (R)) the reduced state on the system of either the beam-splitter (λ ≤ 1) or the squeezer (λ ≥ 1). The quantum Gaussian attenuator E λ,E : D(L 2 (R)) → D(L 2 (R)), of parameters 0 ≤ λ ≤ 1 and E ≥ 0, corresponds to the channel ρ → B λ (ρ ⊗ σ(E)). Similarly, the quantum Gaussian amplifier A κ,E : D(L 2 (R)) → D(L 2 (R)) of parameters κ ≥ 1 and E ≥ 0, corresponds to the channel ρ → B κ (ρ ⊗ σ(E)).
Another important Gaussian channel considered in the literature is the Gaussian additive noise channel. It is generated by a convex combination of displacement operators with a Gaussian probability measure: In fact, the Gaussian additive noise channel can be seen as a large particle number limit of the attenuator channel (see [34]).
All the channels mentioned above are typical examples of a gauge-covariant Gaussian channel [44]: A channel M : is the gauge transformation corresponding to the n-mode harmonic oscillator. In fact, every phase-covariant quantum Gaussian channel can be expressed as a quantum-limited amplifier composed with a quantum-limited attenuator.
The following conjecture, stated in [35], was recently proved in the restricted one-mode case [29,28] as well as for the range of parameters for which the channels become entanglement-breaking [24]; that is (i) For any quantum Gaussian attenuator E λ,E with E ≥ λ 1−λ ; (ii) For any quantum Gaussian amplifier A κ,E with E ≥ 1 κ−1 ; (iii) For any quantum Gaussian additive noise channel N E with E ≥ 1. Conjecture 8.7 (Constrained minimum output entropy conjecture for phase-covariant quantum channels). For any n ∈ N and any ρ ∈ D(L 2 (R n )) with finite entropy, let σ be the one-mode thermal Gaussian state with entropy H(ρ) n . Then, for any E ≥ 0, 0 ≤ λ ≤ 1, κ ≥ 1: H(A ⊗n κ,E (ρ)) ≥ H(A ⊗n κ,E (σ ⊗n )) = n g κ g −1 H(ρ) n + (κ − 1)(E + 1) , (8.11) H(N ⊗n E (ρ)) ≥ H(N ⊗n E (σ ⊗n )) = n g g −1 H(ρ) n + E . (8.12) In the next result, we provide an exact expression for contraction coefficients of any of these Gaussian channels and their tensor products under the condition that Conjecture 8.7 holds. More precisely, let σ j ∈ D(L 2 (R)), j ∈ [n], be one-mode Gaussian states, G be a one-mode Gaussian quantum channel, and define the contraction coefficients for the relative entropy as . (8.13) Here, we define the relative entropy as Lindblad in [54,53]: given any two positive, trace-class operators A, B, if {|a i }, resp. {|b j }, is a complete orthonormal set of eigenvectors of A with corresponding eigenvalues {a i }, resp. B with eigenvalues {b j }, where the sum is taken to be +∞ if the series diverges. Furthermore, we define the energy-constrained relative entropy contraction coefficient as follows: for p > 0 .
Before proving Theorem 8.8, we state and prove a couple of technical lemmas. The first one allows us to restrict the optimization (8.13) to finite-rank input states (see Lemma 21 in [56] for a classical analogue): Lemma 8.9 (Finite-rank state characterization of η Re ). Let T be the set of finite-rank n-mode quantum states ρ ∈ D(L 2 (R n )) supported on H 0 := lin{ |k 1 ⊗ . . . ⊗ |k n , k j ∈ N}. Then for any 1-mode bosonic channel N , and any thermal Gaussian states σ 1 , . . . , σ n : .
(8.15) Remark 8.10. Remark that the optimization does not need to assume that D(ρ j σ j ) > 0: this is automatically true since j σ j is faithful, and hence cannot be equal to any finite-rank state.
The rest of the proof relies on a diagonalization argument that was already used in Lemma 21 in [56]: suppose that {ρ m } m∈N is a sequence of quantum states that satisfies 0 < D(ρ m j σ j ) < ∞ for all m ∈ N and achieves the supremum in (8. Since the supremum in (8.13) is over a smaller set than in (8.15), the above inequality is actually an equality, and the result follows.
Using once again the fundamental theorem of calculus and an obvious rescaling, we reduce the problem to that of proving that for all u ≤ α the function η → ln ηu+1 ηu ln ηα+1 ηα is increasing. By differentiating this function, it is enough to show that the function x → (x + 1) ln x + 1 x on R + is decreasing. One last differentiation reduces the problem to the basic inequality ln 1+u) ≤ u on R + .
We are now ready to prove our main result.
Proof of Theorem 8.8.
Step 1: Upper bounding η Re : by Lemma 8.9, we can restrict the optimization to that over finite-rank input states ρ ∈ D(L 2 (R n )) supported on H 0 = lin{|k 1 ⊗ · · · ⊗ |k n , k j ∈ N}, so that Tr(ρ N (n) ) < ∞. Such states also have finite von Neumann entropy. Moreover, the energy at the output of any Gaussian quantum channel is also finite [76]. Then given any sequence {σ(E j )} of thermal Gaussian states, and denoting by q j := Tr ρ j − σ(E j ) N (1) the difference in energy between ρ j = Tr {j} c (ρ) and σ(E j ), Let us now consider the attenuator channel E λ,E . Since E † λ,E (N (1) ) = λ N (1) + (1 − λ)E 1, we have Hence, for η Re j σ(E j ), E ⊗n λ,E to be upper bounded by the right-hand side of (8.14), it suffices to prove that for any j ∈ [n]: By the constrained minimum entropy conjecture (8.10), we can further simplify the problem to that of proving, for any β := g −1 H(ρ) . (8.24) The result directly follows from (8.20) for α := E j and ν := E. The case of the amplifier A κ,E follow the exact same proof, since A † κ,E (N (1) ) = κN (1) + (κ − 1)(E + 1)1. Now, we turn our attention to the additive noise channel N E : here, since N † E (N (1) ) = N (1) + E 1, (8.23) has to be replaced by Therefore, as previously, by conjecture (8.12) it is enough to prove that for any j ∈ [n], denoting by β := g −1 H(ρ) Then, invoking (8.19) for α := E j and ν := E allows us to conclude.
Step 2: η Re ≥ η (p) Re : this is obvious by definition of the contraction coefficients.
The lower bound follows after taking E = E 1 − δ, |z| 2 = δ and taking the limit δ → 0. Since Tr(σ(E ) z N (1) ) = E 1 for all δ > 0, the result follows. The cases of the amplifier and of the attenuator follow the exact same computations. The other two cases follows the exact same argument.

Remark 8.12.
In the case of the Gaussian attenuator channel E λ,E , we find that the contraction coefficient η Re (σ(E) ⊗n , E ⊗n λ,E ) = λ. This is also a consequence of the conditional entropy power inequality, as previously observed in [27,25]. Remark 8.13. In the classical setting, it is well-known that the contraction coefficient η(G) = 1 for the classical additive white Gaussian noise channel, even when restricting to inputs of finite energy. In order to get nontrivial contraction, the following non-linear strong data processing inequality was proposed in [71,21]: for the channel G : where the supremum is over all joint distributions P U X with constrained input energy over system X. In [21], it was shown that such curves are always strictly less than t for all t > 0. A similar analysis in the quantum setting will be the subject of future work.

Conclusions
In this work, we discussed the relationship between the concept of contraction coefficients and the less noisy partial order. The equivalence between their respective relative entropy and mutual information formulations allowed us a new point of view on their properties and bringing our knowledge in the quantum setting closer to what is currently available in the classical case. Given the central position data processing takes in the field of quantum information theory, we expect our results to lay the ground for further results and applications to plenty information processing tasks. Indeed, the techniques developed here already found applications in obtaining state-of-the-art bounds on the performance of noisy quantum devices [26,82], novel capacity bounds [38] and the study of quantum differential privacy [40].
Nevertheless, many open questions remain. Most crucially, it is not yet clear whether the relations between the different partial orders are actually strict. For example, we know that Moreover, the fully quantum less noisy order and the regularized less noisy order sit somewhat parallel in the hierarchy of orders, compare Figure 1. In the exact case, both have been used to e.g. to give a condition for which the quantum capacity becomes single-letter. It would have interesting applications to find a direct relationship between the two orders, i.e. one or both of the following: