Certifying optimality for convex quantum channel optimization problems

We identify necessary and sufficient conditions for a quantum channel to be optimal for any convex optimization problem in which the optimization is taken over the set of all quantum channels of a fixed size. Optimality conditions for convex optimization problems over the set of all quantum measurements of a given system having a fixed number of measurement outcomes are obtained as a special case. In the case of linear objective functions for measurement optimization problems, our conditions reduce to the well-known Holevo-Yuen-Kennedy-Lax measurement optimality conditions. We illustrate how our conditions can be applied to various state transformation problems having non-linear objective functions based on the fidelity, trace distance, and quantum relative entropy.


Introduction
Several problems and settings that arise in quantum information theory can be expressed as optimization problems in which a real-valued function, defined for a class of quantum channels or measurements, is either minimized or maximized. The problem of minimum error quantum state discrimination [BC09], in which a quantum state randomly selected from a known ensemble of states is to be identified with the smallest possible probability of error by means of a measurement, provides a well-known example. This problem is naturally expressed as the optimization of a real-valued linear function defined on the set of all measurements with a fixed number of outcomes. Other examples arise in the study of quantum cloning [SIGA05] and the closely related notion of quantum money [AFG + 12], where one is generally interested in knowing how well an optimally selected quantum channel can transform a single copy of a given state into multiple copies of the same state, with respect to a number of different figures of merit. Another example can be found in quantum complexity theory, in which two-message quantum interactive proof systems [JUW09] are naturally analyzed as optimization problems in which the objective function describes the probability that a given verifier accepts, and where the optimization is over all quantum channels of a fixed size, which describe the possible actions of a prover.
Concerning the optimization of linear functions defined on the set of all measurements with a fixed number of outcomes, necessary and sufficient conditions for optimality were identified by Holevo [Hol73b,Hol73a] and Yuen, Kennedy, and Lax [YKL70,YKL75]. These conditions, which are described explicitly later in this paper, are relatively easy to check; the problem of actually finding or approximating an optimal measurement, while efficiently solvable through the use of semidefinite programming [JvF02,Ip03,EMV03], is in general a more computationally involved task. These optimality conditions can be easily extended to obtain optimality conditions for real-valued linear functions defined on the set of all quantum channels transforming one quantum system to another.
We prove a generalization of these results to convex optimization problems whose objective functions are not necessarily linear. To be more precise, we consider optimization problems of the form minimize f (Φ) subject to Φ ∈ C(X , Y ), where is a convex function. Here, C(X , Y ) denotes the set of all channels (i.e., completely positive and trace-preserving linear maps) from an input system to an output system having associated complex Euclidean spaces X and Y, respectively. A channel Φ ∈ C(X , Y ) is said to be optimal for the problem (1) if it is the case that f (Φ) ≤ f (Ψ) for all Ψ ∈ C(X , Y ). In this paper we do not consider the difficulty of finding or approximating an optimal channel Φ for a given function f , but instead we focus only on the task of verifying that a given channel Φ is indeed optimal. The optimality conditions we obtain can be easily checked for differentiable functions f , and can also be used to verify optimality of channels for some non-differentiable functions. We stress that our optimality conditions are not generic optimality conditions that hold for all convex optimization problems, but rather rely on a specific structure that arises when the optimization is over all quantum channels of a fixed size. There are, of course, situations in which one would prefer a method to find an optimal channel Φ for a chosen function f , as opposed to simply verifying the optimality of a given Φ, but the task of verifying optimality nevertheless has value for multiple reasons. For instance, one might hypothesize that a particular channel Φ is optimal based upon an intuition concerning the function f , or upon a heuristic method, making the task of verifying optimality essentially important. The computational task of finding an optimal solution for a chosen function f might also be expensive, time-consuming, or delegated to an untrusted computer, but once this task has been performed the optimality of the solution can be verified, allowing anyone who performs the verification to trust in the optimality of the solution. Finally, there are situations in which the function f could be indeterminate in some respect, eliminating the possibility that a numerical computation could reveal an optimal solution, but potentially allowing for an optimal solution to be expressed and checked analytically. (The results of Bacon, Childs, and van Dam [BCvD05] on hidden subgroup algorithms for certain groups provide a striking example of this potential.) Our optimality conditions are applicable to convex optimization problems in which the optimization is over all measurements having a fixed number of outcomes, as opposed to being over all channels of a fixed size. This is done through a standard correspondence between measurements and quantum-to-classical channels, described in the section following this one. We observe that Holevo [Hol73b,Hol73a] also derived optimality conditions for optimizations over measurements having differentiable (but not necessarily convex) objective functions. Holevo proved that these conditions are necessary for optimality, but did not prove they are sufficient (as they are not sufficient in general). When one restricts their attention to convex objective functions, our optimality conditions are equivalent to a set of intermediate conditions identified by Holevo, but not to the final set of conditions he identified.
We provide a few examples of how our optimality conditions can be applied to interesting categories of optimization problems. As a simple warm-up, we first explain how our conditions imply the Holevo-Yuen-Kennedy-Lax conditions for the optimality of measurements for minimum error state discrimination, which are easily extended to channel optimization problems having linear objective functions. We then discuss optimization problems relating to state transformations having objective functions based on fidelity, trace distance, and quantum relative entropy.

Background and notation
This section summarizes some concepts from convex analysis and optimization theory, narrowly focused on their applications to this paper. Further information on these topics can be found in [Roc70], [BL06], [BV04], and [MN13], for instance. We assume the reader is familiar with quantum information theory, which is covered in the books [NC00], [Wil13], and [Wat18], among others. We will, however, summarize the notion of the Choi operator of a channel, clarify its basic connection to the sorts of optimization problems we consider, and discuss the correspondence between measurements and quantum-to-classical channels, as these topics are essential to an explanation of our results. It will also be helpful to begin the section by establishing some basic notation and terminology concerning linear algebra.

Linear algebra notation and terminology
When we refer to a complex Euclidean space, we mean C n for some positive integer n, or more generally the complex vector space consisting of vectors indexed by an arbitrary finite set in place of the index set {1, . . . , n}. The elementary unit vectors of the space C n are denoted e 1 , . . . , e n . Complex Euclidean spaces will be denoted by capital calligraphic letters such as X , Y, and Z.
For a complex Euclidean space X , the space of linear operators on X is denoted L(X ), and the identity operator on X is denoted 1 X . For indices j, k ∈ {1, . . . , n}, the operator E j,k ∈ L(C n ) is defined as E j,k = e j e * k . Equivalently, with respect to the basis {e 1 , . . . , e n }, the matrix representation of E j,k has a 1 in the (j, k) entry, with all other entries 0.
The real vector space of Hermitian operators acting on a complex Euclidean space X is denoted Herm(X ); the cone of positive semidefinite operators acting on X is denoted Pos(X ); the set of positive definite operators acting on X is denoted Pd(X ); and the set of density operators (i.e., positive semidefinite operators having unit trace) is denoted D(X ). The Hilbert-Schmidt inner product of two Hermitian operators X, Y ∈ Herm(X ) is given by X, Y = Tr(XY). (3) For a subspace V ⊆ X of a complex Euclidean space X , we write Π V ∈ Pos(X ) to denote the (orthogonal) projection operator that projects onto the subspace V. Finally, whenever we refer to the inverse of a positive semidefinite operator X ∈ Pos(X ), it should be understood that we are referring to the Moore-Penrose pseudoinverse of X (i.e., the operator that acts as the inverse of X on the image of X and zero on the kernel of X).

Convex functions taking real or infinite values
Let X be a complex Euclidean space and let be a function mapping each Hermitian operator to either a real number or to positive infinity. The domain of f is defined as For any function of the form f : C → R defined only on a subset C ⊆ Herm(X ), one may naturally extend f to a function of the form (4) by defining f (X) = ∞ whenever X ∈ C.
A function f of the form (4) is proper if dom( f ) = ∅.
The indicator function of a set C ⊆ Herm(X ) is the function defined as for all X ∈ Herm(X ). It is evident that dom(I C ) = C for every set C ⊆ Herm(X ), and if C is a convex set then I C is a convex function.

Subdifferentials
Let f be a proper function of the form (4) and let X ∈ dom( f ). The subdifferential of f at X is the set defined as A key property of the subdifferential of a proper function, which follows trivially from the definition of the subdifferential, is its relation to global minima: X ∈ dom( f ) is a global minimizer of f if and only if 0 ∈ ∂ f (X). Two additional properties of subdifferentials that are relevant to this paper are the following: 2. If · is any norm on Herm(X ) and f (X) = X for all X ∈ Herm(X ), then where Y * = sup{ Y, X : X ≤ 1} is the dual norm to · .
Finally, we will make use of the following theorem, which presents a variant of the chain rule for subdifferentials. In the statement of this theorem, relint denotes the relative interior of a set, and for a linear map Λ : Herm(Y ) → Herm(X ), the map denotes the adjoint map to Λ, which is the uniquely determined linear map that satisfies for all X ∈ Herm(X ) and Y ∈ Herm(Y ).
Theorem 1. Let X and Y be complex Euclidean spaces, let f : Herm(X ) → R ∪ {∞} be a convex function, and let g : Herm(Y ) → Herm(X ) be an affine linear map, meaning that for all Y ∈ Herm(Y ), for some choice of a linear map Λ : Herm(Y ) → Herm(X ) and an operator A ∈ Herm(X ).

Convex optimization problems
Convex optimization problems in quantum information theory often have the following general form, for some choice of a convex function f : Herm(X ) → R ∪ {∞} and a convex set C ⊆ Herm(X ): An operator X ∈ C ∩ dom( f ) is said to be optimal for the optimization problem (16) if f (X) ≤ f (Y) for all Y ∈ C. As we will see below, convenient conditions for optimality can be given if it is the case that A (constrained) convex optimization problem of the form (16) can be considered as an unconstrained convex optimization problem by minimizing f (X) + I C (X) over all Hermitian operators X ∈ Herm(X ). The domain of the function f + I C is given by As was mentioned previously, an operator X ∈ dom( f ) ∩ C is a global minimizer of f (X) + I C (X) if and only if 0 ∈ ∂( f + I C )(X). If the condition in (17) holds and both f and C are convex, a subdifferential sum-rule implies that for all X ∈ dom( f ) ∩ C. For this reason, a characterization of the subdifferential set ∂I C (X) for a convex set C can be useful in identifying necessary and sufficient conditions for optimality.

Optimizing over Choi operators of channels
For complex Euclidean spaces X and Y, the set of completely positive, trace-preserving linear maps (i.e., channels) from L( assuming X = C n . (An analogous definition is used for index sets other than {1, . . . , n}.) Through this representation, the set of channels is isomorphic to the set This set is convex, and it is helpful to observe that its relative interior is The action of a map Φ can be recovered from its Choi representation by the relation for all operators X ∈ L(X ), where X T denotes the transpose of X.
An optimization problem of the form where is a given function, can equivalently be expressed as where is defined as f (J(Φ)) = g(Φ) for all Φ ∈ C(X , Y ). Although the formulations (24) and (26) are equivalent, it will be convenient for us to focus primarily on the formulation (26). The results we obtain can, however, easily be adapted to the formulation (24).

Subdifferentials of the indicator function of the set of Choi operators of channels
The following proposition provides a characterization of the subdifferential of the indicator function of J(C(X , Y )) at every point X ∈ J(C(X , Y )).
Proposition 2. For complex Euclidean spaces X and Y, and for X ∈ J(C(X , Y )), it holds that and therefore As Pos(Y ⊗ X ) and K are both convex and one has that which implies the proposition.

The Lagrange dual problem for channel optimization
Consider a channel optimization problem expressed in the following form that is equivalent to (26): One may then formulate the associated Lagrange dual problem: where for all Y ∈ Pos(Y ⊗ X ) and Z ∈ Herm(X ). The optimal value of the Lagrange dual problem (34) is a lower-bound for the optimal value of the original problem (33) (even if the function f is not convex), which is a property known as weak duality. Slater's theorem implies that if f is convex, and there exists a positive definite operator X ∈ Pd(Y ⊗ X ) such that Tr Y (X) = 1 X and X ∈ relint(dom( f )), then the problems (33) and (34) have the same optimal value.

Quantum measurements as channels
Optimizations over the set of all measurements having a fixed number of outcomes can be expressed as optimizations over channels, as we now explain. Consider first a measurement on a complex Euclidean space X that has m possible measurement outcomes, and is described by measurement operators {P 1 , . . . , P m } ⊆ Pos(X ). Letting Y = C m , this measurement can be represented by the channel Φ ∈ C(X , Y ) defined as for all X ∈ L(X ). Any channel expressible in this form is called a quantum-to-classical channel. An equivalent condition to a channel taking the form (36) is that its Choi operator takes the form Next, for each k ∈ {1, . . . , m}, define a linear map Ξ k : Herm(Y ⊗ X ) → Herm(X ) as and observe that one has for the quantum-to-classical channel Φ given by (36). In words, for a measurement represented by operators {P 1 , . . . , P m }, the linear map Ξ k allows for the recovery of the operator P k from the Choi operator J(Φ) of the quantum-to-classical channel associated with that measurement. A function g(P 1 , . . . , P m ) of these measurement operators can therefore be expressed as a function of the Choi operator J(Φ). Note that if Φ ∈ C(X , Y ) is an arbitrary (i.e., not necessarily quantum-to-classical) channel, then the operators P 1 , . . . , P m defined by (39) will still necessarily satisfy P 1 , . . . , P m ∈ Pos(X ) and P 1 + · · · + P m = 1 X , and therefore represent a valid measurement. (In general, the same measurement is given by a continuum of channels Φ, including the quantum-to-classical channel described earlier.) An optimization problem of the form minimize g(P 1 , . . . , P m ) can therefore be expressed equivalently as follows: for f defined from g as in (40).

Optimality conditions for convex channel optimization
In this section, we present our main general result regarding optimality conditions for convex optimization problems over quantum channels. As suggested in the previous section, it is convenient to associate quantum channels with their Choi representations, and to consider optimization problems of the form for various choices of convex functions Optimality conditions for such problems can be translated to optimality conditions for problems of the form (1), as will be illustrated in the section following this one.
and let X ∈ J C(X , Y ) be the Choi representation of a channel such that X ∈ dom( f ). The following statements are equivalent: 1. The operator X is optimal for the optimization problem (43).
2. There exists an operator H ∈ ∂ f (X) satisfying Proof. An operator X is optimal for the problem (43) if and only if By the characterization of the subdifferential of the indicator function I J(C(X ,Y )) given by Proposition 2, it follows that X is optimal for the optimization problem (43) if and only if there exist operators Y ∈ Pos(Y ⊗ X ), Z ∈ Herm(X ), and H ∈ ∂ f (X) satisfying Assume first that statement 2 holds, and define Y ∈ Pos(Y ⊗ X ) and Z ∈ Herm(X ) as As X and Y are both positive operators and it follows that YX = 0. The conditions in (48) are therefore satisfied, implying that statement 1 holds. Now assume that statement 1 holds, so that there exist operators Y ∈ Pos(Y ⊗ X ), Z ∈ Herm(X ), and H ∈ ∂ f (X) such that YX = 0 and H − Y − 1 Y ⊗ Z = 0. It follows that HX = (1 Y ⊗ Z)X and therefore which is a Hermitian operator. Moreover, one has H − 1 Y ⊗ Tr Y (HX) = Y, which is positive semidefinite by assumption. The conditions in (46) are therefore satisfied, which implies that statement 2 holds, completing the proof.
To make use of Theorem 3, one requires that the relative interior of the domain of the objective function contains at least one point in the relative interior of the set of channels. This requirement is precisely Slater's condition for this optimization problem, which guarantees strong duality with the corresponding dual problem (34). This regularity condition is automatically satisfied if f is continuous at at least one point in the set of Choi representations of channels. In particular, if f is differentiable at X then one may take the operator H in Theorem 3 to be H = ∇ f (X).
Corollary 4. Let f : Herm(Y ⊗ X ) → R ∪ {∞} be a convex function, let X ∈ J(C(X , Y )) be the Choi representation of a channel, and assume that f is differentiable at X. The operator X is an optimal solution to the optimization problem (43) if and only if Proof. As f is differentiable at X, one has that ∂ f (X) = {∇ f (X)}. Furthermore, f must be finite in some neighborhood around X and thus relint(dom( f )) ∩ relint(J(C(X , Y ))) = ∅ must hold. The result now follows from Theorem 3.
We remark that the optimality conditions represented by this corollary appear to be a special feature of problems involving an optimization over channels. In essence, in differentiable convex quantum channel optimization problems, optimal dual problem solutions Z = Tr Y (∇ f (X)X) and Y = ∇ f (X) − 1 Y ⊗ Tr Y (∇ f (X)X) are uniquely determined by optimal primal solutions X.
It is natural to ask if there is an approximate version of the implication that statement 2 implies statement 1 in Theorem 3. That is, if the requirements (46) hold approximately for some H ∈ ∂ f (X), then is X necessarily close to being optimal? The following theorem demonstrates that this is indeed the case, which allows for bounds to be placed on the optimal value of such optimization problems in the case the conditions in (46) cannot be verified exactly.
Theorem 5. Let f : Herm(Y ⊗ X ) → R ∪ {∞} be a function, let Φ ∈ C(X , Y ) be a channel such that J(Φ) ∈ dom( f ), and let H ∈ ∂ f (X). It is the case that where Proof. Because the spectral norm is a continuous function, a compactness argument implies that there must exist a positive semidefinite operator P ∈ Pos(Y ⊗ X ) for which We will let such a P be fixed for the remainder of the proof. Define a Hermitian operator and define so that A ∞ = ε, and therefore It is the case that and therefore Y ∈ Pos(Y ⊗ X ). Now consider the Lagrange dual problem (34) associated with the minimization of f over all channels in C(X , Y ). As Z is Hermitian and Y is positive semidefinite, (Y, Z) is a dual feasible solution to this dual problem. As H ∈ ∂ f (J(Φ)), one has that for every X ∈ Herm(Y ⊗ X ). Therefore, when the dual objective function g is evaluated at (Y, Z), we find that The theorem follows by weak duality.
We note that the previous theorem does not require the function f to be convex, or for the associated optimization problem to satisfy the conditions of Slater's theorem. However, having knowledge of the subdifferential ∂ f (X) at an operator X requires knowledge of the global behavior of the function f when f is not convex, so the usefulness of Theorem 5 may be limited when f is not convex.

Applications
In this section we apply the optimality conditions given by Theorem 3 to a few categories of examples, including the simple case of channel optimization problems having linear objective functions and three variants of problems involving quantum state transformations. In these examples we will make use of various facts concerning differentiation for functions mapping between spaces of Hermitian operators; a short discussion of this topic, along with a lemma that is needed for one of the examples, can be found in the section following this one.

Linear objective functions
We will begin by considering the simple case in which the objective function f in Theorem 3 is linear. In this situation, the optimization problem (43) may be rewritten as for some choice of a Hermitian operator H ∈ Herm(Y ⊗ X ). The subdifferential of the function f (X) = H, X is given by ∂ f (X) = {H} for all X ∈ Herm(Y ⊗ X ). By Theorem 3 it follows that the Choi operator X = J(Φ) of a channel Φ ∈ C(X , Y ) is optimal for the problem (63) if and only if Tr Y (HX) ∈ Herm(X ) and H ≥ 1 Y ⊗ Tr Y (HX).
We observe that this optimality criterion can alternatively be obtained through semidefinite programming duality and complementary slackness. (See, for instance, Exercise 3.5 of [Wat18], observing that the inequality is reversed in that exercise because the optimization problem is expressed as a maximization rather than a minimization.) The problem of minimum error state discrimination, which was mentioned in the introduction, is a special case in which the function f in the optimization problem (43) is linear. Consider an ensemble of states, which represents the random selection of one of a finite number of quantum states according to a given probability distribution. Formally speaking, an ensemble is described by a collection {ρ 1 , . . . , ρ n } ⊆ D(X ) of density operators together with a probability vector p = (p 1 , . . . , p n ). The problem of minimum-error state discrimination seeks a measurement on the system represented by the space X that identifies, with the minimum possible probability of error, a state chosen randomly according to this ensemble.
For a given choice of a measurement, represented by operators {P 1 , . . . , P n } ⊆ Pos(X ), the error probability incurred by this measurement can be expressed as is the average state of the ensemble. A minimization of the error probability (65) over all measurements {P 1 , . . . , P n } can be represented as an optimization of the form (63) by letting Y = C n and setting The measurement described by {P 1 , . . . , P n }, which may alternatively be represented by a quantum-to-classical channel whose Choi operator is is optimal for the minimization of the error probability (65) if and only of the conditions (64) hold. These conditions can be simplified by first calculating that then observing that this operator is Hermitian if and only if the operator n ∑ k=1 p k P k ρ k is Hermitian, and finally noting that the inequality H ≥ 1 Y ⊗ Tr Y (HX) is equivalent to which may alternatively be expressed as n ∑ k=1 p k P k ρ k ≥ p j ρ j (for all j = 1, . . . , n).

State transformation
The second category of channel optimization problems we consider involves channels that transform one state into another in such a way that the distance to a target state is minimized. One may consider any number of specific measures of distance in such a problem; we will analyze measures based on the fidelity, trace distance, and quantum relative entropy. For each of these measures, we will consider the situation in which two bipartite states, ρ ∈ D(X ⊗ Z ) and σ ∈ D(Y ⊗ Z ), for complex Euclidean spaces X , Y, and Z, are given. The optimization problem to be considered is to minimize the distance (or maximize the similarity) between the states Φ ⊗ 1 L(Z ) (ρ) and σ, with respect to the measure under consideration, over all possible channels Φ ∈ C(X , Y ).
When analyzing optimality conditions for these problems, it will be helpful to refer to the evaluation map corresponding to the operator ρ ∈ D(X ⊗ Z ). This is the uniquely determined completely positive map Ψ ρ ∈ CP(X , Z ) that satisfies for every complex Euclidean space Y and every channel Φ ∈ C(X , Y ). The relationship between the map Ψ ρ and the state ρ is closely related to the Choi representation of maps: assuming for the moment that X = C n , one has that That is, up to swapping the tensor factors corresponding to X and Z, the state ρ is the Choi operator of the map Ψ ρ .

Objective functions based on the fidelity
For positive semidefinite operators P, Q ∈ Pos(X ), one defines the fidelity between P and Q as F(P, Q) = The first variant of the optimal state transformation problem we will consider is as follows: The fidelity function is jointly concave, and therefore is concave in each of its arguments, from which it follows that this problem is a convex optimization problem.
The following theorem establishes optimality conditions for a channel Φ ∈ C(X , Y ) in the optimization problem (76) under the assumption that the operator Tr X (ρ) is positive definite. We note that the theorem statement does not actually require ρ and σ to have unit trace-they can be arbitrary positive semidefinite operators, but we nevertheless use the letters ρ and σ to make the connection to the optimization problem (76) clear.
Theorem 6. Let ρ ∈ Pos(X ⊗ Z ) and σ ∈ Pos(Y ⊗ Z ) be positive semidefinite operators, for complex Euclidean spaces X , Y, and Z, and assume that Tr X (ρ) is a positive definite operator. A channel Φ ∈ C(X , Y ) is optimal for the optimization problem (76) if and only if the following two conditions are met:

The operator
(As per the convention mentioned in Section 2, the inverse in (77) refers to the Moore-Penrose pseudo-inverse in case σ does not have full rank.) Remark 7. The theorem assumes that Tr X (ρ) has full rank, as this assumption allows for a cleaner theorem statement. We note in particular that this assumption is equivalent to the condition that im(Ψ ρ ) = Herm(Z ) when Ψ ρ is regarded as a linear map from Herm(X ) to Herm(Z ). It is, however, straightforward to apply the theorem to a situation in which Tr X (ρ) does not have full rank. Specifically, for an arbitrary choice of ρ and σ, one may take B ∈ L(V, Z ) to be an isometry for which BB * is the projection onto the image of Tr X (ρ), and then observe that by replacing ρ and σ with (1 X ⊗ B * )ρ(1 X ⊗ B) and (1 Y ⊗ B * )σ(1 Y ⊗ B), respectively, an equivalent problem is obtained that satisfies the assumptions of the theorem.
Proof of Theorem 6. Let r = rank(σ), let W = C r , and let A ∈ L(W, Y ⊗ Z ) be any isometry for which AA * = Π im(σ) (the projection onto the image of σ). Define a function for all Y ∈ Herm(Y ⊗ Z ), and observe that g(Y) = − F(σ, Y) for every Y ∈ Pos(Y ⊗ Z ). For a given operator Y ∈ Herm(Y ⊗ Z ) satisfying A * YA ∈ Pos(W ), there are two cases for the subdifferential ∂g(Y).
Case 1: A * YA is positive definite. In this case g is differentiable at Y, and which follows from Lemma 11, stated and proved in the section following this one. It therefore follows that Case 2: A * YA is not positive definite. In this case, ∂g(Y) = ∅, which also follows from Lemma 11.
Next, define and observe that for every channel Φ ∈ C(X , Y ). When Λ is regarded as a linear map from Herm(Y ⊗ X ) to Herm(Y ⊗ Z ), one has im(Λ) = Herm(Y ⊗ Z ) by virtue of the assumption that Tr X (ρ) is positive definite. It follows that and therefore, by Theorem 1, for every X ∈ Herm(Y ⊗ X ).
The theorem now follows from Theorem 3. In greater detail, if Φ is optimal for the problem (76), then it must hold that there exists an operator H ∈ ∂(g • Λ)(J(Φ)) such that Tr Y (H J(Φ)) ∈ Herm(X ) and H ≥ 1 Y ⊗ Tr Y (H J(Φ)).
Case 1 described above must therefore hold when Y = Λ(J(Φ)), for otherwise the subdifferential ∂(g • Λ)(J(Φ)) would be empty. It follows that from which the second condition in the statement of the theorem follows. Conversely, if the two conditions in the statement of the theorem hold, then it follows that H ∈ ∂(g • Λ)(J(Φ)), and moreover and one concludes by Theorem 3 that Φ is optimal for the optimization problem (76).
The objective function of this optimization problem is convex (with respect to Φ), so we may use Theorem 3 to obtain optimality conditions for this problem. In this case, the optimality conditions given by this theorem may not be efficiently checkable. We do, however, obtain an efficiently checkable condition that is sufficient for optimality, and we conjecture that this condition is also a necessary for optimality.
Theorem 8. Let ρ ∈ Pos(X ⊗ Z ) and σ ∈ Pos(Y ⊗ Z ) be positive semidefinite operators, for complex Euclidean spaces X , Y, and Z, and let Φ ∈ C(X , Y ) be a channel. The channel Φ is optimal for the state transformation problem in (97) if and only if there exists an operator Y ∈ Herm(Y ⊗ Z ) with Y ∞ = 1 such that the following conditions are satisfied:

The operator
The proof of Theorem 8 follows directly from Theorem 3 and applying the rules of subdifferentiation presented in Section 2. Indeed, operators of the form in (98) are precisely the elements of the subdifferential of the objective function in (97) at J(Φ). As a generalization, one can replace the trace norm · 1 in the statement of Theorem 8 with any other norm on operators, and replace · ∞ with the corresponding dual norm.
In the event that the operator σ − (Φ ⊗ 1 L(Z ) )(ρ) arising in Theorem 8 has no zero eigenvalues, there is a unique choice of Y for which the first condition the theorem holds. Specifically, if is a spectral decomposition where each λ k is nonzero, then the unique operator Y satisfying condition 1 in Theorem 8 is given by In this case it is sufficient for the second condition to be checked for this unique choice of Y, yielding an efficiently checkable optimality criterion. However, if it is the case that σ − (Φ ⊗ 1 L(Z ) )(ρ) has one or more zero eigenvalues, then the first condition holds for a continuum of choices of Y, and from Theorem 8 we conclude only that the optimality of Φ is equivalent to the existence of at least one such choice of Y for which the second statement in the theorem hold. It is reasonable, though, to view the operator Y defined by (101), where now it is to be understood that sign(0) = 0, as a natural selection of an operator through which optimality may be verified. We conjecture, based on numerical evidence, that this choice yields an efficiently checkable necessary and sufficient optimality condition.
Conjecture 9. Let ρ ∈ D(X ⊗ Z ) and σ ∈ D(Y ⊗ Z ) be density operators, for complex Euclidean spaces X , Y, and Z, and let Φ ∈ C(X , Y ) be a channel. Let be a spectral decomposition, and define where sign(α) = 1 and sign(−α) = −1 for all α > 0 and sign(0) = 0. The channel Φ is optimal for the state transformation problem in (97) if and only if the operator

Relative entropy
Finally, we consider a variant of the optimal state transformation problem based on the quantum relative entropy. For positive semidefinite operators P, Q, ∈ Pos(X ), the quantum relative entropy of P with respect to Q is defined as D(P Q) = Tr(P log(P)) − Tr(P log(Q)) if im(P) ⊆ im(Q) ∞ otherwise.
The specific variant of the problem to be considered is The relative entropy is jointly convex, which implies that it is convex in its second argument, and therefore the problem above is a convex optimization problem.
Theorem 10. Let ρ ∈ Pos(X ⊗ Z ) and σ ∈ Pos(Y ⊗ Z ) be positive semidefinite operators, for complex Euclidean spaces X , Y, and Z, and assume that Tr X (ρ) is a positive definite operator. A channel Φ ∈ C(X , Y ) is optimal for the optimization problem (106) if and only if the following two conditions are met:

The operator
Here, Π denotes the projection onto the image of σ and D log(P) denotes the differential operator of the logarithm function at the operator P (as described in (156) at the end of Section 5).
Proof. If the first condition does not hold for a given channel Φ, then the objective function in (106) takes an infinite value. However, by the assumption that Tr X (ρ) is positive definite, one has that the channel yields a finite value for the same objective function, implying that Φ is not optimal. If Φ is optimal, the first condition must therefore hold. It remains to prove that if Φ satisfies the first condition, then Φ is optimal if and only if the second condition holds. Let r be the rank of σ, let W = C r , and let A ∈ L(W, Y ⊗ Z ) be an isometry that satisfies AA * = Π im(σ) . Define a linear map Ξ : Herm(Y ⊗ X ) → Herm(W ) as for all X ∈ Herm(Y ⊗ X ). For every channel Φ ∈ C(X , Y ) it is the case that With this observation in mind, define a function f : Herm(Y ⊗ X ) → R ∪ {∞} as so that a given channel Φ ∈ C(X , Y ) is optimal for the problem (106) if and only if J(Φ) is optimal for the problem minimize f (X) subject to X ∈ J(C(X , Y )).
The function f is differentiable at every operator X ∈ Herm(Y ⊗ X ) in its domain, with its gradient being ∇ f (X) = −Ξ * D log(Ξ(X))(A * σA) As f is differentiable at every point in its domain, it follows that for every Φ ∈ C(X , Y ) for which im(σ) ⊆ im (Φ ⊗ 1 L(Z ) )(ρ) . For a given channel Φ ∈ C(X , Y ) for which im(σ) ⊆ im (Φ ⊗ 1 L(Z ) )(ρ) , it therefore follows from Theorem 3 that Φ is optimal if and only if the operator satisfies which is the second condition in the statement of the theorem.
Note that the relative entropy can be approximated through the use of semidefinite programming [FF18,FSP18], so the optimization problem (106) can be efficiently approximated on a computer.

Gradients and subdifferentials of functions on matrices
Results in the previous section have required the computation of gradients and subdifferentials for various functions mapping Hermitian operators to the real numbers. In this section we provide details on these computations.

Definitions and basic results
It is appropriate to begin with some basic definitions. Throughout this discussion, X , Y, and Z are arbitrary complex Euclidean spaces.
Suppose that f : Herm(X ) → Herm(Y ) is a partial function, meaning that it may only be defined on some subset of inputs X ∈ Herm(X ). The function f is (Fréchet) differentiable at X ∈ Herm(X ) if it is continuous in a neighborhood of X and there exists a linear map for which the equation is satisfied. If there does exist such a map, it must be unique, and we denote it by D f (X). Whenever f is differentiable at X, it must be the case that for all choices of Z ∈ Herm(X ). In the special case that Y = C, which is equivalent to f taking the form f : Herm(X ) → R, one has that for all Z ∈ Herm(X ), assuming f is differentiable at X. The chain rule for differentiation states that if f : Herm(X ) → Herm(Y ) and g : Herm(Y ) → Herm(Z ), f is differentiable at X, and g is differentiable at Y = f (X), then for all Z ∈ Herm(X ).

Affine linear functions
is affine linear if there exists a linear map and an operator Y ∈ Herm(Y ) such that for all X ∈ Herm(X ). Every such function is differentiable at every X ∈ Herm(X ), with its derivative given by D f (X) = Φ.
For an arbitrary function one therefore finds that provided that g is differentiable at f (X), and if g takes the form then it is the case that ∇(g • f )(X) = Φ * (∇g( f (X))).

Real-valued functions extended to Hermitian operators
If f : R → R is a function, then it may be extended to a function of the form g : Herm(X ) → Herm(X ) in a standard way: for any choice of X ∈ Herm(X ), one considers the spectral decomposition of X, then defines (It is typical that this extended function is given the same name as the original function on the real numbers, but for the sake of clarity we have introduced a distinct name for the extended function.) Naturally, if f is defined only on a subset of R, then g is defined for all X whose eigenvalues are contained in the domain of f . The function g is differentiable at every Hermitian operator whose eigenvalues correspond to differentiable points of the function f . The derivative of g can be described explicitly by first defining a function for every pair of points (α, β) for which f is differentiable at both α and β. (The function h is sometimes called the first divided difference of f , although this terminology is sometimes limited to the case that α = β.) In terms of this function, the derivative of g at an operator X having a spectral decomposition (135) is for every Z ∈ Herm(X ).

Gradients of functions involving the fidelity
Let Y be a complex Euclidean space, and consider the function f : Herm(Y ) → R ∪ {∞} defined as This function is differentiable at every positive definite operator Y ∈ Pd(Y ), with its gradient being One way to verify this expression is to first consider the function g(Y) = √ Y, defined for every positive semidefinite operator Y ∈ Pos(Y ), and to use the formula (138) to conclude that provided that Y is positive definite and has spectral decomposition The equation (140) follows from the chain rule.
Lemma 11. Let X be a complex Euclidean space, let P ∈ Pd(X ) be a positive definite operator, and define a function g : Herm(X ) → R ∪ {∞} as g(X) = − F(P, X) if X ∈ Pos(X ) ∞ otherwise.
For every X ∈ Pd(X ), the function g is differentiable at X, and For every operator X ∈ Pos(X ) that is not positive definite, it is the case that ∂g(X) = ∅.
Proof. Define a linear map Λ : Herm(X ) → Herm(Y ) as for all X ∈ Herm(X ). It is the case that g = f • Λ, where f is as defined in (139) above. By the chain rule for differentiation, one has ∇g(X) = − 1 2 Λ * (∇ f (Λ(X))) = − 1 2 provided that √ PX which is equivalent to X ∈ Pd(X ). Now suppose that X ∈ Pos(X ) is not positive definite, and let ∆ be the projection onto the kernel of √ PX √ P, which is nonzero by the assumption that X is not positive definite. Consider the operator Y = X + λP − 1 2 ∆P − 1 2 (148) for an arbitrary choice of λ > 0. It is the case that g(Y) − g(X) = Tr Thus, if there were to exist an element Z ∈ ∂g(X), one would have for all Y ∈ dom( f ), including the operator (148) for every λ > 0. It would then follow that λ Z, P − 1 2 ∆P − 1 or equivalently for every λ > 0, which is impossible given that the left-hand side is a finite value independent of λ and the right-hand side approaches −∞ as λ approaches 0.

Gradients of functions involving the quantum relative entropy
Let Y be a complex Euclidean space, let P ∈ Pd(Y ) be a positive definite operator, and consider the function f : Herm(Y ) → R ∪ {∞} defined as This function is differentiable at every positive definite operator Y ∈ Pd(Y ), with its gradient being ∇ f (Y) = −D log(Y)(P), where D log(Y) is the derivative of the logarithm function at Y. If is the spectral decomposition of Y, then by means of the expression (138) this function can be described explicitly as