Quantum Wasserstein distance based on an optimization over separable states

We define the quantum Wasserstein distance such that the optimization of the coupling is carried out over bipartite separable states rather than bipartite quantum states in general, and examine its properties. Sur-prisingly, we find that the self-distance is related to the quantum Fisher information. We present a transport map corresponding to an optimal bipartite separable state. We discuss how the quantum Wasserstein distance introduced is connected to criteria detecting quantum entanglement. We define variance-like quantities that can be obtained from the quantum Wasserstein distance by replacing the minimization over quantum states by a maximization. We extend our results to a family of generalized quantum Fisher information quantities. Dedicated to the memory of D´enes Petz on the occasion of his 70 th birthday.


Introduction
A classical Wasserstein distance is a metric between probability distributions µ and ν, induced by the problem of optimal mass transportation [1,2].It reflects the minimal effort that is required in order to morph the mass of µ into the mass distribution of ν.Methods based on the theory of optimal transport and advantageous properties of Wasserstein metrics have achieved great success in several important fields of pure mathematics including probability theory [3,4], theory of (stochastic) partial differential equations [5,6], variational problems [7,8] and geometry of metric spaces [9,10,11,12].In recent years, there have been a lot of results con-Géza Tóth: toth@alumni.nd.eduJózsef Pitrik: pitrik@math.bme.hucerning the description of isometries of Wasserstein spaces, too [13,14,15,16,17,18,19].Moreover, optimal transport and Wasserstein metric are also used as tools in applied mathematics.In particular, there are applications to image and signal processing [20,21,22], medical imaging [23,24] and machine learning [25,26,27,28,29,30].
The non-commutative generalization, the so-called quantum optimal transport has been at the center of attention, as it lead to the definition of several new and very useful notions in quantum physics.The first, semi-classical approach of Życzkowski and Slomczynski, has been motivated by applications in quantum chaos [31,32,33].The method of Biane and Voiculescu is related to free probability [34], while the one of Carlen, Maas, Datta, and Rouzé [35,36,37,38,39] is based on a dynamical interpretation.Caglioti, Golse, Mouhot, and Paul presented an approach based on a static interpretation [40,41,42,43,44,45].Finally De Palma and Trevisan used quantum channels [46], and De Palma, Marvian, Trevisan, and Lloyd defined the quantum earth mover's distance, i.e., the quantum Wasserstein distance order 1 [47], while Bistron, Cole, Eckstein, Friedland and Życzkowski formulated a quantum Wasserstein distance based on an antisymmetric cost function [48,49,50].
One of the key results of quantum optimal transport is the definition of the quantum Wasserstein distance [31,32,33,40,41,42,43,46,47,44,45,51,52].It has the often desirable feature that it is not necessarily maximal for two quantum states orthogonal to each other, which is beneficial, for instance, when performing learning on quantum data [53].Some of the properties of the new quantities are puzzling, yet point to profound relations between seemingly unrelated fields of quantum physics.For instance, the quantum Wasserstein distance order 2 of the quantum state from itself can be nonzero, while in the classical case the self-distance is always zero.In par-ticular, as we have mentioned, the quantum Wasserstein distance has been defined based on a quantum channel formalism [46], and it has been shown that the square of the self-distance is equal to the Wigner-Yanase skew information of the quantum state [54].At this point the questions arises: Can other fields of quantum physics help to interpret these results?For example, is it possible to relate the findings above to entanglement theory [55,56,57] such that we obtain new and meaningful relations naturally?Before presenting our results, let us summarize the definitions of the quantum Wasserstein distance.
Definition 1.The square of the distance between two quantum states described by the density matrices ϱ and σ is given by De Palma and Trevisan as [46] D DPT (ϱ, σ) 2 s. t. ϱ 12 ∈ D, where A T denotes the matrix transpose of A, and H 1 , H 2 , ..., H N are Hermitian operators1 , while D is the set of density matrices, i.e., Hermitian matrices fulfilling In this approach, there is a bipartite density matrix ϱ 12 , called coupling, corresponding to any transport map between ϱ and σ, and vice versa, there is a transport map corresponding to any coupling [46].Moreover, it has been shown that for the self-distance of a state [46] holds, where the Wigner-Yanase skew information is defined as [54] This profound result connects seemingly two very different notions of quantum physics, as it has been mentioned in the introduction.
The Wasserstein distance has also been defined in a slightly different way.
Definition 2. Golse, Mouhot, Paul and Caglioti defined the square of the distance as [40,45,43,41,42,44] Tr 2 (ϱ 12 ) = ϱ, In this paper, we will obtain new quantities by restricting the optimization to separable states in the above definitions.We will show that, in this case, the square of the self-distance equals the quantum Fisher information times a constant, while in Eq. (3) it was related to the Wigner-Yanase skew information.The quantum Fisher information is a central quantity in quantum estimation theory and quantum metrology, a field that is concerned with metrological tasks in which the quantumness of the system plays an essential role [58,59,60,61].Recent findings show that the quantum Fisher information is the convex roof of the variance, apart from a constant factor [62,63], which allowed, for instance, to derive novel uncertainty relations [64,65], and will also be used in this article.
The paper is organized as follows.In Sec. 2, we summarize basic facts connected to quantum metrology.In Sec. 3, we summarize entanglement theory.In Sec. 4, we show how to transform the optimization over decompositions of the density matrix into an optimization of an expectation value over separable states.In Sec. 5, we show some applications of these ideas for the Wasserstein distance.We define a novel type of Wasserstein distance based on an optimization over separable states.In Sec. 6, we define variance-like quantities from the Wasserstein distance.In Sec. 7, we discuss how such a Wasserstein distance and the variance-like quantity mentioned above can be used to construct entanglement criteria.In Sec. 8, we introduce further quantities similar to the formulas giving the Wasserstein distance, but they involve the variance of two-body quantities rather than the second moment.In Sec. 9, we consider an optimization over various subsets of the quantum states.In Sec. 10, we extend our ideas to various generalized quantum Fisher information quanitities.

Quantum metrology
Before discussing our main results, we review some of the fundamental relations of quantum metrology.A basic metrological task is estimating the small angle θ in a unitary dynamics where H is the Hamiltonian.The precision is limited by the Cramér-Rao bound as [66,67,68,69,70,58,60,61,71,72,59, 73] where the factor 1/ν in Eq. ( 7) is the statistical improvement when performing independent measurements on identical copies of the probe state, and the quantum Fisher information is defined by the formula [66,67,68,69,70] where the density matrix has the eigendecomposition The quantum Fisher information is bounded from above by the variance and from below by the Wigner-Yanase skew information as where if ϱ is pure then the equality holds for both inequalities [68].For clarity, we add that the variance is defined as where the expectation value is calculated as The quantum Fisher information is the convex roof of the variance times four [62,63,71] where the optimization is carried out over pure state decompositions of the type For the probabilities p k ≥ 0 and k p k = 1 hold, and the pure states |Ψ k ⟩ are not assumed to be orthogonal to each other.
Apart from the quantum Fisher information, the variance can also be given as a roof [62,63] Note that convex and concave roofs of more complicated expressions can also be computed.For instance, and the expression are the convex and concave roofs, respectively, of the sum of several variances over the decompositions of ϱ given in Eq. (14).They play a role in the derivation of entanglement conditions [64], and will also appear in our results about the quantum Wasserstein distance.We add that there is an equality in Eq. (17) for N = 2 [74,75].Finally, we note that the quantum Fisher information can also be given with a minimization over purifications, which has been used, for instance, to study quantum metrology in noisy systems [76,77,78].The relation of this finding to the expression in Eq. ( 13) is discussed in Ref. [79].

Entanglement theory
Next, we review entanglement theory [55,56,57].A bipartite quantum state is separable if it can be given as [80] k where p k are for the probabilities, |Ψ k ⟩ and |Φ k ⟩ are pure quantum states.We will denote the set of separable states by S. The mixture of two separable states is also separable, thus the set S is convex.If a quantum state cannot be written as Eq.(18) then it is called entangled.A relevant subset of separable states are the symmetric separable states, which can be given as [81,82] We will denote the set of symmetric separable states by S ′ .The mixture of two symmetric separable states is also symmetric and separable, thus the set S ′ is also convex.Clearly For such states for the expectation value holds, where P s is the projector to the symmetric subspace defined by the basis vectors |nn⟩ and where F 12 is the flip operator for which for all m, n.
The set of quantum states with a positive partial transpose (PPT) P consists of the states for which holds for k = 1, 2, where T k is a partial transposition according to the k th subsystem.P is clearly a convex set.Moreover, all separable states given in Eq. (18) fulfill Eq. (24), thus holds.For two qubits and for a qubit-qutrit system, i.e., for 2 × 2 and 2 × 3 systems the set of PPT states equals the set of separable states [83,84,85].
It is generally very difficult to decide whether a state is separable or not, while it is very simple to decide whether the condition given in Eq. ( 24) is fulfilled.Such conditions can be even part of semidefinite programs used to solve various optimization problems (e.g., Ref. [86]).Thus, often the set of PPT quantum states, P is used instead of the separable states.Since for small systems P = S, optimization problems over separable states can be solved exactly for those cases.For larger systems, by optimizing over states in P instead of states in S, we get lower or upper bounds.
Finally, we will define the set of symmetric PPT states P ′ .States in this set are symmetric, thus Eqs. ( 21) and ( 22) hold.For the various sets mentioned above, we have the relation while for 2 × 2 and 2 × 3 systems, we have P ′ = S ′ .Convex roofs and concave roofs play also a central role in entanglement theory.The entanglement of formation is defined as a convex roof over pure components of the von Neumann entropy of the reduced state [87,88] where the optimization is over the decompositions given in Eq. ( 14), and the entanglement entropy of the pure components is given as where S is the von Neumann entropy.E F (ϱ) is the minimum entanglement needed to create the state.On the other hand, the entanglement of assistance is obtained as a concave roof [89,90] The above quantities correspond to the following scenario.Let us assume that the bipartite quantum state ϱ living on parties A and B is realized as the reduced state of a pure state living on parties A, B and C. Let us assume that party C makes a von Neumann measurement resulting in a state |Ψ k ⟩ on A and B, and it sends the measurement result k to A and B. After repeating this on many copies of the threepartite state, the average entanglement of A and B will be where p k is the probability of the outcome k and and for the |Ψ k ⟩ states Eq. ( 14) holds.If party C wants to help the parties A and B to have a large average entanglement, then Eq. (30) can reach the entanglement of assistance given in Eq. (29).On the other hand, the average entanglement is always larger than or equal to the entanglement of formation given in Eq. (27).
Next, we will introduce an entanglement condition based on the sum of several variances [91,92,93,94], which will be used later in the article.Let us consider a full set of traceless observables {G n } d 2 −1 n=1 for d-dimensional systems fulfilling Any traceless Hermitian observable can be obtained as a linear combination of G n .In other words, G n are the SU(d) generators.It is known that for pure states (see e.g., Ref. [94]) holds.Due to the concavity of the variance it follows that for mixed states we have [94] Let us now consider a d × d system.For a product state, |Ψ⟩ ⊗ |Φ⟩, we obtain [91,94] where in the last inequality we used that for pure states Eq. (32) holds.We also used the fact that if {G n } d 2 −1 n=1 is a full set of observables with the properties mentioned above, then {G T n } d 2 −1 n=1 is also a full set of such observables.Then, for bipartite separable states given in Eq. (18) [91,92,93,94] holds due to the concavity of the variance.Any state that violates the inequality in Eq. ( 35) is entangled.The left-hand side of Eq. ( 35) is zero for the maximally entangled state Thus, we say that the criterion given in Eq. (35) detects entangled states in the vicinity of the state given in Eq. (36).
Let us consider the d = 2 case concretely.Let us choose {G n } 3 n=1 = {σ x , σ y , σ z }.Equation (35) can rewritten as where σ l are Pauli spin matrices defined as The left-hand side of the inequality given in Eq. ( 37) is zero for the state We can have a similar construction with only three operators for any system size.Let us consider the usual angular momentum operators j x , j y and j z , living in a d-dimensional system, fulfilling where d = 2j+1.In particular, let us define the matrix [95] Then, we need the ladder operators + , where 1 ≤ m, n ≤ d and δ kl is the wellknown Kronecker delta.Then, we define the x and y components as After clarifying the operators used, we need the uncertainty relation which is true for all quantum states.Based on Eq. (44), it can be proved that for separable states we have [96,97,98,94] [∆(j c. f.Eq. (37).The left-hand side of the inequality given in Eq. ( 45) is zero for the maximally entangled state given in Eq. (36).It is easy to see that Eq. ( 45) is a tight inequality for separable states, since the | + j, +j⟩ state saturates it.We now present a simple expression for which the maximum for general states is larger than the maximum for separable quantum states.We know that for separable states [99, 97] while the maximum for quantum states is 8 and it is taken by the singlet state We add that the product state |01⟩ x saturates the inequality Eq. (46), where |.⟩ x is a state given in the x-basis.That is, for a qubit, the basis states in the x-basis are Let us now determine a complementary relation.We will find the minimum for separable states for the left-hand side of Eq. (46).We need to know that for separable states holds.This can be seen as follows [91,92].We need to know that for two-qubit quantum states holds.For product states |Ψ⟩ ⊗ |Φ⟩, the left-hand side of Eq. ( 49) equals the sum of single system variances, which can be bounded from below as The bound for separable states given in Eq. ( 18) is the same as the bound for product states, since the variance is a concave function of the state.The lefthand side of Eq. ( 49) is zero for the state The product state |11⟩ x saturates the inequality given in Eq. ( 49) and for that state hold.Thus, for separable states holds and it is even a tight inequality for separable states, c. f.Eq. (46).

Optimization over the two-copy space
In this section, we first review the formalism that maps the optimization over the decompositions of the density matrix to an optimization of an operator expectation value over bipartite symmetric separable quantum states with given marginals [100] 2 .Then, we show that the same result is obtained if the optimization is carried out over general separable quantum states rather than symmetric separable states.
We start from writing the variance of a pure state |Ψ⟩ as an operator expectation value acting on two copies as where we define the operator Based on Eq. ( 55), the expression of the quantum Fisher information given in Eq. ( 13) can be rewritten as [62,100] The sum can be moved into the trace and we obtain On the right-hand side in Eq. (58) in the square bracket, we can recognize a mixed state living in the two-copy space States given in Eq. ( 59) are symmetric separable states, thus ϱ 12 ∈ S ′ .We can rewrite the optimization in Eq. ( 58) as an optmization over symmetric separable states given in Eq. (59) as Due to the optimization over symmetric separable states, Tr 1 (ϱ 12 ) = ϱ is fulfilled without adding it as an explicit constraint.Equation (60) contains an optimization over symmetric separable states, which cannot be computed directly.However, we can consider an optimization over a larger set, the set of symmetric PPT states, which leads to an expression that can be obtained numerically using semidefinite programming [101] In general, the relation holds, while for two qubits we have an equality, since for that system size S ′ = P ′ , as we discussed in Sec. 3.
Let us now change the operator to be optimized, making it permutationally symmetric.Equation (60) can be rewritten as We will now show that the expression in Eq. (63) remains true if we change the set over which we have to optimize to the set of separable states.
Observation 1.The quantum Fisher information can be obtained as an optimization over separable states as Proof. Using the right-hand side of Eq. ( 64) can be reformulated as Due to the permutational invariance of H ⊗ H and that both marginals must be equal to ϱ, if a separable state given in Eq. ( 18) maximizes the correlation ⟨H ⊗ H⟩, then the separable state also maximizes the correlation.Then, the mixture of the above two separable states also maximize the correlation with the given marginals, which can always be written as where π k for k = 1, 2, .., M is a permutation of 1, 2, ..., M for some M. Based on our arguments, it is sufficient to look for the separable state that maximizes ⟨H ⊗ H⟩ in the form Eq. ( 68), and pk ≥ 0, k pk = 1.Then, the correlation can be given as where we define the expectation values of Based on the Cauchy-Schwarz inequality we have Thus, when we maximize Tr[(H ⊗ H)ϱ 12 ] over separable states with the constraints for the marginals, the maximum is taken by a symmetric separable state given in Eq. (59).In this case both inequalities are saturated in Eq. (71).■ Based on Eq. (15), we can obtain the variance also as a result of an optimization over a two-copy space.
Observation 2. The variance can also be obtained as an optimization over symmetric separable states as Example 1.Note that if we replace the set of symmetric separable states by the set of separable states in Eq. ( 72), then we get a different quantity, which can be larger in some cases than (∆H) 2 .Let us see a concrete example.For instance, if ϱ = 1/2 then among symmetric separable states, the maximum is 1 and it is attained by the state where the Bell state |Ψ + ⟩ is given as The state given in Eq. ( 73) can be decomposed into the mixture of symmetric product states as where the symmetric product state is defined as Among separable states, the maximum is 2 and it is attained by the state Note that ϱ 12 given in Eq. ( 77) is not symmetric.

Quantum Wasserstein distance based on a separable coupling
Next, we will connect our result in Observation 1 to results available in the literature mentioned in the introduction.We will consider the quantum Wasserstein distance such that the optimization takes place over separable states rather than over general bipartite quantum states.We will consider such modifications of D GMPC (ϱ, σ) 2 and D DPT (ϱ, σ) 2 , and examine their properties.Our first finding concerning the GMPC distance is the following.
Definition 3. Modifying the definition in Eq. ( 5) we can define a new type of GMPC distance such that we restrict the optimization over separable states as We can see immediately two relevant properties of the newly defined distance.Observation 1 showed that for N = 1 for a given H 1 holds.Moreover, based on Eqs. ( 5) and (78), we can immediately see that since on the left-hand side of Eq. ( 80) there is a minimization over a larger set of quantum states than on the right-hand side.Based on Eqs. ( 79) and (80), it follows that for the self-distance for N = 1 for a given H 1 we obtain Now, our goal is to give an optimal transport map (plan) corresponding to an optimal coupling of D GM P C,sep (ϱ, σ) 2 .Note that there can be several optimal couplings.We look for a CPTP map Φ corresponding to an optimal separable coupling ϱ 12 .Let us assume that an optimal separable coupling equals the separable state given in Eq. (18).Based on Definition 3, it has the following marginals.We obtain ϱ as in Eq. ( 14), and for the state holds.Let us consider the map [46] where the Kraus operators are given as and ϱ −1 is the inverse of ϱ on its support.For our particular transport problem, let us choose It is clear that Φ is completely positive and it is trace preserving, since Since the map transforms ϱ to σ the CPTP map Φ given in Eq. ( 83) gives the optimal transport map we were looking for.Let us see some properties of the map we have just found.If |Ψ k ⟩ are pairwise orthogonal to each other then holds.In this case, the map can be realized by a von Neumann measurement in the basis given by {|Ψ k ⟩}, with a subsequent unitary that transforms |Ψ k ⟩ to |Φ k ⟩.It is instructive to look at the action of the map on the state Then, we obtain the optimal coupling ϱ 12 as Hence, for every map of given by Eqs. ( 83), ( 84) and (85) there is a corresponding coupling.In summary, if we restrict the optimization for separable states, then when computing D GMPC,sep (ϱ, σ) 2 , for all ϱ and σ, there is a transport map corresponding to all optimal couplings.Note that this was not the case for D GMPC (ϱ, σ) 2 .
Let us now define another distance based on an optimization over separable states.

Definition 4.
Based on the definition of D DPT (ϱ, σ) 2 given in Eq. (1), we can also define As an important property of D DPT,sep (ϱ, σ) 2 , we can see that holds, since on the left-hand side of Eq. ( 92) there is a minimization over a larger set of quantum states than on the right-hand side.Let us see now, what kind of map corresponds to an optimal separable coupling ϱ 12 , when we calculate D DPT,sep (ϱ, σ) 2 .Let us assume that an optimal separable coupling is of the form Based on Definition 4, we obtain ϱ as in Eq. ( 14), and σ as in Eq. (82).It turns out that the map we need is just Φ(ϱ) defined in Eq. (83).It is instructive to look at the action of the map on the state where ϱ 0 is defined in Eq. (89).Then, we obtain the optimal coupling ϱ 12 as Hence, for every map of given by Eqs. ( 83), ( 84) and (85) there is a corresponding coupling.
It is instructive to relate our results to those of Ref. [46].In Ref. [46], the role of ϱ T1 0 is played by the purification.Moreover, if we compute the selfdistance and thus ϱ = σ, the map given in Eq. ( 83) is not the identity map.On the other hand, in Ref. [46] the map was the identity map in that case.
Interestingly, the two different distance measures, defined with the transpose and without it, respectively, are equal to each other.
Observation 3. The two quantum Wasserstein distance measures are equal to each other Proof.We need to know that for two matrices X and Y acting on a bipartite system holds for k = 1, 2. Then, from Eq. (78) it follows that Then, we arrive at Eq. (91) by noticing that holds if and only if holds.■ It is instructive to obtain a quantum state that maximizes D DPT,sep (ϱ, ϱ) 2 and D DPT (ϱ, ϱ) 2 for N = 1 for a given H 1 as follows.We know that (101) holds.The first two inequalities are based on Eq. (10).The third one can be obtained as follows.Simple algebra shows that for the state where |h min ⟩ (|h max ⟩) is the eigenstate of H 1 with the minimal eigenvalue h min (maximal eigenvalue h max ) of H 1 , the variance (∆H 1 ) 2 is maximal, and all inequalities of Eq. (101) are saturated.Hence, the maximal self-distance is achieved by Let us calculate D GMPC,sep (ϱ, σ) 2 and the other quantum Wasserstein distance measures for some concrete examples.
Example 2. Let us consider the case when ϱ = |Ψ⟩⟨Ψ| is a pure state of any dimension and σ is an arbitrary density matrix of the same dimension.Then, when computing the various Wasserstein distance measures between ϱ and σ, the state ϱ 12 in the optimization is constrained to be the tensor product of the two density matrices.Hence, for the distance from a pure state, holds.The last expression in Eq. ( 104) can be written also as 1. ( Let us take N = 1 and H 1 = σ z .Then, when computing D GMPC,sep (ϱ, σ) 2 , the optimum is attained by the separable state and for the self-distance we have holds.If H 1 is a different operator then the state ϱ 12 corresponding to the minimum will be different.
It can be obtained from the state given in Eq. ( 107) with local unitaries.For instance, for H 1 = σ x , the optimum is attained by the separable state Example 4. Let us consider the two-qubit states ϱ = σ = D p , where the diagonal state is defined as Let us take N = 1 and H 1 = σ z .When computing D GMPC,sep (ϱ, σ) 2 , he optimum is reached by the bipartite separable state The self-distance is zero for all p Example 5. Let us consider the two single-qubit mixed states and ϕ is a real parameter.For N = 1, H 1 = σ z we plotted D DPT,sep (ϱ, σ ϕ ) 2 and D DPT (ϱ, σ ϕ ) 2 in Fig. 2. The details of the numerical calculations are in Appendix A.
For ϕ = 0, σ ϕ = ϱ, hence it this case the two types of distance equal the corresponding types of self-distance of ϱ.That is, based on Eq. (3), we have and based on Eq. ( 79) and taking into account Observation 3, we have Here, we used the formula giving the quantum Fisher information with the variance for pure states mixed with white noise as [102,103,71,104] where d is the dimension of the system.For ϕ = π/2, Numerics show that while for smaller ϕ an entangled ϱ 12 is cheaper than a separable one.We can relate D GMPC,sep (ϱ, σ) 2 to the quantum Fisher information.
Observation 4. For the modified GMPC distance defined in Eq. (78) the inequality (121) holds, and for ϱ = σ and for N = 1 we have equality in Eq. (121).
Proof.We can rewrite the optimization problem in Eq. (78) for separable states as where for the first optimization the decomposition of ϱ 12 is given as Eq. ( 18), and we have the conditions For the second optimization, the condition with ϱ is given in Eq. (14).For the third optimization, the condition is where for the probabilities q k ≥ 0 and k q k = 1 hold.The fourth and fifth optimization are similar to the second and the third one, however, the order of the sum and the minimization is exchanged.From Eq. (122), the statement follows using the formula giving the quantum Fisher information with the convex roof of the variance in Eq. (13).■ In Eq. (122), in the second and third lines we can see that, when computing D GMPC,sep (ϱ, σ) 2 , the quantity to be minimized is a weighted sum containing the variances of H n for |Ψ k ⟩ and |Φ k ⟩, and the expression holds.
Based on Example 2 and the proof of Observation 4, we find that the GMPC distance can be obtained with an optimization over separable decompositions as where for the optimization the decomposition of ϱ 12 is given as Eq. ( 18), and we have the conditions on the marginals given in Eq. (123).This way, we computed the distance for mixed states using the a formula for the distance for pure states and an optimization.An analogous expressions holds also for D DPT,sep (ϱ, σ) 2 .
Let us see now the consequences of the above observations for the self-distance, which, unlike in the classical case, can be nonzero.It is obtained as and the condition for the minimization is given in Eq. (14).Then, based on Eq. ( 80), it follows that the self-distance for the GMPC distance is bounded from above as Let us examine some properties of D GMPC,sep (ϱ, ϱ) 2 .From Eq. (127) it follows that for the self-distance for N = 1 for a given H 1 holds, if and only if After studying the self-distance, let us look now for similar relations for the distance between two different states.We find that for N = 1 for a given H 1 D GMPC,sep (ϱ, σ) 2  Let us look for a relation between the distance and the self-distance.From the first inequality in Eq. (122), it follows that for any ϱ and σ For a local dimension d > 2, the optimization over separable states is difficult to carry out numerically.Thus, it is reasonable to define D DPT,PPT (ϱ, σ) 2  and D GMPC,PPT (ϱ, σ) 2 that need an optimization over PPT states rather than separable states.It is possible to prove that the two new quantities are equal to each other.
Observation 5.The two quantum Wasserstein distance measures are equal to each other Proof.We have to follow ideas similar to the ones in Observation 3. In particular, we need to use that ϱ 12 is a PPT quantum state if and only if ϱ T1  12 is a PPT quantum state.
■ In order to define further Wasserstein distance measures based on an optimization over other supersets of separable states, we need to know the separability criterion based on symmetric extensions [105,106,107].A given bipartite state ϱ AB is said to have a n : m symmetric extension if it can be written as the reduced state of a multipartite state ϱ A1..AnB1..Bm , which is symmetric under A k ↔ A l for all k ̸ = l and under B k ′ ↔ B l ′ for all k ′ ̸ = l ′ .If we also require that the state is PPT for all bipartitions, then the state has a PPT symmetric extension.The requirement of having a PPT symmetric extension for n = 1 and m = 1 is equivalent to the PPT condition, while for n > 1 or m > 1 the condition is stronger.Bipartite separable states have such extensions for arbitrarily large n and m, while the lack of such an extension for some m and n signals the presence of entanglement.In particular, a state is separable if and only if there is an extension for any n and for m = 1.
Let us define D DPT,PPTn (ϱ, σ) 2 and D GMPC,PPTn (ϱ, σ) 2 based on an optimization over quantum states with a PPT symmetric extension for given n and for m = 1.
if n ′ > n.Based on Observations 3 and 5, we can also see that for every n.

Variance-like quantities
When we compare the expression in Eq. (63) defining the quantum Fisher information and the other expression in Eq. (72) defining the variance, we can see that the main difference is that the minimization is replaced by a maximization.We also showed a similar relation between the entanglement of formation given in Eq. (27), and the entanglement of assistance given in Eq. (29).Based on this observation, we can define variance-like quantities from the various forms of quantum Wasserstein distance by replacing the minimization by maximization.Such a variance-like quantity can be interpreted in the framework of transport problems as follows.The quantum Wasserstein distance determines the smallest cost possible for the transport problem by a minimization.The variance-like quantities presented in this section determine the largest cost possible for the transport problem.Knowing the largest possible cost is useful when judging how close the cost of a given transport plan is to the optimal cost.
Let us now define the first variance-like quantity.Definition 5. From the GMPC distance with an optimization restricted over separable states given in Eq. ( 78), we obtain the following variance like quantity Clearly, the inequality holds, since since on the left-hand side of Eq. (138) we maximize over a set of quantum states while on the right-hand side we minimize over the same set.We can define analogously V DPT,sep (ϱ, σ), V GMPC (ϱ, σ), and V DPT (ϱ, σ), modifying the definition of D DPT,sep (ϱ, σ) 2 , D GMPC (ϱ, σ) 2 , and D DPT (ϱ, σ) 2 , respectively.
In all these cases, a relation analogous to the one in Eq. (138) can be obtained.
The value of V GMPC,sep (ϱ, σ) is related to the variance.
Observation 6.The GMPC variance defined in Eq. (137) is bounded from below as Proof.We can rewrite the expression to be computed for V GMPC,sep (ϱ, σ) as where ϱ 12 has marginals ϱ and σ, as given in Eq. (123), and we define Since the product state ϱ ⊗ σ is separable and fulfills the conditions on the marginals, it is clear that Substituting Eq. (142) into Eq.(140), using the fact (⟨H n ⟩ ϱ − ⟨H n ⟩ σ ) 2 ≥ 0, we can prove the observation.

■
We can even obtain an upper bound.Observation 7. The GMPC variance defined in Eq. ( 137) is bounded from above as Proof.We can find an upper bound for the expression to be computed for V GMPC,sep (ϱ, σ) as where ϱ 12 has marginals ϱ and σ, as given in Eq. ( 123), and we used that ■ Let us see some concrete examples.Example 6.Interestingly, the quantity can be larger or smaller than or equal to c. f.Eq. (133).We consider N = 1 and H 1 = σ z .The three possibilities above are realized by the following states.For the state we have V > V.For the following state we have V = V.Finally, for the following state we have V < V. Let us look now for larger systems.
For systems with a local dimension d = 3 and for H 1 = diag(−1, 0, 1), N = 1 and for we have V < V.Note that in all examples where ϱ, σ and H 1 were all diagonal, there is an optimal diagonal ϱ 12 , essentially corresponding to the classical case.
The two different variance-like quantities, defined with the transpose and without it, respectively, are equal to each other.
Observation 8.The two types of quantum Wasserstein variance are equal to each other Proof.The proof is analogous to that of Observation 3. ■ Let us calculate V DPT,sep (ϱ, σ) for some concrete examples.
Example 7. Let us consider the case when ϱ = |Ψ⟩⟨Ψ| is a pure state of any dimension and σ is an arbitrary density matrix of the same dimension.Then, when computing the various types of quantum Wasserstein distance and quantum Wasserstein variance between ϱ and σ, the state ϱ 12 in the optimization is constrained to be the tensor product of the two density matrices.Hence, it follows that holds, and analogous equations hold for the quantities V DPT,sep (|Ψ⟩⟨Ψ|, σ), V GMPC,sep (|Ψ⟩⟨Ψ|, σ), and V GMPC (|Ψ⟩⟨Ψ|, σ), where the Wasserstein distance measures for this case are given in Eq. (104).Example 8.For ϱ = σ = |Ψ⟩⟨Ψ|, and for N = 1 for a given H 1 we obtain Example 9. Let us consider the single-qubit states given in Eq. (106).Let us take N = 1 and H 1 = σ z .Then, when computing V GMPC,sep (ϱ, σ), the optimum is attained by the separable state [c. f.Eq. ( 107)] and we have 1, (See also Example 1.) If H 1 is a different operator then the state ϱ 12 corresponding to the maximum will be different.It can be obtained from the state given in Eq. (155) with local unitaries.For instance, for H 1 = σ x , the optimum is reached by the separable state where |.⟩ x is a state given in the x-basis.
7 Quantum Wasserstein distance and entanglement criteria Example 5 highlighted that in certain cases the minimum for separable states is larger than the minimum for general states.In this case, entanglement can help to decrease the quantum Wasserstein distance.In this section, we analyze the relation of the quantum Wasserstein distance and entanglement conditions on the optimal ϱ 12 couplings.First, we make the following simple observation.Observation 9.If holds then all ϱ 12 states that minimize the cost for a given ϱ and σ, when computing D DPT (ϱ, σ) 2 , are entangled.In short, all optimal ϱ 12 states are entangled.
The situation is analogous if Then, all optimal ϱ 12 states for a given ϱ and σ, when computing D GMPC (ϱ, σ) 2 , are entangled.Thus, we can even use the quantum Wasserstein distance as an entanglement criterion detecting entanglement in the optimal ϱ 12 states.We can even consider conditions with the selfdistance.Based on Eqs.(3), (79), and (96), we find that for the N = 1 case is equivalent to a relation between the quantum Fisher information and the Wigner-Yanase skew information Thus, the condition in Eq. (161) implies that the optimal ϱ 12 states, obtained when computing D DPT (ϱ, ϱ) 2 , are all entangled.
Let us now use entanglement criteria to construct relations for the quantum Wasserstein distance, that can verify that the coupling ϱ 12 is entangled.If the inequality given in Eq. (35) holds for separable states, so does the inequality since the left-hand side of Eq. (162) is never smaller than the left-hand side of Eq. (35).Any state that violates the inequality in Eq. (162) is entangled.It can be shown that Eq. (162) is a tight inequality for separable states as follows.Based on Eq. (34), we see that for pure product states of the form for the second moments holds since for this state for all n.
Next, we will define a quantum Wasserstein distance related to the entanglement condition in Eq. (162).

Observation 10. Let us consider d-dimensional systems with
H n = G n (166) holds, then all optimal ϱ 12 states are entangled.Here, for clarity, we give explicitly the observables used to define the distance in the superscript.
Clearly, since when calculating D DPT,sep (ϱ, σ) 2 , we optimize over separable states, we have Thus, independently from what ϱ and σ are, their distance (ϱ, σ) 2 cannot be smaller than a bound.This is true even if ϱ = σ.
Let us now use another entanglement condition to construct relations for the quantum Wasserstein distance that can verify that the coupling is entangled.We know that the inequality given in Eq. (45) holds for separable states and it is tight.Based on these, we can obtain the following bounds on the quantum Wasserstein distance.
Observation 11.Let us choose the set of operators as Then, if the inequality holds, then all optimal ϱ 12 states are entangled.Clearly, since when calculating D DPT,sep (ϱ, σ) 2 , we optimize over separable states, we have Thus, again, independently from what ϱ and σ are, their distance D {jx,jy,jz} DPT,sep (ϱ, σ) 2 cannot be smaller than a bound.This is true even if ϱ = σ.
So far we studied the relation of the quantum Wasserstein distance to entanglement.Next, let us consider the relation of V DPT (ϱ, σ) and V GMPC (ϱ, σ) to entanglement.
holds then all optimal ϱ 12 states for a given ϱ and σ, when computing V DPT (ϱ, σ), are entangled.The situation is analogous if Next, we will determine a set of H n operators that can be used efficiently to detect entanglement with the quantum Wasserstein variance.For that, we need to know that for separable states the inequality given in Eq. ( 46) holds.
Observation 13.Let us consider an example with d = 2 and holds, then all optimal ϱ 12 states are entangled.
Clearly, since when calculating V DPT,sep (ϱ, σ), we optimize over separable states, we have Let us see now a complementary relation for D GMPC (ϱ, σ) 2 and D GMPC,sep (ϱ, σ) 2 .They will use the same H n operators that appear in Observation 13.We need to know that for separable states the inequality given in Eq. (54) holds.
Observation 14.Let us consider d = 2 and then all optimal ϱ 12 states are entangled.Clearly, since when calculating D DPT,sep (ϱ, σ), we optimize over separable states, we have Statements analogous to those of Observations 13 and 14 can be formulated, with identical bounds, for

Optimization of the variance over the two-copy space
In this section, we examine the quantity that we obtain after replacing the second moment by a variance in the optimization in the definition of D GMPC,sep (ϱ, σ) 2 given in Eq. (78) and in the definition of V GMPC,sep (ϱ, σ) in Eq. (137).Analogous ideas work also for the other types of quantum Wasserstein distance and quantum Wasserstein variance defined before.We will show that such quantities have interesting properties.Definition 6.After replacing the second moment by a variance in the optimization in the definition of D GMPC,sep (ϱ, σ) 2 given in Eq. (78), we define Let us see some properties of the quantity we have just introduced.In general, for mixed states, holds, hence clearly Due to the relation in Eq. (179), for the self-distance we have where D GMPC,sep (ϱ, σ) 2 is given in Eq. (78).
We can write the expression to be optimized as where ϱ 12 has marginals ϱ and σ, as given in Eq. (123), and we define C ϱ12 as in Eq. (141).For pure states, ϱ = |Ψ⟩⟨Ψ| and σ = |Φ⟩⟨Φ|, we have C = 0, and hence We can present lower bounds on the distance.Observation 15.The modified GMPC distance defined in Eq. (178) is bounded from below as (184) while for ϱ = σ we have equality for N = 1 in Eq. (184).
Proof.We can rewrite the optimization problem in Eq. (178) based on Eqs.(122) and (179) as where we used in the first inequality that for real holds due to the fact that f (x) = x 2 is convex.In Eq. (185), in the first optimization the decomposition of ϱ 12 is given as Eq. ( 18), and we have the conditions given in Eq. (123).For the second optimization, the condition is Eq.(14).For the third optimization, the condition is given in Eq. (124), where for the probabilities q k ≥ 0 and k q k = 1 hold.The fourth and fifth optimization are similar to second and the third one, however, the order of the sum and the minimization is exchanged.From Eq. (185), the statement follows using the formula that obtains the quantum Fisher information with a convex roof of the variance given in Eq. (13).■ We can define another variance-like quantity.Definition 7. Analogously, we can define the quantity that we obtain after replacing the second moment by a variance in the optimization in the definition of V GMPC,sep (ϱ, σ) 2 in Eq. (137), as Let us see now some properties of ṼGMPC,sep (ϱ, σ).In general, holds.

Optimization over other subsets of physical states
So far we considered Wasserstein distance based on an optimization over all bipartite physical states, separable states, PPT states, and states with a PPT symmetric extension considered in Sec. 5.In this section we examine other convex sets of quantum states.We will also discuss some relevant couplings.By optimizing over a convex set different from the ones we have considered, we will obtain a Wasserstein distance with a different self-distance.
Let us consider the set of couplings for which the quantum discord is zero [108,109,110].If we assume that hold, then an element of the set is of the form where the eigendecomposition of ϱ is given in Eq. ( 9) Clearly, such states given in Eq. (193) form a convex set.Such states are called classical-quantum since in subsystem 1 we have a mixture of states that are pairwise orthogonal to each other [110].We will denote the set of such states as C 1 .For such couplings, the map given in Eq. ( 83) needs only a von Neumann measurement.
Another possibility is the quantum-classical states, which we will denote by C 2 .Such states are of the form where the eigendecomposition of σ is and for the density matrices holds.A minimization over C 1 or C 2 will lead to a larger value than a minimization over separable couplings.An optimization over C 1 or C 2 can efficiently be carried out using semidefinite programming for any system size.We can also consider the set of states that are classical-classical, which are the members of both C 1 and C 2 .
Let us consider now the case of the GMPC selfdistance, when ϱ = σ.Then, a relevant coupling which is the elements of C 1 and C 2 is where the eigendecomosition of ϱ is given in Eq. (9).For the coupling in Eq. (198), for N = 1 the equals So far, we have been obtaining results for the GMPC distance.Analogous statements hold for the DPT distance.
Another relevant case is the product state coupling Then, we can define the distances given in Eqs.(5) and (1) for product states as c. f.Eq. (105).For the self-distance, the relation In Table 1, we summarized the self-distances obtained for the Wasserstein distance considering an optimization over various subsets of the bipartite quantum states and N = 1.For the quantities in the Table, the inequality given in Eq. (10) holds.As expected, a minimization over a larger set will not lead to a larger value and often will lead to a smaller value.
We can consider an optimization over other convex sets of states.For instance, a convex set can be characterzied by constraints like or by the linear constraints where A k and B k are operators, c k and d k are constants.Other possibilities are the convex set of states with negativity not larger than a given bound [86], and the convex set of states not violating certain entanglement conditions.These conditions can be incorporated into the numerical optimization.We can also consider the convex set of states with a local hidden variable model [55,56,57].
10 Alternative definition of the Wasserstein distance such that the self-distance equals various generalized quantum Fisher information quantities In this section, we will modify the definition of the Wasserstein distance such that the self-distance equals a quantity different from the ones we considered so far.In particular, we would like that it equals various generalized quantum Fisher information quantities.The Wigner-Yanase skew information and the quantum Fisher information are two members of this family.The basic idea of Refs.[111,112] is that for each standard matrix monotone function f : R + → R + , a generalized variance and a corresponding quantum Fisher information are defined.The notion standard means that f must satisfy For a review on generalized variances, generalized quantum Fisher information quantities and covariances see Ref. [113].Moreover, it is also useful to define the mean based on f as and use it instead of f.The normalization condition given in Eq. (205a) corresponds to the condition for the means.The requirement given in Eq. (205b) corresponds to A list of generalized quantum Fisher information quantities generated by various well-known means m f (a, b) can be found in Refs.[111,112].
Using the normalization suggested by Ref. [62], we arrive at a family of generalized quantum Fisher information quantities F f Q [ϱ, H] such that (i) for pure states, we have H] is convex in the state.The generalized variance var f ϱ (H) fulfills the following two requirements.(i) For pure states, the generalized variance equals the usual variance (ii) For mixed states, var f ϱ (H) is concave in the state.A family of generalized quantum Fisher information and generalized variance fulfilling the above requirements are [62] where the matrix elements of H in the eigenbasis of ϱ are denoted as F f Q [ϱ, H]/4 has been called metric-adjusted skew information [114].
Note that the definition of the variance in Eq. (211b) requires that m f (1, 0) ≡ f (0) is nonzero.In such cases f is called regular [115].
The usual quantum Fisher information and the usual variance corresponds to and the arithmetic mean Note that f max (x) is the largest among standard matrix monotone functions.Due to this, and the mean Note that we get 4 times the usual Wigner-Yanase skew information due to the chosen normalization We now show a method to express the various generalized quantum Fisher information quantities with each other.
Observation 16.Let us define for given f the following matrix in the eigenbasis of ϱ Then, any generalized quantum Fisher information can be expressed as where "•" denotes element-wise or Hadamard product defined as where |k⟩ and |l⟩ are the eigenvectors of the density matrix, and the coefficient for converting one type of quantum Fisher information into another ons is given as Proof.The statement can be verified by direct comparison of the definition of the quantum Fisher information given in Eqs.(211a) and (219).■ It is interesting to compute the matrix needed for f 1 = f max and f 2 = f WY .We obtain, in the basis of the eigenvectors of ϱ, (222) Here note that in Eq. (222), the denominator of the fraction with λ k and λ l is positive if λ k ̸ = λ l .With the matrix in Eq. (222), we can use the Wigner-Yanase skew information to obtain a the quantum Fisher information Next, we compute the matrix needed for f 1 = f WY and f 2 = f max .We obtain, in the basis of the eigenvectors of ϱ, (224) Then, we can also obtain the Wigner-Yanase skew information with the quantum Fisher information as Next, we show how the various generalized quantum Fisher information quantities can be expressed as a convex roof over the decompositions of the density matrix.
Observation 17.The various generalized quantum Fisher information quantities can be expressed as a convex roof as where the optimization is over pure state decompositions given in Eq. (14), and in the basis of the eigenvectors of ϱ we define Proof.It follows from Observation 16 defining (X f ) kl and the definition of the quantum Fisher information with the convex roof of the variance given in Eq. (13).
■ Next, we show how the various generalized quantum Fisher information quantities can be expressed as an optimization in the two-copy space.
Observation 18.The generalized quantum Fisher information can be obtained as an optimization over separable states where Q f WY ,fmax is defined in Eq. (224).Observation 19.We can obtain the various quantum Fisher information quantities as an optimization over general quantum states, rather than over separable states, as where in the basis of the eigenvectors of ϱ we define Proof.We use the definition of D DPT (ϱ, σ) 2 given in Eq. (1) and the equation relating it to the Wigner-Yanase skew information given in Eq. (3), together with Eq. (219).■ Let us again calculate a concrete example.For instance, for obtaining the quantum Fisher information as an optimization over general quantum states, we should use where Q fmax,f WY is defined in Eq. (222).
Based on these, we can define various Wasserstein distance measures, for which the self-distance equals various quantum Fisher information quantities.Based on Observation 19, for the self-distance holds.It would be interesting to examine the properties of the quantities defined in Definitions 8 and 9.

Conclusions
We discussed how to define the quantum Wasserstein distance as an optimization over bipartite separable states rather than an optimization over general quantum states.With such a definition, the self-distance becomes related to the quantum Fisher information.We introduced also variance-like quantities in which we replaced the minimization used in the definition of the quantum Wasserstein distance by a maximization, and examined their properties.We discussed the relation of our findings to entanglement criteria.We examined also the quantity obtained after we considered optimizing the variance rather than the second moment in the usual expression of the quantum Wasserstein distance.Finally, we extended our results to the various generalized quantum Fisher information quantities.The details of the numerical calculations are discussed in Appendix A.

Table 1 :
Self-distance obtained for the Wasserstein distance considering an optimization over various subsets of the bipartite quantum states for N = 1.
12 ], s. t. ϱ 12 ∈ S, Tr 2 (ϱ 12 ) = ϱ, Proof.It follows from Observation 1 defining the quantum Fisher information with an optimization over bipartite separable quantum states and Observation 16 defining (X f ) kl .■ Let us calculate a concrete example.For instance, for obtaining the Wigner-Yanase skew information times four as an optimization over separable states, we should use