Efficient Computation of the Quantum Rate-Distortion Function

The quantum rate-distortion function plays a fundamental role in quantum information theory, however there is currently no practical algorithm which can efficiently compute this function to high accuracy for moderate channel dimensions. In this paper, we show how symmetry reduction can significantly simplify common instances of the entanglement-assisted quantum rate-distortion problems. This allows us to better understand the properties of the quantum channels which obtain the optimal rate-distortion trade-off, while also allowing for more efficient computation of the quantum rate-distortion function regardless of the numerical algorithm being used. Additionally, we propose an inexact variant of the mirror descent algorithm to compute the quantum rate-distortion function with provable sublinear convergence rates. We show how this mirror descent algorithm is related to Blahut-Arimoto and expectation-maximization methods previously used to solve similar problems in information theory. Using these techniques, we present the first numerical experiments to compute a multi-qubit quantum rate-distortion function, and show that our proposed algorithm solves faster and to higher accuracy when compared to existing methods.


Introduction
The rate-distortion function quantifies how much a signal can be compressed by a lossy channel while remaining under a maximum allowable distortion of the signal.This important tool in information theory was originally formulated for the classical setting in Shannon's seminal work [1], and has recently been extended to the quantum setting [2,3].More concretely, in the quantum setting, the signal can be represented by an n × n positive semidefinite Hermitian matrix ρ ∈ H n + with unit trace, and the distortion can be quantified using a matrix ∆ ∈ H mn + .The corresponding entanglement-assisted quantum rate-distortion function R( • ; ρ, ∆) : R + → R is then given by R(D; ρ, ∆) = min (1) Here, D ≥ 0 represents the maximum allowable distortion of the signal, tr B : H mn → H n is a linear operator called the partial trace (see Section 2.2), and I( • ; ρ) is a reparameterization of the quantum mutual information and defined as accuracy solutions.We show in Appendix C that our algorithm obtains linear convergence for common instances of the quantum-rate distortion problem.Alternatively, we can formulate the quantum rate-distortion problem as an instance of a quantum relative entropy program (see, e.g., [16,17]), which we can solve using existing software which supports this problem class, such as CvxQuad [18], Hypatia [19], or DDS [20,21].However, these methods do not scale well to large problem dimensions.CvxQuad uses large semidefinite approximations of matrix logarithms, while Hypatia and DDS use second-order interior point algorithms.Therefore, they are limited to computing the rate-distortion function of small quantum channels.In contrast, the mirror descent algorithm is a first-order method which is better at scaling to large problem dimensions.
Contributions Our main contributions are threefold.
• First, we show that for certain common distortion measures, the corresponding ratedistortion problems possess symmetries that allow us to better understand the structure of quantum states and channels which obtain the optimal rate-distortion tradeoff by using a symmetry reduction technique.For the quantum rate-distortion problem with entanglement fidelity distortion measure and maximally mixed input state, we show in Theorem 4.16 that there are sufficient symmetries which allow us to directly obtain an explicit solution to the rate-distortion function.For some other common quantum-rate distortion instances, we show how symmetry can be exploited to significantly reduce the dimensionality of the optimization problem.This results in improvements to both computational complexity and memory requirements, and is independent of the optimization algorithm used.
• Second, we show that the expectation-maximization approach of [14] can be interpreted as a mirror descent algorithm, and that we can rederive the algorithm and its analysis in an elementary way using tools from convex optimization.
• Third, unlike [14], we present several concrete strategies for how to solve the mirror descent subproblems while retaining convergence guarantees for the quantum ratedistortion problem.We do this primarily by introducing an inexact method, based on [22], which improves on [14] by allowing the tolerance at which we compute each iterate at to converge to zero with each subproblem, and by using an explicitly computable error criterion.This results in a more efficient algorithm which can converge sublinearly to the solution, rather than only converging to a neighborhood around the solution.
Our main algorithm is presented in Algorithm 3, which we provide an implementation of with symmetry reduction techniques at https://github.com/kerry-he/efficient-qrd.
We present numerical experiments on quantum channels of up to 9-qubits to demonstrate the scalability of our approach.To the best of our knowledge, no other works have presented numerical experiments computing the quantum rate-distortion function for more than a single-qubit channel (e.g., [23], which estimates the curve using a sampling-based method).We note that unlike [6,14], we do not directly account for the distortion constraint.However, it is relatively straightforward to extend most of our analysis and algorithm to account for it.
Throughout the paper, we predominantly study the quantum rate-distortion function using the entanglement fidelity distortion measure.This is generally considered as the canonical distortion measure first used by the foundational works [24,2,3] (see also [25,26]) as it represents a natural way of measuring the dissimilarity between two quantum states, and possesses nice properties which makes it useful when analyzing the quantum rate-distortion function.We note that although the symmetry reduction in Section 4 is dependent on the choice of distortion measure (we discuss how symmetry reduction can be applied to other common distortion measures in Section 4.4), the inexact mirror descent algorithm and its convergence analysis introduced in Section 5 is independent of this choice.This means the inexact mirror descent method can be applied for any distortion measure, however each iteration may be more expensive to compute if we cannot apply symmetry reduction.

Outline
The remainder of the paper is structured as follows.In Section 2, we summarize the notation we use, as well as preliminaries about quantum information theory and mirror descent.In Section 3, we formalize the quantum rate-distortion problem, and explain how we can reformulate the optimization problem to make it easier to solve and analyze.In Section 4, we recall symmetry reduction and show how it can be applied to common quantum ratedistortion instances.In Section 5, we introduce the inexact mirror descent algorithm we propose to solve the rate-distortion problem with.In Section 6, we present experimental results comparing our proposed algorithm to existing methods.Finally, we end with concluding remarks in Section 7. We provide some auxiliary results in the appendix, including how our results, analysis and techniques can be applied to the classical rate-distortion in Appendix A, local linear convergence rates in Appendix C, and derivations for bounds on the optimality gap of mirror descent in Appendix D.

Notation
We denote arbitrary finite-dimensional inner product spaces by V and V ′ , the set of n × n Hermitian matrices as H n with inner product ⟨X, Y ⟩ = tr[XY † ], the n-dimensional nonnegative orthant and n × n positive semidefinite cone as R n + and H n + respectively, and their respective interiors as R n ++ and H n ++ .For X, Y ∈ H n , we use X ⪰ Y to mean X −Y ∈ H n + , and X ≻ Y to mean X − Y ∈ H n ++ .We denote the extended reals by R = R ∪ ∞, and the set of positive integers as N. We use dom to denote the domain of a function, int and relint to denote the interior and relative interior of a set, respectively, ker to denote the kernel of a linear operator, dim C and dim R to denote the dimension of a complex and real vector space respectively, I to denote the identity matrix, and ⊗ to denote the Kronecker product between two matrices.For a complex matrix X ∈ C n×m , we use X to denote its entry-wise complex conjugate, and X † to denote its conjugate transpose.For a linear operator A : V → V ′ , we denote its adjoint by A † : V ′ → V. We use {e i } to denote the standard basis for the vector space C n .

Quantum information theory
Here, we introduce some preliminary concepts relating to quantum information theory.We refer the reader to e.g., [27,28,29] for some standard references for these topics.
These matrices are also commonly referred to as density matrices.Quantum channels are completely positive trace preserving (CPTP) (see e.g., [28,Definitions 4.4.2 and 4.4.3])linear operators which map n × n quantum states to m × m quantum states.We denote the set of these CPTP linear operators as Φ m,n .It turns out that any CPTP linear map N : H n → H m in Φ m,n can be written in the form where k ≤ mn and the V i ∈ C m×n (for i = 1, . . ., k) are matrices satisfying k i=1 V † i V i = I [28,Theorem 4.4.1].See also Definition B.1 for an alternate characterization of these linear maps.

Composite systems
An n-dimensional quantum system is represented mathematically as an n-dimensional complex vector space C n , which we can associate with the set of density matrices D n which act on this space.A bipartite system is obtained by taking the tensor product between two quantum systems, e.g., the tensor product between an n-dimensional system and an m-dimensional system produces an nm-dimensional system, which we can associate with the set of density matrices D mn which act on this joint system.To aid in keeping track of these composite systems, we will denote quantum systems with letters, e.g., A, B, and the tensor product of these two spaces as, e.g., AB.We will use subscripts to clarify that a density matrix acts on a particular system.For instance, we write ρ A to indicate that the quantum state acts on system A.
To make the concept of composite quantum systems more concrete, we note that the Kronecker product between any two density matrices ρ A ∈ D m and ρ B ∈ D n on systems A and B respectively will produce a density matrix ρ A ⊗ ρ B ∈ D mn on system AB.This operation is analogous to forming the joint distribution of two independent probability distributions.We also consider the "marginalization" operation, called the partial trace, which is a linear operator which maps density matrices on the joint system AB to density matrices on the individual systems A or B. This is analogous to finding the marginal distribution from a joint distribution.We denote the partial trace as tr A : H mn → H n and tr B : H mn → H m , and define them as the adjoint of the Kronecker product with the identity matrix, i.e., (tr More concretely, if we represent a bipartite density matrix ρ AB ∈ D mn on system AB as a block matrix consisting of an m × m array of n × n blocks X ij , then We will also use the tensor product between linear operators.In particular, for an operator N ∈ Φ m,n given by (2), we define (N ⊗ I) ∈ Φ (m+p),(n+p) , where I is of size p × p, as Purification Consider a quantum state ρ A ∈ D n with spectral decomposition ρ A = n i=1 λ i v i v † i defined on an n-dimensional system A, and a system R which is isomorphic to A. A purification of ρ A is a complex unit vector ψ ∈ C n 2 defined on the joint system AR which satisfies tr R (ψψ † ) = ρ A .The purification is not unique.However, in this paper we will consider the particular choice of purification and refer to this as the purification of ρ A .With some further abuse of terminology, we will also refer to the quantum state associated with (3), i.e., ρ AR := ψψ † ∈ D n 2 , as the purification of ρ A .Note that this choice of purification satisfies ρ R := tr Entropy For a function f : R → R and Hermitian matrix X ∈ H n with spectral decomposition With some overloading of function notation, we define the von Neumann entropy S : and the quantum relative entropy S : For notational convenience, we use the natural base e throughout the paper for logarithms and exponentials.However, all computational experiments are performed in base 2 to obtain results measured in bits, i.e., the standard unit of measurement for information.This is done by rescaling relevant quantities by a factor of log 2 (e).

Mirror descent
We will focus on constrained convex optimization problems of the form Throughout the rest of the paper, we define as the intersection of the implicit and explicit constraints of (6).
Before introducing the mirror descent algorithm, we first introduce the class of Legendre functions, which are known to satisfy certain nice properties.We adopt the terminology in [30] for common terms relating to convex functions.(i) The convex conjugate φ * of φ is also Legendre.
(ii) The gradient ∇φ : int dom φ → int dom φ * is continuous and invertible, where Now consider using a Legendre function φ to construct a local approximation of f at y ∈ int dom φ ∩ int dom f for some scaling factor t > 0 If y is close to the optimal solution x * of (6), and if φ is, in a sense, similar to f , then we expect the solution to (9) to be a good approximation of (6).More formally, we recall from [11,12,13] the following definitions for relative smoothness and strong convexity to describe structural similarities between f and φ.
In particular, if f is L-smooth relative to φ on C, then the local approximation used in (9) for t = 1/L is a majorization of f , and therefore gives an upper bound on the optimal value f (x * ).To solve (6), mirror descent uses the approximation (9) to construct the sequence of iterates This is a practical algorithm when (9) is easier to solve compared to the original problem (6).When f satisfies certain smoothness and strong convexity properties relative to φ, mirror descent has the following convergence properties.).Let φ be Legendre, and {x k } be the sequence of iterates generated by mirror descent (10) to solve (6).
We will make the following assumptions throughout the rest of the paper to ensure the mirror descent algorithm is well defined and to simplify some of the analysis.
Assumption A. Consider the convex optimization problem (6) and mirror descent rule (10).
(iv) C is compact and non-empty, and Under Assumptions items A(i)-A(iv), the mirror descent iterate (10) always exists and is uniquely defined [11,Lemma 2].Assumption A(v) may seem restrictive, however it is satisfied by many important examples (see [13,Example 2.1]), including the ones we later use to compute the classical and quantum rate-distortion functions.The motivation behind Assumption A(iii) is that this allows the Bregman divergence to act as a natural barrier for the domain of the objective function, allowing these implicit constraints to be more easily handled.In particular, Legendre functions satisfy the following property.

Fact 2.5 ([32, Lemma 2.2]). Consider an essentially smooth convex function
Therefore, as the sum of an essentially smooth convex function and an affine function is still clearly essentially smooth and convex, Fact 2.5 tells us that the mirror descent iterates (10) always lie in int dom φ.
Dual of mirror descent subproblem It will also be useful to consider the Lagrangian of the problem (9) for some y ∈ int dom φ and the corresponding dual function which under Assumption A(v) has the property dom g( • ; y) = V ′ .When y = x k , these are the Lagrangian and dual function for the mirror descent subproblem (10) at the kth mirror descent iteration.We will use L k (x, ν) := L(x, ν; x k ) and g k (ν) := g(ν; x k ) to simplify notation.An alternative characterization of the dual is g(ν; y) = L(x, ν; y) where which can be used to recover a primal optimal solution from a dual optimal solution assuming (∇φ) −1 is easily computable.Using Fact 2.5, it follows that x ∈ int dom φ.We can show that the dual problem max ν g(ν; y), (14) is well-posed under certain conditions using the following proposition.We begin with a standard definition from convex analysis before introducing the main result.

Problem definition
Consider an n-dimensional input system A, an m-dimensional output system B, and a quantum state ρ A ∈ D n with purification ρ AR ∈ D n 2 .Let ∆ ∈ H mn + be a distortion matrix and D ≥ 0 be a maximum allowable distortion.The entanglement-assisted quantum ratedistortion function [2,3] is where is the quantum mutual information.It follows from joint convexity of quantum relative entropy [28, Corollary 11.9.2] that (15) is a convex optimization problem.
A standard distortion measure used for the quantum rate-distortion function is to use the entanglement fidelity (see, e.g., [24,2,3,25,26]).This corresponds to the channel where A and B are isomorphic, and the distortion matrix is ∆ = I − ρ AR .This distortion measure possesses symmetries which we will can take advantage of in Section 4. We also briefly discuss the symmetries of other common distortion measures in Section 4.4.Remark 3.1.Without loss of generality, we will always assume that the input state of the quantum channel is full-rank, i.e., ρ A ≻ 0. This is because it is possible to construct equivalent optimization problems of a lower input dimension to avoid these degeneracies by discarding parts of the channel and distortion measures which correspond to degenerate inputs which have no effect on the problem.See Appendix B.2 for more details.

Problem reformulation
In this section, we summarize standard reformulations of the quantum rate-distortion problems that are more convenient for our purposes.

Change of variables
We will make a change of variables where we optimize over the bipartite state σ BR ∈ D mn , where σ BR = (N ⊗ I)(ρ AR ).For distortion D ≥ 0, input state ρ A ∈ D n and distortion matrix ∆ ∈ H mn + , this gives us where with some slight abuse of notation we redefine quantum mutual information as Note that the apparent inconsistency in the quantum systems implied by the subscripts of (17b) is due to the fact that, as discussed in Section 2.2, A and R are isomorphic systems, and that we have defined the purification in a way such that ρ R := tr A (ρ AR ) = ρ A .
Proposition 3.2.The quantum rate-distortion problems (15) and (17) are equivalent, in the sense that the optimal values of the two problems are equal.
As we made a linear change of variables, these reformulated problem is still convex.We refer the reader to [6, Section 6.2] for the gradient of the reformulated quantum mutual information.
Distortion constraint For rate-distortion algorithms, we are usually interested in characterizing the entire rate-distortion curve rather than computing the rate for a single distortion.Therefore it is standard to parameterize the distortion constraint with its corresponding dual variable κ ≥ 0 instead of D (see e.g., [4]).In this case, the quantum rate-distortion problem for distortion dual variable κ ≥ 0, input state ρ A ∈ D n , and distortion matrix ∆ ∈ H mn + becomes Note that as the distortion constraints are linear, the convexity properties of the objective function does not change with this reformulation.Additionally, we have omitted some additive constant terms in the objective to simplify the expression.Once a solution σ * BR to (QRD) is found for a given κ ≥ 0, it is straightforward to recover the corresponding optimal rate R q (D; ρ A , ∆) = I q (σ * BR ; ρ A ) for distortion D = ⟨∆, σ * BR ⟩.

Symmetry reduction
A practical issue with solving the quantum rate-distortion problem (QRD) is that the problem dimension scales quartically with the quantum system dimension.We will show that we can restrict this problem to a significantly smaller subspace by taking advantage of symmetry properties of the rate-distortion problem for certain distortion matrices.We will also show that for certain highly symmetric problems, we can obtain explicit expressions for the rate-distortion function.

Background
We begin by introducing some preliminaries on group and representation theory.We refer the reader to [34,35] as references for these basic facts.Throughout this section we will denote the general linear group as GL(V), which is the group of invertible linear transforms from V to V with the group operation being composition.
for all g ∈ G, then π and τ are isomorphic, denoted π ∼ = τ .Definition 4.3 (Fixed-point subspace).The fixed point subspace V π associated with a representation (V, π) of a group G is When the vector space V is clear from context, we use the abbreviated notation V π := V π|V .
Definition 4.4 (Group average).For a representation (V, π) of a compact group G, the group average is where θ is the normalized Haar measure on G (see e.g., [36]).The group average is the projector onto the fixed point subspace V π .
Lemma 4.5.Consider a representation (V, π) of a group G.If a convex optimization problem of the form (6) is invariant under π, meaning then there is an optimal point for (6) in V π .
Proof.Let x * be a solution to the convex optimization problem and f * be the optimal value.We can show that The first inequality holds as P π (x * ) ∈ C from (20b), and that by definition f * ≤ f (x) for all x ∈ C. The second inequality uses convexity of f , and the equality uses (20a).Therefore, equality must hold throughout.This implies the projection of the solution onto the fixed-point subspace, P π (x * ), must also be a solution, as desired.
We will also introduce characters of compact groups and well known results associated with them.For these facts, we will assume that V is a complex vector space, as some of these results do not hold in general for real vector spaces.
The dimension (over C) of the fixed-point subspace V π is Proof.This uses [34,Theorem 4.11(i)] combined with the properties vec We next consider the analogue of Lemma 4.9 for real representations.
Corollary 4.10.Let G be a compact group, let (C n , τ ) be a complex representation of G, and let (H n , π) be the real representation of G defined by π(g Proof.We first recognize that by treating C n×n as a real vector space, (C n×n , π) decomposes into two isomorphic real subrepresentations given by the restriction to Hermitian and skew-Hermitian matrices, respectively.The decompositions of these isomorphic representations into irreducibles are, therefore, identical.It follows that the fixed point subspaces in each have the same real dimension.Therefore given that the fixed point subspace V π|C n×n has a dimension over the reals of dim R (V π|C n×n ) = 2 dim C (V π|C n×n ), restricting to the Hermitian space and using Lemma 4.7 leaves us with the desired real dimension for V π|H n .

Entanglement fidelity distortion
We now restrict our attention to the quantum rate-distortion problem with the entanglement fidelity distortion matrix ∆ = I−ρ AR , where ρ AR is the purification of the input state ρ A .We will show that for this choice of distortion matrix, there are significant symmetries in the quantum rate-distortion problem (QRD) which can be used to drastically reduce the problem dimension.We will first introduce the following lemma that will be useful to prove the main result.
Lemma 4.11.Consider an m-dimensional system A, an n-dimensional B, and a density matrix ρ AB ∈ D mn defined on the joint system AB.Let T A ∈ C p×m be an arbitrary linear operator on A, and U B ∈ C n×n be an arbitrary unitary operator on B. Then Proof.Recall that the partial trace is defined as the adjoint of the Kronecker product with the identity matrix.Therefore, for all σ A ∈ D n the following equalities must hold where the second equality uses U † B U B = I B .This produces the desired result.
We now find a group action that leaves the quantum rate-distortion problem invariant, which we can use to characterize the corresponding fixed-point subspace in which a solution must lie.For simplicity of notation we assume the input state ρ A is diagonal for the following lemma.Lemma 4.12.Consider a diagonal input state ρ A ∈ D n .Let G be a subgroup of the group of n × n unitary matrices, such that gρ A g † = ρ A for all g ∈ G, and Then the quantum rate-distortion problem (QRD) parameterized by input state ρ A and distortion matrix ∆ = I − ρ AR , where ρ AR is the purification of ρ A , is invariant under π.
Proof.We will show that the objective function and each constraint of problem (QRD) is invariant under π.First, we recognize that since ρ A is Hermitian and diagonal, ρ A = ρ A , and therefore where the second equality uses Lemma 4.11 and gρ A g † = ρ A , and the third equality uses invariance of quantum relative entropy under unitary transforms.For the linear component of the objective arising from the distortion constraint, let us first assume ρ A can be expressed as i.e., ρ A has r distinct eigenvalues, and I I is the set of indices corresponding to eigenvalue λ I for all I = 1, . . ., r.For any I = 1, . . ., r and i ∈ I I , we have This implies that g † e i must be some vector in the subspace spanned by {e i } i∈I I , and because g † is unitary, that {g † e i } i∈I I forms an orthonormal basis of this subspace.Therefore i.e., g † P I g = P I where P I is the orthogonal projector onto the eigenspace corresponding to eigenvalue λ I .Using this, we can show Combining this with the fact that ⟨∆, π(g)(σ BR )⟩ = ⟨π(g † )(∆), σ BR ⟩ and that π(g † )(I) = I shows that the linear component of the objective is invariant under π.
Next, we will show invariance of the feasible set.First, note that the set of density matrices is invariant under congruence by unitary transforms.Second, consider an arbitrary σ BR which satisfies (QRD.b).Then we can show that where the first equality uses Lemma 4.11.This shows that the constraint set is invariant under π, and concludes the proof.
The general idea motivating the following theorem is that we want to find an appropriate subgroup G that satisfies the condition gρ A g † = ρ A , which is sufficiently large so that we can restrict to as small of a fixed-point subspace as possible.In the following theorem, we characterize such a subgroup and its corresponding fixed-point subspace for an arbitrary purification.

where ρ AR is the purification of ρ A . A solution to this problem exists in the subspace
Proof.We first consider the special case when ρ A is diagonal, i.e., v i = e i for all i = 1, . . ., n.For any of these diagonal inputs, we can show that the unitary subgroup G consisting of elements satisfies gρ A g † = ρ A for all g ∈ G, and therefore Lemma 4.12 tells us that the quantum rate-distortion problem is invariant under representation (23) of this subgroup.It follows from Lemma 4.5 that a solution must live in the fixed-point subspace of this representation.
To characterise this subspace, let us first consider representation We can show that τ can be decomposed into the following subrepresentations (V ij , τ ij ) such that where we follow the notation of ( 25) to define z i .We recognize that the subrepresentations τ ii for all i = 1, . . ., n are just the identity map on a one-dimensional subspace, and are therefore all isomorphic to each other.If i ̸ = j, then the τ ij are mutually non-isomorphic irreducible representations.Therefore, using Corollary 4.10 we can compute the dimension over the reals of the fixed point subspace where χ ij is the character of subrepresentation τ ij , and the first and second equalities use Lemmas 4.7 and 4.8 respectively.Verifying that the proposed subspace ( 24) is invariant under representation (23) of group G and is of the correct dimension gives the desired result for any diagonal input ρ A .This result generalizes to non-diagonal inputs by using the fact that we can use a change-of-basis to show that the quantum rate-distortion problem for any non-diagonal input state is equivalent to a quantum rate-distortion problem with a diagonal input state.See Appendix B.1 for more details.
We also introduce the following property of this subspace which can be shown from a straightforward calculation, and will be useful in some following proofs.Corollary 4.14.Let σ BR ∈ V π as defined in (24).The partial traces of σ BR with respect to R and B satisfy for some x, y ∈ R n (i.e., they are simultaneously diagonalizable with the input state corresponding to the fixed point subspace).
The implications of this symmetry reduction are as follows.Originally, the primal variable σ BR was defined on the vector space H n 2 , which has a dimension over the reals of n 4 .By restricting to the fixed-point subspace V π as defined in (24), this dimension over the reals reduces to 2n 2 − n.Moreover, following from Corollary 4.14, the partial trace constraint (QRD.b) is reduced from representing n 2 equations to just n equations.This will be particularly useful for the dualized algorithm we propose in Section 5.
We also recognize that the subspace (24) is isomorphic to a block diagonal matrix with n 2 − n blocks of size 1 × 1 and one block of size n × n.Without loss of generality, we can change the basis of the data of the problem instance to be in the standard basis, i.e., ρ A is diagonal, meaning that in practice we can implement the subspace using sparse matrix representations.The block diagonal structure also allows us to perform eigendecompositions, and therefore matrix logarithms and exponentials, very efficiently.

Uniform input state
Let us continue to consider the distortion matrix ∆ = I − ρ AR .An interesting special case is when the input state is the maximally mixed state ρ A = I/n.It turns out that for this input state, there exist additional symmetries which further simplify the problem, and which ultimately allow us to obtain explicit expressions for the rate-distortion function.
Proof.As ρ A is a multiple of the identity matrix, gρ A g † = ρ A is satisfied for all g ∈ U (n), where U (n) is the entire group of n × n unitary matrices.Therefore Lemma 4.12 tells us that the quantum rate-distortion problem is invariant under the representation (23) of U (n).It follows from Lemma 4.5 that a solution must live in the fixed-point subspace of this representation.Using Corollary 4.10, the dimension over the reals of the fixed-point subspace V π is given by where the last equality comes from [37, Theorem 2.1(a)].It is straightforward to verify that the proposed two-dimensional subspace ( 26) is invariant under π, which concludes the proof.
Given this result, we are able to obtain an explicit expression for the quantum ratedistortion function.In particular, we recover and generalize the result from [3,Theorem 10] to maximally mixed inputs of any dimension.

Theorem 4.16. The quantum rate-distortion function for the maximally mixed input state ρ
A channel that attains this rate for 0 ≤ D ≤ 1 − 1/n 2 is the depolarizing channel Proof.We will first find the solution to the modified quantum rate-distortion problem (QRD) for the specified problem parameters, then use this solution to recover solutions to the original problem (15).From Corollary 4.15, we can restrict our attention to consider quantum states of the form σ BR = aI + bρ AR where a, b ∈ R.Moreover, we have the relationship b = 1 − n 2 a as σ BR must have unit trace, and so the rate-distortion problem reduces to a one-dimensional problem.Therefore, the solution to (QRD) can be found by finding when the gradient of a univariate objective is zero.Doing this we find that the optimal point of (QRD) for a given κ ≥ 0 is from which we can easily compute the corresponding rate and distortion as It is straightforward to verify that σ * BR (κ) is a valid solution as it is a feasible point of (QRD) for all κ ≥ 0. Reparameterizing R q to be in terms of D instead of κ gives the desired rate-distortion function.Similarly, expressing σ * BR in terms of D instead of κ, we find As seen in Appendix B, we can recover the Choi matrix from this solution state as C N = nσ * BR .Noting that C N = I/n corresponds to the quantum channel that maps all states to the maximally mixed state, and C N = nρ AR corresponds to the identity channel, identifies the desired quantum channel.

Other quantum distortion matrices
As noted in [3], there are several other common choices for the quantum distortion matrix ∆ which have been studied in the literature.As with the entanglement fidelity distortion, we can use symmetry reduction to gain some additional insight for these other common distortion matrices.Throughout this section, we will assume that ρ A = n i=1 λ i v i v † i , and denote {b i } as an orthonormal basis for system B.
We will omit proofs for these results, as they can be shown using an almost identical method as used in Section 4.2.However, we note that for the quantum rate-distortion problem with input state ρ A ∈ D n and distortion matrix ∆ ∈ H mn + to be invariant under a representation (H mn , π) of a group G, it suffices that where such that tr B (g)ρ A = ρ A tr B (g) and g∆g † = ∆ for all g ∈ G, and where we use G ≤ G ′ to denote that G is a subgroup of G ′ .Note that we have one additional requirement that must be satisfied as compared to Lemma 4.12, as the distortion matrix ∆ is no longer necessarily related to the input states.
Example 4.17.Consider the classical-classical distortion matrix where δ ∈ R m×n + .Consider the representation (H mn , π cc ) with group homomorphism (30) of the group It can be shown that (QRD) with the classical-classical distortion matrix is invariant under this representation.Therefore, a solution must lie in the corresponding fixed-point subspace As expected (and commented on in [3]), the problem completely reduces down to diagonal matrices, and is therefore isomorphic to the classical rate-distortion problem (see Appendix A).
Example 4.18.Consider the quantum-classical distortion matrix studied in [23] where ∆ i ∈ H n + for all i = 1, . . ., m.Consider the representation (H mn , π qc ) with group homomorphism (30) of the group It can be shown that (QRD) with the quantum-classical distortion matrix is invariant under this representation.Therefore, a solution must lie in the corresponding fixed-point subspace which can be interpreted as a block-diagonal structure by changing the basis of {b i } to be in the standard basis.
Example 4.19.Consider the classical-quantum distortion matrix for a system similar to that studied in [38] where ∆ j ∈ H m + for all j = 1, . . ., n.Consider the representation (H mn , π cq ) with group homomorphism (30) of the group It can be shown that (QRD) with the classical-quantum distortion matrix is invariant under this representation.Moreover, the corresponding fixed-point subspace is which can be interpreted as a block-diagonal structure by changing the basis of {v i } to be in the standard basis.

Efficient algorithm
We propose mirror descent to compute the quantum rate-distortion function.This is justified by the following relative smoothness properties of mutual information.Therefore, following from Fact 2.4, the sequence generated by mirror descent with unit step size t k = 1 and kernel function −S to compute (QRD) will converge sublinearly to the optimal value.In the special case where we consider the classical-quantum distortion matrix (37) considered in Example 4.19, and if additionally, ∆ j = ∆ ′ for all j = 1, . . ., m, then mirror descent updates with unit step size are explicitly given by Unfortunately, for a general distortion matrix, it is not obvious how a similar efficiently computable update rule can be obtained.Rather than solve the mirror descent subproblem (10) directly, it is more convenient to solve the dual problem, which using (12) we can show is max where and ν ∈ H n is the dual variable corresponding to the partial trace constraint (QRD.b).
Note that we have ignored the unit trace constraint tr[σ BR ] = 1 as this is already implied by the partial trace constraint.This dual problem (41) is simpler to solve compared to the original primal subproblem as it is both unconstrained and has a significantly smaller problem dimension of n 2 compared to the original dimension of n 2 m 2 .Once a solution ν k+1 to (41) is found, the primal solution can be recovered using (13) as The overall algorithm is summarised in Algorithm 1.
Algorithm 1 Quantum rate-distortion algorithm end for Output: Approximate solution σ k+1 BR .
There are a number of standard algorithms available which can be used to solve the unconstrained convex optimization problem (41).We will consider methods of the form where G i = I corresponds to gradient ascent, and )] −1 corresponds to Newton's method.We also take t i > 0 to be a step size obtained from a backtracking procedure [39, Algorithm 9.2] which guarantees that the objective converges monotonically.

Proposition 5.2. The dual problem (41) satisfies the following properties:
(i) There exists a unique solution ν * ∈ H n to the dual problem, and; (ii) Solving the problem using gradient descent or damped Newton's method with backtracking will produce a sequence of iterates {ν i } which converges to ν * .
Proof.To show part (i), we first show the dual function ( 42) is coercive and strictly convex by using Proposition 2.7, recognizing that the tensor product X → I ⊗ X is injective, and using Remark 3.1 to assume ρ A ≻ 0 is full-rank.Existence and uniqueness of a solution then follows from [33, Proposition 3.1.1and 3.2.1].To prove part (ii), we use the fact that the superlevel sets of the dual function are compact, and that the dual function is strongly concave and has Lipschitz gradient and Hessian over compacts sets.These properties allow us to use standard convergence results of the two algorithms.We leave the details of the proof for part (ii) in Appendix E.2.
Note that to use either of these descent methods, we need the gradient and Hessian of g k q .The gradient can easily be found using [6, Corollary B.5] as To find the Hessian, let us define the diagonalization We use [40,Theorem 6.6.30] to obtain the directional derivative of where ⊙ represents the Hadamard or element-wise product, and f [1] (Λ) is the first divided differences matrix of Λ corresponding to f (x) = e x , i.e., the matrix whose (i, j)-th entry is given by f [1] (λ i , λ j ) where Equation (47) defines the Hessian as a linear map, which we can use to construct the Hessian matrix by expressing it in coordinates with respect to a choice of basis for Hermitian matrices.
Remark 5.3.We note that Algorithm 1 is very similar to [14, Algorithm 14], which was derived using an expectation-maximization approach.In fact, if we also dualize the redundant unit trace constraint then solve for the corresponding dual variable using the KKT conditions, we obtain the alternate dual function By recognizing the identity log(A⊗B) = log(A)⊗I+I⊗log(B), we see that (49) is identical to the function being minimized in the subproblems of [14,Algorithm 14] up to a simple translation.The advantage of the dual function (42) which we minimize in Algorithm 1 is that the trace expression in gk q is composed with a logarithm, and is therefore no longer strictly convex nor coercive.We can confirm this by showing that gk q (tI) is a constant for all t ∈ R. Therefore, we can no longer use the same arguments as Proposition 5.2 to prove that gradient descent or damped Newton's with backtracking will converge to the solution, as without coercivity the superlevel sets of gk q are no longer compact.Additionally, our experimental results in Section 6.2 suggest that (49) is a more difficult function to minimize.
Remark 5.4.Up to this point, we have established theoretical guarantees that mirror descent applied to the quantum rate-distortion problem will result in sublinear convergence to the solution.However, for rate-distortion problems using the entanglement fidelity distortion, empirical convergence results show that mirror descent converges to the solutions at a linear rate (see Figure 1 in Appendix C).Moreover, this linear rate appears to become faster as the parameter κ increases.One standard way to prove global linear convergence of mirror descent is by using a relative strong convexity argument (see Fact 2.4 and, e.g., [6]).However, we can show that quantum mutual information is not relatively strongly convex on the set of density matrices (see Appendix C).
Instead, it is possible to gain some insight into this linear rate by studying the relative strong convexity properties of the quantum mutual information around a neighborhood of the solution to the rate-distortion problems.In particular, using a priori knowledge about the solutions to particular instances of rate-distortions problem analyzed in Section 4.3, we can characterize the local relative strong convexity parameters of quantum mutual information around the solutions, and use them to better understand the linear convergence behavior of mirror descent as a function of κ.We refer the interested reader to Appendix C for a more in depth discussion about these linear rates.

Inexact mirror descent
As we use a numerical method to solve the mirror descent subproblem (10) for Algorithm 1, it is important to account for numerical errors that will inevitably arise.Existing works [41,42,22] have studied the convergence properties of proximal gradient methods where the update steps are numerically computed to some finite tolerance.Of particular interest is that we do not always need to solve the subproblem to high accuracy to guarantee convergence to the optimal value, which can significantly improve computation speeds.We first present a generalized algorithm, then show how to apply it to the quantum ratedistortion problem.We remind the reader that Assumption A is made throughout this section.

General algorithm
To solve equality constrained problems of the form (6) using an inexact method, we propose Algorithm 2. This algorithm is adapted from [22] which is an inexact Bregman proximal point method for arbitrary convex constraints, whereas our algorithm is an inexact mirror descent method for linear equality constraints.
The algorithm works as follows.We first warm-start the dual variable ν from the previous iteration (see Step 0 of Algorithm 2) based on the assumption that consecutive subproblems will have similar solutions.A primal variable x ∈ dom int φ is recovered using this dual variable using (13) (see (50) in Algorithm 2).However as the dual variable is not necessarily dual optimal, the primal variable is not necessarily primal feasible.Therefore, we define a pseudo-projection proj C : int dom φ → relint C which is continuous, surjective and idempotent, and compute a feasible primal variable as x = proj C (x) ∈ C (see (51) in Algorithm 2).Finally, an error criterion based on [22] is used to check if the subproblem has solved to a sufficient accuracy (see Step 2 of Algorithm 2).If not, then ν is updated in an ascent direction for g k (using e.g., gradient ascent or Newton's method), and the process repeats until the desired accuracy has been reached.

. do
Step 1: Recover primal variable from dual variable, i.e., where proj C : int dom φ → relint C is continuous, surjective and idempotent.
Step 2: if subproblem has solved to sufficient accuracy, i.e., satisfies We now present the convergence properties of Algorithm 2. We first show that the exit criterion will always eventually hold.Proposition 5.5.Consider the problem (9) for some y ∈ relint C and Legendre function φ.Consider the sequence {ν i } ∈ V ′ , and let {x i } and {x i } be the corresponding sequences generated by (50) and (51) respectively, where x k = y.If {ν i } converges to the dual optimum ν * of the dual subproblem (14), then D φ (x i ∥x i ) → 0.
Proof.Let us denote x * as the primal optimum of (9).From Fact 2.5, we have {x i } ∈ int dom φ and x * ∈ int dom φ.Using Fact 2.2(ii), we can show (∇φ) −1 is continuous, and so x i is a continuous function of ν i from (50).Therefore, convergence of {ν i } to ν * implies {x i } converges to a solution of inf x L(x, ν * ; y), which from strict convexity of φ must be the unique primal optimum x * [43,Theorem 12.13].From continuity of the pseudo-projection operator on int dom φ and primal feasibility of x * , it follows that xi = proj C (x i ) must also converge to x * .The desired result then follows from continuity of D φ on int dom φ × int dom φ, and the property D φ (x∥x) = 0 for all x ∈ int dom φ.

Remark 5.6. For the quantum rate-distortion problem, convergence of the iterates {ν i } to the optimal solution of the subproblem is guaranteed by Proposition 5.2 if we use gradient ascent or Newton's method with a backtracking line search.
We now move onto proving convergence rates of the inexact mirror descent iterates.We begin with a preliminary result which takes advantage of the fact that we restrict our attention to problems with only linear equality constraints.
Proof.As xk+1 is primal feasible, it follows that N A (x k+1 ) is equal to the image of A † .Rearranging (50) then substituting this in gives for all ν k ∈ V ′ , from which (53) follows.
The following main convergence result can been seen as a modification of [22, Theorem 3.2].Theorem 5.8.Consider Algorithm 2 to solve the convex optimization problem (6).Let φ be Legendre, f * represent the optimal value of this problem, and x * be any corresponding optimal point.(i) If f is L-smooth relative to φ and t k = 1/L, then the sequence {(x k , xk )} satisfies where (ii) If f is L-smooth relative to φ, µ-strongly convex relative to φ, and t k = 1/L, then the sequence {(x k , xk )} satisfies where Proof.From L-smoothness of f , we have By substracting f (u) from both sides of the inequality for any u ∈ C and using µ-strong convexity of f , we obtain Using Lemma 5.7 and the definition of the normal cone N A (x k+1 ) of {x ∈ V : A(x) = b} at xk+1 , we can show that for any u ∈ C, where the equality uses the definition of the Bregman divergence (8), and the second inequality uses (52).Combining this with (57) and letting u = x * gives To show the part (i), let µ = 0, i.e., we make no assumption of relative strong convexity.
From convexity of f , we find that as the sum telescopes, which recovers the desired result.To show part (ii), we recognize that f * ≤ f (x) for all x ∈ C, and therefore from (58) we obtain Recursively applying this inequality recovers the desired result.
Remark 5.9.If ε k = ε is a constant for all k, then (54) implies ergodic sublinear convergence to an Lε-optimal solution.If f is µ-strongly convex relative to φ, then (55) implies linear convergence to a Lε/µ-Bregman ball of the solution.If {ε k } is a summable (i.e., lim k→∞ k i=0 ε k < ∞), then (54) implies ergodic sublinear convergence to the optimal solution.If f is µ-strongly convex relative to φ and {ε k } is a sequence which converges to zero linearly, then (55) implies linear convergence to the solution.See [42,Proposition 3] for a more in-depth discussion on how the linear rate varies depending on how quickly {ε k } converges.Notably, the best linear convergence behavior occurs when {ε k } convergences at a similar rate as that of the function values.

Application to quantum rate-distortion
To use Algorithm 2 for the quantum ratedistortion problem, all that remains is to propose a suitable pseudo-projection (51).We propose the following operator proj C (σ BR ) = P σ BR P † (60a) for It follows from Lemma 4.11, where ) and U Y = I B , that the proposed pseudo-projection maps positive definite matrices to positive definite matrices satisfying (QRD.b).Idempotence follows by recognizing that for any σ BR ∈ H mn ++ satisfying (QRD.b),i.e., tr B (σ BR ) = ρ A , the pseudo-projection satisfies proj C (σ BR ) = σ BR because the matrix P from (60b) satisfies Continuity of the pseudo-projection follows from the fact that the matrix inverse and square root are continuous on the positive definite cone.
Therefore, (60a) is a suitable pseudo-projection for the quantum rate-distortion problem.Note that the standard Euclidean projection is not suitable as it maps variables to the boundary of the domain, where the gradient of quantum mutual information is not well-defined.Finally, we summarize the inexact mirror descent algorithm applied to the quantum rate-distortion problem in Algorithm 3 and its corresponding convergence guarantees.

Numerical experiments
We present numerical experiments to demonstrate the computational performance of Algorithm 3 in solving the quantum rate-distortion problem (QRD).Input states ρ A are randomly generated using QETLAB [44], and the entanglement fidelity was used to measure distortion by letting ∆ = I−ρ AR , where ρ AR is the purification of ρ A .All experiments were ran on MATLAB using an Intel i5-11700 CPU with 32GB of RAM.
For all implementations of Algorithm 3, unless otherwise stated, we use an error schedule of where ξ = 0.9 and ε −1 = 10 −2 .Based on the discussion from Remarks 5.4 and 5.9, the first term in the minimum aims to match the linear rate of the error sequence to that of the mirror descent iterations, the second term ensures that the error schedule decays linearly, the third term ensures the sequence is monotonic, and the last term accounts for machine precision.For brevity, we will use MD-N to refer to when Newton's method with backtracking is used to compute the inexact mirror descent iterates, and MD-GD to refer to when gradient descent with backtracking is used.We terminate MD-N once the change in objective value drops below 10 −15 , and MD-GD at 10 −8 as the sublinear rate of gradient descent makes computing the iterates to higher accuracy impractical.All algorithms are also terminated if the computation time exceeds 3600 s.We use "timeout" to refer to situations when an algorithm is unable to perform a single iteration within this timeframe, and "out of memory" to refer to when there is insufficient RAM to run the algorithm.All backtracking line searches use the implementation described by [39,Algorithm 9.2], where α = 0.1 and β = 0.1.For gradient descent methods, we start backtracking from t 0 = 1000, while for Newton's method we use t 0 = 1.We report upper bounds on absolute optimality gaps measured in bits, and unless otherwise stated are measured by computing a lower bound of the optimal value using MD-N and the method described in Appendix D. Where relevent, methods are initialized using σ 0 BR = ρ A ⊗ ρ A and ν 0 = − log(ρ A ).

Comparison between exact and inexact methods
We first present some preliminary results comparing exact and inexact computation of the mirror descent iteration.We compare between variations of Algorithm 3 when the iterates are computed using Newton's method with backtracking exactly, i.e., close to machine precision ε k = 10 −15 for all k, and inexactly using (62).We also compare against when iterates are computed using gradient descent with backtracking inexactly using (62), and to a constant error of ε k = 10 −8 for all k as proposed in [14, Algorithm 4].All methods use symmetry reduction to reduce the problem dimension.We summarize the results in Table 1.We see that using Newton's method to compute the mirror descent iterates yields identical accuracies between exact and inexact implementations, despite the inexact method solving up to 6 times faster.This difference also seems Table 1: Comparison between various strategies of choosing the tolerance to which mirror descent iterates are computed, when computing the quantum rate-distortion function with entanglement fidelity distortion at channel dimensions n and distortion dual variables κ.We report the time elapsed (s) and upper bounds on the absolute optimality gap (bits) of each method.All methods use symmetry reduction to reduce the problem dimension.more pronounced for lower values of κ.Similarly, both inexact methods using gradient descent solve to similar accuracies, while the proposed decaying error solves approximately 3 times faster compared to solving to a constant tolerance.

Comparison with other algorithms
We compare Algorithm 3 to existing algorithms which have been used to compute the quantum rate-distortion function.These include the backtracking primal-dual hybrid gradient (PDHG) algorithm from [6, Algorithm 3], the expectation maximization (EM) algorithm from [14,Algorithm 14], and CvxQuad [17,18] with the SDPT3 [45] solver.We show results with and without symmetry reduction for all of these methods.For PDHG, we use the same backtracking parameters as used in the experiments of [6].As noted in Remark 5.3, the EM algorithm is very similar to our mirror descent algorithm except that it solves a slight variation of the dual problem.Therefore to allow for a fair comparison, we use the same inexact method as Algorithm 2 using gradient descent to compute these subproblems rather than solving to a constant inexact tolerance as was originally proposed in [14], and which we showed was computationally slower in Section 6.1.Both the PDHG and EM methods were terminated once the change in objective value dropped below 10 −8 .The default settings for CvxQuad were used, and we report the optimality gap returned by the solver.
Remark 6.1.Just as our proposed algorithm shares similarities to EM, the inexact mirror descent method in Algorithm 1 also shares similarities to PDHG.In particular, if we use gradient descent to compute the inexact mirror descent iterates, then, like PDHG, we are alternating between primal mirror descent iterates, and dual gradient ascent iterates.The main difference is that we may perform multiple dual steps for every primal step we take.
We summarize the results without symmetry reduction in Table 2a, and results with symmetry reduction in Table 2b.We see that symmetry reduction results in a significant improvement in both computation time and memory requirements.We emphasize that without symmetry reduction, the quantum rate-distortion for a 9-qubit channel (n = 512) needs to optimize over a 70 billion dimensional variable.Symmetry reduction reduces the optimization variable to a dimension of just 523,776.
Overall, we see that our proposed method is able to obtain high-accuracy solutions relatively quickly.In particular, our method generally outperforms all existing benchmarks in both computation time and accuracy simultaneously.We note that it is not entirely Table 2: Comparison between various algorithms to compute for the quantum rate-distortion function with entanglement fidelity distortion, at channel dimensions n and distortion dual variables κ.We report the time elapsed (seconds) and upper bounds on the absolute optimality gap (bits) of each method.We use "> 3600.00" to refer when the algorithm has not solved to the desired tolerance within an hour, and report the optimality gap of the last completed iteration at this time.We use "Timeout" to refer to when the algorithm has not performed any iterations within an hour.clear whether Newton's method or gradient descent is better to solve our mirror descent subproblems.At high problem dimensions, gradient descent solves significantly faster to similar accuracy solutions.At low to medium problem dimensions however, Newton's method seems to solve faster for larger values of κ, while gradient descent solves faster for smaller κ.However, Newton's method allows us to solve the mirror descent iterates "exactly", which in turn allows us to efficiently compute an optimality gap using the method from Appendix D.

Concluding remarks
We have presented an inexact mirror descent algorithm which is able to efficiently compute the quantum rate-distortion function with provable sublinear convergence rates.Additionally, we have shown how we can exploit symmetries that exist in common quantum rate-distortion settings to significantly reduce the dimensions of the problems we need to solve.It would be interesting to see if our techniques and analysis can be extended to other similar problems, such as the quantum rate-distortion for mixed states [26] and the quantum information bottleneck function [38,46].In this work, we have focused on a reformulation of the quantum rate-distortion prob-lem where we dualize the distortion inequality constraint, as is typically done for these constrained channel capacity-type problems [4].However, by converting the distortion inequality to an equality constraint using [14, Lemma 21], it is relatively straightforward to extend most of our algorithm and analysis to directly account for the distortion constraint.The only limitation is that there is not an obvious pseudo-projection (51) for this new problem.Therefore, future work to find an efficient pseudo-projection operator will allow us to use our inexact mirror descent algorithm to directly solve the unsimplified rate-distortion problem.Alternatively, we could perform exact computations of the mirror descent steps, which, although slower, circumvents the need for a pseudo-projection.Computation of the mirror descent subproblems is an important part of inexact mirror descent which significantly impacts how effective the overall algorithm is.In Proposition 5.2, we established qualitative properties of the dual function which allows us to guarantee global convergence when maximizing it using standard techniques.However, we lack a more quantitative understanding of these functions which may allow us to have a better understanding of convergence rates.A more in-depth analysis of the dual function may allow us to understand how to best solve the mirror descent subproblems.

A Classical rate-distortion
In this section, we study the classical rate-distortion function, showing how symmetry reduction and mirror descent can be applied to this problem.The purpose of this is to provide a point of comparison for the concepts we use to study the quantum rate-distortion function, and to show how many of our results still apply to the classical setting.

A.1 Notation
We define n-dimensional classical states as n-dimensional probability distributions, i.e., Classical channels are linear functions that map n-dimensional classical states to mdimensional classical states, and are mathematically represented as m×n column stochastic matrices Probabilities defined on the Cartesian product of two sample spaces are represented by joint probability matrices of the form With some overloading of function notation, we define the Shannon entropy H : R n + → R as and the Kullback-Leibler (KL) divergence Throughout this section we will use 1 to denote the all ones column vector.

A.2 Problem definition
For a classical channel with n inputs and m outputs, consider a probability distribution p ∈ P n , a distortion matrix δ ∈ R m×n + where δ ij represents the distortion of producing the i-th output from the j-th input, and a maximum allowable total distortion D ≥ 0. The corresponding classical rate-distortion function [1] is where is known as the classical mutual information.It follows from joint convexity of the KL divergence [47, Theorem 2.7.2] that (65) is a convex optimization problem.When there is the same number of inputs and outputs, i.e., m = n, a common distortion measure for the classical rate-distortion function is the Hamming distance [47], which corresponds to δ = 11 ⊤ − I (i.e., the square matrix with zeros on the diagonal and ones everywhere else).
Remark A.1.Like the quantum analog (see Remark 3.1), without loss of generality, we will always assume that the input distribution of the classical channel is non-degenerate, i.e., p > 0. See Appendix B.2 for more details.

Change of variables
We will make a change of variables where we optimize over the joint distribution P ∈ P m×n , where P ij = p j Q ij is the joint probability of obtaining the i-th output from the j-th input.For distortion D ≥ 0, input distribution p ∈ P n , and distortion matrix δ ∈ R m×n where with some slight abuse of notation we redefine classical mutual information as Proposition A.2.The classical rate-distortion problems (65) and (67) are equivalent, in the sense that the optimal values of the two problems are equal.

A.3 Symmetry reduction
To show how we can achieve a similar result as Theorem 4.16 for the classical scenario, consider the classical rate-distortion problem with Hamming distortion δ = 11 ⊤ − I and a uniform input distribution p = 1/n.The analytic expression for this setting is well known (see, e.g., [47, Exercise 10.5]), which is typically derived by lower bounding the mutual information using Fano's inequality, then showing achievability of this lower bound.Here, we show how this result can also be derived by using a similar symmetry reduction as we used to derive the quantum rate-distortion function.
Corollary A.3.Let G be a compact group, let (R n , τ ) be a real representation of G, and let (R n×n , π) be the real representation of G defined by π(g)(X) = τ (g)Xτ (g) † .Then the dimension (over R) of the fixed point subspace is given by dim R (V π ) = ⟨χ τ , χ τ ⟩.
Proof.This follows from a similar argument as the proof for Corollary 4.10.We first extend (R n , τ ) to a complex representation in the natural way.We then note that (C n×n , π) decomposes into isomorphic subrepresentations given by the restriction to real and imaginary subspaces.
where the second and third equalities use Lemmas 4.7 and 4.8 respectively.It is straightforward to verify that the proposed two-dimensional subspace is invariant under π, which concludes the proof.
Theorem A.6.The classical rate-distortion function with uniform input distribution p j = 1/n and Hamming distortion δ = 11 ⊤ − I is given by A channel which attains this rate for 0 ≤ D ≤ 1 − 1/n is given by Proof.This follows from a similar proof to that of Theorem 4.16.

A.4 Mirror descent
Like the quantum case, we propose mirror descent to compute the classical rate-distortion function.This is justified by the following relative smoothness properties of mutual information.
Therefore, following from Fact 2.4, mirror descent with a unit step size t k = 1 and kernel function −H is guaranteed to converge sublinearly to the optimal value.The mirror descent update (10) for the classical rate-distortion problem can be expressed as the following analytic expression where q k i = n j=1 P k ij is the marginal probability of obtaining the i-th output.This iteration is precisely the same as the Blahut-Arimoto algorithm [4].
Like the quantum rate-distortion problem with the entanglement fidelity distortion measure, empirical results show that using the mirror descent iterates (73) to compute the classical rate-distortion with Hamming distortion results in a linear convergence rate.More details about this can be found in Appendix C.

B Equivalencies between rate-distortion problems
To prove that two optimization problems are equivalent, it suffices to show that we can use a feasible point of one optimization problem to obtain a feasible point of the other problem with the same objective value, and vice versa.
To show equivalencies between quantum rate-distortion problems, we will first introduce the Choi representation of a quantum channel, which will be a convenient mathematical representation than (2) for us to use to analyze the quantum rate-distortion problem.Definition B.1 (Choi operator).Consider a linear map N : H n → H m which maps Hermitian matrices on an n-dimensional input system A to Hermitian matrices on an m-dimensional output system B. The Choi matrix representation C N ∈ H mn of N , which acts on the bipartite system BA, is defined as where {v i } is any orthonormal basis for A. A

B.1 Basis independence
We will first show that that the quantum rate-distortion problem is independent of the eigenvectors of the input state ρ A by showing that we can construct an equivalent optimization problem by performing a suitable unitary transform on systems A and R. For a given spectral decomposition where {u i } is any orthonormal basis for A. We can define a new problem with input state For any feasible channel N of the original problem (15), we can construct feasible channel N ′ for the new problem with the same objective value by letting Equivalency in the reverse direction follows from an identical argument.We note that a similar argument can be used to show that rate-distortion problem is independent of the choice of purification, even when R is chosen not to be isomorphic to A.

B.2 Degenerate and non-degenerate inputs
Here, we show that we can always assume that the input states p and ρ A are non-degenerate (i.e., p > 0 and ρ A ≻ 0), as it is possible to construct equivalent optimization problems of a lower input dimension to avoid any degeneracies.For the classical rate-distortion function, we can construct a new problem where we simply ignore the j-th element of p j and column of δ for all j corresponding to p j = 0. Clearly, given any feasible channel Q of the original problem (65) we can construct a feasible channel Q ′ to the reduced problem with the same objective value by dropping the corresponding columns of Q.In the reverse direction, we just need to populate the columns of Q ′ we removed with any valid probability distribution.
For the quantum rate-distortion function, let us assume {v i } for i = 1, . . ., n ′ are the eigenvectors corresponding to non-zero eigenvalues of the (possibly rank-deficient) matrix ρ A .Similar to the argument for basis independence, we can define a new n ′ -dimensional system A ′ with orthonormal basis {u i } and an isometry V = n ′ i=1 v i u † i .We use this to define a new problem with a full-rank input state V † ρ A V , purification (V ⊗ V ) † ρ AR (V ⊗ V ) and distortion matrix (I ⊗ V ) † ∆(I ⊗ V ).For any feasible channel N of the original problem (15), we can construct feasible channel N ′ for the reduced problem with the same objective value by letting In the reverse direction, we have where ρ i ∈ D m is any valid density matrix for all i = n ′ + 1, . . ., n.

B.3 Optimizing over channel and joint output state
Now given non-degeneracy of p, it is fairly straightforward to show that we can map between feasible channels Q and states P which obtain the same objective values between (65) and (67) using for all i = 1, . . ., m, j = 1, . . ., n.Note that this is well defined as p j > 0 for all j = 1, . . ., n. Similarly for the quantum setting, given that ρ A has spectral decomposition ρ A = i=1 λ i v i v † i , without loss of generality we will first represent σ BR and N as Given this, we can map from feasible channels N and states σ BR which obtain the same objective values between (15) and (17) using Again, this is well-defined as λ i > 0 for all i = 1, . . ., n from the assumption that ρ A is full-rank.Note that the first relationship is equivalent to σ BR = (N ⊗ I)(ρ AR ).

C Linear convergence rates
In Figure 1, we show the empirical convergence behaviors of exact mirror descent on the classical and quantum rate-distortion problems for the Hamming distortion δ = 11 ⊤ − I and entanglement fidelity distortion ∆ = I − ρ AR respectively, for various values of κ and randomly generated inputs.We observe linear convergence for all examples, and note that this linear rate appears to become faster as κ increases.These observations were also shared across multiple randomly generated input states.From Fact 2.4, this suggests that classical and quantum mutual information might be strongly convex relative to Shannon and von Neumann entropy respectively.However, simple counterexamples show that this is false.For a continuously differentiable function f to be strongly convex relative to a Legendre function φ on a set X , there must be a µ > 0 that satisfies ⟨∇f (x) − ∇f (y), x − y⟩ ≥ µ(D φ (x∥y) + D φ (y ∥x)), for all x, y ∈ relint X (see e.g., [12, Proposition 1.1(b)]).For classical mutual information with input distribution p ∈ P n , we can show that for X, Y ∈ P m×n with columns X j = p j x and Y j = p j y for any x, y ∈ P m for all j = 1, . . ., n that Similarly, for the quantum case with input state ρ A ∈ D n , we can show that for σ by using additivity of quantum relative entropy [28,Property 11.2.1].Therefore, we conclude that classical and quantum mutual information are not relatively strongly convex on their respective domains.Instead, it is possible to gain insights into this linear convergence behavior through the local convexity properties of mutual information at the solutions to the rate-distoriton problems.

C.1 Preliminaries
We will first introduce the notion of local relative strong convexity.
Definition C.1.Let f and φ be twice continuously differentiable at some x, and let φ be Legendre.Then f is locally µ-strongly convex relative to φ at x if there exists a µ > 0 such that We also introduce the following additional assumptions on the kernel function φ.The first two are standard assumptions used to prove convergence of the mirror descent iterates to the solution, rather than just convergence of the function values.(iii) The function φ is strongly convex.
We now present some results for local linear convergence.
Proposition C.2.Consider problem (6) where x * is any corresponding optimal solution, f and φ are twice continuously differentiable at x * , and φ is Legendre and satisfies Assumption B. If f is locally µ-strongly convex relative to φ at x * , then the following results hold.
(i) The optimal solution x * is the unique optimal solution.
(ii) Let {x k } be the sequence of iterates generated by using the mirror descent iterates (10) to solve (6).If f is L-smooth relative to φ and t k = 1/L for all k, then there exists a µ ′ ∈ (0, µ) and N ∈ N such that the sequence {x k } satisfies for all k > N .
Proof.As µ > 0 and φ satisfies Assumption B(iii), meaning ∇ 2 φ(x) ≻ 0 for all x ∈ int dom φ, then there exists a µ ′ ∈ (0, µ) such that As f and φ are both twice continuously differentiable, then there exists an ε > 0 such that for all ∥x−x * ∥ < ε.For part (i), it follows from Assumption B(iii) that f is strictly convex within this neighborhood, in which x * must therefore be the unique solution.As the set of solutions to (6) must be convex, x * must be the unique solution on the entire feasible set.
For part (ii), we first note that given L-relative smoothness of f and Assumption B that the mirror descent iterates {x k } converge to the unique solution x * (this follows from a similar argument as, e.g., [11, Theorem 2(ii)]).Therefore there exists an N ∈ N such that ∥x k − x * ∥ < ε for all k > N .Linear convergence within this region follows from Fact 2.4.

Remark C.3. The kernel functions φ = −H and φ = −S satisfy Assumption B(i) and Assumption B(ii). Additionally, they satisfy Assumption B(iii) on any bounded set
(notably, they are 1-strongly convex with respect to the 1-and trace-norm over P n and D n respectively [6, Proposition 3.12]).

C.2 Analysis of rate-distortion problems
If we can characterize the local relative strong convexity parameters of mutual information at the solutions to the rate-distortion problems, then Proposition C.2 allows us to predict the convergence behavior of mirror descent around a neighborhood of the solution.Knowing these local relative strong convexity parameters a priori can be difficult for arbitrary parameterizations of the rate-distortion problems.However, in Section 4.3, we derived explicit expressions for the optimal solutions of the classical rate-distortion problem for uniform input distribution and Hamming distortion (Theorem A.6), and the quantum rate-distortion problem for maximally mixed input and entanglement fidelity distortion (Theorem 4.16).Therefore, we can compute the local relative strong convexity parameters  of mutual information at these solutions by solving what amounts to a generalized eigenvalue problem.If we consider (6) where V = R n , this amounts to finding the smallest µ that satisfies for some v ∈ R m \{0}, where m = dim R (ker A) and U ∈ R n×m is an isometry with columns forming an orthonormal basis of ker A such that U U † is a projection onto this nullspace.Figure 2 shows these values of the local relative strong convexity parameter µ for various values of distortion dual variables κ and problem dimensions n for both of these specific classical and quantum rate-distortion problems.Notably, we see that for both the classical and quantum scenarios that µ increases monotonically from 0 to 1 as κ increases from 0 to infinity.Combining these observations with Lemma 5.1 and Proposition C.2, we conclude that mirror descent applied to solve these particular instances of rate-distortion problems will converge linearly around a neighborhood around the solution.Moreover, this linear rate becomes arbitrarily fast as κ tends to infinity.
As seen in Figure 1, these observations reflect our empirical findings for the classical rate-distortion problem with Hamming distortion and quantum rate-distortion problem with entanglement fidelity distortion, not only for uniform input distributions and maximally mixed input states, but for any input state.We remark that it is possible to rigorously prove that the lim κ→∞ µ(κ) = 1 property seen in Figure 2 holds for any input state.

D Frank-Wolfe lower bound D.1 Preliminaries
We will first remind the reader about Frank-Wolfe type lower bounds for convex optimization problems over compact sets, then show how these can be applied to the classical and quantum rate-distortion problems.Consider (6) where C is a compact set.As f is convex, any linear approximation will always lower bound the function.That is, for any x, y ∈ C f (x) ≥ f (y) + ⟨∇f (y), x − y⟩. (76) By finding the infimum of the right hand side over y over C, we can obtain a value which lower bounds the function.As C is compact, this lower bound will always exist, and therefore we define which upper bounds the true optimality gap, i.e., e(x) ≥ f (x) − f * .We also show that this error bound converges to zero as x converges to an optimal point.
Proposition D.1.Consider the constrained convex optimization problem (6) where C is a compact set.At an optimal point x * , the estimated optimality gap equals zero e(x * ) = 0.Moreover, if f is continuously differentiable, for any sequence {x k } that converges to x * , the sequence {e(x k )} will converge to zero.
Proof.The first order optimality condition for a constrained minimization problem ( 6) is where N C is the normal cone of C (see e.g.[33]).It follows from the definition of the normal cone and the optimality condition (79) that for all x ∈ C ⟨∇f (x * ), x * − x⟩ ≤ 0. (80) Noting that the optimality gap e(x * ) maximizes the left hand side over x ∈ C and is non-negative shows that e(x * ) = 0 as desired.
To show convergence of the optimality gap to zero, we recognize that e(x) is the sum of a continuous function and the support function of a compact set.As support functions on bounded sets are continuous [48,Proposition 8.6], from continuous differentiability of f and the fact that compositions of continuous functions are continuous, we conclude that e(x) is a continuous function, from which the desired result follows.
Over certain sets, there are well-known ways to compute the minimization required in (77), see, e.g., [49].

D.2 Application to rate-distortion problems
We now show how the Frank-Wolfe lower bound can be applied to both the classical and quantum rate-distortion problems.[6], the lower bounds derived for Blahut-Arimoto algorithms applied to classical [4] and quantum [50,51] channel capacities are identical to the Frank-Wolfe lower bound (78).

Classical rate-distortion
For the classical rate-distortion problem (CRD), let us consider the set P ′ = {P ∈ P m×n : m i=1 P ij = p j , j = 1, . . ., n} where p ∈ P n .Over this set, we find where ∂ ij f is the partial derivative of f with respect to the (i, j)-th coordinate, by noting the minimization is separable into minimizing each column of P over a scaled probability simplex.
Quantum rate-distortion For the quantum rate-distortion problem (QRD), consider the set D ′ = {σ BR ∈ D mn : tr B (σ BR ) = σ} where σ ∈ D n with spectral decomposition σ = n i=1 λ i v i v † i .Unfortunately, it is not obvious how to compute the lower bound (77) efficiently over this set.However, if the gradient is of the form where g ij ∈ R for all i, j and {b i } is any orthonormal bases for B, then a symmetry reduction argument shows that the left-hand side of (83) is invariant under the same representation as in Example 4.17, and therefore the optimal value is an element of the subspace (33).Therefore, the minimization is separable in the same way as (81), and Although (82) this seems like a very limiting assumption, we can show this holds for the quantum rate-distortion problem setting we are most interested in, i.e., where distortion is measured using the entanglement fidelity, and if iterates are generated using Algorithm 1.
To see this, consider This property of the gradients no longer holds in general if we use the inexact method in Algorithm 3.However, we can still use this lower bounding technique whenever we solve a step exactly, which we can choose to do at the last iteration before terminating the algorithm to obtain an optimality gap for the algorithm's final output.

E Background proofs E.1 Proof of Proposition 2.7
We first present a simple result about how strict convexity and coercivity are preserved by injective linear maps.Lemma E.1.Consider a function f : V → R and an affine operator A : V → V ′ .
We will also briefly remind the reader about these fairly standard results which establish Lipschitz-continuity and strong convexity over compact sets.Lemma E.3.Consider a continuously differentiable function f : U → V where U is an open subset of an inner product space V ′ , and a convex compact set X ⊂ U. Then f is Lipschitz on X , meaning that there exists an L ≥ 0 such that ∥f (x) − f (y)∥ ≤ L∥x − y∥ for all x, y ∈ X .
Proof.This is a straightfoward consequence of the mean value theorem [53, Theorem X.4.5] and convexity of X .
Lemma E.4.Consider a twice continuously differentiable function f : U → R where U is an open subset of an inner product space V, and a convex compact set X ⊂ U.If f has a positive definite Hessian, then f is strongly convex on X , meaning that there exists a µ > 0 such that D 2 f (x)[v, v] ≥ µ for all x ∈ X and v ∈ V satisfying ∥v∥ = 1.
Proof.Given twice continuous differentiability of f , we know D 2 f (x)[v, v] is a continuous function in both x and v.The desired result then follows by taking the infumum of D 2 f (x) [v, v] over the product of the compact sets X and {v ∈ V : ∥v∥ = 1}.
We now present the main proof.
Proof of Proposition 5.2(ii).Given that gradient descent and Newton's method with backtracking both result in a monotonic sequence of function values, for some initialization ν 0 ∈ H n , we can constrain our analysis to the superlevel set S = {ν ∈ H n : g k q (ν) ≥ g k q (ν 0 )} which is compact from coercivity of −g k q (ν).Over this compact set, from Lemmas E.3 and E.4 we establish that g k q (ν) is strongly concave, and has Lipschitz gradient and Hessian.These properties are sufficient to establish global convergence of the function values to the optimal value by using gradient descent and Newton's method with backtracking [39, Sections 9.3.1 and 9.5.3].Convergence of the iterates to the unique solution follows from strong convexity and Lipshitz gradient (see e.g., the discussion in [39, Section 9.1.2]).

Corollary 4 .
15.A solution to (QRD) parameterized by the maximally mixed input state ρ A = I/n and distortion matrix ∆ = I − ρ AR , where ρ AR is the purification of ρ A , exists in the subspace

+,
this gives us R c (D; p, δ) = min P ∈P m×n I c (P ; p)

P
Like the quantum case, we will reparameterize the distortion constraint with the dual variable κ ≥ 0 instead of D. The classical rate-distortion problem for distortion dual variable κ ≥ 0, input distribution p ∈ P n , and distortion matrix δ ∈ R m×n + then becomes Rc (κ; p, δ) = min P ∈P m×n I c (P ; p) + κ ij = p j , j = 1, . . ., n, (CRD.b)

Lemma A. 4 .
Let us define the group S n as the group of permutations on {1, . . ., n}, and a representation (R n×n , π) of S n asπ(g)(X) = P g XP ⊤ g , ∀X ∈ R n 2 ×n 2 ,where P g is the matrix representation of permutation g.Problem (CRD) for uniform input distribution p = 1/n and Hamming distortion δ = 11 ⊤ − I is invariant under π.Proof.Classical mutual information and the constraint (CRD.b) are invariant under column-wise and row-wise permutations.Moreover, as conjugation by permutation matrices always maps diagonal elements to diagonals, and off-diagonal elements to off-diagonals, δ is also invariant under π.Corollary A.5.A solution to (CRD) for input distribution p = 1/n and Hamming distortion δ = 11 ⊤ − I exists in the subspace V π = {aI + b11 ⊤ : a, b ∈ R}. (70) Proof.It follows from Lemmas 4.5 and A.4 that a solution must live in the fixed-point subspace of the representation (R n×n , π) defined in Lemma A.4.It is well known that the matrix representation (C n , τ ) of the permutation group S n (i.e., τ (g) = P g ) can be decomposed as the direct sum of two irreducible subrepresentations V = {1} and W = {x ∈ C n : ⟨1, x⟩ = 0}.Therefore using Corollary A.3, we can show the dimension over the reals of the corresponding fixed-point subspace is dim

Figure 1 :
Figure 1: Convergence behaviors of computing the classical and quantum rate-distortion functions using mirror descent for Hamming distortion and entanglement fidelity distortion and for randomly generated inputs of dimension n = 32.

Assumption B .
The kernel function φ : V → R satisfies the following.(i) The right partial level sets {y ∈ int dom φ : D φ (x∥y) ≤ α} are bounded for all x ∈ dom φ and α ∈ R. (ii) If a sequence {y k } ∈ int dom φ converges to x ∈ dom φ, then D φ (x∥y k ) → 0.

Figure 2 :
Figure 2: Local relative strong convexity parameters µ at the optimal point of the (a) classical ratedistortion problem with uniform input distribution and Hamming distortion and (b) quantum ratedistortion problem with maximally mixed input state and entanglement fidelity distortion, at different problem dimensions n and distortion dual variables κ.
BR .Corollary 5.11.Consider the quantum rate-distortion problem (QRD) for some input state ρ A ∈ D n and distortion matrix ∆ ∈ H mn + .Let Î * denote its optimal value, and σ * BR be any corresponding optimal point.The sequence {(σ k BR , σk BR )} generated by Algorithm 3 satisfies linear operator N is completely positive if and only if C N ⪰ 0 [29, Theorem 2.22], and is trace preserving if and only if tr B (C N ) = I A [29, Theorem 2.26].Applying a linear operator on input state ρ A ∈ D n is equivalent to N (ρ A ) = tr A (C N (I B ⊗ ρ A )) [29, Equation (2.66)].