Robust Interior Point Method for Quantum Key Distribution Rate Computation

Security proof methods for quantum key distribution, QKD , that are based on the numerical key rate calculation problem, are powerful in principle. However, the practicality of the methods are limited by computational resources and the eﬃciency and accuracy of the underlying algorithms for convex optimization. We derive a stable reformulation of the convex nonlinear semideﬁnite programming, SDP , model for the key rate calculation problems. We use this to develop an eﬃcient, accurate algorithm. The stable reformulation is based on novel forms of facial reduction, FR , for both the linear constraints and nonlinear quantum relative entropy objective function. This allows for a Gauss-Newton type interior-point approach that avoids the need for perturbations to obtain strict feasibility, a technique currently used in the literature. The result is high accuracy solutions with theoretically proven lower bounds for the original QKD from the FR stable reformulation. This provides novel contributions for FR for general SDP . We report on empirical results that dramatically improve on speed and accuracy, as well as solving previously intractable problems.

1 Introduction Quantum key distribution, QKD (see e.g., [1,2] for reviews), is a secure communication method that distributes a secret key between two honest parties (traditionally known as Alice and Bob), even in the presence of an eavesdropper (traditionally called Eve). Those keys can be used for secure communication, authentication, secure multi-party computation and other cryptographic applications. Moreover, QKD is a quantum-resistant (quantum-safe) key establishment protocol. The need for quantum-resistant cryptography is widely recognized given that the threat of quantum computers to current cryptosystems can become a reality in the future. In contrast to the area of post-quantum cryptography that is also believed to be quantum-resistant, one can actually prove the information-theoretic security of QKD protocols, an attractive and important feature. Moreover, in these security proofs, one need only assume that Eve follows the laws of quantum mechanics. Such an all-powerful Eve can be assumed to have access to unlimited computational powers, including not-yet-available quantum computers. The core of a proof of security of any QKD protocol is to calculate the secret key rate, which is the number of secret key bits obtained per exchange of a quantum signal. As one needs to take into account all possible eavesdropping attacks from an all-powerful Eve, an analytical calculation of the key rate is extremely challenging. Such analytical calculations are generally limited to protocols with high symmetry like the BB84 protocol [3]. Moreover, one often has to invoke inequalities to obtain an analytical lower bound of the true key rate. Those inequalities can significantly loose the key rate in many practical parameter regimes of a QKD protocol. Fortunately, the key rate problem can be formulated as the optimal value of a convex minimization problem. In general, the size of the problem makes it intractable to find analytical solutions. Therefore, we resort to numerical approaches. The numerical calculation of the key rate can be done by finding a provable tight lower bound of this convex optimization problem. This is true for both asymptotic [4,5] and finite-size regimes [6]. In principle, any (devicedependent) QKD protocol can be analyzed in this numerical framework, including measurementdevice-independent, and both discrete-variable and continuous-variable protocols. If we are able to solve the problem numerically without loosing tightness, we can potentially provide tighter key rates, even in situations where some valid analytical bounds are known. This is highly relevant for practical implementations of QKD protocols as it may enable us to gain key rates without modifying hardwares. This paper continues with the convex optimization approach, and contributes novel robust strategies for numerical calculation of key rates that exhibit strong practical performance.

Convex Optimization
The secret key rate convex optimization calculations can be typically performed after introducing suitable tools to reduce the dimension of the problem, e.g., the squashing models [7], or the dimension reduction method [8]. These tools are necessary as we are interested in QKD protocols that have quantum optical implementations, and thus work with infinite-dimensional Hilbert spaces corresponding to optical modes. In reality, the success of this security proof method is often limited by computational resources, as well as the efficiency and accuracy of underlying algorithms. For a specific protocol, it is also possible to further reduce the dimension of the key rate calculation problem by using properties of the protocol. For example, as done in [7,9], one can without loss of generality consider only block-diagonal density matrices in the feasible set due to the block-diagonal structure of measurement operators. Then by utilizing properties of the objective function, we can split the key rate calculation problem into several convex optimization problems for individual blocks. Our goal here is to develop a tool that works for general protocols. In applying our method, one can always choose the problem formulation after applying all applicable aforementioned methods to reduce the dimension as the starting point. Furthermore, it should be noted that the finite-size key rate problem involves a variation of the asymptotic key rate formulation. For the simplicity of our discussion, we focus on the asymptotic formulation in this paper.
The work in [5] provides a reliable framework to compute the key rate using a two-step routine. In the first step, one tries to efficiently find a near optimal, feasible point, of the optimization problem; see (2.2) for the explicit problem formulation. In the second step, one then obtains a reliable lower bound from this feasible point by a linearization and duality argument. In terms of numerical computation, the bottleneck of this approach for large-size QKD problems comes from the first step, as it involves semidefinite optimization with a nonlinear objective function. In particular, the work in [5] proposes an algorithm based on the Frank-Wolfe method to solve the first step. However, this method can converge slowly in practice. We note that in Faybusovich and Zhou [10], they also work on providing a more efficient algorithm based on a long-step path-following interior-point method for the QKD key rate calculation problem. However, their discussions are restricted to real symmetric matrices, while for QKD key rate calculations, it is important to handle Hermitian matrices. Although it might be possible to extend the algorithm in [10] to deal with Hermitian matrices, currently the extension is not done and thus, we cannot directly compare our algorithms with theirs. In addition, the problem formulations used in [5,10] do not guarantee positive definiteness of the matrices involved in the objective function. Therefore, they perturb current feasible solutions by adding a small multiple of the identity matrix. This perturbation is not required in our new method in this paper due to our regularization using facial reduction, FR. 1 In this paper, we derive a stable reformulation of the convex semidefinite programming, SDP, model for the key rate calculation problem of QKD. We use this to derive efficient, accurate, algorithms for the problem, in particular, for finding provable lower bounds for the problem. The stable reformulation is based on a novel facial reduction, FR, approach. We exploit the Kronecker structure and do FR first for the linear constraints to guarantee a positive definite, strictly feasible solution. Second we exploit the properties of the completely positive maps and do FR on the nonlinear, quantum relative entropy objective function, to guarantee positive definiteness of its arguments. This allows for a Gauss-Newton type interior-point approach that avoids the need for perturbations to obtain positive definiteness, a technique currently used in the literature. The result is high accuracy solutions with provable lower (and upper) bounds for the convex minimization problem. We note that the convex minimization problem is designed to provide a lower bound for the key rate.

Outline and Main Results
In Section 2 we present the preliminary notations and convex analysis tools that we need. In particular, we include details about the linear maps and adjoints and basic facial reduction, FR, needed for our algorithms; see Sections 2.4 and 2.5.
The details and need for facial reduction, FR, is discussed in Section 3. We perform FR due 1 Following the original submission of this paper, we have obtained access to the code in [10]. We have observed that the code requires a Slater point, as it uses the analytic center as the starting point of the algorithm. Therefore, we could not use it for many of our instances where strict feasibility fails. Moreover, the random problems solved in the paper have positive definite (strictly feasible) optimal solutions that are relatively close to the analytic center. We hope to fully test the code in [10] on our facially reduced problems in a follow-up paper. Hopefully, we can combine the information on efficient evaluations of the Hessian in [10] with our FR techniques and improve both algorithmic approaches.
to the loss of strict feasibility for the linear constraints for some classes of instances, as well as the rank deficiency of the images of the linear maps that consist of the nonlinear objective function. The FR guarantees that the reformulated model satisfies the strict feasibility of the linear constraints, and the objective function is evaluated at positive definite density matrices. A partial FR that stems from singularity encountered from the reduced density operator constraint is discussed in Section 3.2.1. A second type of FR performed on the completely positive mappings of the objective function is discussed in Section 3.2.2. These FR steps result in a much simplified reformulated model (3.19) for which strict feasibility holds and the objective function arguments preserve positive definiteness. This allows for efficient accurate evaluation of the objective function that uses a spectral decomposition and avoids the less accurate matrix log function evaluation. In addition, we discuss the differentiability of the objective function, both first and second order, in Appendix A. 3.
In Section 4 we begin with the optimality conditions and a projected Gauss-Newton, GN, interior point method. This uses the modified objective function that is well defined for positive definite density matrices, ρ 0. We use the stable GN search direction for the primal-dual interior-point, p-d i-p, method. This avoids unstable backsolve steps for the search direction. We also use a sparsity preserving nullspace representation for the primal feasibility in Appendix B.3.2. This provides for exact primal feasibility steps during the algorithm. Optimal diagonal precondition for the linear system is presented in Appendix B.3.3.
Our upper and lower bounding techniques are given in Section 4.5. In particular, we provide novel theoretical based lower bounding techniques for the FR and original problem in Corollaries 4.8 and 4.10, respectively. 2 Applications to the security analysis of some selected QKD protocols are given in Section 5. This includes comparisons with other codes as well as solutions of problems that could not be solved previously. We include the lower bounds and illustrate its strength by including the relative gaps between lower and upper bounds; and we compare with the analytical optimal values when it is possible to do so.
We provide concluding remarks in Section 6. Technical proofs, further references and results, appear in Appendices A and B. The details for six protocol examples used in our tests are given in Appendix C.
A reader, whose main interest is to use the code released from this work to perform QKD security analysis, can safely skip Sections 2.3 to 2.5 after reading Section 2.1. One may refer to Section 2.2 for notations if necessary. The main results of Sections 3 and 4 are summarized at the beginning of those sections. The final model of the key rate problem after regularization via FR is given in (3.19). The projected Gauss-Newton interior-point algorithm for this model is presented in Algorithm 1 with a less technical explanation given in Section 4.4. In terms of reliable lower bound for the original key rate problem, one may be interested in Remark 4.9 and Corollary 4.10. One can then safely skip details presented in the rest of Sections 3 and 4 and proceed with Section 5 to compare numerical performance of our method with other existing approaches.

Preliminaries
We now continue with the terminology and preliminary background for the paper as well as presenting the notations. Sections 2.3 to 2.5 contain the convex optimization background material to understand details of our problem reformulation and our projected Gauss-Newton interiorpoint algorithm. We also include pointers to additional convex optimization background that appears in Appendix A.

QKD key rate calculation background
The asymptotic key rate R ∞ is given by the Devetak-Winter formula [11] that can be written in the following form [5]: where D(δ σ) =:f (δ, σ) = Tr (δ(log δ − log σ)) is the quantum relative entropy, p pass is the probability that a given signal is used for the key generation rounds, and δ EC is the cost of error correction per round. The last two parameters are directly determined by observed data and thus are not a part of the optimization. Thus, the essential task of the quantum key distribution rate computation is to solve the following nonlinear convex semidefinite program: where Γ : H n → R m is a linear map defined by Γ(ρ) = (Tr(Γ i ρ)); H n is the linear space of Hermitian matrices over the reals; and γ ∈ R m . In this problem, {Γ i } is a set of Hermitian matrices corresponding to physical observables. The data pairs Γ i , γ i are known observation statistics that include the Tr(ρ) = 1 constraint. The maps G and Z are linear, completely positive maps that are specified according to the description of a QKD protocol. In general, G is trace-nonincreasing, while Z is trace-preserving and its Kraus operators are a resolution of identity. The maps are usually represented via the so-called operator-sum (or Kraus operator) representation.
(More details on these representation are given below as needed; see also Definition 3.1.) In other words, the optimization problem is of the form in (2.3): where the objective function f is the quantum relative entropy function as shown in (2.1), and the constraint set is a spectrahedron, i.e., the intersection of an affine manifold and the positive semidefinite cone. The affine manifold is defined using the linear map for the linear equality constraints in (2.3): These are divided into two sets: the observational and reduced density operator constraint sets, i.e., S O ∩ S R . The set of states ρ satisfying the observational constraints is given by

4)
where we let n A , n B be the sizes of P A s ∈ H n A , P B t ∈ H n B , respectively; and we denote the Kronecker product, ⊗. We set n = n A n B which is the size of ρ.
The set of states ρ satisfying the constraints with respect to the reduced density operator, ρ A , is

5)
where θ j = Θ j , ρ A and {Θ j } forms an orthonormal basis for the real vector space of Hermitian matrices on system A. This implicitly defines the linear map and constraint in Tr B (ρ) = ρ A .
Here we denote the identity matrix 1 B ∈ H n B .
Here, we may assume that Γ 1 = I and γ 1 = 1 to guarantee that we restrict our variables to density matrices, i.e., semidefinite and unit trace.

Notations
We use C n×n to denote the space of n-by-n complex matrices, and H n to denote the subset of nby-n Hermitian matrices; we use H when the dimension is clear. We use S n , S for the subspaces of H n of real symmetric matrices. Given a matrix X ∈ C n×n , we use (X) and (X) to denote the real and the imaginary parts of X, respectively. We use H n + , S n + ( H n ++ , S n ++ , resp) to denote the positive semidefinite cone (the positive definite cone, resp); and again we leave out the dimension when it is clear. We use the partial order notations X 0, X 0 for semidefinite and definite, respectively. We let R n denote the usual vector space of real n-coordinates; P C (X) denotes the projection of X onto the closed convex set C. For a matrix X, we use range(X) and null(X) to denote the range and the nullspace of X, respectively. We let BlkDiag(A 1 , A 2 , . . . , A k ) denote the block diagonal matrix with diagonal blocks A i .

Real Inner Product Space C n×n
In general, H n is not a subspace of C n×n unless we treat both as vector spaces over R. To do this we define a real inner product in C n×n that takes the standard inner products of the real and imaginary parts: Tr(Y † X) .

(2.6)
We note that Over the reals, dim( H n ) = n 2 , dim(C n×n ) = 2n 2 . The induced norm is the Frobenius norm X 2 F = X, X = Tr X † X , where we denote the conjugate transpose, · † .

Linear Transformations and Adjoints
Given a linear map L : D → R, we call the unique linear map L † : R → D the adjoint of L, if it satisfies Often in our study, we use vectorized computations instead of using complex matrices directly. In order to relieve the computational burden, we use isomorphic and isometric realizations of matrices by ignoring the redundant entries. We consider H n as a vector space of dimension n 2 over the reals. We define Hvec(H) ∈ R n 2 by stacking diag(H) followed by √ 2 times the strict upper triangular parts of (H) and (H), both columnwise: We note that for the real symmetric matrices S n , we can use the first triangular number, t(n) = n(n + 1)/2 of elements in Hvec, and we denote this by svec(S) ∈ R t(n) , with adjoint sMat. We use various linear maps in an SDP framework. For given Γ i ∈ H n , i = 1, . . . , m, define The adjoint satisfies The matrix representation A of Γ is found from i.e., for g i = Hvec(Γ i ), ∀i and h = Hvec(H), Specialized adjoints for matrix multiplication are given in Appendix A.1.

Cones, Faces, and Facial Reduction, FR
The facial structure of the semidefinite cone is well understood. We outline some of the concepts we need for facial reduction and exposing vectors (see e.g., [13]). We recall that a convex cone K is defined by: λK ⊆ K, ∀λ ≥ 0, K + K ⊂ K, i.e., it is a cone and so contains all rays, and it is a convex set. For a set S ⊆ H we denote the dual cone, S † = {φ ∈ H : φ, s ≥ 0, ∀s ∈ S}.
Equivalently, for a general convex set K and convex subset where [x, y] denotes the line segment joining x, y.
Faces of the positive semidefinite cone are characterized by the range or nullspace of any element in the relative interior of the faces. In fact, the following characterizations in Lemma 2.2 hold.
be the orthogonal spectral decomposition with D ∈ H r ++ . Then the following are equivalent: The matrix P , in Item 3 of Lemma 2.2, allows us to represent any matrix Y ∈ F ¢ H n + as a compact spectral decomposition, i.e., Y = P DP † with diagonal D ∈ H r + . This compact representation leads to a reduction in the variable dimension. The matrix QQ † , in Item 4 of Lemma 2.2, is called an exposing vector for the face F . Exposing vectors come into play throughout Section 3. Definition 2.3 (minimal face). Let K be a closed convex cone and let X ⊆ K. Then face(X)¢K is the minimal face, the intersection of all faces of K that contain X.
Facial reduction is a process of identifying the minimal face of H n + containing the spectrahedron {ρ : Γ(ρ) = γ} ∩ H n + . Lemma 2.4 plays an important role in the heart of the facial reduction process. Essentially, either there exists a strictly feasible ρ, or the alternative holds that there exists a linear combination of the Γ i that is positive semidefinite but has a zero expectation. In Lemma 2.4, the matrix Γ † (y) is an exposing vector for the face containing the constraint set in (2.3).

Problem Formulations and Facial Reduction
The original problem formulation is given in (2.2). Without loss of generality we can assume that the feasible set, a spectrahedron, is nonempty. This is because our problem is related to a physical scenario, and we can trivially set the key rate to be zero when the feasible set is empty. Note that the Hermitian (positive semidefinite, density) matrix ρ is the only variable in the above (2.2) optimization problem. Motivated by the fact that the mappings G, Z • G are positive semidefinite preserving but possibly not positive definite preserving, we rewrite (2.2) as follows: 3 Due to the structure of the linear mapping G, the matrix δ is often singular in (3.1). Therefore, strict feasibility fails in (3.1). This indicates that the objective function, the quantum relative entropy function is evaluated on singular matrices in both (2.2) and (3.1), creating theoretical and numerical difficulties. In fact, the domain of the problem that guarantees finiteness for the objective function, requires restrictions on the ranges of the linear mappings. By moving back and forth between equivalent formulations of the types in (2.2) and (3.1), we derive a regularized model that simplifies type (2.2), and where positive definiteness is preserved. In particular, the regularization allows for an efficient interior point method even though the objective function is not differentiable on the boundary of the semidefinite cone. This allows for efficient algorithmic developments. In addition, this enables us to accurately solve previously intractable problems.
We now present the details on various formulations of QKD from (2.2) and (3.1). We show that facial reduction allows for regularization of both the constraints and the objective function, i.e., this means that we have positive definite feasible points, and a proper domain for the objective function with positive definite matrices. This obviates the need for adding perturbations of the identity. We include results about FR for positive transformations and show that highly accurate FR can be done in these cases.
In particular, we provide a regularized reformulation of our problem, see (3.19), and the regularization statement that guarantees positive definiteness, see (3.20).

Properties of Objective Function and Mappings G, Z
The quantum relative entropy function D : , and is defined as Typically we have k > n with k being a multiple of n; and thus we can have G(ρ) rank deficient for all ρ 0.

Definition 3.2. The self-adjoint (projection) linear map
is a spectral resolution of I. Proposition 3.3 below states some interesting properties of the operator Z; see also [15, Appendix C, (C1)].

Proposition 3.3. The linear map Z in Definition 3.2 is an orthogonal projection on
Proof. First we show that the matrices of Z satisfy We now have Z = Z 2 = Z 1/2 = Z † . Thus, Z is an orthogonal projection. The equality (3.5) holds by the properties of the map Z that it removes the off-diagonal blocks in its image.
Using (3.2), Lemma 3.4 below shows that the objective value of the model (2.2) is finite on the feasible set. This also provides insight on the usefulness of FR on the variable σ done below.
Remark 3.5. In general, the mapping G in (3.3) does not preserve positive definiteness. Therefore the objective function f (ρ), see (A.9) below, may need to evaluate Tr(δ log δ) and Tr(δ log σ) with both δ = G(ρ) and σ = Z • G(ρ) always singular. Although the objective function f is well-defined at singular points δ, σ, the gradient of f at singular points δ, σ is not well-defined. Our approach using FR within an interior point method avoids these numerical difficulties.

Reformulation via Facial Reduction (FR)
Using Proposition 3.3, we can now reformulate the objective function in the key rate optimization problem (3.1) to obtain the following equivalent model: The new objective function is the key in our analysis, as it simplifies the expressions for gradient and Hessian. Next, we derive facial reduction based on the constraints in (3.8).

Partial FR on the Reduced Density Operator Constraint
Consider the spectrahedron S R defined by the reduced density operator constraint in (2.5). We now simplify the problem via FR by using only (2.5) in the case that ρ A ∈ H n A is singular. We now see in Theorem 3.6 that we can do this explicitly using the spectral decomposition of ρ A ; see also [16,Sec. II]. Therefore, this step is extremely accurate. Using the structure arising from the reduced density operator constraint, we obtain partial FR on the constraint set in Theorem 3.6.
Theorem 3.6. Let range(P ) = range(ρ A ) H n A , P † P = 1 r for r < n A , and let V = P ⊗ 1 B .
Then the spectrahedron S R in (2.5) has the property that Proof. Let P Q be a unitary matrix such that range(P ) = range(ρ A ) and range(Q) = null(ρ A ).
where 1 B ∈ H n B is the identity matrix of size n B , and we use (2.5) to guarantee that Tr B (ρ) = ρ A . Therefore, W ⊗ 1 B 0 is an exposing vector for the spectrahedron S R in (2.5). And we can write ρ = V RV † with V = P ⊗ 1 B for any ρ ∈ S R . This yields an equivalent representation (3.9) with a smaller positive semidefinite constraint. 4 We emphasize that facial reduction is not only powerful in reducing the variable dimension, but also in reducing the number of constraints. Indeed, if ρ A is not full-rank, then at least one of the constraints in (2.5) becomes redundant and can be discarded; see [17,18]. In this case, it is equivalent to the matrix ρ A becoming smaller in dimension. (Our empirical observations show that many of the other observational constraints Γ i (ρ) = γ i also become redundant and can be discarded.)

FR on the Constraints Originating from G, Z
Our motivation is that the domain of the objective function may be restricted to the boundary of the semidefinite cone, i.e., the matrices G(ρ), Z(G(ρ)) are singular by the definition of G. We would like to guarantee that we have a well-formulated problem with strictly feasible points in the domain of the objective function so that the derivatives are well-defined. This guarantees basic numerical stability. This is done by considering the constraints in the equivalent formulation in (3.1).
We first note the useful equivalent form for the entropy function.
Proof. We obtain a unitary matrix U = V P by completing the basis.
We use the following simple result to obtain the exposing vectors of the minimal face in the problem analytically.
Then the minimal face, face(A(C)) = V H r + V † . Proof. First, note that properties of the mapping implies that A(C) ⊂ H k + . Nontrivial exposing vectors 0 = W ∈ H n + of A(C) can be characterized by the null space of the adjoint operator A † : where the third equivalence follows from int (C) = ∅, and the fourth equivalence follows from the properties of the sum of mappings of a semidefinite matrix. The choice of V follows from choosing a maximal rank exposing vector and constructing V using Lemma 2.2: We emphasize that the minimal face in Lemma 3.8 means that V has a minimum number of columns, as without loss of generality, we choose it to be full column rank. In other words, this is the greatest reduction in the dimension of the image. In addition, the exposing vectors of A(C) are characterized by the positive semidefinite matrices in the null space of A † . This implies that FR can be done in one step.
We describe how to apply Lemma 3.8 to obtain V ρ , V δ , V σ of the minimal face of ( H n + , H k + , H k + ) containing the feasible region of (3.8). By Lemma 2.2, we may write Define the linear maps As above, this also can be seen by looking at the image of I and the relative interior of the range of Z V . We note, by Lemma 3.4, that range(V σ ) ⊇ range(V δ ). Note that we have assumed the exposing vector of maximal rank for the original constraint set on ρ in the first step is obtained. Without loss of generality, we can assume that the columns in V ρ , V δ , V σ are orthonormal. This makes the subsequent computation easier.
After facial reduction, many of the linear equality constraints in (3.13) end up being redundant. We may delete redundant constraints and keep a well-conditioned equality constraints. In the next section, we show that the removal of the redundant constraints can be performed by rotating the constraints.

Reduction on the Constraints
Recall that our primal problem after FR is given in (3.13). Moreover, by the work above we can assume that Γ V is surjective. In Theorem 3.10 and Theorem 3.11 below, we show that we can simplify the last two equality constraints in (3.13) by an appropriate rotation. where Proof. Let P be such that U = V δ P is unitary. Rotating the first equality in (3.14) using Applying the orthogonality of V δ , the left-hand side above becomes From facial reduction, it holds that range(V δ ) = range(G V ) and thus P † G V = 0. Therefore, the right hand-side becomes Proof. Using the unitary matrix U = V σ P in the proof of Theorem 3.10, we obtain the statement.
With Theorems 3.10 and 3.11, we reduce the number of linear constraints in (3.13) as below.
We emphasize that the images of Z V and G V in (3.13) are both in H k but the images of Z U V and G U V in (3.18) are in H kσ and H k δ , respectively, and k δ ≤ k σ ≤ k. The facial reduction performed on the variables δ, σ may yield k δ < k σ . Hence, the two trace terms in the objective function in (3.18) cannot be consolidated into one trace term in general.

Final Model for QKD key rate calculation
In this section we have a main result, i.e., the main model that we work on and the derivatives. We eliminate some of variables in the model (3.18) to obtain a simplified formulation. Define Z := Z U V • G U V and G := G U V . We substitute R σ = Z(R ρ ) and R δ = G(R ρ ) back in the objective function in (3.18). For simplification, and by abuse of notation, we set We obtain the final model for QKD key rate calculation problem: The final model is essentially in the same form as the original model (2.2); see also Proposition 3.3.
Note that the final model now has smaller number of variables compared to the original problem (2.2). Moreover, the objective function f , with the modified linear maps G, Z, is welldefined and analytic on ρ ∈ H nρ ++ , i.e., we have Some derivative background is given in Appendix A.3. We conclude this section by presenting the derivative formulae for gradient and Hessian. The simple formulae in Theorem 3.13 are a direct application of Lemma A.3. Throughout Section 4 we work with these derivatives.
The Hessian in the direction ∆ρ is then Given a real-valued convex function f , a subgradient φ of f at x is a vector satisfying f (y) ≥ f (x) + φ, y − x , ∀y. The set of gradients of f at x is called the subdifferential, denoted by ∂f (x). For a differentiable function f , the subdifferential is a singleton, ∂f (x) = {∇f (x)}; see e.g., [19].
14. Let f be as defined in (3.19) and let Proof. The result follows from the characterization of the subgradient as containing the convex hull of all limits of gradients, e.g., [19,Theorem 25.6].

Optimality Conditions, Bounding, GN Interior Point Method
Arguably, the most popular approach for solving SDP problems is by applying a path-following interior point approach to solving perturbed optimality conditions using Newton's method. However, in general, numerical difficulties and instability arise in two ways. First, the optimality conditions for SDP problems are overdetermined, and a symmetrization is needed to apply a standard Newton method. Second, block Gaussian elimination is applied to the linearized Newton system to efficiently solve for the Newton direction. This is done without regard to partial pivoting to avoid roundoff error buildup. To avoid these instabilities, we apply a projected Gauss-Newton, GN, interior point approach, Section 4.4, to solve the perturbed optimality conditions for our model (3.19).
In this section, we begin by presenting the optimality conditions for the model (3.19); then the GN search direction is introduced in Section 4.2; the projected versions are discussed in Section 4.3; we present the algorithm itself in Section 4.4. We finish this section with bounding strategies in Section 4.5. The important provable lower bound is presented in Section 4.5.3.

Optimality Conditions and Duality
We first obtain perturbed optimality conditions for (3.19) with positive barrier parameters. This is most often done by using a barrier function and adding terms such as µ ρ log det(ρ) to the Lagrangian. After differentiation we obtain µ ρ ρ −1 that we equate with the dual variable Z ρ . After multiplying through by ρ we obtain the perturbed complementarity equations, e.g., Z ρ ρ−µ ρ I = 0.
The following holds for problem (3.19).

The Lagrangian dual of (3.19) is
and strong duality holds for (3.19), i.e., d * = p * and d * is attained for some (y, 3. The primal-dual pair (ρ, (y, Z)), with ∂f (ρ) = ∅, is optimal if, and only if, Proof. The proof is given in Appendix A.5.

Perturbed Optimality Conditions
Many interior-point based algorithms try to solve the optimality conditions (3.19) by solving a sequence of perturbed problems while driving the perturbation parameter µ ↓ 0. The parameter µ gives a measure of the duality gap. In this section, we present the perturbed optimality conditions for QKD. Note that the optimality conditions in (4.1) assumed the existence of the subdifferential. This assumption is not required for the perturbed optimality conditions as we can use existing gradients.
Theorem 4.2. The barrier function for (3.19) with barrier parameter µ > 0 is With Z = µρ −1 scaled to Zρ − µI = 0, we obtain the perturbed optimality conditions for (3.19) at ρ, Z 0, y: In fact, for each µ > 0 there is a unique primal-dual solution ρ µ , y µ , Z µ satisfying (4.2). This defines the central path as µ ↓ 0. Moreover, Proof. The optimality condition (4.2) follows from the necessary and sufficient optimality conditions of the convex problem  [20,21] and Theorem 3.14 together give the last claim.
Theorem 4.2 above provides an interior point path following method, i.e., for each µ ↓ 0 we solve the pertubed optimality conditions The question is how to do this efficiently. The nonlinear system is overdetermined as Therefore we cannot apply Newton's method directly to the nonlinear system (4.3), since the linearization does not yield a square system.

Gauss-Newton Search Direction
To avoid the instability that is introduced by symmetrizations for applying Newton's method to the overdetermined optimality conditions, we use a Gauss-Newton approach. That is, to solve the optimality conditions (4.3), we consider the equivalent nonlinear least squares problem The Gauss-Newton direction, d GN , is the least squares solution of the linearization where F µ denotes the Jacobian of F µ .

Remark 4.3.
The Gauss-Newton method is a popular method for solving nonlinear least squares problems. It is arguably the method of choice for overdetermined problems such as the one we have here. It is called Newton's method for overdetermined systems, e.g., [22], see also the classical book [23]. In particular, it is very successful in cases where the residual at optimality is small. And, in our case the residual is zero at optimality. Other symmetrization schemes are discussed in e.g., [24].

Lemma 4.4.
Under a full rank assumption of F µ (ρ, y, Z), we get Moreover, if ∇g(ρ, y, Z) = 0, then d GN is a descent direction for g.
Proof. The gradient of g is, omitting the variables, and the Gauss-Newton direction is the least squares solution of the linearization F µ d GN = −F µ , i.e., under a full rank assumption, we get the solution from the normal equations as We see that the inner product with the gradient is indeed negative, hence a descent direction.
We now give an explicit representation of the linearized system for (4.3). We define the (right/left matrix multiplication) linear maps Then the linearization of (4.3) is We emphasize that the last term is in C nρ×nρ and the system is overdetermined.

Projected Gauss-Newton Directions
The GN direction in (4.4) solves a relatively large overdetermined linear least squares system and does not explicitly exploit the zero blocks. We now proceed to take advantage of the special structure of the linear system by using projection and block elimination.

First Projected Gauss-Newton Direction
Given the system (4.4), we can make a substitution for ∆Z using the first block equation This leaves the two blocks of equations where the superscript in F pc µ stands for the primal and complementary slackness constraints. The adjoint equation follows: In addition, we can evaluate the condition number of the system using F pc µ † F pc µ . Note that we include the adjoints as they are needed for matrix free methods that exploit sparsity.

Second Projected Gauss-Newton Direction
We can further reduce the size of the linear system by making further variable substitutions.
Recall that in Section 4.3.1 we solve the system with a variable in H nρ × R m V , i.e., n 2 ρ + m V number of unknowns. In this section, we make an additional substitution using the second block equation in (4.4) and reduce the number of the unknowns to n 2 ρ .

Theorem 4.5.
Letρ ∈ H nρ be a feasible point for Γ V (·) = γ V . Let N † : R n 2 ρ −m V → H nρ be an injective linear map in adjoint form so that, again by abuse of notation and redefining the primal residual, we have the nullspace representation: Then the second projected GN direction, d GN = ∆v ∆y , is found from the least squares solution of Proof. Using the new primal feasibility representation, the perturbed optimality conditions in (4.3) become: After linearizing the system (4.7) we use the following to find the GN search direction: From the first block equation we have From the second block equation, we have Substituting ∆Z and ∆ρ into Z∆ρ + ∆Zρ gives Rearranging the terms, the third block equation becomes The matrix representation of (4.6) is presented in Appendix B.2. It is easy to see that the adjoint satisfying F c After solving the system (4.6), we make back substitutions to recover the original variables. In other words, once we get (∆v, ∆y) from solving (4.6), we obtain (∆ρ, ∆y, ∆Z) using the original system: 1. If a step length one is taken (α = 1), then the new primal residual is exact, i.e.,

Suppose that the exact primal feasibility is achieved. Then the primal residual is 0 throughout the iterations regardless of the step length.
Proof. If a step length one is taken for updating ρ + ← ρ + ∆ρ = ρ + F p µ + N † (∆v), then the new primal residual In other words, as for Newton's method, a step length of one implies that the new residuals are zero for linear equations.
We can now change the line search to maintain ρ + = N † (v + α∆v) −ρ 0 and preserve exact primal feasibility. Assume that F p µ = 0.
where the last equality follows from the exactly feasibility assumption.

Projected Gauss-Newton Primal-Dual Interior Point Algorithm
We now present the pseudocode for the Gauss-Newton primal-dual interior point method in Algorithm 1.
It is a series of steps that find the least squares solution of the over-determined linear system (4.6), while decreasing the perturbation parameter µ ↓ 0, and maintaining the positive definiteness of ρ, Z. Algorithm 1 is summarized as follows: 1. At each iteration, we find the projected GN direction described in Section 4.3.2; See Algorithm 1 lines: 2, 3 and 4.
2. We then choose a step length that maintains strict feasibility. Whenever the step length is one, we attain primal feasibility for all future iterations; See Algorithm 1 line: 5.
3. We decrease the perturbation parameter µ appropriately, and proceed to the next iteration; See Algorithm 1 line: 7.
4. The algorithm stops when the relative duality gap reaches a prescribed tolerance or we reach our prescribed maximum number of iterations; See Appendix B.3 for implementation details on stopping criteria, preconditioning, etc.

Dual and Bounding
We first look at upper bounds 6 found from feasible solutions in Proposition 4.7. Then we use the dual program to provide provable lower bounds for the FR problem (2.2) thus providing lower bounds for the original problem with the accuracy of FR.

Upper Bounds
A trivial upper bound is obtained as soon as we have a primal feasible solutionρ by evaluating the objective function. Our algorithm is a primal-dual infeasible interior point approach. Therefore we typically have approximate linear feasibility Γ V (ρ) ≈ γ V ; though we do maintain positive definitenessρ 0 throughout the iterations. Therefore, once we are close to feasibility we can project onto the affine manifold and hopefully maintain positive definiteness, i.e., we apply iterative refinement by finding the projection where we denote Γ V −1 , generalized inverse. If ρ 0, then p * ≤ f (ρ).
In our numerical experiments below we see that we obtain valid upper bounds starting in the early iterations and, as we use a Newton type method, we maintain exact primal feasibility throughout the iterations resulting in a zero primal residual, and no further need for the projection. As discussed above, we take a step length of one as soon as possible. This means that exact primal feasibility holds for the remaining iterations and we keep improving the upper bound at each iteration.

Lower Bounds for FR Problem
Facial reduction for the affine constraint means that the corresponding feasible set of the original problem lies within the minimal face V ρ H nρ + V † ρ of the semidefinite cone. Since we maintain positive definiteness for ρ, Z during the iterations, we can obtain a lower bound using weak duality. Note that ρ 0 implies that the gradient ∇f (ρ) exists. FR (3.19)). Consider the problem (3.19). Letρ,ŷ be a primaldual iterate withρ 0. LetZ = ∇f (ρ) + Γ † V (ŷ). IfZ 0, then a lower bound for problem (3.19) is

Corollary 4.8 (lower bound for
Proof. Consider the dual problem We now have dual feasibilitȳ Since we have dual feasibility, weak duality in Theorem 4.1, Item 2 as stated in the dual problem above yields the result.

Remark 4.9.
We note that the lower bound in Corollary 4.8 is a simplification of the approach in [5], where after a near optimal solution is found, a dual problem of a linearized problem is solved using CVX in MATLAB. Then a strong duality theorem is assumed to hold and is applied along with a linearization of the objective function. Here we do not assume strong duality, though it holds for the facially reduced problem. And we apply weak duality to get a theoretically guaranteed lower bound. We emphasize that this holds within the margin of error of the FR. Recall that we started with the problem in (2.3). If we only apply the accurate FR based on spectral decompositions, then the lower bound from Corollary 4.8 is accurate and theoretically valid up to the accuracy of the spectral decompositions. 7 In fact, in our numerics, we can obtain tiny gaps of order 10 −13 when requested; and we have never encountered a case where the lower bound is greater than the upper bound. Thus the bound applies to our original problem as well. Greater care must be taken if we had to apply FR to the entire constraint Γ(ρ) = γ. The complexity of SDP feasibility is still not known. Therefore, the user should be aware of the difficulties if the full FR is done.
A corresponding result for a lower bound for the original problem is given in Corollary 4.10.

Lower Bounds for the Original Problem
We can also obtain a lower bound for the case where FR is performed with some error. Recall that we assume that the original problem (2.3) is feasible. We follow the same arguments as in Section 4.5.2 but apply it to the original problem. All that changes is that we have to add a small perturbation to the optimum V ρR V † ρ from the FR problem in order to ensure a positive definite ρ for differentiability. The exposing vector from FR process presents an intuitive choice for the perturbation.
Let the orthogonal spectral decomposition be Let 0 η ≈ W be the (approximate) exposing vector obtained as the nearest rank r positive semidefinite matrix to W, 7 Note that the condition number of the spectral decomposition of Hermitian matrices is 1; see e.g., [25].
LetR,ŷ be a primal-dual iterate for the FR problem, withR 0. Add a small perturbation matrix Φ 0 to guarantee that the approximate optimal solution

Without loss of generality, letŷ be a dual variable for (2.3), adding zeros to extend the given vector if needed. SetZ
(4.9) IfZ φ 0, then a lower bound for the original problem (2.3) is Proof. By abuse of notation, we let f, L be the objective function and Lagrangian for (2.3).
Consider the dual problem d * = max y,Z 0 min ρ∈H n L(ρ, y) − Z, ρ . We now have dual feasibilitȳ Since we have dual feasibility, weak duality in Theorem 4.1, Item 2 as stated in the dual problem above yields the result. Remark 4.11. We note that (4.9) withZ φ is dual feasibility (stationarity of the Lagrangian) for an optimalρ φ . Therefore, under continuity arguments, we expectZ φ 0 to hold as well.
In addition, for implementation we need to be able to evaluate ∇f (ρ φ ). Therefore, we need to form the positive definite preserving maps G, Z, but without performing FR on the feasible set. That we can do this accurately using a spectral decomposition follows from Lemma 3.8.

Numerical Testing
We compare our algorithm to other algorithms by considering six QKD protocols including four variants of the Bennett-Brassard 1984 (BB84) protocol, twin-field QKD and discrete-modulated continuous-variable QKD . In Appendix C we include the descriptions of protocol examples that we use to generate instances for the numerical tests. We note that while it is possible to simplify the optimization problem of some protocols using protocol-specific properties as discussed in Section 1.1, we have not performed those protocol-specific simplifications since we aim to demonstrate the generality of our method for a wide class of protocols and in particular the ability to handle problems with considerably large problem sizes.
We continue with the tests in Sections 5.1 to 5.3. This includes security analysis of some selected QKD protocols and comparative performances among different algorithms. In particular, in Section 5.1, we compare the results obtained by our algorithm with the analytical results for selected test examples where tight analytical results can be obtained. In Section 5.2, we present results where it is quite challenging for the previous method in [5] to produce tight lower bounds. In particular, we consider the discrete-modulated continuous-variable QKD protocol and compare results obtained in [26]. In Section 5.3, we compare performances among different algorithms in terms of accuracy and running time.

Comparison between the Algorithmic Lower Bound and the Theoretical Key Rate
We compare results from instances for which there exist tight analytical key rate expressions to demonstrate that our Gauss-Newton method can achieve high accuracy with respect to the analytical key rates. There are known analytical expressions for entanglement-based BB84, prepareand-measure BB84 as well as measurement-device-independent BB84 variants mentioned in Appendix C. We take the measurement-device-independent BB84 as an example since it involves the largest problem size among these three examples and therefore more numerically challenging. In Figure 5.1, we present instances with different choices of parameters for data generation. The instances are tested with a desktop computer that runs with the operating system Ubuntu 18.04.4 LTS, MATLAB version 2019a, Intel Xeon CPU E5-2630 v3 @ 2.40GHz × 32 and 125.8 Gigabyte memory. We set the tolerance = 10 −12 for the Gauss-Newton method.
In Figure 5.1, the numerical lower bounds from the Gauss-Newton method are close to the analytical results to at least 12 decimals and in many cases they agree up to 15 decimals. Key rate per signal p z = 0.5, Gauss-Newton p z = 0.5, theory p z = 0.7, Gauss-Newton p z = 0.7, theory p z = 0.9, Gauss-Newton p z = 0.9, theory p z = 0.99, Gauss-Newton p z = 0.99, theory As noted in Appendix C.5, analytical results are also known when the channel noise parameter ξ is set to zero since in this case, one may argue the optimal eavesdropping attack is the generalized beam splitting attack. This means the feasible set contains essentially a single ρ up to unitaries. Since our objective function is unitarily invariant, one can analytically evaluate the key rate expression. In Figure 5.2, we compare the results from the Gauss-Newton method with the analytical key rate expressions for different choices of distances L (See Appendix C. 5 for the description about instances of this protocol example). These instances were run in the same machine as in Figure 5.1. We set the tolerance = 10 −9 for the Gauss-Newton method.

Solving Numerically Challenging Instances
We show results where the Frank-Wolfe method without FR has difficulties in providing tight lower bounds in certain instances. In Figure 5.2, we plot results obtained previously in [26, Figure 2  In Figure 5.3, we show another example to demonstrate the advantages of our method. These instances were run in the same machine as in Figure 5.2. For this discrete-phase-randomized BB84 protocol with 5 discrete global phases (see Appendix C.6 for more descriptions), the previous Frank-Wolfe method was unable to find nontrivial lower bounds. This is because the previous method can only achieve an accuracy around 10 −3 for this problem due to the problem size. This is insufficient to produce nontrivial lower bounds for many instances since the key rates are on the order of 10 −3 or lower as shown in Figure 5.3. On the other hand, due to high accuracy of our method, we can obtain meaningful key rates. The advantage of high accuracy achieved by our method enables us to perform security analysis for protocols that involve previously numerically challenging problems. Like the discrete-phase-randomized BB84 protocol, these protocols involve more signal states, which lead to higher-dimensional problems.

Comparative Performance
In this section we examine the comparative performance among three algorithms; the Gauss-Newton method, the Frank-Wolfe method and cvxquad. The Gauss-Newton method refers to the algorithm developed throughout this paper. The Frank-Wolfe method refers to the algorithm developed in [5] and cvxquad is developed in [27]. We use Table 5.1 to present detailed reports on some selected instances. More numerics are reported throughout Tables C.1 to C.6 in Appendix C.7.
The instances are tested with MATLAB version 2021a using Dell PowerEdge R640 Two Intel Xeon Gold 6244 8-core 3.6 GHz (Cascade Lake) with 192 Gigabyte memory. For the instances corresponds to the DMCV protocol, we used the tolerance = 10 −9 and the tolerance = 10 −12 was used for the remaining instances. The maximum number of iteration was set to 80 for the Gauss-Newton method.  In Table 5.1 Problem Data refers to the data used to generate the instances. Gauss-Newton refers to the Gauss-Newton method. Frank-Wolfe refers to the Frank-Wolfe algorithm used in [5] and we use 'with FR (w/o FR, resp)' to indicate the model is solved with FR (without FR, resp). The header cvxquad with FR refers to the algorithm provided by [27] with FR reformulation. If a certain algorithm fails to give a reasonable answer within a reasonable amount of time, we give a ' ' flag in the gap followed by the time taken to obtain the error message. We use 'n/a' to indicate the instances for which cvxquad is not applicable due to the size differences in the images under G and Z due to FR.
The following provides details for the remaining headers in Table 5 We make some discussions on the formula (5.1). The best upper bound from Gauss-Newton algorithm is used for all instances for 'bestub' in (5.1). The Gauss-Newton algorithm computes the lower bounds as presented in Corollary 4.8. The Frank-Wolfe algorithm presented in [5] obtains the lower bound by a linearization technique near the optimal. As presented in [27], cvxquad uses the semidefinite approximations of the matrix logarithm. The lower bounds from cvxquad are often larger than the theoretical optimal values. This observation indicates that the lower bounds from cvxquad are not reliable. Therefore, we adopt the lower bound strategy used in [5] for cvxquad.
We now discuss the results in Table 5.1. Comparing the two columns gap and time among the different methods, we see that the Gauss-Newton method outperforms other algorithms in both the accuracy and the running time. For example, comparing Gauss-Newton and Frank-Wolfe with FR, the gaps and running times from Gauss-Newton are competitive. There are three instances that Gauss-Newton took longer time. We emphasize that the gap values with Gauss-Newton illustrate much higher accuracy.
We now illustrate that the reformulation strategy via FR contributes to superior algorithmic performances. For the columns Frank-Wolfe with FR and Frank-Wolfe w/o FR in Table 5.1, the FR reformulation contributes to not only giving tighter gaps but also reducing the running time significantly. We now consider the column corresponding to cvxquad with FR in Table 5.1. We see that the algorithm fails (marked with ' ') with some instances due to the memory shortage. Facial reduction indeed contributes to the reduction on the problem sizes. For example, we reduced the problem sizes in Table 5

Summary
In this paper we have presented a robust numerical method for finding provable tight lower bounds for the convex optimization problem that finds the key rate for the QKD problem. Our empirical evidence illustrates consistent significant improvements in solution time and accuracy over previous methods. In particular, we solve many problems close to machine accuracy and provide theoretical provable accurate lower bounds. This includes previously unsolved problems.
(See e.g., Table 5.1.) This paper used novel convex optimization techniques applied specifically to the QKD problem. This includes reformulations of the convex optimization problem that finds the key rate. The result is a regularized problem that avoids the need for previously used perturbations and resulting possible instabilities. Below, we give a summary of the contributions for the model reformulation and for the algorithm.

Summary of the Model Reformulation
We have reformulated, simplified, and stabilized the model for QKD key rate calculation through the sequence

Summary of Algorithm 1
Our algorithm Algorithm 1 is based on a standard primal-dual interior-point approach applied to the FR stabilized model. However, it differs in several ways.
1. We modify the primal feasibility to use a nullspace representation. Therefore, both primal and dual feasibility have a similar representation.
2. We use a projected Gauss-Newton search direction to account for the overdetermined least squares problem arising from the optimality conditions. This means we project the Gauss-Newton direction after substituting using the primal-dual linear feasibility equations.
3. We exploit the exact feasibility of linear constraints after a step length one for the Gauss-Newton method. Therefore, we attempt to take a primal and/or dual step length one as soon as possible. Exact feasibility results.
4. We use a modified form of the dual to obtain a lower bound that is used along with an upper bound from the objective function to stop the algorithm when the duality gap is provably small. Our lower bound is a provably lower bound for the original problem as it is using a feasible dual point to evaluate the dual value. This is needed, as an approximate primal optimal value is not exactly optimal.

Future Plans
There are still many improvements that can be made. Exact primal feasibility was quickly obtained and maintained throughout the iterations. However, accurate dual feasibility was difficult to maintain. This is most likely due to the rounding errors in the numerical computation of the Hessian H f (ρ) when ρ is near the boundary of the positive semidefinite cone. This approximation can be improved by including a quasi-Newton approach, as we have accurate gradient evaluations. We maintain high accuracy results even in the cases where the Jacobian was not full rank at the optimum. This appears to be due to the special data structures and more theoretical analysis at the optimum can be done. In this work, we considered a model with the linear constraints Γ(ρ) = γ restricted to be equalities. While this model covers many interesting QKD protocols, there are scenarios where inequality constraints are needed, e.g., when using the flag-state squasher [7] to reduce the dimension. Moreover, there can be additional matrix inequality constraints when the dimension reduction method [8] is applied. It is interesting and important to address possible numerical instabilities introduced by those inequality constraints as was done in this paper.

Code Availability
The code developed in this work is currently available at this link. 8 It will be integrated into the open-source QKD security software project, which can be accessed via the link. 9
We now consider the first two terms in X, This completes the proof (The induction steps are clear.).

A.3 Derivatives for Quantum Relative Entropy under Positive Definite Assumptions
We can reformulate the quantum relative entropy function defined in the key rate optimization (2. 2) as Here, the linear map Z is added to the second term in (A.9) above, and the equality follows from Proposition 3.3.
In this section, we review the gradient (Fréchet derivative), and the image of the Hessian, for the reformulated relative entropy function f defined in (A.9). We obtain the derivatives of f under the assumption that the matrix-log is acting on positive definite matrices. This assumption is needed for differentiability. Note that the difficulty arising from the singularity is handled by using perturbations in [5,10]. This emphasizes the need for the regularization below as otherwise f in (A.9) is never differentiable. We avoid using perturbations in this paper by applying FR in the sections below.
We now use the chain rule and derive the first and the second order derivatives of the composition of a linear and entropy function. Then the gradient of g at ρ is and the Hessian of g at ρ acting on ∆ρ is where log denotes the Fréchet derivative.
Proof. We first work on the first-order derivative. Note that we used the fact that the directional derivative of matrix-log at ρ in the direction ρ is: Similarly, the Hessian g at ρ acting on ∆ρ can be obtained as follows.
Under the assumption that G(ρ) 0, we can use Lemma 3.4 and show that Z(G(ρ)) 0. Using Lemma A.3 and (3.5), we obtain the first and the second order derivatives of the objective function f in (A.9).
Corollary A.4. Suppose that ρ ∈ H n + and G(ρ) 0. Then the gradient of f at ρ is The Hessian at ρ ∈ H n + acting on the direction ∆ρ ∈ H n is A.4 Proof of Theorem 3.6 Proof. We provide an alternative, self-contained proof. We note that the key is finding an exposing vector for S R , i.e., Z Γ 0 such that Z Γ , ρ = 0, ∀ρ ∈ S R . See e.g., [13]. The standard theorem of the alternative for strict feasibility, Lemma 2.4, yields the following equalities for Z Γ : It is equivalent to look at the smaller problem and find y so that Since the reduced density operator constraint requires that θ j = Tr ρ A Θ j , we get i.e., the exposing vector Z Θ = QR Θ Q † , for some R Θ . Conversely, we can set Z Θ = QQ † , R Θ = I, by the basis property of the Θ i , i.e., the basis property means we can always find an appropriate y so that j y j Θ j = QQ † . We get that rank Z Θ = n A − r. Therefore, , is an exposing vector as desired, i.e., we have Thus we get the conclusion that ρ = V RV † , as desired.
A.5 Proof of Theorem 4.1 Proof. The dual in Item 2 is obtained from from standard min-max argument; See [28,Chapter 5].
That strong duality holds comes from our regularization process, i.e., the existence of a Slater point; see [29,Chapter 8]. Item 3 is the standard optimality conditions for convex programming, where the dual feasibility 0 ∈ ∂f (ρ) + Γ † V (y) − Z holds from Theorem 3.14.

B Implementation Details
In this section we look at simplifications for evaluations of the objective function and its derivatives.

B.1 Matrix Representations of Derivatives
We now include a matrix representation for the derivatives. Let A, B, C be given compatible matrices. If X is Hermitian, then the linear system AXB = C can be written as Note that M T is the transpose of M , i.e., without conjugation. Let g : R → R be a continuously differentiable function. The first divided difference h [1] (λ, µ) of g at λ, µ ∈ R is defined as If D is a diagonal matrix with diagonal entries λ 1 , . . . , λ n , then we define h [1] (D) to be the symmetric n × n matrix given by h [1] (diag(D)).

Lemma B.1. Let
A : H s → H t be a linear map, ρ, ∆ρ ∈ H s , A(ρ) ∈ H t ++ , and f (ρ) = Tr(A(ρ) log A(ρ)). Let A(ρ) = U DU † be the spectral decomposition of f at ρ, and the Hessian of f at ρ in the direction ∆ρ are given by and where h [1] (D) is the first divided difference of the logarithm function g(x) = ln x, see (B.1) and the paragraph below.
In the actual computation, it is more convenient to express the gradient and Hessian in matrix form. Let A be the matrix representation of A. The Hessian in matrix form is

B.2 Matrix Representation of the Second Projected Gauss-Newton System
We present the matrix representation of (4.6). Let N i be a basis element of null(Γ V ). Then N † (w) has the representation i w i N i . Then the LHS of (4.6) becomes Applying Cvec 10 to the terms related to ∆v, we have the following matrix representation: Similarly, applying Cvec on the terms related to ∆y, we have the following matrix representation: The RHS of (4.6) becomes Cvec F c µ + ZF p µ + F d µ + ∇ 2 f (ρ)F p µ ρ . Thus, d GN is obtained be solving the system

B.3 Implementation Heuristics
We now discuss the implementation details. This involves preprocessing for a nullspace representation and preconditioning. The details follow. 10 The operator vec maps a real matrix to a column vector by stacking the columns on top of one another. Cvec is a generalization of vec for complex matrices. It maps a complex matrix M to the column vector vec( (M )) vec(Im (M )) .

B.3.1 Stopping Criteria
We terminate the algorithm when the optimality condition (4.3) is approximately satisfied. Denote the residual in Theorem 4.5 by In other words, for a pre-defined tolerance , we terminate the algorithm when the relstopgap< . If the algorithm computes lower and upper bounds of the optimal value throughout its execution, we may terminate the algorithm when the gap between lower and upper bounds is within . Finally, a common way to terminate an algorithm is to impose restrictions on the running time, e.g., setting an upper bound on the number of iterations or the physical running time.
We now get the nullspace representation: The permutation of rows and columns are done in order to obtain a simple, near triangular, well conditioned B so that B −1 E can be done simply and maintain sparsity if possible. The permutation of the rows does not affect the problem and we can ignore it. However the permutation of the columns cannot be ignored. We get the following By abuse of notation, the Gauss-Newton direction d GN ∈ R n 2 ρ can now be found from:

B.3.3 Preconditioning
The overdetermined linear system in (B.5) can be ill-conditioned. We use diagonal preconditioning, i.e., we let d i = F c µ (e i ) , for unit vectors e i and then column precondition using This diagonal preconditioning has been shown to be the optimal diagonal preconditioning for the so-called Ω-condition number, [30]. It performs exceptionally well in our tests below.

B.3.4 Step Lengths
The GN method is based on a linearization that suggests a step length of one. However, long step methods are known to be more efficient in practice for interior point methods for linear SDPs.
Typically step lengths are found using backtracking to ensure primal-dual positive definiteness of ρ, Z.
In our case we do not have a linear objective and we typically experience Maratos type situations, i.e., we get fast convergence for primal feasibility but slow and no convergence for dual feasibility. However, we do have the gradient and Hessian of the objective function and therefore can minimize the quadratic model for the objective function in the search direction ∆ρ Therefore, we begin the backtracking with this step. Moreover, we take a step length of one as soon as possible, and only after this do we allow step lengths larger than one. This means that exact primal feasibility holds for all further steps. This happens relatively early for our numerical tests.

C Descriptions and Further Numerics of the Protocols
We briefly describe six QKD protocols where we compare our algorithm to other algorithms. We also describe how the data (γ in (2.2)) is generated. In addition, we remark on the level of numerical difficulty for each example. We consider four variants of the Bennett-Brassard 1984 (BB84) protocol [3] including single-photon based variants: entanglement-based (Appendix C.1), prepare-and-measure (Appendix C.2), measurement-device-independent [31] (Appendix C.3) and a coherent-state based variant with discrete global phase randomization [32] (Appendix C.6). We also consider the single-photon version of the twin-field QKD [33] (Appendix C.4). Another interesting protocol in our numerical tests is the quadrature phase-shift keying scheme of discretemodulated continuous-variable QKD with heterodyne detection (Appendix C.5), see [26,Protocol 2]. These protocols correspond to numerical problems with the level of difficulty ranging from easy to difficult. In the descriptions below, we use the Dirac notation for quantum states which are vectors in the underlying Hilbert space. We skip the description about some common classical postprocessing steps in a QKD protocol like error correction and privacy amplification since they are unimportant for our discussions here. We note that the description of linear maps G and Z directly follow from the protocol description by following the simplification procedure explained in [26, Appendix A]. We omit those detailed descriptions here and note that the explicit expressions for some of protocols can also be found in [6, Appendix D].

C.1 Entanglement-Based BB84
We consider this protocol with a single-photon source and restrict our discussions to the qubit space. In the quantum communication phase, Alice and Bob each receive one half of a bipartite state. This is supposed to be a two-qubit maximally entangled state before Eve's tampering. And the measure in the Z basis is with probability p z , or in X basis with a probability 1 − p z . In the classical communication phase, they announce their basis choices for each round and perform sifting to keep those rounds where they both chose the same basis. In the end, they generate keys from both Z and X bases.
In the simulation, we assume both bases have the same error rate e z = e x = Q. In particular, Γ (in (2.2)) contains the Z-basis error rate, X-basis error rate constraints as well as one coarsegrained constraint for each mismatched basis choice scenario. This is to ensure that Alice and Bob get completely uncorrelated outcomes in that case. In other words, the data γ are determined by Q. This test example is supposed to be numerically easy, since it involves the smallest possible size of ρ for QKD, i.e., four. Moreover, there is no reduced density operator ρ A constraint for this example. In Table 5.1, instances of this test example are labeled as ebBB84(p z , Q) for different values of p z and Q.

C.2 Prepare-and-Measure BB84
Another protocol example in our numerical tests is the prepare-and-measure version of BB84 with a single-photon source. In the quantum communication phase, Alice chooses the Z basis with a probability p z or the X basis with a probability 1 − p z . When she chooses the Z basis, she sends either |0 or |1 at random, where |0 and |1 are eigenstates of the Pauli σ Z operator. When she chooses the X basis, she sends either |+ or |− at random, where |± = 1 √ 2 (|0 ± |1 ). After Alice sends the state of her choice to Bob, Bob chooses to measure in the Z basis with a probability p z or the X basis with a probability 1−p z . The rest of the protocol is exactly the same as the entanglement-based BB84 protocol described in Appendix C.1. We call {|0 , |1 , |+ , |− } as stated in BB84.
For the security analysis, we use the source-replacement scheme [16] to convert it to its equivalent entanglement-based scheme. Therefore, the main differences between this example and the one in Appendix C.1 are: (1) the dimension of Alice's system for this example is four due to the source-replacement scheme, while it is two for the entanglement-based BB84; (2) there is the reduced density operator constraint ρ A which is of size 4 and translated to 16 linear constraints. In this test example, the size of ρ is 8 and the size of G(ρ) is 32 before FR.
The data simulation is done in a similar way as that in the entanglement-based BB84 protocol, i.e., e x = e z = Q. In Table 5.1, instances of this test example are labeled as pmBB84(p z , Q).

C.3 Measurement-Device-Independent BB84
In the measurement-device-independent variant of BB84 with single-photon sources, Alice and Bob each prepare one of four BB84 states (with the probability of choosing the Z basis as p z ). Then they both send this to an untrusted third-party Charlie for measurements. He ideally then performs the Bell-state measurements and announces the outcomes. We consider a setup where Charlie only uses linear optics, and thus can only measure two out of four Bell states. In this protocol, Charlie announces either a successful Bell-state measurement or a failure. If a successful measurement, Charlie then also announces the Bell state. Therefore, there are three possible announcement outcomes. After the announcement, Alice and Bob perform the basis sifting, as well as discard rounds that are linked to unsuccessful events. They then generate keys from rounds where they both chose the Z basis and Charlie's announcement is one of the successful events.
We now consider the measurement-device-independent type of protocols. As described in [4], the optimization variable ρ involves three parties as ρ ABC . Here, registers AB together serve the role of A in the reduced density operator constraint set (2.5). The dimension of Alice's system is 4 and so is Bob's dimension. The register C is a classical register that stores the announcement outcome. Thus it is three-dimensional with three possible announcement outcomes. In the data simulation, we assume that each qubit sent to Charlie goes through a depolarizing channel, with the depolarizing probability p.
In the numerical tests, we label instances of this protocol example as mdiBB84(p z , p). The size of ρ is 48 and that of G(ρ) is 96 before FR.

C.4 Twin-Field QKD
As above, this protocol also uses the measurement-device-independent setup. The exact protocol description can be found in [34,Protocol 1]. In this protocol, Alice and Bob each prepare a state |φ q Aa = √ q|0 A |0 a + √ 1 − q|1 A |1 a (|φ q Bb ) with 0 ≤ q ≤ 1, where the register A is a qubit system and the register a is an optical mode with the vacuum state |0 a and the singlephoton state |1 a . After they send states to the intermediate station, Charlie at the intermediate station is supposed to perform the single-photon interference of these two signal pulses and then announces the measurement outcome for each of two detectors: click or no-click. Then Alice and Bob each perform the X-basis measurement on their local qubits with a probability p x or the Z-basis measurement with a probability 1 − p x . They generate keys from rounds where they both choose the X basis and where Charlie announces a successful measurement outcome, that is, having exactly one of two detectors click.
In the simulation, we consider a lossy channel, with the transmittance 10 −0.02L , for the distance L in kilometers between Alice and Bob. We consider the symmetric scenario where Charlie is at an equal distance away from Alice and Bob. We also consider detector imperfections: each detector at Charlie's side has detector efficiency η d = 14.5% and dark count probability p d = 10 −8 . In instances of this protocol, data is generated as a function of: q that appears in the states |φ q Aa and |φ q Bb ; the total distance L in kilometers between Alice and Bob; and the probability of choosing X basis p x . The instances of this test example are labeled as TFQKD(q, L, p x ).

C.5 Discrete-Modulated Continuous-Variable QKD
The exact protocol description can be found in [26,Protocol 2]. We use the same simulation method described in [26,Equation (30)] to generate the data γ. In this protocol, Alice sends Bob one of four coherent states |αe iθ j , where θ j = jπ 2 for j = 0, 1, 2, 3. And Bob performs the heterodyne measurement, i.e., measuring both X-and P -quadratures after splitting the signal into two halves by a 50/50 beamsplitter. The first and second moments of X-and Pquadratures are used to constrain ρ. The data simulation uses a phase-invariant Gaussian channel with transmittance η t and excess noise ξ to generate those values. We use the same photonnumber cutoff assumption used there to truncate the infinite-dimensional Hilbert space. For the calculation, it is typically sufficient to choose N c ≥ 10 to minimize the effects of errors due to the truncation. For simplicity, we assume the detector at Bob's side is an ideal detector. The channel transmittance, η t , is related to the transmission distance L between Alice and Bob, by η t = 10 −0.02L .
Let N c be an integer that represents the cutoff photon number. Before FR, the sizes of ρ and G(ρ) are 4(N c + 1) and 16(N c + 1), respectively. In Table 5.1, we label instances of this example as DMCV(N c , L, ξ, α). When the noise ξ = 0, this problem can be solved analytically via physical arguments. The detailed instructions for analytical calculation can be found in [26,Appendix C]. We use this special case to demonstrate that our interior-point method can reproduce the analytical results to high precision.

C.6 Discrete-Phase-Randomized BB84
We consider the phase-encoding BB84 protocol with c (a parameter) discrete global phases evenly spaced between [0, 2π] [32]. A detailed protocol description can also be found in [6, Sec. IV D]. In particular, each of four BB84 states is realized by a two-mode coherent state |αe iθ r |αe i(θ+φ A ) s , where the first mode is the phase reference mode and the second mode encodes the private information. In particular, the θ is a global phase that involves discrete phase randomization, i.e., θ ∈ { 2π c : = 0, . . . , c − 1}. The relative phase for encoding is φ A ∈ { jπ 2 : j = 0, 1, 2, 3}, where {0, π} correspond to the Z basis and { π 2 , 3π 2 } correspond to the X basis. Data simulation is done in exactly the same way as in [6, Section IV D], where we consider detector imperfections and a lossy channel with a misalignment error due to phase drift.
We remark that the instances of this test example become more challenging as one increases the number of discrete phases c, since the size of ρ is 12c and the size of G(ρ) is 48c before FR. In all instances of this protocol, we choose p z = 0.5 and the data are simulated with detector efficiency η d = 0.045, dark count probability p d = 8.5 × 10 −7 and relative phase drift of 11 • . The final key rate values are presented by taking the error correction efficiency as 1.16. In the numerical tests, we label instances of this protocol as dprBB84(c, α, L). We remark on the behavior of the Frank-Wolfe method when used without FR. Table 5.1 shows that two instances of the TFQKD protocol have vastly different running times. Similar situations without FR also occur in the tables presented here. The Frank-Wolfe method that is used and described in [5] adopts a two-step procedure. The running time of this method is mainly dependent on the number of iterations, since each iteration solves a linear SDP problem using CVX. There are two stopping conditions: the first depends on the value of the gradient information; the second is the maximum number of iterations, currently set at 300. For the instances with short running time, termination typically occurred due to the gradient condition; the other instances stopped after reaching the maximum number of iterations. A likely reason for early termination is the perturbation approach used to preserve positive definiteness of the matrices involved in the objective function. We emphasize that early termination still leads to reliable (but pessimistically lower) key rates. More explanations can be found in [5]. As discussed in this paper, a general motivation for FR is to avoid numerical instability associated with such perturbation approaches, i.e., without FR, we typically have the erratic behavior due to ill-posedness of the problems.