Quantum algorithms for Second-Order Cone Programming and Support Vector Machines

Second order cone programs (SOCPs) are a class of structured convex optimization problems that generalize linear programs. We present a quantum algorithm for SOCPs based on a quantum variant of the interior point method. Our algorithm outputs a classical solution to the SOCP with objective value $\epsilon$ close to the optimal in time $\widetilde{O} \left( n\sqrt{r} \frac{\zeta \kappa}{\delta^2} \log \left(1/\epsilon\right) \right)$ where $r$ is the rank and $n$ the dimension of the SOCP, $\delta$ bounds the distance from strict feasibility for the intermediate solutions, $\zeta$ is a parameter bounded by $\sqrt{n}$, and $\kappa$ is an upper bound on the condition number of matrices arising in the classical interior point method for SOCPs. We present applications to the support vector machine (SVM) problem in machine learning that reduces to SOCPs. We provide experimental evidence that the quantum algorithm achieves an asymptotic speedup over classical SVM algorithms with a running time $\widetilde{O}(n^{2.557})$ for random SVM instances. The best known classical algorithms for such instances have complexity $\widetilde{O} \left( n^{\omega+0.5}\log(1/\epsilon) \right)$, where $\omega$ is the matrix multiplication exponent that has a theoretical value of around $2.373$, but is closer to $3$ in practice.


Introduction
Convex optimization is one of the central areas of study in computer science and mathematical optimization. The reason for the great importance of convex optimization is twofold. Firstly, starting with the seminal works of Khachiyan [25] and Karmarkar [18], efficient algorithms have been developed for a large family of convex optimization problems over the last few decades. Secondly, convex optimization has many real world applications and many optimization problems that arise in practice can be reduced to convex optimization [8].
There are three main classes of structured convex optimization problems: linear programs (LP), semidefinite programs (SDP), and second-order conic programs (SOCP). The fastest (classical) algorithms for these problems belong to the family of interior-point methods (IPM). Interior point methods are iterative algorithms where the main computation in each step is the solution of a system of linear equations whose size depends on the dimension of the optimization problem. The size of structured optimization problems that can be solved in practice is therefore limited by the efficiency of linear system solvers -on a single computer, most open-source and commercial solvers can handle dense problems with up to tens of thousands of constraints and variables, or sparse problems with the same number of nonzero entries [30,31].
In recent years, there has been a tremendous interest in quantum linear algebra algorithms following the breakthrough algorithm of Harrow, Hassidim and Lloyd [16]. Quantum computers are known to offer significant, even exponential speedups [16] for problems related to linear algebra and machine learning, including applications to principal components analysis [28], clustering [28,19], classification [27,20], least squares regression [21,9] and recommendation systems [22].
This raises the natural question of whether quantum linear algebra can be used to speed up interior point algorithms for convex optimization. The recent work [23] proposed a quantum interior point method for LPs and SDPs and developed a framework in which the classical linear system solvers in interior point methods can be replaced by quantum linear algebra algorithms. In this work, we extend the results of [23] and develop a quantum interior point method for second order cone programs (SOCPs).
Second order cone programs are a family of structured optimization problems that have complexity intermediate between LPs and SDPs. They offer an algorithmic advantage over SDPs as the linear systems that arise in the Interior Point Method for SOCPs are of smaller size than those for general SDPs. In fact, the classical complexity for SOCP algorithms is close to that for LP algorithms. Our results indicate that this remains true in the quantum setting, that is the quantum linear systems arising in the interior point method for SOCPs are easier for quantum algorithms than the linear systems for general SDPs considered in [23].
SOCPs are also interesting from a theoretical perspective, as the interior point method for LPs, SOCPs and SDPs can be analyzed in a unified manner using the machinery of Euclidean Jordan algebras [14]. An important contribution of our work is to present a similar unified analysis for quantum interior point methods, which is equivalent to analyzing the classical interior point method where the linear systems are only solved approximately with an 2 norm error. Our analysis is not purely Jordan-algebraic like that of [14], however it can still be adapted to analyze quantum interior point methods for LPs and SDPs.
The main advantage of SOCPs from the applications perspective is their high expressive power. Many problems that are traditionally formulated as SDPs can in fact be reduced to SOCPs, an extensive list of such problems can be found in [2]. An important special case of SOCPs is the convex quadratic programming (QP) problem, which in turn includes the support vector machine (SVM) problem in machine learning [11] and the portfolio optimization problem in computational finance [29,8] as special cases. The SVM and portfolio optimization problems are important in practice and have recently been proposed as applications for quantum computers [24,34,35].
However, these applications consider the special cases of the 2 -(or least-squares-)SVM [38] and the unconstrained portfolio optimization problem, that reduce to a single system of linear equations. The 1 -regularized SVM algorithm is widely used in machine learning as it finds a robust classifier that maximizes the error margin. Similarly, the constrained portfolio optimization is widely applicable in computational finance as it is able to find optimal portfolios subject to complex budget constraints. The ( 1 -)SVM and constrained portfolio optimization problems reduce to SOCPs, our algorithm can therefore be regarded as the first specialized quantum algorithm for these problems.
We provide experimental evidence to demonstrate that our quantum SVM algorithm indeed achieves an asymptotic speedup. For suitably constructed 'random' SVM instances, our experiments indicate that the quantum SVM algorithm has running time O(n 2.557 ) which is an asymptotic speedup over classical SVM algorithms, which have complexity Ω(n 3 ) for such instances.
The significance of these experimental results is two-fold. First, they demonstrate that quantum interior point methods can achieve significant speedups for optimization problems arising in practice, this was not clear apriori from the results in [19] where the running time depends on the condition number of intermediate matrices arising in the interior point method for which it is hard to prove worst case upper bounds. It suggests that one may hope to obtain significant polynomial speedups for other real-world optimization problems using quantum opti-mization methods. Second, support vector machines are one of the most well studied classifiers in machine learning, it is therefore significant to have an asymptotically faster quantum SVM algorithm. Moreover, our SVM algorithm has the same inputs and outputs as that for classical SVMs and is therefore directly comparable to them. This can lead to further developments at the interface of quantum computing and machine learning, for example that of quantum kernel methods.

Our Results
In order to state our results more precisely, we first introduce second order cone programs (SOCPs). An SOCP is an optimization problem over the product of Lorentz cones L k which are defined as, The SOCP in the standard form is the following optimization problem: The inputs for the problem are the r constraint matrices The running time for SOCP algorithms is stated in terms of the parameters r and n := i∈[r] n i . The rank r determines the number of SOCP constraints while n is the dimension of the solution vector. The best known classical algorithm for SOCP is based on the interior-point method [6], and has complexity O √ rn ω log(n/ ) , where ω the matrix multiplication exponent (approximately equal to 2.373 in theory [26], and equal to 3 in practical implementations), and solutions obtained are within of the optimal value. It is worth noting that very recently an optimal O n ω log(n/ ) algorithm has been found for linear programs [10], which are a special case of SOCPs.
Our main result is a quantum interior point method for SOCPs with the following running time and approximation guarantees.
Then, there is a quantum Algorithm that outputs a solution x i ∈ L n i that achieves an objective value that is within of the optimal value in time, Here ζ ≤ √ n is a factor that appears in quantum linear system solvers and κ is an upper bound on the condition number of the matrices arising in the interior point method for SOCPs. Similarly, the parameter δ is a lower bound on the distance of the intermediate iterates to the boundary of the respective cones.
Let us make some observations about the running time of the quantum algorithm. Unlike the classical algorithm that has a logarithmic dependence on the precision, the quantum algorithm scales as O(1/δ 2 ). This is due to the use of quantum tomography to recover classical data, the quantum algorithm is better suited to recover low accuracy solutions. Since the factor ζ is bounded by √ n (and is much less in practice), the quantum algorithm with a worst case running time of O n 1.5 √ r · κ δ 2 can achieve a significant speedup over the best known classical algorithms if the condition number κ is bounded, and the required precision is not too high. Furthermore, theoretical work on optimization [12] suggests that κ = O(1/δ), thus the quantum algorithm can be viewed as having worst case running time O n 1.5 √ r δ 3 . We simulated our algorithm on random SVM instances in order to assess its performance and running time. It turns out that for random SVMs trained using the interior point method, the best classification accuracy and generalization errors are achieved for a modest value of ≈ 0.1 over a large range of problem sizes. For this optimal value of = 0.1 for these random instances, it turns out that the quantity κζ/δ 2 that determines the running time grows only slightly faster than O(n). The estimated running time for the quantum SVM algorithm is O(n 2.557 ), where the exponent was obtained by fitting a curve to running times for a large number of random SVMs with varying size. Further, a standard statistical analysis shows that the 95% confidence interval for the exponent is [2.503, 2.610], supporting the claim that our quantum SVM algorithm achieves an asymptotic speedup over classical SVM algorithms.

Techniques
Our quantum interior-point method for SOCP is based both on the techniques from [23] as well as classical SOCP IPMs from [32,2]. In general, structured optimization problems (LPs, SOCPs and SDPs) can be viewed as having the following form: where K is some "efficiently representable" cone, ·, · the corresponding inner product, and A is a linear operator on K. In order to be "efficiently representable", the cone K is considered to be a direct product K = K 1 × · · · × K r of r basic cones. A basic cone is often taken to be of one of the following three types: , the cone of nonnegative real numbers, If all cones K i , i ∈ [r] are of the same type, the optimization problem is a linear program (LP), second-order cone program (SOCP), and semidefinite program (SDP), respectively. In particular, we get the following three problems corresponding to the case of LPs, SOCPs and SDPs respectively.
Here, the notation x = [x 1 ; . . . ; x r ] means that the vector x is obtained by (vertically) concatenating the vectors x i -we say that x is a block-vector with blocks x i , i ∈ [r]. It is well-known [8] that every LP can be expressed as an SOCP, and every SOCP can be expressed as an SDP. Thus, SOCPs are a sort of "middle ground" between LPs and SDPs. The fastest known algorithms for these problems are based on a family of algorithms known as interior point methods, which are used for both commercial [3] and open-source solvers [7,13,39]). Interior point algorithms can be viewed as an application of Newton's method, the main computation in each iteration is spent on the solution linear systems that are obtained by linearizing certain non-linear optimality constraints. This suggests that quantum linear algebra algorithms [16] can be used to obtain speedups for optimization using interior point methods.
However, solving a quantum linear system differs from a classical solution as it is not possible to write down all n coordinates of the solution vector x ∈ R n for linear system Ax = b in time o(n). Instead, these algorithms encode vectors as quantum states, so that where we write |i for the joint log 2 (n) -qubit state corresponding to log 2 (n) -bit binary expansion of i. Then, the solution they output is a quantum state |φ close to A −1 b . In order to obtain a classical solution, we perform tomography on |φ , and obtain a classical vector x that is close to |φ , so that finally we have a guarantee x − x ≤ x for some > 0. A quantum interior point method for LPs and SDPs based on quantum linear system solvers and an efficient tomography algorithm was given in [23].
The main technical contribution of this paper is the analysis of an approximate IPM algorithm for SOCP, which assumes that all linear systems are solved up to a relative error . Although a similar analysis has been done by [23] for LPs and SDPs, we feel that our analysis of the SOCP case is especially interesting, since it uses Euclidean Jordan algebras to underscore the similarities between SOCPs on one hand, and LPs and SDPs on the other. Apart from [23], our analysis is inspired by the analysis of a classical SOCP IPM from [32], and uses [2] as a dictionary for translating concepts between the algebra of Hermitian matrices and the Jordan algebra of the second-order cone.
In the classical case, it has been shown in [14] that Euclidean Jordan algebras capture all important features of conic programs, and thus general IPM can be expressed in purely Jordanalgebraic terms. Here, we do not strive for such generality, and focus on the case when the cone under consideration is a product of second-order cones. Still, by comparing this analysis with classical [6] and quantum [23] analyses of LP and SDP interior point methods, we see that only minimal changes are required cover the LP and SDP cases. Indeed, we point out the appropriate analogies throughout the paper.
The rest of the paper is organized as follows. First, in Section 2, we introduce the necessary background on Jordan algebras and quantum linear algebra algorithms. Then, in Section 3 we present second-order conic programming, and a classical IPM for solving it. The main technical results are contained in Section 4, where we present our quantum IPM for SOCP, and analyze its runtime and convergence guarantees. In Section 5, we use our algorithm to train a Support Vector Machine (SVM) binary classifier. We present some numerical results that demonstrate the performance of our algorithm when applied to real-world data. Finally, in Section 5.2, we present some concluding remarks as well as possible directions for future research.

Preliminaries
The goal of this section is to introduce the technical framework necessary to follow the results in Sections 3 and 4. In particular, the definitions of the Jordan product • and the matrix representation Arw(x) (and their block extensions) are necessary for understanding the algorithm itself, whereas the rest is used only for the analysis in Section 4. On the quantum linear algebra side, we give precise meaning to the statement "linear systems can be solved quantumly in polylogarithmic time", and present the performance and correctness guarantees of the relevant algorithms.

Euclidean Jordan algebras
Jordan algebras were originally developed to formalize the notion of an algebra of observables in quantum mechanics [17,1], but they have since been applied to many other areas, most interestingly to provide a unified theory of IPMs for representable symmetric cones [36]. In this report, we will not strive for such generality, and will instead focus on SOCPs and the Lorentz cone. Still, most of the results in this section have obvious counterparts in the algebra of Hermitian matrices, to the point that the corresponding claim can be obtained using wordby-word translation.
The main object under consideration is the n-dimensional Lorentz cone, defined for n ≥ 0 as We think of the elements of L n as being "positive" (just like positive semidefinite matrices), since for n = 0, L 0 = R + is exactly the set of nonnegative real numbers. The Jordan product of two vectors (x 0 , x) ∈ R n+1 and (y 0 , y) ∈ R n+1 is defined as , and has the identity element e = 1 0 n , where 0 n is the column vector of n zeros. This product is the analogue to the matrix product X · Y , however, it is commutative and non-associative. In the special case of x • · · · • x, the order of operations does not matter, so we can unambiguously define , and we even have Arw(e) = I, as well as an alternative definition of x • y: x • y = Arw(x)y = Arw(x) Arw(y)e.
What makes the structure above particularly interesting is the fact that for any vector, we can define its spectral decomposition in a way the agrees with our intuition from the algebra of Hermitian matrices: We do this by noting that for all x, we have so we can define the two eigenvalues and eigenvectors of x as Thus, using the notation from (5), we can rewrite (4) as x = λ 1 c 1 +λ 2 c 2 . The set of eigenvectors {c 1 , c 2 } is called the Jordan frame of x, and satisfies several properties: Proposition 1 (Properties of Jordan frames). Let x ∈ R n+1 and let {c 1 , c 2 } be its Jordan frame. Then, the following holds: On the other hand, just like a given matrix is positive (semi)definite if and only if all of its eigenvalues are positive (nonnegative), a similar result holds for L n and int L n (the Lorentz cone and its interior): Then, the following holds: 1. x ∈ L n if and only if λ 1 ≥ 0 and λ 2 ≥ 0. 2.
x ∈ int L n if and only if λ 1 > 0 and λ 2 > 0. Now, using this decomposition, we can define arbitrary real powers x p for p ∈ R as x p := λ p 1 c 1 + λ p 2 c 2 , and in particular the "inverse" and the "square root" Moreover, we can also define some operator norms, namely the Frobenius and the spectral one: It is worth noting that both these norms and the powers x p can be computed in exactly the same way for matrices, given their eigenvalue decompositions. Finally, the analysis in Section 4 requires an analogue to the operation Y → XY X. It turns out that for this we need another matrix representation (quadratic representation) Q x , defined as Now, the matrix-vector product Q x y will behave as the quantity XY X. To simplify the notation, we also define the matrix T x := Q x 1/2 . The definitions that we introduced so far are suitable for dealing with a single constraint x ∈ L n . For dealing with multiple constraints x 1 ∈ L n 1 , . . . , x r ∈ L nr , we need to deal with block-vectors x = (x 1 ; x 2 ; . . . ; x r ) and y = (y 1 ; y 2 ; . . . ; y r ). We call the number of blocks r the rank of the vector (thus, up to now, we were only considering rank-1 vectors). Now, we extend all our definitions to rank-r vectors.
2. The matrix representations Arw(x) and Q x are the block-diagonal matrices containing the representations of the blocks: 3.
x has 2r eigenvalues (with multiplicities) -the union of the eigenvalues of the blocks x i . The eigenvectors of x corresponding to block i contain the eigenvectors of x i as block i, and are zero everywhere else. 4. The identity element is e = (e 1 ; . . . ; e r ), where e i 's are the identity elements for the corresponding blocks.
Thus, all things defined using eigenvalues can also be defined for rank-r vectors: 1. The norms are extended as F and x 2 := max i x i 2 , and 2. Powers are computed blockwise as x p := (x p 1 ; . . . ; x p r ) whenever the corresponding blocks are defined.

Quantum linear algebra
As it was touched upon in the introduction, in order to "solve linear systems in polylogarithmic time", we need to change our computational model, and restate the problem in a way that makes this possible. Namely, given a matrix A and a vector b, we construct the quantum state A −1 b (using the notation from (3)). Once we have the state A −1 b , we can use it in further computations, sample from the corresponding discrete distribution (this forms the basis of the quantum recommendation system algorithm [22]), or perform tomography on it to recover the underlying classical vector. Here, we clarify how we prepare classical vectors and matrices for use within a quantum algorithm, as well as how we get classical data back, once the quantum computation is done.
In this section, we assume that we want to obtain a (classical) vector x ∈ R n that satisfies We also assume that A is symmetric, since otherwise we can work with its symmetrized version sym(A) = 0 A A T 0 . The quantum linear system solvers from [9,15] require access to an efficient block encoding of A, which is defined as follows: Furthermore, U needs to be implemented efficiently, i.e. using an -qubit quantum circuit of depth (poly)logarithmic in n. Such a circuit would allow us to efficiently create states |A i corresponding to rows or columns of A. Moreover, we need to be able to construct such a data structure efficiently from the classical description of A. It turns out that we are able to fulfill both of these requirements using a data structure built on top of QRAM [22].
Theorem 2 (Block encodings using QRAM [22,21]). There exist QRAM data structures for storing vectors v i ∈ R n , i ∈ [m] and matrices A ∈ R n×n such that with access to these data structures one can do the following: In other words, the unitary |i |0 → |i |v i can be implemented efficiently.

A (ζ(A), 2 log n) unitary block encoding for
Moreover, this block encoding can be constructed in a single pass over the matrix A, and it can be updated in O(log 2 n) time per entry.
Here |i is the notation for the log(m) qubit state corresponding to the binary expansion of i. The QRAM can be thought of as the quantum analogue to RAM, i.e. an array [b (1) , . . . , b (m) ] of w-bit bitstrings, whose elements we can access in poly-logarithmic time given their address (position in the array). More precisely, QRAM is just an efficient implementation of the unitary transformation |i |0 ⊗w → |i b Nevertheless, from now on, we will also refer to storing vectors and matrices in QRAM, meaning that we use the data structure from Theorem 2. Note that this is the same quantum oracle model that has been used to solve SDPs in [23] and [4]. Once we have these block encodings, we may use them to perform linear algebra:

For
Finally, in order to recover classical information from the outputs of a linear system solver, we require an efficient procedure for quantum state tomography. The tomography procedure is linear in the dimension of the quantum state.
Theorem 4 (Efficient vector state tomography, [23]). There exists an algorithm that given a procedure for constructing |x (i.e. a unitary mapping U : |0 → |x in time T U ) and precision δ > 0 produces an estimate Of course, repeating this algorithm O(1) times allows us to increase the success probability to at least 1 − 1/ poly(n). Putting Theorems 2, 3 and 4 together, assuming that A and b are already in QRAM, we obtain that the complexity for getting a classical solution of the linear system Ax = b with error δ is O n · κζ δ 2 . For well-conditioned matrices, this presents a significant improvement over O(n ω ) (or, in practice, O(n 3 )) needed for solving linear systems classically, especially when n is large and the desired precision is not too high.

Second-order conic programming
We introduce the SOCP in this section, and describe the basic classical IPM that we use as in order to develop its quantum counterpart. Formally, a SOCP in its standard form is the following optimization problem: Concatenating the vectors (x 1 ; . . . ; x r ) =: x, (c 1 ; . . . ; c r ) =: c, and concatenating the matrices [A (1) · · · A (r) ] =: A horizontally, we can write the SOCP more compactly as an optimization problem over the product of the corresponding Lorentz cones, L := L n 1 × · · · × L nr : The problem (8) is the SOCP primal, and (9) is its corresponding dual. A solution (x, y, s) satisfying the constraints of both (8) and (9) is feasible, and if in addition it satisfies x ∈ int L and s ∈ int L, it is strictly feasible. If at least one constraint of (8) or (9), is violated, the solution is infeasible. Our goal is to work only with strictly feasible solutions, as they are less likely to become infeasible if corrupted by noise. As it is often the case when describing IPMs, we assume that there exists a strictly feasible solution, since there are well-known methods for reducing a feasible but not strictly feasible problem to one with a strictly feasible solution [8].
Just like the gradient of a function vanishes at its (unconstrained) local optimum, similar optimality conditions exist for SOCP as well -at the optimal solution (x * , y * , s * ), the Karush-Kuhn-Tucker (KKT) conditions are satisfied: x * T s * = 0 (complementary slackness).
Since SOCPs are convex, these conditions are both necessary and sufficient for optimality. Convex duality theory tells us that weak duality holds, i.e. that at a feasible solution (x, y, s) the dual objective is bounded by the primal one: c T x ≥ b T y. The difference between these two values is called the duality gap, it is denoted by µ, and usually normalized by a factor of 1 r : Therefore, at the optimum, by (10), the duality gap is 0, and thus strong duality holds. Furthermore, note that together with x * , s * ∈ L, the complementary slackness is equivalent to x * • s * = 0. Unfortunately, there is no fast and easy method for solving (10) directly (indeed, the complementary slackness condition prevents it from being strictly feasible). What makes an IPM work is that it solves a series of strictly feasible problems of increasing difficulty, so that their solutions converge to a solution of (10). These problems are designed so that at the beginning, "the only thing that matters" is that they are strictly feasible no matter the objective value, and as the algorithm progresses, the importance of the objective increases, while the importance of strict feasibility shrinks. This is done by defining a sequence of barrier problems for all ν > 0. Their name comes from the barrier function L(z), whose purpose is to approach ∞ as z approaches the boundary of the cone L. For a single Lorentz cone L n , it is defined as and for the product cone L = L n 1 × · · · × L nr , it is the sum of the blockwise barriers The primal-dual pair (11) also has KKT optimality conditions similar to (10): x • s = νe.
We refer to the set of solutions of (12) for ν ≥ 0 as the central path. The goal of all IPMs is to trace the central path in the direction ν → 0. Although this can not be done exactly, it turns out that good theoretical guarantees can be proved as long as the iterates stay sufficiently close to the central path. We define the distance from the central path as d(x, s, ν) = T x s − νe F , so the corresponding η-neighborhood is given by This set can be thought of as the set of all points that are η-close to the central path at parameter ν. Intuitively, the elements of N η (ν) have duality gap close to ν, so decreasing the central path parameter by a certain factor is equivalent to decreasing the duality gap by the same factor.
In the classical short-step IPM [32], we move along the central path by iteratively decreasing the duality gap by a factor of σ := 1− χ √ r (for some constant χ > 0) at each step. More precisely, in each step we apply to our current iterate (x, y, s) one round of Newton's method for solving the system (12) with ν = σµ. By linearizing the last constraint of (12), we obtain the following linear system (Newton system) for computing the update (∆x, ∆y, ∆s): Since the system above is just a linearization of the true central path condition, we do not expect to follow the central path exactly. Nevertheless, it can be shown that the solutions generated by the interior point method remain in an η-neighborhood of the central path, for some constant η > 0. The classical interior point method for SOCP is summarized in Algorithm 1.

Algorithm 1 The interior point method for SOCPs
Require: Matrix A and vectors b, c in memory, precision > 0.

Repeat the following steps for
(a) Solve the Newton system (13) to get ∆x, ∆y, ∆s.

Output (x, y, s).
It can be shown that this algorithm halves the duality gap every O( √ r) iterations, so indeed, after O( √ r log(µ 0 / )) it will converge to a (feasible) solution with duality gap at most (given that the initial duality gap was µ 0 ).

Algorithm 2 The quantum interior point method for SOCPs
Require: Matrix A and vectors b, c in QRAM, parameters T, δ > 0.
2. Repeat the following steps for T iterations.
(a) Compute the vector σµe − x • s classically and store it in QRAM.

Update solution
(d) Update x ← x + ∆x, s ← s + ∆s and store in QRAM. Update µ ← 1 r x T s.

A quantum interior-point method
Having introduced the classical IPM for SOCP, we can finally introduce our quantum IPM (Algorithm 2). In a way, it is similar to the SDP solver from [23] since we apply a quantum linear system solver to get the solutions of the Newton system (13) as quantum states and then perform tomography to recover the solutions. However, it differs from the SDP solver as the construction of block encodings for the Newton matrix for the SOCP case is much simpler than that for the general SDP case. A procedure for constructing these block encodings is presented in Appendix A. We now state our main result about a single iteration of an approximate IPM, from which the runtime and convergence results follow as consequences: Theorem 5. Let χ = η = 0.01 and ξ = 0.001 be positive constants and let (x, y, s) be solutions of (8) and (9) with µ = 1 r x T s and d(x, s, µ) ≤ ηµ. Then, the Newton system (13) has a unique solution (∆x, ∆y, ∆s). Let ∆x, ∆s be approximate solutions of (13) that satisfy where T x and T x are the square roots of the quadratic representation matrices in equation (7). If we let x next := x + ∆x and s next := s + ∆s, the following holds: 1. The updated solution is strictly feasible, i.e. x next ∈ int L and s next ∈ int L.
2. The updated solution satisfies d(x next , s next , µ) ≤ ηµ and 1 r x T next s next = µ for µ = σµ, σ = 1 − α √ r and a constant 0 < α ≤ χ. (13) is the same as in the classical case, we can reuse Theorem 1 from [32] for the uniqueness part of Theorem 5. Therefore, we just need to prove the two parts about strict feasibility and improving the duality gap.

Technical results
Before proving Theorem 5, however, we need several technical results that will be used throughout its proof in section 4.2. We start with a few general facts about vectors, matrices, and their relationship with the Jordan product •.
2. The spectral norm is less than the Frobenius norm:

If
A is a matrix with minimum and maximum singular values σ min and σ max respectively, then the norm Ax is bounded as σ min x ≤ Ax ≤ σ max x .
5. The following submultiplicativity property holds: The proofs of these statements are analogous to those for the corresponding statements for matrices. Nevertheless there are few remarks on how these statements are different from the matrix case. First, the vector spectral norm · 2 is not actually a norm, since there exist nonzero vectors outside L which have zero norm. It is, however, still bounded by the Frobenius norm (just like in the matrix case), which is in fact a proper norm. Secondly, the minimum eigenvalue bound also holds for matrix spectral norms, with the exact same statement. Finally, the last property is reminiscent of the matrix submultiplicativity property A · B F ≤ A 2 B F .
We now proceed with several well-known properties of the quadratic representation Q x and T x = Q x 1/2 . Claim 2 (Properties of Q x ). [2] Let x ∈ int L. Then, the following holds: 1. Q x e = x 2 , and thus T x e = x.
x , and more generally Q x p = Q p x for all p ∈ R.
4. Q x preserves L, i.e. Q x (L) = L and Q x (int L) = int L As a consequence of the last part of this claim, we also have Q x 2 ≤ x 2 F as well as T x 2 ≤ x F . Finally, we state a lemma about various properties that hold in a neighborhood of the central path.
Lemma 1 (Properties of the central path). Let ν > 0 be arbitrary and let x, s ∈ int L. Then, x and s satisfy the following properties: 1. For all ν > 0, the duality gap and distance from the central path are related as 2. The distance from the central path is symmetric in its arguments i.e. d(x, s, ν) = d(s, x, ν).

Let
A proof of Lemma 1 is included in Appendix B for completeness. The first and the third parts of this lemma have nice intuitive interpretations: The former asserts that the duality gap is roughly equal to the central path parameter, and the latter would be trivially true if • was associative and we had x • s = µe.

A single IPM iteration
The results above allow us to introduce commutative scalings, a method commonly used for proving IPM convergence. Our analysis is inspired by the general case analysis from [6], the derived SDP analysis from [23], and uses some technical results from the SOCP analysis in [32]. The proof of Theorem 5 consists of three main steps: 1. Rescaling x and s so that they share the same Jordan frame.
2. Bounding the norms of ∆x and ∆s, and proving that x + ∆x and s + ∆s are still strictly feasible (in the sense of belonging to int L).
3. Proving that the new solution (x + ∆x, y + ∆y, s + ∆s) is in the η-neighborhood of the central path, and the duality gap/central path parameter have decreased by a factor of 1 − α/ √ n, where α is constant.

Rescaling x and s
As in the case of SDPs, the first step of the proof uses the symmetries of the Lorentz cone to perform a commutative scaling, that is to reduce the analysis to the case when x and s share the same Jordan frame. Although • is commutative by definition, two vectors sharing a Jordan frame are akin to two matrices sharing a system of eigenvectors, and thus commuting (some authors [2] say that the vectors operator commute in this case). The easiest way to achieve this is to scale by T x = Q x 1/2 and µ −1 , i.e. to change our variables as x → x := T −1 x x = e and s → s := µ −1 T x s.
Note that for convenience, we have also rescaled the duality gap to 1. Recall also that in the matrix case, the equivalent of this scaling was X → X −1/2 XX −1/2 = I and S → µ −1 X 1/2 SX 1/2 . We use the notation z to denote the appropriately-scaled vector z, so that we have For approximate quantities (e.g. the ones obtained using tomography, or any other approximate linear system solver), we use the notation · , so that the increments become ∆x and ∆s, and their scaled counterparts are ∆x := T −1 x ∆x and ∆s := µ −1 T x ∆s. Finally, we denote the scaled version of the next iterate as x next := e + ∆x and s next := s + ∆s . Now, we see that the statement of Theorem 5 implies the following bounds on ∆x − ∆x F and ∆s − ∆s Throughout the analysis, we will make use of several constants: η > 0 is the distance from the central path, i.e. we ensure that our iterates stay in the η-neighborhood N η of the central path. The constant σ = 1 − χ/ √ r is the factor by which we aim to decrease our duality gap, for some constant χ > 0. Finally constant ξ > 0 is the approximation error for the scaled increments ∆x , ∆s . Having this notation in mind, we can state several facts about the relation between the duality gap and the central path distance for the original and scaled vectors.
Moreover, if x, s are some vectors scaled using the same scaling, the duality gap between their unscaled counterparts can be recovered as µ r x T s. At this point, we claim that it suffices to prove the two parts of Theorem 5 in the scaled case. Namely, assuming that x next ∈ int L and s next ∈ int L, by construction and Claim 2, we get x next = T x x next and s next = T x −1 s next and thus x next , s next ∈ int L. On the other hand, if µd(x next , s next , σ) ≤ ηµ, then d(x next , s next , µ) ≤ ηµ follows by Claim 3. Similarly, from 1 r x T next s next = σ, we also get 1 r x T next s next = µ. We conclude this part with two technical results from [32], that use the auxiliary matrix R xs defined as R xs := T x Arw(x) −1 Arw(s)T x . These results are useful for the later parts of the proof of Theorem 5.

Claim 4 ([32], Lemma 3).
Let η be the distance from the central path, and let ν > 0 be arbitrary. Then, R xs is bounded as R xs − νI ≤ 3ην. Lemma 5,proof). Let µ be the duality gap. Then, the scaled increment ∆s is

Maintaining strict feasibility
The main tool for showing that strict feasibility is conserved is the following bound on the increments ∆x and ∆s : Lemma 6). Let η be the distance from the central path and let µ be the duality gap. Then, we have the following bounds for the scaled direction: Moreover, if we substitute σ with its actual value 1 − χ/ √ r, we get Θ = √ 2η 2 +4χ 2 1−3η , which we can make arbitrarily small by tuning the constants. Now, we can immediately use this result to prove x next , s next ∈ int L. Lemma 3. Let η = χ = 0.01 and ξ = 0.001. Then, x next and s next are strictly feasible, i.e. x next , s next ∈ int L.

Maintaining closeness to central path
Finally, we move on to the most technical part of the proof of Theorem 5, where we prove that x next , s next is still close to the central path, and the duality gap has decreased by a constant factor. We split this into two lemmas.
Proof. By Claim 3, the distance of the next iterate from the central path is d(x next , s next , σ) = T x next s next − σe F , and we can bound it from above as So, it is enough to bound z F := s next − σ · x −1 next F from above, since We split z as and we bound z 1 F , z 2 F , and z 3 F separately.
1. By the triangle inequality, z 1 F ≤ s + ∆s − σe + ∆x F + 2ξ. Furthermore, after substituting ∆s from Claim 5, we get Using the bound for µI − R xs from Claim 4 as well as the bound for ∆x F from Lemma 2, we obtain where we used the bound from Lemma 2 again.

F
. For this, we use the submul-tiplicativity of · F from Claim 1: . Now, we have the bound (e + ∆x ) −1 2 ≤ 1 1− ∆x F and similarly (e + ∆x ) −1 Using this, we can bound z 3 F : If we let λ i be the eigenvalues of ∆x , then by Lemma 2, we have Combining all bound from above, we obtain Finally, if we plug in χ = 0.01, η = 0.01, ξ = 0.001, we get Now, we prove that the duality gap decreases.
Lemma 5. For the same constants, the updated solution satisfies 1 r x T next s next = 1 − α √ r for α = 0.005.
Proof. Since x next and s next are scaled quantities, the duality gap between their unscaled counterparts is µ r x T next s next . Applying Lemma 1 (and Claim 3) with ν = σµ and x next , s next , we obtain the upper bound µx T next s next ≤ rσµ + r 2 µd(x next , s next , σ), which in turn implies By instantiating Lemma 1 for α = χ, from its proof, we obtain d(x next , s next , σ) ≤ 0.005σ, and Therefore, the final α for this Lemma is 0.005.

Running time and convergence
By Theorem 5, each iteration of Algorithm 2 decreases the duality gap by a factor of 1 − α/ √ r, it can be shown that we need O( √ r) iterations to decrease the duality gap by a factor of 2. Therefore, to achieve a duality gap of , Algorithm 2 requires iterations, which is the same as the bound we had for the classical IPM in Algorithm 1. Here, n is the dimension of x, or, equivalently, n = r + r i=1 n i . By contrast, whereas each iteration of Algorithm 1 is some linear algebra with complexity O(n ω ) (or O(n 3 ) in practice), in Algorithm 2 we need to solve the Newton system to a precision dependent on the norms of T x −1 and T s −1 . Thus, to bound the running time of the algorithm (since the runtime of Theorem 4 depends on the desired precision), we need to bound T x −1 and T s −1 . Indeed, by Claim 2, we get If the tomography precision for iteration i is chosen to be at least (i.e. smaller than) then the premises of Theorem 5 are satisfied. The tomography precision for the entire algorithm can therefore be chosen to be δ := min i δ i . Note that these minimum eigenvalues are related to how close the current iterate is to the boundary of L -as long as x i , s i are not "too close" to the boundary of L, their minimal eigenvalues should not be "too small". There are two more parameters that impact the runtime of Theorem 4: the condition number of the Newton matrix κ i and the matrix parameter ζ i of the QRAM encoding in iteration i. For both of these quantities we define their global versions as κ = max i κ i and ζ = max i ζ i . Finally, we can state the theorem about the entire running time of Algorithm 2: Theorem 6. Let (8) be a SOCP with A ∈ R m×n , m ≤ n, and L = L n 1 × · · · × L nr . Then, Algorithm 2 achieves duality gap in time This complexity can be easily interpreted as product of the number of iterations and the cost of n-dimensional vector tomography with error δ. So, improving the complexity of the tomography algorithm would improve the running time of Algorithm 2 as well.
Note that up to now, we cared mostly about strict (conic) feasibility of x and s. Now, we address the fact that the linear constraints Ax = b and A T y + s = c are not exactly satisfied during the execution of the algorithm. Luckily, it turns out that this error is not accumulated, but is instead determined just by the final tomography precision: Theorem 7. Let (8) be a SOCP as in Theorem 6. Then, after T iterations, the (linear) infeasibility of the final iterate x, y, s is bounded as Proof. Let (x T , y T , s T ) be the T -th iterate. Then, the following holds for Ax T − b: On the other hand, the Newton system at iteration T has the constraint A∆x T = b − Ax T −1 , which we can further recursively transform as, Substituting this into equation (15), we get Similarly, using the constraint A T ∆y T + ∆s T = c − s T −1 − A T y T −1 we obtain that Finally, we can bound the norms of these two quantities, 5 Quantum support-vector machines

Reducing SVMs to SOCP
In this section we present applications of our quantum SOCP algorithm to the support vector machine (SVM) problem in machine learning. Given a set of vectors X = {x (i) ∈ R n | i ∈ [m]} (training examples) and their labels y (i) ∈ {−1, 1}, the objective of the SVM training process is to find the "best" hyperplane that separates training examples with label 1 from those with label −1. More precisely, when there exists a hyperplane separating the examples with label 1 from those with label −1 (i.e. X is linearly separable), the SVM solution is the separating hyperplane that maximizes the distance from the closest point with either label -this is the maximum margin hyperplane. It has been shown [11] that if we parameterize the hyperplane with its normal vector w and displacement (or bias) b, we can obtain the maximum margin hyperplane by solving the following optimization problem: This problem is known as the hard-margin SVM (training) problem. On the other hand, if X is not linearly separable, clearly for some i, the constraints y (i) (w T x (i) + b) ≥ 1 will be violated no matter which hyperplane we choose. So, we define the soft-margin SVM problem by modifying (16) to allow a small ξ i ≥ 0 violation in each of these constraints. Geometrically, this means that the points on the correct side of the hyperplane have ξ = 0, whereas the others will have ξ i ≥ 0. Thus, in order to minimize the number of misclassified points, we add ξ 1 = i ξ i to the objective of (16), to get the following: Here, the constant C > 0 is just a hyperparameter that quantifies the tradeoff between maximizing the margin and minimizing the constraint violations. Finally, it is worth noting that there exists another way of removing the assumption of linear separability from (16): the 2 -SVM (or least-squares SVM, LS-SVM), introduced by [38]. Namely, we can treat the SVM problem as a least-squares regression problem with binary targets {−1, 1}, by trying to solve for y (i) (w T x (i) + b) = 1, and minimizing the squared 2-norm of the residuals: Since this is a least-squares problem, the optimal w, b and ξ can be obtained by solving a linear system. In [35], a quantum algorithm for LS-SVM is presented, which uses quantum linear algebra to construct and solve this system. Unfortunately, replacing the 1 -norm with 2 in the objective of (18) leads to the loss of a key property of ( 1 -)SVM -weight sparsity [37]. Finally, we are going to reduce the SVM problem (17) to SOCP. In order to do that, we define an auxiliary vector t = (t + 1; t; w), where t ∈ R -this allows us to "compute" w 2 using the constraint t ∈ L n+1 since t ∈ L n+1 ⇔ (t + 1) 2 ≥ t 2 + w 2 ⇔ 2t + 1 ≥ w 2 .
Thus, minimizing w 2 is equivalent to minimizing t. Also, we note that we can restrict our bias b to be nonnegative without any loss in generality, since the case b < 0 can be equivalently described by a bias −b > 0 and weights −w. Using these transformations, we can restate (17) as the following SOCP: Here, we use the notation X ∈ R n×m for the matrix whose columns are the training examples x (i) , and y ∈ R m for the vector of labels. This problem has O(n + m) variables, and O(m) conic constraints (i.e. its rank is r = O(m)). Therefore, in the interesting case of m = Θ(n), it can be solved in O( √ n) iterations. More precisely, if we consider both the primal and the dual, in total they have 3m + 2n + 7 scalar variables and 2m + 4 conic constraints. Additionally, it turns out that it is possible to find an initial strictly-feasible primal-dual solution of this SOCP without introducing more slack variables, which will be useful for the numerical experiments later on.
It is also worth noting that if we wanted to solve the LS-SVM problem (18), we could have also reduced it to a SOCP in a similar manner. In fact, this would have resulted in just O(1) conic constraints, so an IPM would converge to a solution in O(1) iterations, which is comparable with the result from [35].

Experimental Results
We next present some experimental results to assess the running time parameters and the performance of our algorithm for random instances of SVM. A random SVM instance is parametrized by two integers n and m, specifying the dimension and the number of training examples, respectively. Given these parameters, we construct the SVM instance as follows: (a) Evolution of the duality gap µ and the tomography precision δ for a random instance of SVM(50, 100, 0.2).  From Figure 1a we see that in terms of convergence, the quantum IPM converges as fast as its classical counterpart, i.e. the duality gap µ decreases exponentially for both the cases. Since from (14) we know that the running time of the tomography algorithm has an inverse square dependence on δ = min λ min (x), λ min (s) , it is worth noting that both of these parameters asymptotically seem to be O(µ). So, in the running time (6), we can think of δ and as quantities of the same order.
In any case, with the number of iterations being the same for Algorithms 1 and 2, it is worth comparing their per-iteration and total costs. In particular, since from Figure 1a we see that the highest tomography precision is required at the end, and since we know that the condition number κ of the Newton system behaves roughly as the inverse of the duality gap [12,Chapter 2] (this can also be seen on Figure 1b), the cost of the last iteration is an upper bound for the cost of all previous iterations.
We next analyze the evolution of the SVM classification accuracy and the generalization error with the duality gap µ for the optimization algorithm. This leads us to the somewhat surprising conclusion that the optimal classification accuracy and generalization error are in fact achieved for a modest value of ∼ 0.1. Further, this value of does not depend on the size of the instances, we found it to be constant for random SVM instances with varying sizes.
We conclude from Figure 2 that for training random SVM instances, it is enough to let Algorithm 2 run until it reaches duality gap µ ≤ = 0.1. So, in spite of Algorithm 2 having running time that scales as −3 , here we already achieve the best possible performance from the end product (a trained SVM) at the modest value of = 0.1.
Finally, we estimate the computational cost needed to achieve these results. In particular, on Figure 3 we show how the complexity from Theorem 6 grows with the problem size. In Figure 3 we considered the a sample of 1200 instances of SVM(n, 2n, 0.2), where n is sampled uniformly from [2 2 , 2 10 ], and the problem size on the x-axis is 8n + 7, the dimension of the Newton system.
By finding the least-squares fit of the power law y = ax b through the observed values of the quantity n 1.5 κζ δ 2 , we obtain the exponent b = 2.557, and its 95% confidence interval [2.503, 2.610] (this interval is computed in the standard way using Student's t-distribution, as described in [33]). Thus, we can say that for random SVM(n, 2n, 0.2)-instances, and fixed = 0.1, the The experiments also suggest that in practice "everything works out" much more easily than in theory. An example of this is the fact that both Algorithm 1 and 2 converge in the same number of iterations, which suggests that there is no loss in the speed of convergence (i.e. in practice, σ ∼ σ).

Conclusions
In this work we presented an approximate interior-point method for second-order conic programming, that converges in the same number of iterations as its exact counterpart (e.g. from [32,6]), namely O( √ r log(µ 0 / )). By using a quantum algorithm for solving linear systems, the per-iteration complexity has been reduced from O(n 3 ) to O nκζ δ 2 log κζ δ , which opens the possibility of a significant speedup for large n. The presence of this speedup is further corroborated with numerical experiments, using random SVM instances as model problems. It also may be possible to further improve the running time for quantum interior point methods using the recently developed stochastic variants of the classical IPMs [10]. We also note that the running time parameters for a quantum SVM algorithm using the best known multiplicative weights based quantum SDP solver [4] were computed in [5], where it was found that while there are no quantum speedup for general SVMs, in the m = O(n) regime the quantum algorithm has running time O( √ m log 10 m

5
). This result also demonstrates the potential for quantum speedups for real world optimization problems.
The experimental results demonstrate the potential for quantum optimization to provide significant speedups over the state of the art classical algorithms. We are confident that, just like numerical linear algebra packages have evolved from naive implementations to their highlyoptimized and robust state today, the quantum building blocks will also evolve -bringing more practical speedups for real-world problems, in the near future. We can therefore bound the duality gap x T s as follows, The second step used the Cauchy-Schwarz inequality while the third follows from the definition d(x, s, ν) 2 = 2r i=1 (λ i − ν) 2 . The proof of the lower bound is similar, but starts instead with the inequality 1 2 For part 2, it suffices to prove that T x s and T s x have the eigenvalues. This follows from part 2 of Theorem 10 in [2].