Faster quantum and classical SDP approximations for quadratic binary optimization

We give a quantum speedup for solving the canonical semideﬁnite programming relaxation for binary quadratic optimization. This class of relaxations for combinatorial optimization has so far eluded quantum speedups. Our methods combine ideas from quantum Gibbs sampling and matrix exponent updates. A de-quantization of the algorithm also leads to a faster classical solver. For generic instances, our quantum solver gives a nearly quadratic speedup over state-of-the-art algorithms. Such instances include approximating the ground state of spin glasses and MaxCut on Erd¨os-R´enyi graphs. We also provide an eﬃcient randomized rounding procedure that converts approximately optimal SDP solutions into approximations of the original quadratic optimization problem.


Introduction
Quadratic optimization problems with binary constraints are an important class of optimization problems.Given a (real-valued) symmetric n × n matrix A the task is to compute maximize x|A|x subject to x ∈ {±1} n (MaxQP). ( This problem arises naturally in many applications across various scientific disciplines, e.g.image compression [52], latent semantic indexing [36], community detection [48], correlation clustering [17,46] and structured principal component analysis, see e.g.[38,37] and references therein.Mathematically, MaxQPs (1) are closely related to computing the ∞ → 1 norm of matrices.This norm, in turn, closely relates to the cut norm (replace x ∈ {±1} n by x ∈ {0, 1} n ), as both norms can only differ by a constant factor.These norms are an important concept in theoretical computer science [24,3,2], since problems such as identifying the largest cut in a graph (MaxCut) can be naturally formulated as instances of these norms.This connection highlights that optimal solutions of (1) are NP-hard to compute in the worst case.Despite their intrinsic hardness, quadratic optimization problems do admit a canonical semidefinite programming (SDP) relaxation1 [27]: maximize tr (AX) subject to diag(X) = 1, X ≥ 0 (MaxQP SDP) Here, X ≥ 0 indicates that the n × n matrix X is positive semidefinite (psd), i.e. y|X|y ≥ 0 for all y ∈ R n .SDPs comprise a rich class of convex optimization problems that can be solved efficiently under mild assumptions, e.g. by using interior-point methods [12].
Perhaps surprisingly, the optimal value of the MaxQP relaxation often provides a constant factor approximation to the optimal value of the original quadratic problem.However, the associated optimal matrix X is typically not in one-to-one correspondence with an optimal feasible point x ∈ {±1} n of the original problem (1).Several randomized rounding procedures have been devised to overcome this drawback since the pioneering work of [27].These transform X into a random binary vector x ∈ {±1} n that achieves x|A|x ≥ γ max x∈{±1} n x|A|x in expectation for some constant γ.Explicit values of γ are known for instance for the case of A being the adjacency matrix of a graph [27] or positive semidefinite [2].
Although tractable in a theoretical sense, the runtime associated with general-purpose SDP solvers quickly becomes prohibitively expensive in both memory and time.This practical bottleneck has spurred considerable attention in the theoretical computer science community over the past decades [7,16,11,61].(Meta) algorithms, like matrix multiplicative weights (MMW) [7] solve the MaxQP SDP (2) up to multiplicative error A 1 in runtime O((n/ )2.5 s), where s denotes the column sparsity of A. Further improvements are possible if the problem description A has additional structure, such as A being the adjacency matrix of a graph [6].
Very recently, a line of works pointed out that quantum computers can solve certain SDPs even faster [13,64,63,14,33].From a high-level perspective, the existing approaches fall into two categories: those that obtain quantum speedups by capitalizing on the fact that quantum computers can prepare certain quantum Gibbs states faster [13,64,63,14] and those that capitalize on the fact that quantum computers can solve certain linear algebra problems faster [33].For the works that focus on quantum Gibbs states, we further distinguish those that follow a primal-dual approach [13,64,63,14] and those that follow a primal-only approach [63] 2 .For both the linear algebra and the primal-dual approach the current runtime guarantees depend on problem-specific parameters.These parameters scale particularly poorly for most combinatorial optimization problems, including the MaxQP SDP, and negate any potential advantage.Moreover, other closely related primal only approaches [63] do not yield a quantum speedup straight away because they treat the constraints in a black-box manner.
In this work, we tackle this challenge and overcome the shortcomings of existing quantum SDP solvers by considering the following further relaxation of problem (2): find X (renormalized, relaxed, feasibility MaxQP SDP) (3) Here we introduced two additional parameters λ and .The λ parameter comes from a standard trick to reduce the problem in (2) to a sequence of feasibility problems, as we will explain later.The parameter encodes a further relaxation of the constraints of Eq. (2).Let us first discuss the case where = 0, that is, we have the normalized diagonal constraints i|X|i = 1  n .This renormalization of the original problem pinpoints connections to quantum mechanics: Every feasible point X obeys tr (X) = 1 and X ≥ 0, implying that it describes the state ρ of a n-dimensional quantum system.In turn, such quantum states can be represented approximately by a renormalized matrix exponential ρ = exp(−H)/tr(exp(−H)), the Gibbs state associated with Hamiltonian H.We capitalize on this correspondence by devising a meta-algorithm -Hamiltonian Updates (HU) -that is inspired by matrix exponentiated gradient updates [62], see also [43,14,30] for similar approaches.Another key insight is that the diagonal constraints also have a clear quantum mechanical interpretation: the feasible states are those that are indistinguishable from the uniform or maximally mixed state when measured in the computational basis.
This interpretation points the way to another key component to obtaining speedups for MaxQP SDP: by setting > 0 we further relax the problem and optimize over all states that are approximately indistinguishable from the maximally mixed state when measured in the computational basis.This further relaxation will allow us to overcome shortcomings of previous solvers when dealing with SDPs of this form, as it bundles up the n linear constraints in Eq. (3) into one.As we ultimately want to solve Eq. (2) and not (3), a significant part of the technical contribution of this work is to show that this further relaxation is mild.Indeed, we will be able to convert a solution to (3) into one to (2) by only slightly changing the objective value.This will allow us to solve the relaxed problem up to an additive error by only imposing a relaxation parameter 4 .As is the case with this and many other related algorithms to solve SDPs, ensuring that we only require a dimension-independent 4 precision for the constraints is essential to guarantee speedups.Note that to obtain the same level of precision as in the formulation given in (3) would require enforcing that each diagonal constraint is satisfied up to an error of order O(n −1 ).
Although originally designed to exploit the fact that quantum architectures can sometimes create Gibbs states efficiently and inspired by interpreting the problem from the point of view of quantum mechanics, it turns out that this approach also produces faster classical algorithms.
To state our results, we instantiate standard computer science notation.The symbol O(•) describes limiting function behavior, while Õ(•) hides poly-logarithmic factors in the problem dimension and polynomial dependencies on the inverse accuracy 1/ .We are working with the adjacency list oracle model, where individual entries and location of nonzero entries of the problem description A can be queried at unit cost.We refer to Section 3.4 for a more detailed discussion.
Here ω is the matrix multiplication exponent.With some abuse of terminology, the word "solves" is used with slightly different meanings for the classical and quantum algorithms in the statement above.For the classical algorithm we can indeed output a feasible solution of MaxQP SDP that is n close to the optimal target value.In the quantum case, the output is in the form of a quantum state ρ such that nρ is O(n ) close in trace distance to a feasible point and with value that is n A close, what we will call approximately feasible.We emphasize that the quantum algorithm also outputs a classical description of a solution that is approximately feasible in a sense that will be made precise below.The polynomial dependency on inverse accuracy is rather high (e.g.(1/ ) 12 for the classical algorithm).We expect future work to be able to improve this.
Already the classical runtime improves upon the best known existing results and we refer to Section 2.5 for a detailed comparison.Access to a quantum computer would increase this gap further.However, it is important to point out that Theorem I has an approximation error of order n A .In contrast, MMW [7] -the fastest existing algorithm -incurs an error proportional to A 1 , where A 1 = i,j |A i,j |, making a straightforward comparison more difficult.Importantly, the scaling of our algorithm is favorable for generic problem instances and spin glass models, see Section 2.5.
The quantum algorithm outputs a classical description of a Hamiltonian H that encodes an approximately optimal, approximately feasible solution ρ = exp(−H )/ tr exp(−H ) of the renormalized MaxQP SDP (3).This classical output can subsequently be used for randomized rounding for the ∞ → 1 norm of a matrix A, A ∞→1 = max x,y∈{±1} n x|A|y .
Theorem II (Rounding).Suppose that H encodes an approximately optimal solution of the renormalized MaxQP SDP (3) with accuracy 4 for the target matrix where A is a n × n real matrix with at most s nonzero entries per column (column sparsity).Then, there is a classical Õ(ns)-time randomized rounding procedure that converts H into binary vectors x, ỹ ∈ {±1} n that obey This result recovers the randomized rounding guarantees of [2] in the limit of perfect accuracy ( = 0).However, for > 0 the error scales with n A .In turn, randomized rounding only provides a multiplicative approximation if A ∞→1 is of the same order.This result on the randomized rounding also relies on a detailed analysis of the stability of the rounding procedure w.r.t. to approximate solutions to the problem.

Detailed summary of results
We present Hamiltonian Updates -a meta-algorithm for solving convex optimization problems over the set of quantum states based on quantum Gibbs sampling -in a more general setting, as we expect it to find applications to other problems.Throughout this work, • tr and • denote the trace (Schatten-1) and operator (Schatten-∞) norms, respectively.

Convex optimization and feasibility problems
SDPs over the set of quantum states are a special instance of a more general class of convex optimization problems.For a bounded, concave function f from the set of symmetric matrices to the real numbers and closed convex sets The constraint tr(X) = 1 enforces normalization, while X ≥ 0 is the defining structure constraint of semidefinite programming.Together, they restrict X to the set of n-dimensional quantum states S n = {X : tr(X) = 1, X ≥ 0}.We will now specialize to the case f (A) = tr (AX) for a symmetric matrix A, as this is our main case of interest, but remark that it is simple to generalize the discussion that follows for more general classes.This trace normalization constraint implies fundamental bounds on the optimal value: |tr(AX )| ≤ A X tr = A , according to Matrix Hölder [10, Ex.IV.2.12].Binary search over potential optimal values λ ∈ [− A , A ] allows for reducing the convex optimization problem into a sequence of feasibility problems: subject to tr (AX) ≥ λ, The convergence of binary search is exponential.This ensures that the overhead is benign: a total of log( A / ) queries of CPfeas(λ) suffices to determine the optimal solution of CPopt (4) up to accuracy .In summary: Fact 2.1.Binary search reduces the task of solving convex optimization problems (4) to the task of solving convex feasibility problems (5).

Meta-algorithm for approximately solving convex feasibility problems
We adapt a meta-algorithm developed by Tsuda, Rätsch and Warmuth [62], see also [43,6,30,14] for similar ideas and [15] for an overview of these techniques.All these algorithms, including the variation presented here, can be seen as instances of mirror descent with the mirror map given by the von Neumann entropy with adaptations tailored to the problem at hand.We believe our variation provides a path for also obtaining quantum speedups for nonlinear convex optimizations, so we state it in more detail.
For our algorithm, we require subroutines that allow for testing -closeness (in trace norm) to each convex set C i .Definition 2.1 ( -separation oracle).Let C ⊂ S n be a closed, convex subset of quantum states and C * ⊂ {X = X † ∈ C n×n : X ≤ 1} be a closed, convex subset of observables of operator norm at most 1.For > 0 an -separation oracle (with respect to C * ) is a subroutine that either accepts a state ρ (in the sense that observables from C * cannot distinguish ρ from elements of C), or provides a hyperplane P that separates ρ from the convex set using a test from C * : We note that the Oracle is well-defined in the sense that if min Y ∈C max P ∈C * tr(P (ρ−Y )) > , then there exists P such that P ∈ C * and for all Y ∈ C tr Indeed, by Sion's min-max theorem [57] we have This implies that there even exists a P that separates the state ρ from the set C by .Nonetheless, we instantiate the weaker requirement with only /2 separation.This will be vital to ensure that the algorithm can tolerate errors and/or approximations in the samples from ρ.
By allowing for fine-tuning of C * we are able to reduce the number of closeness conditions we need to test.Hamiltonian Updates (HU) is a general meta-algorithm for approximately solving convex feasibility problems (5) (CPfeas).The task is to find a state ρ that is -close to each convex set C i with respect to observables in some C * i (max ) and also obeys ρ ∈ S n (ρ ≥ 0 and tr(ρ) = 1).A change of variables takes care of positive semidefiniteness and normalization: replace X in problem (5) by a Gibbs state ρ H = exp (−H) /tr(exp(−H)).At each iteration, we query -separation oracles.If they all accept, the current iterate is -close to feasible in the sense that there is a matrix in each C i whose expectation value with respecto to observables in C * i is close with respect to the accepted state, and we are done.Otherwise, we update the matrix exponent to penalize infeasible directions: H → H + 16 P , where P is a separating hyperplane that witnesses infeasibility.This process is visualized in Figure 1 and we refer to Algorithm 1 for a detailed description.

Require:
As it is also the case for the aforementioned variations of the algorithm above, the proof follows from establishing sufficiently large step-wise progress in quantum relative entropy.The quantum relative entropy between any feasible state and the initial state ρ 0 = n −1 I (maximally mixed state) is bounded by log(n).Therefore, the algorithm must terminate after sufficiently many iterations.Otherwise, the problem is infeasible.We refer to Section 3.1 for details.Note that, unlike related previous quantum solvers [14,64] our algorithm only considers the primal problem and bears a strong resemblance with the primal-only approach of [63], although we differ in the way we implement the constraints.Theorem 2.1 has important consequences: The runtime of approximately solving quantum feasibility problems is dominated by the cost of implementing m separation oracles O i, and the cost associated with matrix exponentiation.This reduces the task of efficiently solving convex feasibility problems to the quest of efficiently identifying separating hyperplanes and developing fast routines for computing Gibbs states.The latter point already hints at a genuine quantum advantage: quantum architectures can efficiently prepare (certain) Gibbs states [19,23,32,54,60,60,65,64].
It should be stressed that the approximate feasibility guarantee in (6) is not very strong and a careful choice of the C i , C * i and a careful analysis of the continuity of the problem is usually required to ensure that it gives a good approximation to CPopt (4).

Classical and quantum solvers for the renormalized MaxQP SDP
Let us now formulate the renormalized, relaxed problem in Eq. (3) in this framework and discuss the appropriate oracles.For fixed λ ∈ [−1, 1] the (feasibility) MaxQP SDP is equivalent to a quantum feasibility problem: The set A λ corresponds to a half-space, while D n is an affine subspace with codimension n.Let us now see that the convergence promises of Thm.2.1 indeed convert to the renormalized, relaxed, feasibility MaxQP SDP, see Eq. (3).Let us start with observing max Combined with the defining halfspace condition for A λ , this display asserts tr A A −1 ρ ≥ λ − .We can analyze the oracle for D n in a similar fashion.Note that, Thus, we indeed obtain Eq. (3) from this formulation up to an error for the target value.
It will be important to ensure that both quantum and classical algorithms work only having access to approximations of the current iteration.The simple structure of both sets readily suggests two separation oracles that take this into account: Check i |p(i) − 1/n| ≤ 3 4 and output if this is not the case.
Note that the oracles are only defined for quantum states as inputs.Let us briefly check that it satisfies the definitions in 2.1.For O A λ we have that if ã ≥ λ− 3  4 , then tr(A A −1 ρ) ≥ λ− , as desired.The other case is similar.
For the oracle for O Dn , let us first assume that we are in the case that i |p(i)−1/n| ≥ 3  4 .Clearly, we have that P defined in Eq. ( 9) is diagonal and of operator norm at most 1.For ease of notation let f : [n] → {−1, 1} be −1 if p(i) < 1/n and 1 else.By construction we have for any Y ∈ O Dn ∩ S n : where in (1) we used Hölder's inequality.On the other hand, if i |p(i) − 1/n| ≤ 3 4 , then an application of the triangle inequality shows that i |p(i) − 1/n| ≤ .Thus, we conclude that both oracles are correct.
The key insight to later obtain quantum speedups for the MaxQP SDP is that the second oracle can be interpreted as trying to distinguish the current state from the maximally mixed through computational basis measurements.This view is similar in spirit to [43,Lemma 4.6], although here we focus on using this approach to construct solutions and to show that this notion of approximate feasibility is good enough for the MaxQP SDP.

Classical runtime
For fixed ρ H = exp(−H)/tr (exp(−H)) both separation oracles are easy to implement on a classical computer given access to ρ H . Hence, matrix exponentiation is the only remaining bottleneck.This can be mitigated by truncating the Taylor series for exp(−H) after l = O( H + 1/ ) many steps.Approximating ρ in this fashion only requires steps and only incurs an error of in trace distance.Moreover, it is then possible to convert an approximately feasible to point to a strictly feasible one with a similar value, see Section 3.3.The following result becomes an immediate consequence of Fact 2.1 and Theorem 2.1.

Corollary 2.1 (Classical runtime for the MaxQP SDP).
Suppose that A has row-sparsity s.Then, the classical cost of solving the associated (renormalized) MaxQP SDP up to additive error is O(min{n 2 s, n ω } log(n) −12 ).
The comparatively poor accuracy scaling with −12 stems largely from the fact that we need to convert an approximately feasible optimal solution into a strictly feasible optimal solution.This conversion is contingent on running Algorithm 1 with accuracy ˜ = 4 (see Proposition 3.1 below).The total accuracy scaling −12 = ˜ −3 results from combining the O(log(n)/˜ )-cost for approximating the matrix exponential within a single iteration with the iteration bound T = O(log(n)/˜ 2 ) from Theorem 3.1.

Quantum runtime
Quantum architectures can efficiently prepare (certain) Gibbs states and are therefore well suited to overcome the main classical bottleneck.In contrast, checking feasibility becomes more challenging, because information about ρ is not accessible directly.Instead, we must prepare multiple copies of ρ and perform quantum mechanical measurements to test feasibility: • O( −2 ) copies of ρ suffice to -approximate tr(A A −1 ρ) via phase estimation.
• O(n −2 ) copies suffice to estimate the diagonal entries of ρ (up to accuracy in trace norm) with a high probability of success via repeated computational basis measurements.
Combining this with the overall cost of preparing a single Gibbs state implies the following runtime for executing Algorithm 1 on a quantum computer.This result is based on the sparse oracle input model and we refer to Sec. 3.4 for details.
Corollary 2.2 (Quantum runtime for the MaxQP SDP).Suppose that A has row-sparsity s.Then, the quantum cost of solving the MaxQP SDP up to additive error n A is Õ(n 1.5 s 0.5+o (1) poly(1/ )).
The quantum algorithm also outputs a classical description of the Hamiltonian H corresponding to an approximately optimal, approximately feasible Gibbs state and its value.More precisely, it outputs a real number a and a diagonal matrix D such that H = aA + D and nρ H is O(n ) close in trace distance to a feasible point of MaxQP SDP.Moreover, we have the potential to produce samples from the associated approximately optimal Gibbs state ρ = exp(−H )/ tr exp(−H ) in sub-linear runtime Õ( √ n) on a quantum computer.In the next section we show that the output of the algorithm is enough to give rise to good randomized roundings.

Randomized rounding
The renormalized MaxQP SDP (3) arises as a convex relaxation of an important quadratic optimization problem (1).However, the optimal solution X is typically not of the form |x x|, with x ∈ {±1} n .Goemans and Williamson [27] pioneered randomized rounding techniques that allow for converting X into a cut x that is close to optimal.However, their rounding techniques rely on the underlying matrix being entrywise positive and a more delicate analysis is required to derive analogous results for broader classes of matrices.We will now follow the analysis of [2] to do the randomized rounding for the ∞ → 1 norm.First, let us make the connection between this norm and the MaxQP SDP clearer.Let A be a real matrix and define It is easy to see that for two binary vectors x, y ∈ {±1} n we have x ⊕ y|A |x ⊕ y = 2 x|A|y (with a slight abuse of notation, we also use the bra-ket notation for inner products of unnormalized vectors).This immediately shows that 2 A 1→∞ = max z∈{±1} 2n z|A |z , which is an instance of MaxQP.We will now show that the rounding procedure is stable, i.e. randomized rounding of an approximately feasible, approximately optimal point, such as the ones outputted by the quantum algorithm, still results in a good binary vector for approximating this norm.We strengthen the stability of the rounding even further by showing that rounding with a truncated Taylor expansion of the solution is still good enough, saving runtime.The rounding procedure is described in Algorithm 2.

4:
output x i = sign(z i ).5: end function Proposition 2.1.Let A be a real matrix and H be such that ρ = exp(−H )/tr(exp(−H )) is an -approximate solution to the renormalized MaxQPSDP for A (3) where γ = 2/π for A p.s.d. and 4/π − 1 else.This rounding procedure is fully classical and can be executed in runtime Õ(ns).We refer to Sec. 3.5 for details.What is more, it applies to both quantum and classical solutions of the MaxQP SDP.Even the quantum algorithm provides H in classical form, while the associated ρ is only available as a quantum state.Rounding directly with ρ would necessitate a fully quantum rounding technique that, while difficult to implement and analyze, seems to offer no advantages over the classical Algorithm 2. Thus, it is possible to perform the rounding even with the output of the quantum algorithm.We prove this theorem in two steps.First, we follow the proof technique of [2] to show that our relaxed notion of approximately feasible is still good enough to ensure a good rounding in expectation.This shows that our notion of feasibility is strong enough for the problem at hand.The stability of the rounding w.r.t. to truncation of the Taylor series then follows by showing appropriate anti-concentration inequalities for the random vector.
Note that in [2] the authors prove that the constant 2 π in Proposition 2.1 is optimal.

Comparison to existing work
The MaxQP SDP has already received a lot of attention in the literature.Table 1 contains a runtime comparison between the contributions of this work and the best existing classical results [7,6].It highlights regimes, where we obtain both classical and quantum speedups.In a nutshell, Hamiltonian Updates outperforms state of the art algorithms whenever the target matrix A has both positive and negative off-diagonal entries and the optimal value of the SDP scales as n A .It is worthwhile to explore the following examples.

Quadratic quantum speedups and classical speedups for generic instances and spin glasses
Recall that Hamiltonian Updates can only offer speedups for MaxQP SDP instances where the optimal value scales like n A , as opposed to the A 1 scaling required for MMW.
Intuitively speaking, such a scaling should arise whenever A has both positive and negative entries, causing cancellations.In order to formalize this intuition, we show that Hamiltonian Updates offers speedups for generic matrices that have both positive and negative entries, see Appendix A for details.Our main result is as follows.Suppose that A is a random Hermitian matrix with entries where g ij are independent centered random variables with a bounded fourth moment, τ ij is a Bernoulli random variable with parameter p and λ > 0 is some fixed parameter.This random generative model covers many relevant MaxQP instances.Note that if we set p = s n , the matrix A is O(s) sparse in expectation.Let us first discuss the (centered) λ = 0case in more detail.There, ) and concrete realizations of A concentrate sharply around these expected values.These concentration arguments are derived in Appendix A and imply that, indeed, n A , and not A 1 , provides the right scaling for such generic instances.The scaling for MMW [7] is to achieve an error of A 1 .Thus, to obtain a multiplicative error for such instances using MMW we need to divide by s − 1 2 , yielding an expected scaling of Õ(min{(n/ ) 2.5 s 4.5 , n 3 s 2.25 −3.5 }).This implies that the runtime of Hamiltonian Updates improves upon MMW [7], both classically and quantumly.
To the best of our knowledge, the quantum implementation of Hamiltonian Updates establishes the first quantum speedup for problems of this type.Corollary 2.2 establishes a nearly quadratic speedup for generic MaxQP SDP instances compared to the current state of the art SDP solvers.
It is worth noting that the random matrix defined in Eq. (10) corresponds to a widely studied model in spin glasses: the (diluted) Sherrington-Kirkpatrick (SK) model [53,59].This problem has received considerable attention in the statistical physics literature.In particular, recent work [47] shows that, under some conjectures, it is possible to approximately solve the quadratic optimization in (1) with high probability in time Õ(n 2 ) for the standard, undiluted SK model (τ ij = 1).This is the same time complexity as our quantum solver, as the target matrix of these instances is dense (s = Ω(n)).To the best of our knowledge, a variation of [47] for the diluted model (p < 1) has not yet been discussed.Furthermore, there is an integrality gap for the SDP relaxation of this problem in the Gaussian setting [40,48] whenever λ = 0.As we discuss in more detail in Appendix B, this implies that the value of the problem in the case τ ij = 1 converges to the largest eigenvalue of A in the limit n → ∞.On top of that, [48] gives a construction of an approximately optimal feasible point that can be computed in O(n ω ) time, where ω > 2 is the exponent of matrix multiplication.Correspondingly, we do not obtain a classical speedup for such instances.Once again we refer to Appendix B for more details and we are not aware of similar results for the diluted model.
Let us now discuss the undiluted case with λ > 1, as the behaviour of the model is not qualitatively different from the case λ = 0 for λ < 1 [48].To the best of our knowledge, the exact value of the MaxQP SDP is not known for this setting.But, numerical evidence suggests that there is no integrality gap [48,31], and no constructions of approximately optimal points are known.Thus, we expect that it is this regime, where we obtain both quantum and classical speedups.

Speedups for MaxCut and the hidden partition problem:
Additional structure can substantially reduce the runtime of existing MMW solvers [6].For weighted MaxCut, in particular, A is related to the adjacency matrix of a graph and has exclusively non-negative entries.This additional structure facilitates the use of powerful dimensionality reduction and sparsification techniques that outperform our algorithm for general graphs.Recently, it was shown that quantum algorithms can speed up spectral graph sparsification techniques [4].As the sparsification step dominates the complexity of these algorithms, this leads to faster solvers for MaxCut, albeit solving the sparsified SDP on a classical computer.However, these sparsification techniques do not readily apply to general problem instances, where the entries of A can have both positive and negative signs (sign problem).We refer to Appendix C for a more detailed discussion.
More direct speedups do, however, apply for approximating MaxCut in Erdös-Rényi graphs.An Erdös-Rényi graph G(n, p) with n vertices is a random graph in which each edge is present independently at random with probability p.In [20] the authors show that whenever pn → ∞, the MaxCut of such graphs satisfies : where P * is the so-called Parisi constant.One can show that for such graphs, a random balanced partition of the vertices achieves an expected cut of value n 2 p 2 .Thus, obtaining a cut of n 2 p 2 up to an approximation error of order n 2 p 2 for of constant order is trivial for random graphs: just sample a random one.Thus, obtaining approximations to the MaxCut of such graphs is only interesting whenever we can achieve an error scaling as O(n and the usual Goemans-Wiliamson relaxation is not suitable.In order to address this issue, Montanari et al. [48] showed that is advisable to instead solve MaxQP SDP relaxation for the matrix where 1 is the all ones vector and A the (random) adjacency matrix of the graph.This then has the value 2n Note that the matrix in Eq. (12) has both negative and positive entries with expected value 0 and bounded variance.We conclude that we are in the same setting as in the spin glasses for this dense instance and, thus, we also obtain speedups compared to MMW.However, once again the recent work [47] shows that, under some conjectures, it is possible to approximately solve the underlying MaxQP for B directly in time O(n 2 ) with high probability.
Another relevant random graph model is that of the planted partition, whose distribution we will denote by G(n, a/n, b/n) for parameters a, b > 0. This distribution over graphs with n vertices is defined as follows.First, we partition the n vertices into two subsets S 1 , S 2 with |S 1 | = n/2 uniformly at random.Conditional on this partition we pick the edges independently at random with probabilities: Solving the MaxQP SDP for the target matrix described in Eq. ( 12) with p = a+b 2 is relevant to solving the planted partition problem [48] and closely related to the model in Eq. (10) . We refer to [48] for details on this, but roughly speaking the problem is to decide if a graph was sampled from Erdös-Rényi distribution with parameter p or from the planted partition with p = a+b 2 .Note that also for the planted partition the adjacency matrix in Eq. (12) satisfies the conditions under which we obtain speedups.
In [48,Theorem 3] the authors show that for certain parameter ranges of a, b, solving the MaxQP SDP in Eq. ( 12) and using its value to decide which distribution we sampled from gives rise to a good test for this problem.As for both the planted partition and the Erdös-Rényi model the MaxQP SDP in Eq. ( 12) can be solved faster with our methods, we obtain a speedup for this problem.

Previous quantum SDP solvers:
Previous quantum SDP solvers based on primal-dual methods [13,64,63,14] with inverse polynomial dependence on the error do not provide speedups for solving the MaxQP SDP in the worst case, as their complexity depends on a problem specific parameter, the width of the SDP.We refer to the aforementioned references for a precise definition of this parameter and for the complexity of the solvers under different input models.As shown in [64, Theorem 24], the width parameter scales at least linearly in the dimension n for what they call combinable SDPs [Definition 23] [64].In a nutshell, these are SDP classes for which direct sum combinations of two instances and constraints yield another valid SDP in the class and we refer to [64] for a precise definition.For our purposes, it suffices to note that the MaxQP SDP class of SDPs is combinable, as shown in [64].Although the authors only observe that their conditions apply to MaxCut, it is easy to see that their results do not require any assumptions on the sign of entries of A. Thus, their results show that the MaxQP SDP also admits instances with linearly growing width.To the best of our knowledge, the solvers mentioned above have a dependence that is at least quadratic in the width and at least a n 1 2 dependence on the dimension.Thus, the combination of the term stemming from the width and the dimension already gives a higher complexity than our solver.One reason why we bypass these restrictions is that we do not use the primal-dual approach to solve the SDP from the aforementioned references.
However, these are worst-case bounds for MAX QP, and do not necessarily imply that our algorithm outperforms the aforementioned ones on the random instances discussed before on average if we pick them uniformly at random.In Prop.B.1 of Appendix B we show that for the random model in (10) with λ > 1 it is indeed the case that the width scales linearly with the dimension with high probability, albeit under the assumption that the problem does not have an intregrality gap.The absence of an integrality gap is supported by the numerics of [48,31].These results show that previous quantum SDP solvers are likely not to provide a speedup on average for such instances with λ > 1.On the other hand, we also show that for λ < 1, the width does not necessarily scale with system size.These results certainly motivate further studies on the width of such randomized instances, also for the random graph models.
Another, and arguably conceptually more interesting, reason why our algorithm outperforms other solvers is how we enforce the diagonal constraint.This is what also sets us apart from other approaches based only on the primal of the SDP, like that of [63].
Enforcing that each diagonal constraint of the renormalized MAXQP SDP in Eq. (3) is satisfied up to an additive error, i.e.
| i|ρ|i − 1/n| ≤ would require an error of order n −1 to ensure a solution with a quality comparable to ours.This would translate to an additional quadratic factor in n in the runtime of other approaches, as they have a dependency on the error that is at least inverse quadratic and negate a speedup.This issue with efficiently implementing the diagonal constraints also arises for other approaches to obtain quantum speedups for SDPs beyond those based on quantum Gibbs states.Let us illustrate this by example.In [33], the authors give a quantum SDP solver whose complexity is O n 2.5 ξ 2 µκ 3 log −1 .Here κ and µ are again problem-specific parameters.On the other hand, ξ is the precision to which each diagonal constraint is satisfied.As noted before, a straightforward implementation of the MAXQP SDP requires ξ to be at most of order n −1 , which establishes a runtime of order at least n 4.5 using those methods.
Alternatively, one could also enforce the diagonal constraint globally through the variational formulation of the trace distance.That would translate to requiring that holds for all diagonal projectors P .As there are 2 n such projectors and previous primal-only quantum solvers have a √ m dependency on the number of constraints m = 2 n , this approach also fails to provide a speedup.Thus, enforcing the diagonal constraints severely limits the scope of existing quantum SDP solvers -they do not readily apply and have worse runtimes than available classical algorithms.Thus, we conclude that all current quantum SDP solvers do not offer speedups over state of the art classical algorithms, see Table 1 for more details.This discussion showcases that our technique to relax the diagonal constraints gives rise to a novel way of enforcing constraints that allows for better control of errors in quantum SDP solvers and could be used for other relevant SDPs.Moreover, the fact that the approximate solution can still be used to obtain good roundings highlights the fact that our notion of approximate feasibility does not render the problem artificially easy.
Finally, we want to point out that subtleties regarding error scaling do not arise for MaxCut.If A is the adjacency matrix of a d-regular graph on n vertices, then n A = nd = A 1 and the different errors in Table 1 all coincide.
3 Technical details and proofs

Runtime Error Speedup
This work (Classical) Õ(min{n Interior-point [44] O(n ω+1 log( Initialization with H 0 = 0 and ρ 0 = I/n is crucial, as it implies that the quantum relative entropy between ρ * and ρ 0 is bounded: We will now show that the relative entropy between successive (infeasible) iterates ρ t+1 , ρ t and the feasible state ρ * necessarily decreases by a finite amount.Let P t be the hyperplane that separates ρ t from the feasible set provided by the oracle.
The logarithmic ratio can be bounded using the Peierls-Bogoliubov inequality [ Combining Eq. ( 13) with Eq. ( 14) we arrive at Next, note that the updates are mild in the sense that ρ t+1 and ρ t are close in trace distance.[13,Lem. 16] for all iterations t = 0, . . ., T and we conclude This expression becomes negative as soon as the total number of steps T surpasses 64 log(n)/ 2 .A contradiction, because the quantum relative entropy is always non-negative.

Stability of the relaxed MaxQP SDP
Note that even if Algorithm 1 accepts a candidate point, it does not necessarily mean that this point is exactly feasible.Theorem 2.1 only asserts that this point is -close to all sets of interest with respect to a set of observables.For the MaxQP SDP (3), this means that the outputs of the algorithm will only satisfy the diagonal constraints approximately and, in principle, the value of this further relaxed problem could differ significantly from the original value.In the next Proposition we show that this is not the case: Proposition 3.1.Let α 4 = tr (Aρ) be the value attained by -up to accuracy 4 -solution ρ to the relaxed MaxQP SDP (3) with input matrix A. Then there is a quantum state ρ at trace distance O( ) of ρ such that nρ is a feasible point of MaxQP SDP (2).In particular: Moreover, it is possible to construct ρ in time O(n 2 ) given the entries of ρ.
Proof.Let ρ be a solution to the relaxed MaxQP SDP (3) with relaxation parameter 4 .We will now construct ρ such that nρ is an exactly feasible point of the MaxQP SDP (3).These modifications are mild enough to ensure that the associated SDP value will only change by O( n A ).We proceed in two steps: (i) ρ → ρ : Identify diagonal entries that substantially deviate from 1/n in the sense that | i|ρ|i − 1/n| > 2 /n.Subsequently, replace ρ ii by 1/n and set all entries in the i-th row and i-th column to zero.This ensures that ρ remains positive semidefinite.(ii) ρ → R: Replace all remaining diagonal entries by 1/n.This may thwart positive semidefiniteness, but the following convex combination restores this feature: By construction, this matrix is both psd and obeys i|ρ |i = 1/n for all i ∈ [n].In words: it is a feasible point of the renormalized MaxQP SDP (3).
We now show that these reformulations are mild.To this end, let Moreover, as shown in [34], we have Next, note that we obtain R from ρ by just replacing all diagonal entries of ρ by 1/n.As by construction all the diagonal elements of ρ are in the range 1/n ± 2 /n, we can write where D is a diagonal matrix whose entries are in the range [− 2 /n, 2 /n].Thus, D + 2 n I is psd.Normalizing the trace we see that is psd with diagonal entries 1/n and, thus, nρ # is a feasible point of MaxQP SDP (2).We also have that: by a triangle inequality.Thus, combining Eq. ( 18) and Eq. ( 17) we conclude from another triangle inequality that: The claim then follows from a (matrix) Hölder inequality: Note that the proof technique above is constructive and allows us to construct a feasible point from an approximately feasible one in O(n 2 ) time by manipulating the entries.

Approximately solving the MaxQP SDP on a classical computer
We will now show how to use Hamiltonian Updates (Algorithm 1) to solve the MaxQP SDP (3) on a classical computer.It turns out that the main classical bottleneck is the cost of computing matrix exponentials ρ = exp(−H)/ tr (exp(−H)).The following result, also observed in [43], asserts that coarse truncations of the matrix exponential already yield accurate approximations.
Lemma 3.2.Fix a Hermitian n × n matrix H, an accuracy and let l be the smallest even number that obeys (l + 1)(log(l + 1) − 1) ≥ 2 H + log(n) + log(1/ ).Then, the truncated matrix exponential Proof.First note, that truncation at an even integer l ensures that T l is positive semidefinite.This is an immediate consequence of the fact that even-degree Taylor expansions of the (scalar) exponential are non-negative polynomials.In particular, T l tr = tr (T l ).Combine this with tr (X) ≤ X tr ≤ n X for all Hermitian n × n matrices to conclude where we have also used tr (exp(−H)) ≥ exp(−H) ≥ exp(− H ). By construction, both exp(−H) and T l commute and are diagonal in the same eigenbasis.Let λ 1 , . . ., λ n be the eigenvalues of H.Then, Taylor's remainder theorem asserts The value of l is chosen such that 2n exp( 2 Although the dependency in for our algorithm is high, we expect that a more refined analysis of the error could improve this significantly.This is because the approximately feasible to feasible conversion behind Proposition 3.1 requires 4 accuracy. Proof.As each run of Algorithm 1 takes at most Õ(1) iterations, we only need to implement the oracles in time Õ(n 2 s −1 ) to establish the advertised runtime for an approximate solution.First, note that the operator norm H t only grows modestly with the number of iterations t = 0, . . ., T .This readily follows from H 0 = 0, and H t+1 − H t ≤ 16 P t ≤ 16 .What is more, the maximal number of steps is T = 64 log(n)/ 2 , implying H t ≤ 4 log(n)/ for all t.
In turn, Lemma 3.2 implies that computing the Taylor series of exp(−H t ) up to a term of order O(log(n)/ ) suffices to compute a matrix ρt that is 4 -close to the true iterate ρ t = exp(−H t )/ tr (exp(−H t )) in trace distance.Now note that the complexity of multiplying any matrix with H t is O(min{n 2 s, n ω }), as H t is a linear combination of a diagonal matrix and A. Thus, we conclude that computing ρt takes time O(n 2 s log(n)/ ).Checking the diagonal constraints then takes time O(n) and computing tr A A −1 ρt takes time O(ns).This suffices to implement both -separation oracles and highlights that the runtime is dominated by computing approximations of the matrix exponential.
Finally, we show in Proposition 3.1 that in order to ensure an additive error of order O( n A ) for the MaxQP SDP, it suffices to solve the relaxed one up to an error 4 , from which the claim follows and we can then convert the approximately feasible solution to a feasible solution in time O(n 2 ).

Approximately solving the MaxQP SDP on a quantum computer
We will now show how to implement -separation oracles on a quantum computer.As discussed before, implementing the oracle requires us to evaluate diagonals of the Gibbs states ρ = exp(−H)/ tr (exp(−H)) and the value of tr ρA A −1 .These two tasks can be performed easily on a quantum computer given the ability to prepare approximate copies of the quantum state ρ.Lemma 3.3.We can implement -separation oracles for the MaxQP SDP (3) on a quantum computer given access to 8 approximate O(n −2 ) copies in trace distance of the input state ρ and the ability to measure tr (Aρ) A −1 .Moreover, the classical postprocessing time needed to implement the oracle is O(n −2 ).
Proof.Let ρ be the approximation to ρ.We implement the oracle first measuring O(n −2 ) approximate copies ρ of the input ρ in the computational basis.This is enough to ensure that with high probability the resulting empirical distribution of the measurement outcomes, p = i p(i)|i i|, satisfies i i|ρ|i |i i| − p tr ≤ 8 .
If I/n − p tr ≤ 3 4 , then the oracle for the diagonal constraints accepts the current state.If not, we output To see that this indeed satisfies the definition of the oracle, note that the empirical distribution p is at most 4 away in total variation distance to the distribution on the diagonals of ρ.This is because we obtain a 8 contribution from the approximation ρ and 8 from statistical noise.Thus, if By the definition of P we have: and tr (P (ρ − p)) + tr ( This step requires a classical postprocessing time of order O(n −2 ).For implementing the second oracle, we simply measure A A −1 directly.A total of O( −2 ) copies of ρ suffice to determine tr A A −1 ρ up to precision 4 via phase estimation [49].
Lemma 3.3 reduces the task of implementing separation oracles to the task of preparing independent copies of a fixed Gibbs state.There are many different proposals for preparing Gibbs states on quantum computers [19,23,32,54,60,60,65,64]. Here, we will follow the algorithm proposed in [54].This approach allows us to reduce the problem of preparing ρ H = exp(−H)/ tr (exp(−H)) to the task simulating the Hamiltonian H.More precisely, [54,Appendix] highlights that Õ √ n −3 invocations of a controlled U , where U satisfies suffice to produce a state that is 8 close in trace distance to ρ H .The probability of failure is constant.We expect that a more refined analysis can lead to a better dependence on the error .The methods presented in [64] seem like a good starting point for such future improvements.Here, however, we prioritize the scaling in the problem dimension n only.
By construction, the Hamiltonians we wish to simulate are all of the form H = aA A −1 + bD, where a, b = O(log(n) −1 ) and D is a diagonal matrix with bounded operator norm D ≤ 1.It follows from [18, Theorem 1] that Õ t(a + b) exp(1.6 log(log(n)t −3 )) separate simulations of aA and bD suffice to simulate H for time t up to an error 3 .Thus, we further reduce the problem of simulating H to simulating A and D separately.
At this point, it is important to specify input models for the matrix A, the problem description of the MaxQP SDP.We will work in the sparse oracle input model.That is, we assume to have access to an oracle O sparse that gives us the position of the nonzero entries.Given indices i for a column of A and a number 1 ≤ j ≤ s, where A is s-sparse, the oracle acts as: Here f (i, j) is the index of the j−th nonzero element of the i−th column of A. Moreover, we assume that the magnitude of individual entries are accessible by means of another oracle: Here, the entry A A −1 ij is represented by a bit string long enough to ensure the desired precision.The results of [45] then highlight that it is possible to simulate exp(itA 1) o (1) .
Let us now turn to the task of simulating diagonal Hamiltonians D. Let O D be the matrix entry oracle for D. We suppose that it acts on C n ⊗ C 2 ⊗m , where m is large enough to represent the diagonal entries to desired precision in binary, as It is then possible to simulate H = D for times t = Õ( −1 ) with Õ(1) queries to the oracle O D and elementary operations [9].Thus, efficient simulation of e −iDt follows from an efficient implementation of the oracle O D .The latter can be achieved with a quantum RAM [25].
We consider the quantum RAM model from [55].There, it is possible to make insertions in time Õ (1).Thus, given a classical description of a diagonal matrix D, we may update the quantum RAM in time Õ (n).After we have updated the quantum RAM, we may implement the oracle O D in time Õ (1).Combining all these subroutines establishes the second main result of this work.Proof.As we saw before, the ability to solve the relaxed MaxQP SDP (3) up to precision ˜ = 4 is sufficient to ensure an output with the properties above.
It follows from Theorem 3.3 that producing Õ(n˜ −2 ) copies of Gibbs states suffices to implement the oracle.The results of [54] then imply that each copy can be obtained with Õ( √ n˜ −3 ) Hamiltonian simulation steps, which, as discussed above, can each be done in time Thus, the cost per iteration of the algorithm is As the algorithm requires Õ(˜ −2 ) iterations and replacing ˜ = O( −2 ) we obtain the claim.

Randomized rounding
As pioneered by the seminal work of Goemans and Williamson [27], it is possible to use randomized rounding techniques to obtain an approximate solution to the original quadratic optimization problem for certain instances (1).These solutions are in expectation within a multiplicative factor of the value of the SDP relaxation (3) and the exact constant depends on the structure of the matrix A .We will explore Rietz's method, as in [2], to show that it is possible to perform the rounding on a classical computer to approximate A ∞→1 with our approximately feasible solutions to MaxQP SDP and still obtain good approximations.
First, recall that the rounding algorithms usually work by first multiplying a random Gaussian vector by the square root of the solution.The approximate solution is then given by the signs of this random vector.Note that both classical and quantum algorithms output a classical description of the Hamiltonian H associated with an approximately optimal, approximately feasible Gibbs state ρ to (3).Pseudocode for the rounding algorithm is provided in Algorithm 2. The first important proof ingredient is an adaptation of [2, Eq. (4.1)].Lemma 3.4.Fix v, w ∈ R n (non-zero) and let g ∈ R n be a random vector with standard normal entries.Then, Proof.In [2, Eq. (4.1)] the authors use invariance to establish this identity for two unit vectors.The claim then follows from observing that the distribution of sign( v|g ) sign( w|g ) is invariant under scaling both v and w by non-negative numbers.In particular, v → v/ v and w → w/ w does not affect the distribution.
The next step involves a technical continuity argument.
Lemma 3.5.Fix > 0 and let ρ be a quantum state s.t.: } and let ρ B be the submatrix with indices in the complement B of B. Then, the matrix σ with entries

Submultiplicativity of the operator norm then implies
and, in turn, id − D ∞→∞ ≤ 3 .
We are now ready to prove the main stability result required for randomized rounding.
Theorem 3.1.Let ρ be an approximately feasible, optimal point of (3) with accuracy 4 > 0 and input matrix A with where A is a real n × n matrix.Let v 1 , . . ., v 2n be the columns of ρ , sample g ∈ R 2n with i.i.d.Gaussian entries and set x i = sign( v i |g ) and y = (x 1 , . . ., x n ), z = (x n+1 , . . ., x 2n ).Then, Proof.The upper bound follows immediately from the fact MaxQP SDP (2) relaxations (renormalized or not) provide upper bounds to the original (1).The factors n A is an artifact of the renormalization (3).
For the lower bound, we once more define . Plugging in v i and v j in (23), multiplying both sides by A ij and summing over i, j implies Following the same proof strategy as in [2, Sec.4.1], we note that the matrix T defined by [T ] ij = τ ij is a Gram matrix and, thus, psd.Moreover, in [2, Sec.4.1] the author shows that Moreover, because of the structure of the matrix A , we have that To see this, consider the block unitary Then for any psd matrix X we have that tr A U XU † = − tr (A X) and so tr A U ρ U † provides a lower bound to the value over the approximately feasible set .Thus, We now have to relate tr ρ A to tr (σA ).To do so, we can argue like in Proposition 3.1 and see that tr Proposition 3.1 highlights that performing the rounding with approximate solutions to the MaxQP SDP (3) still ensures a good approximate solution in expectation for the A ∞→1 norm.In the case of matrices A that are psd it is possible to improve the constant in the rounding and we do not to resort to lifting the problem to a matrix with double the dimension: Corollary 3.3.Let ρ be an approximately feasible, optimal point of (3) with accuracy 4 > 0 and psd input matrix A. Let v 1 , . . ., v n be the columns of ρ , sample ∈ R n with i.i.d.Gaussian entries and set x i = sign( v i |g ).Then, Proof.The proof follows by following the same proof as above but noting that we may use the estimate tr (T A) ≥ 0 instead of (24), as both A and T are psd.Optimality of the constant was shown in [2].
As Alon and Naor [2] also show that for psd matrices A we have i.e. we may restrict to the same vector on the left and right, it follows that Corollary 3.3 gives almost optimal rounding guarantees.These two statements certify that, as longs as A ∞→1 = Θ(n A ), performing the rounding with our approximately feasible solutions gives rise to approximations of the ∞ → 1 norm that are almost as good the strictly feasible solutions.
But computing ρ g = exp(−H/2)g/ tr (exp(−H)) directly still remains expensive because of matrix exponentiation.We will surpass this bottleneck by truncating the Taylor series of the matrix exponential in a fashion similar to Lemma 3.2.The following standard anti-concentration result for Gaussian random variables will be essential for this argument.Note that the design of Algorithm 1 ensures that optimal Hamiltonians always obey Proof.Define h = exp(−H /2)g and note that this is a Gaussian random vector with covariance matrix exp(−H ).Let B = i : denote the set of indices for which ρ ii deviates substantially from 1/n.Then, every entry of h that is not contained in this index set obeys The assumption H = O(log(n)/ ) ensures tr(exp(−H ))/n ≥ n −c / −1 for some constant c .We can combine this with Fact 3.1 (Gaussian anti-concentration) to conclude A union bound then asserts Moreover, it follows from standard concentration arguments that Thus, with probability at least 1 − O(n −1 ), we have that / for every entry i ∈ B. Following the same proof strategy as in Lemma 3.2, it is easy to see that picking l = O( −1 log(n)) suffices to ensure that Conditioning on the events emphasized above, implies

This in turn ensures max
Combining the statements we just proved we conclude that: Proposition 3.2 (Restatement of Proposition 2.1).Let > 0 and A a real, psd matrix be given.Moreover, let H be the solution Hamiltonian to the relaxed MaxQP SDP (3) with error parameter 4 and α * its value.Then, with probability at least 1 − n −1 , the output x of Algorithm 2 satisfies: Proof.It follows from Lemma 3.6 that the output of Algorithm 2 will only differ from the vector obtained by performing the rounding with the approximate solution on a set of size O(n 2 ) with probability at least 1 − n −1 .This is because, as argued before, by picking 4 we have at most O( 2 n) diagonal entries that do not satisfy |ρ ii − 1/n| ≤ /n.We will now argue that sign vectors that differ at O(n 2 ) position can differ in value by at most O( n A ). Let x be the vector obtained by the ideal rounding and x the one with the truncated Taylor series.Then there exists a vector e with at most O(n 2 ) nonzero entries bounded by 2 such that x = x + e by our assumption.By Cauchy-Schwarz: Now, as x is a binary vector, x = √ n and, as e has at most O( 2 n) nonzero entries, it follows that e = O( √ n) and we conclude As Theorem 3.1 asserts that performing the rounding with the approximate solution is enough to produce a sign vector that satisfies (25) in expectation, this yields the claim.
The analogous claim, i.e. that truncating still gives rise to good solutions, also holds in the setting of Proposition 3.1.
Thus, we conclude that the rounding can be performed in time Õ(ns) on a classical computer, as multiplying a vector with H time Õ(ns) and we only need to perform these operations for a total number of steps that is logarithmic in the problem dimension n (but polynomial in inverse accuracy 1/ ).As ns ≤ n 1.5 √ s for s ≤ n, we conclude that the cost of solving the relaxed MaxQP SDP (3) dominates the cost of rounding.

Conclusion and Outlook
By adapting ideas from [62,30,43,14], we have provided a general meta-algorithm for approximately solving convex feasibility problems with psd constraints.Hamiltonian Updates is an iterative procedure based on a simple change of variables: represent a trace-normalized, positive semidefinite matrix as X = exp(−H)/ tr (exp(−H)).At each step, infeasible directions are penalized in the matrix exponent until an approximately feasible point is reached.This procedure can be equipped with rigorous convergence guarantees and lends itself to quantum improvements: X = exp(−H) tr (exp(−H)) is a Gibbs state and H is the associated Hamiltonian.Quantum architectures can produce certain Gibbs states very efficiently.
We have demonstrated the viability of this approach by considering semidefinite programming relaxations of quadratic problems with binary constraints (MaxQP SDP) (2).The motivation for considering this practically important problem class was two-fold: (i) MaxQP SDPs have received considerable attention in the (classical) computer science community.Powerful meta-algorithms, like matrix multiplicative weights [6], have been designed to solve these SDPs very quickly.(ii) So far, quantum SDP solvers [13,64,63,14,33] have failed to provide speedups for MaxQP SDPs.The quantum runtime associated with these solvers depends on problem-specific parameters that scale particularly poorly for MaxQP SDPs.Moreover, the notions of approximate feasibility championed in these other works are too loose for this class of problem.
The framework developed in this paper has allowed us to address these points.Firstly, we shown that a classical implementation of Hamiltonian Updates already improves upon the best existing results.A runtime of Õ(n 2 s) suffices to find an approximately optimal solution.Secondly, we have showed that quantum computers do offer additional speedups.A quantum runtime of Õ(n 1.5 s 0.5+o (1) ) is sufficient.We emphasize that this is the first quantum speedup for MaxQP SDP relaxations.Subsequently, we have devised a classical randomized rounding procedure that converts both quantum and classical solutions into close to optimal solutions of the original quadratic problem.
We note in passing that our algorithm is very robust, in the sense that it only requires the preparation of Gibbs states up to a precision that can be taken to be constant in the number of qubits.This requirement is combined with other simple tasks like computational basis measurements and the ability to estimate the expectation value of the target matrix on states.Although the subroutines used in this work to perform these tasks certainly require nontrivial quantum circuits, it would be interesting to identify classes of target matrices A for which preparing the corresponding Gibbs state and estimating the expectation values is feasible on near-term devices.
We believe that the framework presented here lends itself to applications.
One concrete application of Hamiltonian Updates, in particular the idea to treat constraints as the statistics of measurements, is quantum state tomography, see e.g.[8] and references therein.Sample-optimal tomography have revealed that classical postprocessing is the main bottleneck for reconstructing density matrices [21,51,39,29,28].A classical implementation of Hamiltonian Updates allows for optimizing postprocessing costs at the expense of a worse dependence on accuracy [22].Further improvements are possible by executing the algorithm on a quantum computer, giving a quantum speedup for quantum state tomography.
Another promising and practically relevant application is binary matrix factorization.A recent line of works [38,37] reduces this problem to a sequence of SDPs.Importantly, each SDP corresponds to a MAXQP SDP (2) with a random rank-one objective A = |a a| and an additional affine constraint tr (P X) = n.Here, P is a fixed low-rank orthoprojector.This application, however, is likely going to be more demanding in terms of approximation accuracy.Hence, improving the runtime scaling in inverse accuracy will constitute an important first step that is of independent interest.
All inequalities are tight up to constants.The above inequalities highlight that it is a priori not clear what the correct scaling for errors approximating the cut norm should be.The goal of this section will be to show that for random matrices A with independent, standardized entries that have bounded fourth moment n A reproduces the correct error behavior, while A 1 does not.
Proposition A.1 (Cut norm of random matrices).Let A be a n × n random matrix whose entries are sampled independently from a real-valued distribution α that obeys E [α] = 0, E α 2 = 1 and E α 4 = O(1).Then, Proof.We refer to Latala's work for the third claim [42].A key ingredient for establishing the second claim is [26, Corollary 3.10]: where is the sum of the Euclidean norms of the columns of A. Now, note that the entries of A are i.i.d.copies of the random variable α.In turn, the expected column norm of A is just n times the expected Euclidean norm of the random vector a = (a 1 , . . ., a n ) T , where each a i is an independent copy of α.Jensen's inequality then asserts while a matching lower bound follows from i and note that this new random variable obeys E[y] = 1 and E[(y − 1) 2 ] = O(1/n) by assumption.This ensures a matching lower bound: ) and establishes the second claim.
The first claim follows from the fact that the fourth-moment bound E α 4 = O(1) demands E [|α|] = Θ(1).Combine this with i.i.d.entries of the random matrix A to conclude In the case of random matrices with Gaussian entries, such as in the case of the SK-model, we have also have exponential concentration around these expectation values, as shown in [53,Theorem 1.2].
Another family of random matrices for which we expect that n • provides the correct error scaling for cut norms are matrices of the form B = A * A, where A again has i.i.d.entries of mean 0 and unit variance.Indeed, in [56] the authors show that with high probability.One can combine these recent results with more standard relations, like

B Random instances of the MaxQP SDP
The following random instances of the MaxQP SDP have received significant attention in recent literature [48,40,31]: we define the Gaussian orthogonal ensemble (GOE) to be the random matrix distribution over symmetric n × n matrices A with i.i.d.normal entries A ii ∼ N (0, 2/n) on the diagonal and A ij ∼ N (0, 1/n) for i < j.For a parameter λ ≥ 0, we also define the deformed GOE as B(λ) = (λ/n)11 T + A, where 1 = (1, . . ., 1) T ∈ R n is the vector of ones and A is sampled from the GOE.This random matrix model is intimately connected to the Sherrington-Kirkpatric model and determining the MaxCut of random graphs.We will instantiate the notation from [48] and write MaxQP(B(λ)) for the optimal value of the MaxQP SDP with input B(λ).In [48], Montanari and Sen showed the following interesting result.
This seemingly abstract theorem has profound implications to our work.To appreciate them, it is worth noting that for 0 ≤ λ ≤ 1, it is known that the maximal eigenvalue of B(λ) is also contained in the interval [2 − , 2 + ] with high probability [35].Thus, the optimal value of MaxQP is comparable in size to the re-scaled largest eigenvalue n B(λ) .Moreover, it also follows that for these instances B(λ) 1 = Ω(MaxQP(B(λ)) √ n in expectation.
On the other hand, if λ > 1, the largest eigenvalue of the matrix and B(λ) is given by λ + λ −1 [35].We see that both the largest eigenvalue and the value of MaxQP(B(λ)) go through a phase transition at λ = 1.
Let us now focus on the case λ < 1.Note that the dual of the MaxQP SDP with target matrix A is given by optimizing over y ∈ R n+1 as follows: subject to y 0 I + diag(y) ≥ A, y i ≥ 0, where diag(y) = diag(y 1 , . . ., y n ) denotes the diagonal matrix with entries y i for 1 ≤ i ≤ n.The additional dual variable y 0 arises from also incorporating the redundant constraint tr (X) ≤ n in the associated primal SDP (3).This choice is motivated by the observation that previous quantum SDP solvers [14,64,63] actually output approximately optimal solutions for the dual SDP with this redundant constraint.Note once again that [63] also proposes a primal-only algorithm that bears strong similarities to ours.However, their version of the algorithm does not readily admit a quantum speedup because of the way they implement the diagonal constraints.We refer to Sec. 2.5 for more details.
Theorem B.1 implies that we can always find a trivial feasible point that is approximately optimal and sparse for Eq.(26).Indeed, for any > 0 fixed, γ = (2+ )e 0 will be feasible with probability 1 in the limit n → ∞.We conclude that for > 0 fixed and in the regime λ < 1, solving the dual problem is trivial.So, existing solvers that output a feasible, approximately optimal dual solution [14,64,63] for of constant order are of little practical interest for the problem at hand.In stark contrast, feasible and approximately optimal solutions of the primal problem are still relevant, because they can be used to perform the rounding.
It is also important to note that the proof of [48, Theorem 5] is constructive.Indeed, let P δ denote the the projector onto the range of the best rank-δn approximation of B(λ), that is, the subspace spanned by the the eigenvectors corresponding to the largest δn eigenvalues of B(λ).Moreover, let D with (D) ii = (P δ ) ii be the restriction of this projector to the main diagonal.By construction, (D) ii = (P δ ) ii > 0 almost surely for all 1 ≤ i ≤ n.And, in turn, X = D − 1 2 P δ D − 1 2 must be a feasible point of the primal MaxQP SDP (2).Montanari and Sen then show that tr (B(λ)X) ≥ 2 − for some = Ω(δ).This establishes the first part of Thm.B.1.In summary, diagonalizing B(λ) and computing X is sufficient to obtain an approximately optimal primal solution.Suppressing the error dependence on , diagonalizing B(λ) takes O(n ω ) time, while our classical algorithm to solve MaxQP SDP takes the same up to polylogarithmic factors.The quantum runtime, however, is of order Õ(n 2+o (1) ) only.Thus, we see that for = Θ(1) we obtain a quantum speedup as soon as the exponent of matrix multiplication obeys ω > 2 (which is widely believed).
Let us now discuss the regime where λ > 1.To the best of our knowledge, the limit value of MaxQP(B(λ))/n has not been identified yet.Ref. [48], however, shows that it must be strictly larger than 2 by constructing a sequence of feasible points that continues to saturate such a lower bound.However, in contrast to before (λ < 1), it is not clear that this sequence of feasible points is approximately optimal.In fact, it is not even known if MaxQP(B(λ))/n < λ + λ −1 .'But numerical evidence in favor of this behavior is provided in [31].That is, the optimal value of the SDP is strictly smaller than the trivial eigenvalue upper bound.Thus, if it is indeed the case that MaxQP(B(λ))/n + µ < λ + λ −1 for some µ > 0 as n → ∞, then the dual SDP must be nontrivial to solve.Still, a direct solution of the primal problem is arguably more relevant, because it can be used to perform randomized rounding.Nevertheless, we will now argue that previous quantum solvers [14,64,63] do not give rise to a speedup for the dual problem assuming that MaxQP(B(λ))/n + µ < λ + λ −1 .
Before we move on, we once more emphasize that our algorithm considers the primal problem only.This is in stark contrast to existing quantum SDP solvers that address both primal and dual problems.This fully primal approach has the advantage that the runtime of the algorithm does not depend on problem-specific parameters like the • 1 norm of approximately optimal dual solutions, as mentioned in Sec.2.5.We will now show that this becomes a real advantage for solving the MaxQP SDP for B(λ) in the regime λ > 1.
To see this, note that the value of the dual SDP is clearly monotonically increasing on the other entries, which are all positive.We will now show that in order for the matrix inequality y 0 I + diag(y ) ≥ B(λ) (28) to hold, then the vector y ∈ R n must have large 1 norm.In order to do this, we will resort to an approximate leading eigenvector construction by [48].This construction will have the desirable property that it is not too "spiky".In turn, this approximate leading eigenvector will have a small overlap with each entry of the diagonal matrix diag(y).
We will make extensive use of the results of [48], so we will also follow their notation and normalizations for this proof.Define u 1 to be the eigenvector corresponding to the largest eigenvalue of B(λ)/n.Moreover, define the "capping" function R(x) as For some > 0, Montanari and Sen then define the vector ϕ componentwise as ϕ i = R( √ nu 1,i ).In [48,Lemma G.2], they then establish On top of that, in Eq. ( 163) they show that: We can now use the vector ϕ to probe positive semidefiniteness in Eq. ( 28 Rearranging the terms produces Thus, we can pick small and n large enough to require that the right-hand side of the inequality above is of constant order (recall that B(λ) does not depend on n).In contrast, the average n −1 n i=1 y i is of constant order if and only if y 1 = Ω(n).
As the primal-dual methods of [14,64,63] have a superquadratic dependency on y 1 for approximately optimal solutions, we conclude that their performance is worse than our algorithm for instances of MaxQP SDP with B(λ) for λ > 1 with high probability.Of course, this only holds provided that the typical value of the SDP is a constant fraction away from the spectrum of B(λ), as indicated by numerical evidence.

C Comparison to previous work and techniques for further improvement
This section is devoted to giving a brief overview over some promising proposals for speeding up SDP solvers for problems with a similar structure.The main message is that these unfortunately do not immediately apply to the general MaxQP SDP setting, especially for random signed matrices.
The main classical bottleneck behind Algorithm 1 is computing matrix exponentials.Dimension reduction techniques, like Johnson-Lindenstrauss, can sometimes considerably speed up this process, see e.g.[6].There, Arora and Kale apply this idea to solve the MaxCut SDP up to a multiplicative error of O( nd) in time Õ(nd) for a d regular graph on n vertices.Moreover, sparsification techniques [4] can be used to bring this complexity down to Õ(n) in the adjacency list model and Õ(min(nd, n 1.5 d −1 )) in the adjacency matrix input model.Note that the MaxCut SDP is just an instance of the MaxQP SDP, as both have the same constraints.The only difference is that the MaxCut SDP has the additional structure that the target matrix is the weighted adjacency matrix of a graph and, thus, has positive entries.The extra assumption of non-negative entries is a key ingredient behind the fastest approximate MaxCUT SDP solvers which would outperform the main results of this work.It is therefore worthwhile to discuss why these ideas do not readily extend to more general problem instances.
First, note that the fact that the entries of the target matrix has positive entries is crucial for the soundness of the oracle presented in [6,Theorem 5.2].This already rules out the possibility of directly applying their methods to MAXQP if the matrix A has negative entries.The second crucial observation of [6] is that it is possible to rewrite the MaxCut SDP as: In this reformulation, the vectors v i correspond to columns of a Cholesky-decomposition associated with feasible points: [X] ij = v i |v j .Next, recall the following variation of the polarization identity: This allows us to rewrite the original objective function as Feasibility of X then demands 1 = i|X|i = v i |v i = v i 2 and we, thus, only need to optimize over v i − v j 2 .Subsequently, Arora and Kale apply dimensionality reduction techniques to compute approximate vectors v i , v j that satisfy: in time O(ns).A priori, similar techniques can be applied to the more general MaxQP SDP (3).However, sign problems can substantially affect the approximation error.Pointwise estimates like the one in (33) only suffice to estimate tr (XA) up to an error of order O( A 1 ).This is fine for matrices with non-negative entries, where this error scaling is comparable to the size of the optimal SDP solution.Matrix entries with different signs, however, may lead to cancellations that result in a much smaller size of the optimal SDP solution.In summary: adapting the ideas of Arora and Kale [6] is advisable in situations where the problem matrix obeys A 1 = Θ(n A ).This ensures a correct error behavior and dimension reduction allows for reducing the classical runtime to Õ(ns).
Another important technique for complexity reduction in SDPs is sparsification.Once again, one seminal example is MaxCut, where spectral sparsification methods can be used to reduce the complexity [58,41].Here, the idea is to find a (usually random) sparser matrix B that has approximately the same cut value as A and then run the algorithm on B instead.Unfortunately, once again signed matrix entries render this approach problematic.Up to our knowledge, the best current sparsification results available for the ∞ → 1 norm are those of [26,Chapter 3].There, the author shows in Corollary 3.9 that if we let B be a random matrix with independent random entries s.t.E(B ij ) = A ij , then

3. 1 Figure 1 :
Figure1: Caricature of Hamiltonian Update iterations in Algorithm 1: Schematic illustration of the intersection of three convex sets (i) a halfspace (blue), (ii) a diamond-shaped convex set (red) and (iii) the set of all quantum states (clipped circle).Algorithm 1 (Hamiltonian Updates) approaches a point in the convex intersection (magenta) of all three sets by iteratively checking feasibility (left column), identifying a separating hyperplane (central column) and updating the matrix exponent to penalize infeasible directions (right column).

≥ 2 whenever
by a triangle inequality, as desired.A similar argument shows that we also have tr P ρ − I n I/n − p tr ≥ 3 4 .Indeed, we have: tr P ρ − I n = tr P p − I n + tr (P (ρ − p)) + tr (P (ρ − ρ)) .( where D is the linear map given by D(X) = D B XD B and D B is a | B| × | B| diagonal matrix with entries √ nρ ii −1 for i ∈ B. This implies ρ B − σ B tr = (id − D) (ρ B ) tr ≤ id − D tr→tr ρ B tr ≤ id − D ∞→∞ , because ρ B tr ≤ ρ tr = tr(ρ) = 1.Duality of norms and the fact that both id and D are self-adjoint with respect of the Frobenius inner product tr X T Y implies id − D ∞→∞ = id − D tr→tr .This allows us to bound id − D ∞→∞ instead.By construction, we have that all the entries of D B are in 1 ± .Write D B = I + D , where D is a diagonal matrix with entries that are bounded by in absolute value.Then, id − D(X) = D X + XD + D XD for any matrix X.
Lemma 3.1.SupposeAlgorithm 1 does not terminate after T = 64 log(n)/ 2 + 1 steps.Then, the feasibility problem (5) is infeasible.Proof.By contradiction.Suppose there exists a feasible point ρ * in the intersection of all m+1 sets and we ran the algorithm for T steps.Instantiate the short-hand notation ρ t = ρ Ht = exp(−H t )/tr(exp(−H t )) for the t-th state and Hamiltonian in Algorithm 1.