Warm-starting quantum optimization

There is an increasing interest in quantum algorithms for problems of integer programming and combinatorial optimization. Classical solvers for such problems employ relaxations, which replace binary variables with continuous ones, for instance in the form of higher-dimensional matrix-valued problems (semidefinite programming). Under the Unique Games Conjecture, these relaxations often provide the best performance ratios available classically in polynomial time. Here, we discuss how to warm-start quantum optimization with an initial state corresponding to the solution of a relaxation of a combinatorial optimization problem and how to analyze properties of the associated quantum algorithms. In particular, this allows the quantum algorithm to inherit the performance guarantees of the classical algorithm. We illustrate this in the context of portfolio optimization, where our results indicate that warm-starting the Quantum Approximate Optimization Algorithm (QAOA) is particularly beneficial at low depth. Likewise, Recursive QAOA for MAXCUT problems shows a systematic increase in the size of the obtained cut for fully connected graphs with random weights, when Goemans-Williamson randomized rounding is utilized in a warm start. It is straightforward to apply the same ideas to other randomized-rounding schemes and optimization problems.

Recent work has improved the original QAOA, for instance, by aggregating only the best sampled candidate solutions [15] and carefully choosing the mixer operator to improve convergence [26][27][28][29], empirically. Reinforcement learning [30,31], multi-start methods [32], and local optimization [33] help navigate the QAOA optimization landscape. Algorithms such as the Hamiltonian Variational Ansatz produce optimization landscapes that are easier to navigate [34]. Furthermore, optimal β and γ values concentrate on all typical instances generated by some reasonable distributions which may allow optimization strategies with fewer calls to the quantum computer [35]. Certain local classical algorithms match the performance of QAOA for Ising-like cost functions with multi-spin interactions [36] which has motivated the development of Recursive-QAOA (RQAOA) [24,37]. RQAOA iteratively reduces the problem size and outperforms QAOA on certain forms of Ising Hamiltonians [24]. Implementing QAOA on noisy quantum hardware is challenging as the number of gates can be high for current gate fidelities [38,39]. The circuits become especially deep when large p is required or when the native hardware connectivity does not match the problem structure, thence requiring SWAP gates [40]. Therefore, in the near term, quantum computers will most likely only run low-depth QAOA. Low-depth QAOA results are improved by robust control [41] and by mapping β and γ to parameters of the control pulses [42,43], a method available to cloud-based quantum computers [44] with pulse-level control [45,46]. binatorial optimization problems. The best-known continuous relaxations of MAXCUT and many other problems take the form of semidefinite programs [48]. These can be solved efficiently both in theoretical models of computation [49], where a real-number arithmetic operation can be performed in unit time, and in practice 1 [51]. Subsequently, the solution of a continuous relaxation of a combinatorial optimization problem is transformed into a good solution of the discrete-valued problem by randomized rounding [52]. For instance, the celebrated Goemans-Williamson (GW) random hyperplane rounding [53,54] for MAX-CUT finds cuts whose expected value is an α fraction of the global optimum, for 0.87856 < α < 0.87857, with the expectation over the randomization in the rounding procedure. The Unique Games Conjecture [55][56][57], introduced in Appendix A, suggests that GW randomized rounding has the best possible polynomial-time performance on MAXCUT.
Our work is motivated by the desire to provide at least as good guarantees for QAOA as there are for classical approximations. For example, we show that for MAXCUT a warm-start can preserve the GW approximation ratio at any depth p [58]. Further, if the Unique Games Conjecture were to be false even stronger guarantees may be available, improving upon those for randomized rounding. In simulations, our variant of QAOA consistently performs as well as the GW algorithm or better.
We discuss how to warm-start quantum optimization in Sec. 2.2. We explore warm-starting QAOA (WS-QAOA) numerically in Sec. 3 by relaxing Quadratic Unconstrained Binary Optimization problems to continuous ones which provide QAOA with a good initial solution. In Sec. 4 we use the GW algorithm [53] to warm-start RQAOA. We discuss our results and conclude in Sec. 5.
2 Warm-start Quantum Optimization

Preliminaries
Quadratic Unconstrained Binary Optimization (QUBO) has been studied in Combinatorial Optimization since the 1960s [59]. A common formulation is where x is a vector of n binary decision variables, Σ ∈ R n×n a symmetric matrix, and µ ∈ R n a vector. Since for binary variables x 2 i = x i , µ can be added to the diagonal of Σ, and in the following, we only add µ when it simplifies the notation in the given context. Considering that any mixed-integer linear program can be encoded in a QUBO [60], QUBO is NP-Hard. Indeed, even checking local optimality is NP-Hard [61], and hence only very special cases [62,63] can be solved in polynomial time.
If Σ is positive semidefinite, the trivial continuous relaxation of QUBO is a convex quadratic program and the optimal solution c * of the continuous relaxation is easily obtainable with classical optimizers [64].
If Σ is not positive semidefinite, one can apply the well-known recipe [65] to obtain another continuousvalued relaxation, known as semidefinite programming (SDP): where S n×n denotes the set of n × n symmetric matrices, e is an n-vector of ones, and Y 0 denotes that Y must be positive semidefinite. Given the optimal solution Y * to (SDP), there exist several approaches to generating solutions of the corresponding (QUBO), often with approximation guarantees, as discussed later in this section and Appendix B. A classical laptop can solve instances of (SDP) relaxations of QUBO, where Σ has 10 13 entries [51]. Furthermore, quantum computers offer the prospect of some speed-ups in solving SDPs [66,67], although recent quantum-inspired algorithms for SDPs may reduce the potential speedup [68].

Continuous warm-start QAOA
The solutions of either continuous-valued relaxation (QP or SDP) can be used to initialize quantumclassical hybrid algorithms, which is known as warmstarting them [69]. In particular, we focus on warmstarting QAOA.
In QAOA, each decision variable x i of the discrete optimization problem corresponds to a qubit by the relation x i = (1 − z i )/2. Each z i is replaced by a spin operatorẐ i to transform the cost function to a cost HamiltonianĤ C [70,71]. Note that the final measurement in QAOA can be considered as a randomized rounding. In the simplest variant of WS-QAOA, we replace the initial equal superposition state |+ ⊗n with a state which corresponds to the solution c * of the relaxed Problem (QP). Here,R Y (θ i ) is a rotation around the Y-axis of qubit i with angle θ i = 2 arcsin c * i and c * i ∈ [0, 1] is the i-th coordinate of the optimum of the continuous-valued relaxation (QP). The probability to measure |1 in qubit i is thus c * i , see the geometric representation in Fig. 1.
We also replace the mixer and hasR Y (θ i ) |0 as ground state with eigenvalue of −1. The ground state ofĤ (ws) M is thus |φ * with eigenvalue −n [16]. Therefore, WS-QAOA applies at layer k a mixing gate which is given by the time-evolved mixing Hamiltonian exp(−iβ kĤ If a coordinate in the optimal solution of a continuous relaxation is c * i = 0 or c * i = 1, qubit i would be initialized in state |0 or |1 , respectively. In such cases, the qubit will remain in its initial state throughout the QAOA optimization whenĤ C contains onlyẐ iẐj and identity spin-operators. This creates a reachability issue when the optimal continuous and discrete solutions do not overlap, i.e., d * i = 1 and c * i = 0 or To mitigate this effect, we introduce a variant of WS-QAOA that utilizes a regularization parameter ε ∈ [0, 0.5] and changes the rotation angle creating the initial state according to The mixer Hamiltonian is adjusted accordingly. The parameter ε provides a continuous mapping between WS-QAOA and standard QAOA since at ε = 0.5 the initial state is the equal superposition state and the mixer Hamiltonian is the X operator. If all c * i ∈ (0, 1) or ε > 0, WS-QAOA converges to the optimal solution of (QUBO) as the depth p approaches infinity as does standard QAOA [16]. This directly follows from the adiabatic theorem and the fact that we start in the ground state of the mixer which overlaps with all computational basis states including the optimal solution. For large enough p, (WS-)QAOA therefore reproduces the adiabatic evolution transforming the ground state of the mixer into the ground state of H C . The speed of the adiabatic evolution is limited by the spectral gap of the intermediate Hamiltonians. If the evolution is too fast transitions to excited states occur which may not result in the optimal solution. The speed of the evolution can be related to p, where a slow evolution, i.e., longer total evolution time, implies a larger p. The idea of (WS-)QAOA is to speed-up this evolution by optimizing the parameters instead of following a fixed annealing schedule.

Rounded warm-start QAOA
Further variants of WS-QAOA randomly round the optimum of the continuous-valued relaxation before using it as the initial state. This is appealing to quantum hardware with limited qubit numbers as even for convex relaxations in dimensions that scale super-linearly with the number n of binary variables in (QUBO), such as (SDP) with dimension n(n+1)/2, the representation of the rounded solution to (QUBO) Table 1: An overview of design choices in warm-starting a quantum optimization algorithm for (QUBO). Under "What to round?", columns are ordered left to right to suggest the increasing strength of the relaxations, although this is necessarily fraught in the case of hierarchies of relaxations [72][73][74], where one column represents a potentially infinite number of relaxations. Similarly, under "How to round?", we order the options approximately by their performance.

Variant
What to round? When to round? How to round?
We now elaborate on the example of Goemans-Williamson random-hyperplane rounding of (SDP). For a given GW cut we generate an initial state using Y -rotations with a ε ∈ (0, 0.5), as discussed in Sec. 2.2. To warm-start QAOA such that we can retain the GW bound on MAXCUT, we wish create a quantum circuit that can both represent solutions of the random-hyperplane rounding as well as deviate from them. We therefore modify the mixer such that i.e., we multiply the offdiagonal elements in (2) by −1. With this modification, the value of the regularization parameter ε can be set to 0.25 to generate states that differ from the GW rounding as well as retain it by choosing β 1 = π/2 and γ 1 = 0. At these values the depth-one variational form reduces tô for each qubit, and creates the states −i |1 and −i |0 when c * i = ε and 1 − ε, respectively. Thus, the variational form can recover the solution given by the GW rounding, considering that z and 1 − z represent the same cut, see the orange path in Fig. 1 as example. Therefore, WS-QAOA is at least as good as GW rounding. This adjustment also comes with a drawback. Since the prepared initial state is no longer an eigenstate of the mixer (otherwise we would not be able to deviate from it) we cannot use the same arguments as in [16] to derive the convergence of the al-gorithm to the global optimum with increasing depth p. We will analyze this numerically in Sec. 4.
Notice that measuring an initial state provided by a randomized rounding of the semidefinite programming relaxation (SDP) yields the best approximation guarantees available classically in polynomial time under the Unique Games Conjecture [55][56][57]. Therefore, any quantum circuit that preserves or improves the performance ratio would preserve or improve the overall performance guarantees.
Rounding in the classical pre-processing readily leads to the warm-started recursive QAOA (WS-RQAOA), illustrated in Fig. 3 and demonstrated in Sec. 4. For MAXCUT of a graph G n , we leverage a GW pre-solver GW (G n , N, M ) to generate N good cuts of which we retain the M < N best unique cuts. These M cuts therefore initialize M WS-QAOA optimizers with ε ∈ (0, 0.5). Each QAOA solver produces an optimized variational state |ψ * l = 2 n −1 i=0 α il |i for l = 1, ..., M . We then aggregate these M variational states by averaging the probability of sampling each bit-string |i , i.e.p i = M −1 M l=1 |α il | 2 , and use these average probabilities to create the correlation matrix M needed by RQAOA [24,37], see Appendix E and F. At each iteration, RQAOA removes one decision variable z i from the problem by replacing it with sign( This generates a new MAXCUT problem with a new graph G n−1 , see Appendix G, for which we repeat this procedure, illustrated in Fig. 3, until the reduced graph reaches a certain size n stop . The graph G nstop is solved by diagonalizing the HamiltonianĤ C or by applying classical optimizers. End RQAOA, replacements e.g. {z 1 : (z 4 , −1), z 2 : (z 3 , 1), ...}

Yes No
Solve G n−1 Figure 3: WS-RQAOA for MAXCUT. At each iteration we run several WS-QAOAs that are initialized with different solutions from the GW randomized rounding. The resulting samples are aggregated to compute the combined correlation matrix M needed by RQAOA to eliminate a decision variable.

Further variants of warm-starting quantum optimization
In Sec. 2.2 and 2.3, we gave first examples of how to warm-start QAOA using a continuous relaxation and a randomized rounding. The key algorithm-design questions in warm-starting quantum optimization are: what to round, when to round it, and how to round it. For each of these questions, there are multiple options available, as suggested in the previous discussion and summarized in Tab. 1. First, there are many options for what to round, outside of the (QP) relaxation and the (SDP) relaxation. For example, the (QP) relaxation can be seen as a second-order cone programming (SOCP) relaxation, and could be strengthened iteratively [74], until its objective-function value coincides with the objective-function value of the non-convex (QUBO), albeit at the cost of an exponential growth of the relaxation. Similarly, one could strengthen the (SDP) relaxation either by using an entropy-penalizing term [76] or by using the Moment/SOS hierarchy [72] and its sparse variant [73].
Second, there are two options for when to round: either in the classical pre-processing -within the initial state preparation which leads to the WS-RQAOA discussed in Sec. 2.3 on the example of the (SDP) relaxation -or within the quantum circuit. In its simplest form, the latter can be a quantum measurement, as discussed in Sec. 2.2 on the example of the (QP) relaxation.
Third, there are several options for the rounding procedure. Even the simplest rounding mechanisms often perform well: on MAXCUT, for example, disregarding the relaxation and coordinate-wise assigning a value uniformly at random achieves a 0.5 approximation ratio [77] and can be derandomized [78,Chapter 6]. The random-hyperplane rounding of GW [53], as explained in Appendix B, improves the performance ratio on MAXCUT to α = 2 π min 0≤θ≤π θ 1−cos θ ≈ 0.878. The same ratio can also be obtained with an iterative procedure that rounds coordinates that are close to being integral to integers [79,80] and removes them from further processing 2 , as explained in Appendix D. Plausibly, the same ratio could also be achieved with a number of novel and very different iterative procedures, such as [81].

Discussion of warm-starting quantum optimization
On noisy quantum hardware it seems appealing to use the WS-RQAOA with the strongest available relaxations [72][73][74] in the classical pre-processing. However, higher-order relaxations within these hierarchies [72][73][74] require a run-time of the classical SDP solver which is super-polynomial in the number n of integral decision variables in (QUBO) and the order in the hierarchy [72][73][74]. Therefore, we limit ourselves to the use of WS-RQAOA with the basic (SDP) relaxation, whose value can be approximated classically to any fixed precision in polynomial time.
In contrast, one could extend the use of the continuous-valued solution c * of the (QP) relaxation to either the solution Y * of the basic (SDP) relaxation, or its strengthened variants [72,73], when preparing the initial state. However, this may require more qubits than would be practical in the nearterm. For example, a naïve approach to prepare the initial state would utilize Θ(n 2 ) and Ω(n 2 ) qubits to represent the optimum Y * of the basic (SDP) relaxation and its strengthened variants, respectively 3 . At the same time, strong performance guarantees would be readily available for such variants of warm-started quantum optimization. For example, consider representing the matrix-valued solution of a (SDP) relaxation in a O(n 2 )-qubit initial state, applying a parametrized quantum circuit that allows for the identity in the unitary representation, at least for some choice of its parameters, and then, measuring the qubit register. This can be seen as a randomizedrounding algorithm, and one can hence analyze the quality of the measured output.
A recently-proposed [79] avenue for the analysis of such randomized-rounding algorithms utilizes the Sticky Brownian Motion [82], a well-known concept in Stochastic Analysis, possibly with a slowdown due to the use of a speed-function [79], as explained in Appendix D. In the case of a (SDP) warm start, one can obtain approximation guarantees for rounded solutions that match the best guarantees available classically in polynomial time, see Appendix D.
A particularly important question is whether any of these variants would strictly improve upon the guarantees of GW [53]. Under the Unique Games Conjecture [83,84], it is strictly impossible to improve upon the guarantees of GW using either quantum or classical algorithms unless a quantum computer can solve NP-Hard problems in polynomial time, which is not believed to be the case [85], or if P = NP. However, a richer picture emerges if this conjecture were to be false and it may be possible to improve approximation ratios using both classical and quantum algorithms.

Simulations with Continuous-Valued Warm-start
As a first computational illustration of WS-QAOA, we solve combinatorial-optimization problems framed as a financial-portfolio optimization with a budget constraint [15]. An optimal portfolio minimizes risk and maximizes return by exploiting imperfect correlations in a covariance matrix Σ between n assets with expected returns µ [86]. Selecting B assets out of n with equal weights thus requires solving where q controls the risk-return trade-off. We create random instances of this problem with n = 6 assets by simulating the time-series of the asset prices and computing the covariance matrix and returns, see Appendix H. We enforce a budget constraint B = 3 with a large quadratic penalty term λ(1 T x − B) 2 where we chose λ = 3 as it is much larger than Σ and µ. Each instance is mapped to an Ising HamiltonianĤ C . To measure the performance of standard and warm-start QAOA we compute the energy of the optimized trial state ψ(β * , γ * )|Ĥ C |ψ(β * , γ * ) labeled as E * cold and E * warm , respectively. We normalize E * cold and E * warm to the minimum energy E 0 found by diagonalizingĤ C . Since the state-vector simulator in Qiskit [87] evaluates the quantum circuits the only source of randomness is the initial guess for β and γ chosen uniformly from ±2π given to the COBYLA optimizer we use to find β * and γ * . The optimal solution c * of the continuous relaxation of the problem used to warmstart QAOA is found with IBM ® ILOG ® CPLEX ® 12.10.0 (CPLEX). The probability of sampling the optimal binary solution d * is more than 5 times higher Equal superposition Warm-start -normal mixer Warm-start Figure 4: (a) Probability to sample the optimal state |d * from the optimized trial state |ψ * and (b) energy of |ψ * for warm-start and standard QAOA at different depths for n = 6 assets and q = 2. The optimal discrete and continuous solutions are d * = (0, 0, 1, 1, 1, 0) and c * (0.17, 0, 0.97, 0.73, 1.0, 0.14), respectively. QAOA is run ten times with different initial random guesses for (β, γ) chosen uniformly from ±2π. The thick lines show the median of the ten runs while the shaded areas indicate the 25% and 75% quantiles. The gray dashed line shows E0.
with WS-QAOA then standard QAOA for the simulated depths 1 ≤ p ≤ 5, see Fig. 4(a). Furthermore, the quality of the solution found by WS-QAOA is better than standard QAOA since E * warm is closer to E 0 than E * cold , see Fig. 4(b). At depth p ≥ 4 standard QAOA has enough free parameters to satisfy the budget constraint, as shown by the low energy in Fig. 4(b), but still fails to produce a trial state which contains the optimal solution with high probability.
We investigate the role of the warm-start mixer op-eratorĤ  Fig. 4(b). The probability of sampling the optimal discrete solution is between warm-start and standard QAOA but depends heavily on the initial point given to COBYLA, see Fig. 4(a). These results further justify the use of the modified mixer in WS-QAOA.
To further illustrate the advantage of a warmstart at low depth we solve 250 random portfolio instances with warm-start and standard QAOA, both at depth p = 1. Here, the standard QAOA produces variational states that poorly approximate the ground state, see the histogram of E * cold in Fig. 5(a). However, WS-QAOA produces optimized variational states that are much closer to the minimum energy of each problem Hamiltonian. Furthermore, we find that WS-QAOA tends to produce better solutions when the overlap d * T c * /B between the optimal solutions (a) Histogram of the energy of the optimized trial states |ψ(β * , γ * ) with (orange) and without (blue) warm-start normalized to E0. We found |ψ(β * , γ * ) with COBYLA seeded with random initial guesses for β and γ chosen uniformly from ±2π. The minimum energy E0 is found by direct diagonalization. (b) Energy difference of WS-QAOA with the optimal solution, i.e. ∆warm = E * warm − E0, normalized to the energy difference obtained with standard QAOA, i.e. ∆ cold = E * cold − E0, as a function of the overlap between the optimal solution of the problem with binary weights and continuous weights. ∆warm/∆ cold < 1 implies that WS-QAOA improved the energy of the trial state and ∆warm/∆ cold = 0 implies that WS-QAOA found the optimal portfolio. to the discrete and relaxed problems is closer to 1, see Fig. 5(b).
We now illustrate the advantage of a warm-start in the context of quantum annealing. We simulate annealing by Trotterizing the time-evolution start-ing form the ground state of the mixer Hamiltonian. At time step k ∈ {0, ..., N } we apply the operator exp(−iβ kĤM ) exp(−iγ kĤC ) where the annealing schedules are β k = 2δt(1 − i/N ) and γ k = 2iδ t /N with δt = T /N . We compute the energy as function of T , which controls the total annealing time, and N using the equal-superposition mixer n−1 i=0X i and the warm-start mixerĤ (ws) M . Warm start requires less annealing time T and a smaller number of steps N than the equal superposition mixer to minimize the energy, as can be seen by comparing (a) and (b) in Fig. 6. This is further confirmed by computing the energy at each time step for T = 40 and N = 60. When the warm-start mixerĤ (ws) M is used, the initial and final energies are lower than when the equal superposition mixer is used. These data emphasize, from an annealing point of view, that WS-QAOA converges faster than QAOA.

Simulations with Rounded Warm-Start
Next, we discuss warm-starting QAOA for MAX-CUT. The maximum cut of an edge-weighted graph G = (V, E) with nodes V , edges E, and weights ω ij , {i, j} ∈ E is a partitioning of the set of nodes V in two such that the sum of the edge weights ω ij where i and j are in different parts is maximized. This is cast as where the binary variable z i indicates which side of the cut node i is on. In the case of positive edge weights ω ij , for any , the problem cannot be approximated within the ratio of 16/17 − classically in polynomial time [88], unless P = NP. In the case of the real-valued edge-weights ω ij , the hardness factor is 11/13 − [75]. In both cases, under the Unique Games Conjecture [55][56][57], the best guarantees obtainable classically in polynomial time are those of the random-hyperplane rounding [53,54,75], as we detail in Appendices B and C.
Since we cannot relax MAXCUT to a (QP) with a positive semi-definite matrix, without a projection onto the cone of positive-semidefinite matrices, we now warm-start QAOA with a binary solution obtained using the GW rounding. Here, the variational form can only produce states different from the initial GW cut when the regularization parameter ε > 0. We study the effect of ε by minimizing the energy of depth-one WS-QAOAs applied to ten fully connected graphs with 30 nodes and edge weights uniformly chosen from {−10, −9, ..., 0, ..., 9, 10}. For each graph we generate ten GW cuts and study the five best cuts with WS-QAOA. To find the optimal β 1 and γ 1 we seed COBYLA with an initial point obtained as follows. We perform a grid search in γ 1 with 25 equally spaced values from −2π to 2π. For each value of γ 1 we evaluate the energy for the two points β 1 ∈ {3π/4, 3π/8} and fit the result to a sin(2β 1 ) + b sin(4β 1 ) to find the β 1 that minimizes the energy without having to perform a twodimensional grid scan. The (β 1 , γ 1 ) with the lowest energy is given as initial point to COBYLA. At ε = 0 the median energy, normalized to the energy of the maximum cut, is 0.907 and corresponds to the energy of the GW cuts used to warm-start QAOA, see Fig. 7. As ε increases, the normalized energy decreases. However, around ε = 0.15 the median energy starts to increase, and for ε = 0.25 rises beyond the energy of the GW cut to 0.929, which suggests that warm-starting quantum optimization may lead to algorithms which can outperform the GW randomized rounding. This increase in ε is not observed when the mixer from Eq. (2) is used, see Appendix I.
Next, we illustrate the WS-RQAOA algorithm outlined in Sec. 2.3 at depth one by searching for the maximum cut of arbitrary graphs with n = 20, and n = 30 nodes. Two types of graphs are solved, one where each edge appears with a p E = 1/2 probability and has a 1/2 probability of having a positive or negative unit weight. The second type of graphs are fully connected with uniformly distributed edge weights sampled from {−10, −9, ..., 0, ..., 9, 10}. We expect that finding the maximum cut for the fully connected graphs will be harder than those with p E = 1/2 [89] and that the resulting QAOA circuits will be deeper as they have more edges [90]. For each graph size and type we randomly generate 100 graphs. At each iteration a GW pre-solver generates N = 10 cuts of which we select the best M = 5 unique cuts to warm-start five QAOA solvers with a depth p = 1 and ε = 0.25, chosen based on the results from Fig. 7. We chose a low N to avoid systematically giving QAOA the maximum cut, see Appendix B. This only holds for the small graphs with which we illustrate WS-RQAOA. For larger graphs we would, however, choose a much larger N as GW cuts can be efficiently generated. The standard and warm-start depth-one RQAOA algorithms are efficiently simulated by computing the correlations Ẑ iẐj at each iteration, see Appendix F. The parameters β 1 and γ 1 are optimized with COBYLA which is initialized with a good initial point obtained from a grid search, as discussed above, to avoid localminima. When a graph reaches n stop = n/2 nodes we diagonalize the corresponding Hamiltonian to find the maximum cut of this reduced problem. Together with the replacements from RQAOA we obtain an approximation of the maximum cut of the original graph with n nodes. We compare WS-RQAOA with standard RQAOA.
Our simulations indicate that WS-RQAOA outper- forms standard RQAOA, see Fig. 8, and that the number of maximum cuts found decreases with graph size. Fully connected graphs with 30 nodes are the hardest to solve among the graphs we consider. Still, for the graphs in Fig. 8, we observe that the optimal β * 1 and γ * 1 are systematically found. This indicates that when warm-start and standard RQAOA fail, it is because the depth-one variational form is not versatile enough to capture the correlations in the maximum cut. At ε = 0.25 we often observed that β * 1 = π/2 and γ * 1 = 0, see Fig. 9. We therefore benchmark WS-RQAOA against a classical recursive optimization procedure, where the average correlation matrix of the five best GW cuts is used to eliminate decision variables in each iteration, similarly to RQAOA, see black bars in Fig. 8. This classical algorithm performs better than standard RQAOA, but slightly worse than WS-RQAOA.
We now investigate WS-QAOA for p > 1 in a non-recursive setting. Since the efficient algorithm outlined in Appendix F is not valid for p > 1 we solve a small, fully connected graph with six nodes and edge weights uniformly distributed in {−10, −9, ..., 0, ..., 10}, see Fig. 9(a). The maximum cut of this graph has size 27. By comparing the energy landscape E(β 1 , γ 1 ) of a depth-one and depththree WS-QAOA initialized with the cut 001111, of size 23, we observe that the optimal trial state of deeper variational forms is no-longer the initial GW cut, see Fig. 9(b-d). We study the probability of sampling the maximum cut as a function of p by running 30 WS-QAOAs each with a random initial β and γ chosen uniformly from [0, 2π] and ±2π, respectively, for depths p = 1, ..., 6. The probability to sample the maximum cut for this graph increases with p while the energy of the optimized trial state decreases, see  Max. cut counts   E(β, γ) and correspond to the best point in Fig. 10. The inset in (d) is a zoom of (c) around the optimal point (orange dot) found by COBYLA. ory since the circuit at depth p + 1 can reproduce all states of the depth p variational form while being more flexible. Since the energy landscape is nonconvex and contains many local minima it is challenging to find globally optimal parameters starting from random guesses of β and γ [32,[91][92][93]. Even at depth-one with random initial guesses for β 1 and γ 1 COBYLA does not always find the optimal β 1 = π/2 and γ 1 = 0, see Fig. 9 and Fig. 10(b). The complexity of the energy landscape, even for this six-node graph, may therefore explain why the energy of the optimized trial state decreases slowly with p.  Figure 10: (a) Probability of sampling the maximum cut of the six-node graph shown in Fig. 9(a) from the optimized trial state and (b) its energy as a function of QAOA depth p. The shaded areas indicate the 25% and 75% quantiles of 30 runs represented as small dots. Their medians are the large dots. The green triangle is the energy of the depth-one trial state with β1 = π/2 and γ1 = 0.

Discussion and Conclusion
We hope to have contributed towards a framework for the design of quantum optimization algorithms with a warm start, and towards reasoning about their properties. Currently, these algorithms can achieve the same guarantees as the classical relaxations upon which they are based. If the Unique Games Conjecture is true, these guarantees cannot be improved upon by classical or quantum algorithms running in polynomial time, unless we can solve NP-Hard problems in polynomial time. However, if this conjecture is false then both quantum and classical algorithms may be able to improve the existing performance guarantees.
An implementation of WS-QAOA is available in Qiskit [94], the open-source software development kit for working with quantum computers. Our simulations show that warm-starting quantum heuristics provides an advantage at low depth. This is particularly important for dense optimization problems intended to be solved on noisy quantum hardware that struggles to implement deep quantum circuits. The portfolio optimization simulations indicate that WS-QAOA finds better solutions than standard QAOA. Here, future work could investigate tying budget constraints into the quantum circuit of WS-QAOA [95].
We have also demonstrated how to continuously transform WS-QAOA to conventional QAOA using the regularization parameter ε. We also showed how to improve QAOA for MAXCUT using the GW algorithm to warm-start RQAOA, albeit by introducing an inconsistency between the mixer and the initial state. By using a grid scan at depth one, we mitigated the effect of local optima. We applied WS-RQAOA at depths p > 1 on a single graph. The results suggest that the algorithm provides better solutions as p increases. Future work may extend these simulations to more graphs with different sizes. Further work could also exploit other possible warm-starts, e.g., based on polynomially-solvable special cases [62,63], where one could for example consider low-rank approximations of Σ or the SDP [96] as well as analysis of the convergence properties when using a modified mixer that does not have the initial state as eigenstate. One may also investigate RQAOA in the context of the continuous warm-start discussed in Sec. 3. Furthermore, we may also warm-start quantum optimization algorithms from candidate solutions obtained with classical solvers such as CPLEX or GUROBI with a timelimit termination criterion.
We expect warm-start to be applicable to other problems within Combinatorial Optimization and Integer Programming, for which a good solution can be found through randomized rounding [52], possibly following an encoding into a QUBO [70,71,97], a mixedinteger linear optimization problem [60], or a polynomial unconstrained binary optimization problem [90]. Indeed, both the recipe to obtain SDP relaxations [65] and the analytical tools of Appendix D are applicable to linearly constrained problems equally well. For example, the particle-hole representation for VQE can be seen as a form of warm-start [98]. We anticipate that WS-QAOA is also applicable to other binary optimization problems for which an approximate solution can easily be found using relaxed versions of the problem, without the use of randomized rounding, albeit more research needs to be done in this direction. ibm.com are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide.
Other product and service names might be trademarks of IBM or other companies. The current list of IBM trademarks is available at https://www.ibm.com/legal/copytrade. Jakub Mareček's research has been supported by the OP RDE project CZ.02.1.01/0.0/0.0/16_019/0000765 Research Center for Informatics. One month after releasing our work Ref. [96] appeared on the arXiv.

A The Unique Games Conjecture
We now summarize the Unique Games Conjecture (UGC) without presenting any original material. Inapproximability results suggest that finding an approximate solution of a certain problem-dependent approximation ratio is no easier than finding the optimal solution. The PCP theorem [99][100][101] shows that in a constraint satisfaction problem, the fraction of satisfiable constraints is NP-Hard to approximate within some constant factor. In particular, for a constraint satisfaction problem with at most k variables per constraint, it suggests there is a constant 0 < α < 1, such that it is NP-Hard to decide whether either all constraints are simultaneously satisfiable or whether every assignment satisfies fewer than an α fraction of the constraints. We refer to [102,Chapters 18 and 19] for an excellent overview.
By building on the PCP theorem Khot suggested the Unique Games Conjecture [103] which can be formulated with two-prover one-round games as well as unique label cover problems. Here, we state the UGC with the unique label cover problem as it relates to MAXCUT [55]. In a unique label cover problem there is a bipartite graph G = (V, W, E) with partition V , W and edges E, an alphabet M , and a bijection π e : M → M for every edge e ∈ E. We Conjecture 1 (Unique Games Conjecture [103]). For arbitrarily small constants ζ, δ > 0, there exists a constant k = k(ζ, δ) such that it is NP-Hard to determine whether a unique label cover instance with the label set size k, i.e. k = |M |, has optimum at least 1 − ζ or at most δ. Under the UGC a canonical semidefinite programming relaxation of a constraint satisfaction problem provides the best possible approximation ratio [55,104,105]. Nevertheless, the UGC itself has been neither proven nor disproven, despite much effort. The closest to proving the UGC, namely Khot et al. [106], essentially proved the so-called 2-to-2-Games Conjecture, which is a specialisation of the UGC to finite fields on two elements. Independently, [107,108] have shown that there is a subexponential time approximation for the original Unique Games problem, which utilizes a hierarchy of LP [108] or SDP [107] relaxations. This seems to point to the full UGC being true, but does not prove it yet.
Research in quantum complexity is still ongoing. The EPR paradox [109] and the Tsirelson problem [110] can be seen as two-prover one-round games in which the provers may share entanglement while the verifier and all communication are classical. In this spirit, Kempe et al. [84] showed that the "unique games with entangled provers" is false by rounding an SDP relaxation of the value of a unique game with entangled provers and with more than two possible answers, in polynomial time. The "games quantum PCP conjecture", where one wishes to distinguish between the cases when the provers of a game with a classical verifier have a strategy using entanglement that succeeds with probability 1, or when no such strategy succeeds with probability larger than 1/2, has been shown to be true under randomized reductions [111]. By contrast, the "constraint satisfaction" variant of the quantum PCP conjecture [112], which considers constant-factor approximations to the minimal energy of a local Hamiltonian normalized to 1, remains open.

B Goemans-Williamson Algorithm
The GW algorithm [53] first solves the continuous relaxation of MAXCUT with positive edge weights ω ij , where the decision variables v i are n-dimensional vectors with unit Euclidean norm instead of binary variables z i ∈ {−1, 1}.
We denote this v i ∈ S n with S in plain font, in contrast to S n for symmetric matrices. The relaxation (7) is efficiently solvable as a semidefinite programming problem [48] to get the optimal vectors v * i . Next, the GW algorithm generates a cut by selecting a vector r uniformly at random on the unit sphere and assigning z i = sign(r T v * i ) for each node, where the sign function returns 1 for non-negative inputs and -1 elsewhere. That is, the rounding depends on which side of the hyperplane (defined by r) passing through the origin the node lies.
Informally speaking, cuts generated in this way are guaranteed to be on average 87.9% of the size of the maximum cut [53], when averaging over the choice of the random hyperplane in the case of the positive edge weights. Formally, Proposition 2 (Based on Theorem 3.1 in [53]). The expected value, with respect to the random hyper-plane defined by the vector r, of the cut size W generated by rounding of the MAXCUT SDP relaxation (7) is: where W * denotes the value of the maximum cut and the hardness factor is Further, conditional on the Unique Games Conjecture [55][56][57], this is the best possible guarantee that can be obtained by any classical algorithm in polynomial time.

C Extensions towards QUBO
MAXCUT is a special case of (QUBO) 4 . The GW performance ratio is valid only for MAXCUT, as the dotted-dashed green line shows the fraction of graphs for which the maximum cut was found. It is harder to find the maximum cut on fully connected uniform random graphs than random graphs with pE = 1/2. special case of (QUBO) [75], and likewise the constants in the inapproximability results. One can, however, encode most problems in combinatorial optimization into a so-called constraint satisfaction problem [75,104,113], for which there is an well-known SDP relaxation and a subsequent rounding procedure [75,104,113]. Likewise, one can derive optimal inapproximability results conditional on the Unique Games Conjecture [55][56][57]. See, for example, Figure 2 of [56].
For example, for MAXCUT with real-valued edge weights [75], which actually generalises the QUBO we have presented, as it does not assume Σ is symmetric, we have: Proposition 3 (Based on Lemma 6 in [75]). Let w stand for the total weight of edges in a MAXCUT instance, where it is NP-Hard to decide whether the optimal cut is larger or equal than k or less than αk, where α is the hardness factor (9) for MAXCUT. Then for every > 0 it is NP-Hard to distinguish instances of QUBO with optimum greater or equal to 2k − w from instances of QUBO whose optimum is at most 2αk − w. The ratio of these two bounds on the optimum is Moreover, the optimum hardness factor is achieved by the randomized rounding of an SDP relaxation [75].
This can be used to prove the inapproximability results for MAXCUT with real weights [75], both conditional and independent of the Unique Games Conjecture.
We illustrate the performance of GW on the random graphs with 30 nodes used in Sec. 4. For each graph we generate N cuts with GW and normalize them to the maximum cut which is found with CPLEX. We next calculate the minimum, maximum, and average size of these N cuts for each graph. Finally, we average the minimum, maximum, and average of the 100 graphs, see Fig. 12. The average is stable at 85.2% and 83.7% for the random graphs with p E = 1/2 and the fully connected graphs, respectively, see Fig. 12(a) and (b). These averages are slightly below the GW approximation ratio 5 . When N > 100 the maximum cut for more than 80 of the 100 graphs in Fig. 12(a) is found which is why we chose N = 10 in Sec. 4. When the graphs are fully connected the GW algorithm does not find as many maximum cuts. For instance, 61 maximum cuts are found at N = 10 5 for fully connected graphs, see Fig. 12(b).

D A Stochastic-Analysis Viewpoint
Many randomized rounding procedures can be seen from the viewpoint of stochastic analysis: one obtains random unit vectors u 1 , . . . , u n ∈ S n and produces signs σ 1 , . . . , σ n ∈ {−1, 1}. In a natural view of [79], the sign is extracted when an associated stochastic process {u T i B(t)} t≥0 first reaches {−1, 1}, where {B(t)} t≥0 is a Brownian motion in R n adapted to the filtration {F t } t≥0 . The corresponding Sticky Brownian Motion ∀ i ∈ {1, . . . , n} is where This can be extended to "Slowed-down" Sticky Brownian Motion [79,80]. Considering first a speed function ϕ : and ∀ s ∈ (−1, 1), ϕ(s) > 0 (14) and second a stochastic process {W ϕ u (t)} (u,t)∈R n ×[0,∞) that satisfies: 5 This could be seen as an instance of a phase transition [114], beyond which the problem becomes computationally difficult for classical algorithms running in polynomial time. Notice that the MAX-2-SAT of [114] is closely related to MAXCUT [115]. and dW ϕ u (t) = ϕ W ϕ u (t) u T dB(t), (16) one obtains, under mild assumptions [79,80], For the "Slowed-down" Sticky Brownian Motion [79,80], one can show: Proposition 4 (Based on [79]). For u ∈ R n and for α > 0. Then, We note that the constant in (19) is not exactly the constant of the Goemans-Williamson [53] work. (See also Appendix B.) However, one can consider a different speed-function ξ to obtain the GW constant. In particular, by seeing the processes as Krivine diffusions, one can obtain: Proposition 5 (Based on [80]). For u ∈ R n and where Φ : R → R is the standard Gaussian cumulative distribution function, i.e., Then, Compare this to the statement of Proposition 2, noting that arccos(t)+arcsin(t) = π/2 for −1 ≤ t ≤ 1. The proof relies in seeing the process as discrete-time Krivine diffusions [80] and applying Theorem 3 of [80].

E Recursive QAOA
RQAOA [24] is a recursive algorithm to find the ground state of an Ising HamiltonianĤ n = i,j J i,jẐiẐj + k J kẐk with J i,j , J k as arbitrary real coefficients and n decision variables. At each step of the recursion a standard QAOA is run to find the state |ψ * =Û (β * , γ * ) |+ ⊗n that minimizes the energy ψ * |Ĥ n |ψ * . For each edge (i, j) ∈ E the correlator M i,j = ψ * |Ẑ iẐj |ψ * is computed. Next, the decision variable z i for which |M ij | is largest is replaced with sign(M i,j )z j to generate a new Ising Hamilto-nianĤ n−1 with n−1 decision variables. The recursion stops once the number of variables is below a threshold n stop . The remaining problem is solved with a classical solver. We refer to Appendix C of [24] for the pseudocode and detailed discussion.

F Depth-one RQAOA
Depth-one RQAOA can efficiently be simulated classically [91]. Here, we show the algorithm we used to efficiently simulate depth-one WS-RQAOA. To evaluate the correlator Ẑ iẐj = Tr{ρ i,jẐiẐj } we only need the density matrix ρ i,j of qubits i and j, see the circuit in Fig. 13. Qubits i and j are first prepared in the state . For each qubit k = i, j, the cost Hamiltonian applies the gateÛ i,k ⊗Û j,k , whereÛ i,k iŝ Here, the single-qubit gateÛ 1 (φ) is diag(1, e iφ ).
Since the controlled-phase gate C-Phase(φ) = diag(1, 1, 1, e iφ ) commutes with theÛ 1 gate we move allÛ 1 gates to the front of the circuit an apply the phasesÛ 1 (γ k =i,j ω i,k ) andÛ 1 (γ k =i,j ω j,k ) to qubits i and j, respectively. Next, we include the effect of the controlled-phase gate of each qubit k = i, j, initially in the state 1 − c * k |0 + c * k |1 , on the density matrix ρ i,j by computing Then we apply the two-qubit operation from the ω i,jẐiẐj term, i.e.Û i,j = diag(1, e iγωi,j , e iγωi,j , 1), and finally apply the mixer operator before measuring. This is summarized in Alg. 1.

Algorithm 1: Depth-one RQAOA
Initialization: qubit i and j in state |0 .

G MAXCUT reduction
Here we show that replacingẐ i by ±Ẑ j in a MAXCUT problem results in a new MAXCUT problem with one node less. Without loss of generality we label the nodes from 1 to n such that the spin operatorẐ n of node n will be replaced byẐ n = αẐ k with α = ±1 and k < n. The MAXCUT Hamiltonian of the weighted graph iŝ We now replaceẐ n = αẐ k in the last term and, since ω i,n 1 − αẐ iẐk = αω i,n 1 −Ẑ iẐk + ω i,n (1 − α), we may write We neglect the first sum since it is an energy offset that does not affect the optimization. The Hamiltonian of the reduced problem is thereforê This Hamiltonian corresponds to a new graph E in which the weights ω i,j with i, j = 1, ..., n−1 have been updated according to  Figure 14: Energy, normalized to the energy of the maximum cut, as a function of ε for a depth one WS-QAOA. The graphs are the same as those in Fig. 7 but we used the mixer from Eq. (2) which cannot reproduce the GW at ε = 0.25 and depth one.

H Portfolio data
The return vectors and covariance matrices used in Sec. 3 are obtained by simulating the price of each asset following a Geometric Brownian motion for N = 250 days. The price of asset i on the k th day is Without loss of generality we set the initial price S i,0 = 1. We randomly chose each mean µ i and standard deviation σ i uniformly form [−5%, 5%] and [−20%, 20%], respectively. The Brownian motion is given by W k = j l=0 z l / √ N where z l is drawn from the normal distribution. The return of asset i on the k th day is r i,k = S i,k /S i,k−1 − 1. The average of r i,k gives the mean return of asset i and the covariance of asset i and j is obtained from r i,k and r j,k where k = 1, ..., N .

I WS-QAOA for MAXCUT with the warm-start mixer
In the main text we modified the WS mixer to retain the GW cut. Here, we explore WS-QAOA for MAX- Max. cut counts Figure 15: Histograms of cut sizes, relative to the maximum cut found by CPLEX, for WS-RQAOA with the modified mixer (red, same data as in Fig. 8), and the WS-QAOA with the mixer from Eq. (2). The dashed line shows the hardness factor 11/13.
CUT with the mixer of Eq. (2). By repeating the analysis done in Fig. 7 we observe that the energy, normalized to the maximum, decreases as a function of ε and does not recover at ε = 0.25, see Fig. 14.
In addition, we repeat the analysis done in Fig. 8 but with the mixer of Eq. (2) and compare it to WS-RQAOA with the modified mixer. Both algorithms have a similar performance, see Fig. 15, which suggests that the amount of correlation between the variables at each iteration, e.g. as in Fig. 14, is still sufficient for WS-RQAOA to produce good results even though the GW cut cannot be sampled with certainty at each iteration. As the size and complexity of the graphs is increased the performance of WS-RQAOA with the mixer from Eq. (2) decreases compared to WS-RQAOA with the modified mixer and indicates the importance of being able to retain the GW cut.