Warm-Started QAOA with Custom Mixers Provably Converges and Computationally Beats Goemans-Williamson’s Max-Cut at Low Circuit Depths

,


Introduction
In order to realize a quantum advantage, many researchers have been considering the usage of NISQ devices [1,2] for the purposes of solving difficult problems in combinatorial optimization.Of particular interest is the Quantum Approximate Optimization Algorithm (QAOA), a hybrid quantumclassical algorithm developed by Farhi et al. that can be applied to a large variety of combinatorial optimization problems [3].We study the use of QAOA to solve one of the most famous NP-hard combinatorial optimization problems, called Max-Cut. 1 Given a weighted graph G = (V, E), with vertex set V = [n], edge set E ⊆ V 2 and weights w : E → R, the Max-Cut problem is to find a partition of V into two disjoint sets S, V \ S ⊆ V , such that the total weight of the edges across the partition, i.e. cut(S) := e∈E w e •1[e ∈ S×(V \S)], is maximized.The Max-Cut of G is denoted by Max-Cut(G) = max S⊆V cut(S).
In the standard QAOA algorithm, qubits are initialized in the |+⟩ state along the xaxis of the Bloch sphere, tensorized n times, and QAOA's mixing Hamiltonian rotates each qubit about this same axis.The promise of this approach lies in Farhi et al.'s [3] result that establishes a connection between standard QAOA and quantum adiabatic computing, showing that with increased circuit depth, standard QAOA converges to the Max-Cut.This connection relies on the initial state of standard QAOA being the highest energy eigenstate of the mixing Hamiltonian.
We propose to initialize the circuit with separable initial states (other than |+⟩ ⊗n ) generated using classical relaxations of the Max-Cut problem; following classical optimization literature, we refer to these states as warmstarts.We modify the mixing Hamiltonians in the QAOA ansatz in a way dependent on the initial state, so that the adiabatic assumptions hold; we call these mixing Hamiltonians custom mixers, and call the new QAOA variant QAOA-warmest.We show that like QAOA, QAOA-warmest converges to the Max-Cut with increased circuit depth (Proposition 1).
We further note that the convergence rate of QAOA-warmest is heavily dependent on the separable initial state used.
In this work, we focus on two approaches for warmstarting that generate classically-inspired separable initial states which (empirically) converge faster compared to existing initializations: (1) low-dimensional projections of optimal solutions to the Goemans-Williamson (GW) semidefinite program (SDP) [5], and (2) optimization problem by converting a generic QUBO instance into a Max-Cut instance [4], and then applying the same techniques.This conversion requires only one additional qubit; however, such a conversion is not necessarily approximation-preserving.
locally optimal solutions to low-dimensional Burer-Monteiro [6] relaxations of the Max-Cut problem.Using a diverse library of 1148 graphs (up to 11 nodes), we show through numerical simulations that QAOA-warmest that uses our two types of warm-starts outperforms the classical Goemans-Williamson algorithm and standard QAOA even at low-circuit depths around p = 4.Such a low-depth for this performance does not hold for any random initialization on the Bloch sphere.We further tested QAOA-warmest in the presence of noise using Qiskit's built-in modules and hardware calibration data [7], and on IBM's 16-qubit Guadalupe device and Quantinuum's 20-qubit devices.Our simulations demonstrate that QAOA-warmest is robust and still yields high (instance-specific) approximation ratios in such noisy regimes.
Additionally, we provide theoretical performance guarantees at depth p = 0 in the case that projected GW solutions are used.Specifically, for positive-weighted graphs, we show (Theorem 2) that, in expectation, the value of the cut obtained from hyperplane rounding of the projected GW solutions is at least 0.878 of the optimal cut value (just like GW).We further show that quantum measurement of the projected solutions (i.e.depth-0 QAOA-warmest) yields at least a 3  4 (0.878) ≈ 0.658-approximation and 2 3 (0.878) ≈ 0.585approximation for dimensions k = 2 and k = 3 respectively.These guarantees also serve as a lower bound for what can be achieved with higher depth since, like standard QAOA, our approach also has monotonically increasing expected cut-quality with increased circuit depth for any instance.This paper presents the first warm-start approach for QAOA (for Max-Cut) which simultaneously (1) provides a nontrivial constantfactor approximation ratio at depth p = 0, (2) satisfies provable convergence to Max-Cut as circuit depth increases, and (3) enjoys a fast convergence rate, as shown through numerical simulations and experiments on quantum hardware.Some of the previous approaches [8,9] consider warm-start initializations using perturbations of a single-cut (obtained for ex-ample, using GW algorithm).While such initializations theoretically yield higher approximation ratio of 0.878 using quantum sampling at depth p = 0, this warm-start has a much slower convergence rate in simulations (see Table 1).Therefore, we believe that our approach is promising for near-term NISQ devices.

Related Work
Since Farhi et al.'s [3] seminal paper in 2014, many have researched and analyzed the QAOA algorithm.Many empirical studies have been performed including the effects of different parameter initialization strategies [10][11][12][13][14], the performance of QAOA in the context of devices with superconducting qubits with limited connectivity [15][16][17],the use of machine learning to determine (for specific instances) whether Max-Cut QAOA would yield better cuts compared to the classical GW algorithm [18], the effectiveness of different encoding schemes for objectives involving higher-order terms with more than 2 qubits [19], the effect of various graph properties on the performance of QAOA [20], and the use of QAOA to generate highly squeezed states which are useful in the context of quantum metrology [21].
Many researchers have also analyzed QAOA from a more theoretical perspective.Shaydulin et al. [22] gives a series of results regarding the classical symmetries in the objective function and how those symmetries are reflected in the QAOA dynamics.In their seminal paper, Farhi et al. show that depth-1 QAOA achieves an approximation ratio of 0.694 for Max-Cut on 3-regular graphs [3].Wurst and Love show that at depth-2, this approximation ratio improves to 0.7559 for Max-Cut QAOA on 3-regular graphs [23].For quantum devices with limited connectivity, Farhi et al. [24] show that a variation of Max-Cut QAOA on 3-regular graphs on a device whose native graph is a square grid achieves an approximation ratio of 0.5293 without the use of swap operations.Others have also proven limitations of the standard QAOA algorithm as well: Bravyi et al. [25] show that, for all d ≥ 3, there exists a sequence of d-regular bipartite graphs such that depth-p QAOA with p < (1/3 log 2 n − 4)d −1 on such instances produces a cut (in expectation) whose value is at most 5  6 3d , meaning that, in the worst case, constant-depth QAOA for Max-Cut is inferior to the classical Goemans-Williamson algorithm as lim d→∞ Ä 5 6 + √ d−1 3d ä = 5 6 ≈ 0.833 < 0.878.Farhi et al. [26] show a similar result when QAOA is applied to the Max Independent Set problem2 and Bravyi et al. [27] give similar results for a recursive variant of QAOA applied to the Max-k-Cut problem. 3astings [28] defines a notion of a "local" classical algorithm and shows that, for trianglefree d-regular graphs with 2 ≤ d ≤ 1000, there exists4 a d-dependent local classical algorithm for Max-Cut with a provably better approximation ratio compared to depth-1 QAOA.Marwaha [29] extended this result, showing that for each 2 ≤ d ≤ 500, there exists local classical algorithms that yields a better expected cut compared to depth-2 QAOA for all d-regular graphs with girth greater than 5. Barak and Marwaha [30] have continued this line of research, showing that for every onelocal algorithm (classical or quantum), that the maximum cut achieved is at most 1  2 of the maximum cut for d-regular graphs with girth greater than 5 and that there exists a k-local algorithm with approximation ratio for d-regular graphs with girth greater than 2k + 1.However, general approximation guarantees obtained by the QAOA algorithm still remain elusive.
The above results also suggest that in order to realize some kind of quantum advantage, more information beyond the standard QAOA algorithm might be needed.Recent work has considered modifications and variations of the QAOA algorithm itself, as in this work, which we discuss next.
The closest works related to this paper are those by Tate et al. [31] and Egger et al. [9], where the authors have explored multiple approaches for warm-starting QAOA.Tate et al. [31] considered warm-starting QAOA using 2 and 3-dimensional Burer-Monteiro locally optimal solutions for Max-Cut; however, their method plateaus even at low circuit depths of p = 1 for some initializations, and it is unable to improve the cut quality for some instances.For the Max-Cut problem, Egger et al. [9] constructed the initial quantum state by (non-trivially) mapping a single specific cut (S, V \ S) (obtained via the Goemans-Williamson algorithm or possibly other means) to an initial quantum state.Egger et al. also modified the mixing Hamiltonian so that at depth p = 1 (with the right choice of QAOA variational parameters), the cut (S, V \S) is recovered; however, there is no evidence to suggest that such an approach will converge to the optimal solution for depth p → ∞.They further proposed a different continuous warm-started QAOA for Quadratic Unconstrained Binary Optimization (QUBO) problems, which enjoys convergence guarantees.For certain classes of QUBO's, the binary variables can be relaxed to be in the interval [0, 1] to obtain a convex quadratic program whose optimal solution can then be mapped to the Bloch sphere.Our mixer is a generalization of the mixer used in this particular approach by Egger et al.; however, the initialization scheme is quite different.In particular, their continuous warm-started QAOA approach cannot be directly applied to Max-Cut as the corresponding relaxed quadratic program is not convex.
Recent work by Cain et al. [8] further explored convergence properties of warm-starts, when augmented with standard mixers.They showed using a perturbative approach that standard QAOA with single-cut warm-starts (i.e., each qubit is initialized at |0⟩ or |1⟩) does not show any improvement in the approximation ratio, even when the circuit depth is increased.This result is interesting in the context of our work, since we find that (1) using custom mixers, one can guarantee convergence as long as the initialization is not at a singlecut, and (2) warm-starts that are perturbations of single-cut initializations (e.g., each qubit is initialized at R Y (θ * ) |0⟩ or R Y (π −θ * ) for some small θ * where R Y (θ) is a singlequbit rotation about the y-axis by angle θ) converge very slowly, in computations, even with custom mixers.Our work therefore complements the existing work and clarifies what kind of warm-starts are useful.
Additionally, other works have explored different kind of modifications of the QAOA ansatz.Farhi et al. [24] considered having separate variational parameters for each vertex and edge.Hadfield et al. [32] and Wang et al. [33] considered versions of QAOA that are suitable for combinatorial optimization problems with both hard constraints (that must be satisfied) and soft constraints (for which we want to minimize violations).Zhu et al. [34] modify QAOA such that the ansatz is expanded in an iterative fashion with the mixing Hamiltonian being allowed to change between iterations.Bravyi et al. [25] proposed a recursive QAOA approach that decreases the instance size at each iteration; Egger et al. [9] also consider a similar recursive version of their approaches.Bärtschi et al. [35] and Jiang et al. [36] consider modifications of QAOA inspired by Grover's (quantum) algorithm [37] for fast database search.For the scope of this work, we do not consider these alternate approaches; however, it may serve as an interesting direction for future work as QAOA-warmest can likely be used in conjunction with these other approaches.

The Quantum Approximate Optimization Algorithm
First, we review QAOA in the context of the Max-Cut problem.QAOA is performed on n qubits (with n being the number of vertices in the input graph) with the final measurement being a bitstring of length n corresponding to a cut (S, V \ S) with S = {i : ith bit is 0}.
For depth-p QAOA, the quantum circuit alternates between applying a cost Hamiltonian Here, σ x , σ y , σ z are the standard Pauli operators and σ k i is the Pauli operator applied to qubit i for k ∈ {x, y, z}.The cost and mixing Hamiltonians are applied a total of p times to generate a variational state, where |s 0 ⟩ is the initial state and γ = (γ 1 , . . ., γ p ), β = (β 1 , . . ., β p ) are variational parameters to be optimized.For standard QAOA, the initial state is given by |s 0 ⟩ = |+⟩ ⊗n which corresponds to an equal superposition of all 2 n possible cuts in the graph.Finally, sampling from |ψ p (γ, β)⟩ yields a bit-string corresponding to a cut in the graph.We let F p (γ, β) denote the expected cut value from sampling |ψ p (γ, β)⟩, i.e., If, for each p, we choose γ, β optimally, then the expected value of the cut tends to the Max-Cut as p → ∞ [3].
In practice, the optimal choice of γ, β is unknown in advance and thus a hybrid classicalquantum hybrid loop is often used to find values of γ and β that yield high expectation values.The initialization of γ and β in these optimization routines has been investigated [10][11][12][13][14]; however, for simplicity, we initialize γ and β near the origin for our experiments as discussed in Section 4.

Classical Optimization Algorithms
We next review some classical optimization algorithms for Max-Cut.It is useful to reformulate the Max-Cut problem as follows: ( Given a solution x to the maximization above, this yields the cut (S, V \ S) with S = {i : x i = 1}.For positive integer k, the above formulation can then be relaxed as follows: When k = n, the problem can be reformulated as a convex semidefinite program (SDP) which is the formulation used by the seminal Goemans-Williamson (GW) algorithm [5], which yields a 0.878-approximation to Max-Cut for graphs with non-negative edge weights.
Burer and Monteiro proposed solving the relaxation for 1 < k < n (which we denote by BM-MC k ), using parametric forms for points on the k − 1-dimensional sphere [6].This leads to a program that is no longer convex and thus one can not expect to easily find the global optima.However, highquality local optima can be found by utilizing first and second order optimization methods.In general, the Burer-Monteiro technique has been found to work well in practice, even when k = 2 [38]; however, in theory, Mei et al. [39] show that hyperplane rounding of a locally optimal BM-MC k solution achieves a 0.878(1 − 1 k−1 ) fraction of the optimal cut, yielding a 0-approximation and a 0.439-approximation for dimensions k = 2 and k = 3 respectively.
For ease of notation, given a (feasible) BM-MC k solution x, we let denote the expected cut value obtained from performing hyperplane rounding on x [5] and we let denote the BM-MC k objective at x. Lastly, we say that the solution x is κ-approximate if HP(x) ≥ κMax-Cut(G) and similarly,

Approximation Ratio
For this work, we adopt the same definition and notation for the (instance-specific) approximation ratio (AR) as discussed in Tate et al. [31].Specifically, for a graph G and algorithm A, the (instance-specific) AR is given by, where E A,G is the expected cut value (either from hyperplane rounding of a classical solution or quantum measurement of a quantum state) and Min-Cut(G) is the (possibly trivial) minimum cut in the graph G.This definition is chosen as a means of "normalizing" the approximation ratio to lie in the interval [0, 1], even in the case of graphs with both positive and negative edge weights.
For positive-weighted graphs, we have that Min-Cut(G) = 0, in which case, the definition of α A,G reduces to the "typical" definition of approximation ratio often seen in the classical optimization literature.The worstcase AR α A for an algorithm A is defined as the worst instance-specific AR across all instances, i.e., α A = min G α A,G ; alternatively, we say such an algorithm A provides an α Aapproximation.
It should be noted that in our numerical simulations, we can calculate E A,G exactly.Please refer to [31] for more details.

Custom Mixers
In this section, we provide a general framework that shows, for any separable initial state, how to construct a custom mixing Hamiltonian for QAOA so that the warmstart is the most excited state of the mixer.This property will be useful in showing convergence results for QAOA with custom mixers.We refer to such variants of QAOA as QAOA-warmest (compared to QAOA-warm, proposed by Tate et al. [31], which uses the standard mixer, H B = i∈[n] σ x i , together with a separable quantum initial state).
Consider a separable state |s 0 ⟩ on n qubits; if the jth qubit's Cartesian coordinates on the Bloch sphere are given by (x j , y j , z j ), then the corresponding custom mixing Hamiltonian H B is given by, where H B,j = x j σ x + y j σ y + z j σ z .Geometrically, the custom mixer rotates qubits about their original position on the Bloch sphere (details included in Appendix B).Note that the standard mixers in QAOA [3] are, therefore, a special case of our custom mixers since each qubit is initialized at |+⟩, i.e., the the x-axis (with Cartesian coordinates (x j , y j , z j ) = (1, 0, 0)) and the unitary operator e −iβ k H B for the mixer corresponds to rotations (by 2β k ) about the x-axis.When the initial state is composed by qubits restricted to the xz-plane with x > 0, then this custom mixer recovers the one considered by Egger et al [9].

Convergence to Max-Cut
In this section, we show that the expected cut obtained by QAOA-warmest converges to Max-Cut as the circuit depth goes to infinity.Theorem 1.Let |s 0 ⟩ be any separable initial state whose qubits do not lie at the poles of the Bloch sphere.Running QAOA with a warmstart |s 0 ⟩, its corresponding custom mixer (5), with the choice of optimal variational parameters, yields a distribution of cuts whose expected value reaches Max-Cut as the circuit depth p tends to infinity, i.e., We will first show that Theorem 1 holds when the warm-start is initialized in the xzplane of the Bloch sphere, with x > 0. We will then show the main result by showing equivalence of the distribution of cuts obtained by QAOA-warmest when a phase is added to each qubit (i.e., the initial state is not restricted to 0 phase).Proposition 1.Let |s 0 ⟩ be any separable initial state such that all qubits lie at the intersection of the Bloch sphere and the xz-plane with positive x-coordinate.Running QAOA with initial state |s 0 ⟩ and its corresponding custom mixer yields that i.e., the expected cut value of QAOA-warmest with optimal variational parameters will yield the optimal cut value as the circuit depth p tends to infinity.
Egger et al. [9] use the same custom mixers5 for warm-starts restricted to the xz-plane with x > 0, for convex quadratic programs, and state that convergence holds, without proof.While straightforward calculations show that the initial state is the highest-energy eigenstate of the mixer (a condition needed in order to apply the adiabatic theorem and guarantee convergence), a complete proof requires careful inspection of the eigenvalues of the timedependent Hamiltonian H(t) = (1−t/T )H B + (t/T )H C .We show that there is a non-zero gap between the largest and the second largest eigenvalues of the time-varying Hamiltonian, using the Perron-Frobenius theorem for irreducible stoquastic Hermitian matrices.These details can be found in Appendix C.This analysis can not be directly applied for arbitrary separable states since the corresponding mixers do not necessarily have real entries, and therefore, the Perron-Frobenius theorem cannot be applied.However, instead of calculating the eigenvalue gaps directly, we show that the phase in the warm-start is not reflected in the distribution of cuts obtained using QAOA-warmest, which would then complete the proof of Theorem The convergence given by Theorem 1 for QAOA-warmest is especially interesting considering that many previous warm-started QAOA approaches lack such guarantees.The QAOA-warm approach by Tate et al. [31] considered arbitrary separable states but with the standard mixer; they showed examples where QAOA-warm plateaued and did not converge to the Max-Cut.Cain et al. [8] considered the case where the initialization is a single bitstring/cut with the standard mixer, and showed that such an initialization also does not converge to Max-Cut.One of the approaches by Egger et al. consider a mixer that is neither the standard mixer nor the custom mixer approach presented above.Their mixer is used in conjunction with what we call a perturbed single-cut initialization (see Section 3) which recovers a particular cut (obtained via GW or other means) at depth 1.However, this initialization comes with no guarantees on convergence, and our results also do not apply to these different mixers.
Although Theorem 1 applies to any warmstart with aligned custom mixers, we will show that there is a significant difference in the rate of convergence to Max-Cut which depends on the type of warm-start chosen.Theoretically characterizing this rate of convergence still remains an open question, but we show that there is a stark observable difference in rate of convergence through numerical simulations (See Table 3 for a summary, and Section 4.2 for more details).We discuss the choice of warm-starts next.

Warm-Starts
We now discuss the notion of warm-starting the quantum circuit for QAOA by biasing the initial quantum state |s 0 ⟩ in Equation (1) to certain cuts, as opposed to taking an equal superposition of all cuts as in standard QAOA.Many initialization schemes have been used for warm-starting QAOA (see Table 1 for a summary); we focus on two that produce classical solutions which can easily be mapped to a separable quantum state in a way that roughly approximates the corresponding classical distribution of cuts.
Both approaches consider the k-dimensional relaxation (3) for Max-Cut.When k = n, this is the GW semidefinite program.Recall that the Goemans-Williamson algorithm [5] rounds an optimal solution u 1 , . . ., u n ∈ R n to this SDP to a cut v 1 , . . ., v n ∈ {−1, 1} using a random hyperplane.Since we wish to produce a separable quantum state, we instead round u 1 , . . ., u n ∈ R n to vectors v 1 , . . ., v n ∈ R k , k ∈ {2, 3} and map these to the Bloch sphere for each of the n qubits.
Alternately, we can solve the relaxation itself for k ∈ {2, 3} [6], and map the locallyoptimal solution vectors (called BM-MC solutions) directly to the corresponding Bloch spheres.While BM-MC solutions do not enjoy even the trivial 0.5-approximation guarantees [39], they are much faster to compute in practice and therefore scale better for larger graphs: e.g., Burer et al. [38] show that GW took over 1.5 days to complete on a 20,000 node instance, whereas a k = 2 BM-MC solution was found in a little over a second; repeated runs of BM-MC 2 over the course of a couple minutes on the same graph yielded cuts that were at least as good as those obtained by GW [38].
Lastly, we compare the two warm-start techniques above to perturbed single-cut initializations, another warm-start technique that has been considered in previous literature [8,9].We show that, theoretically, such initializations give better depth-0 guarantees for QAOA (compared to the two warm-start techniques previously discussed); however, we later show (Section 4) that such initializations yield a comparatively much slower rate of convergence with increased circuit depth.

SDP-based Relaxations
Recall that a solution to the Goemans-Williamson (GW) SDP relaxation consists of n unit vectors {u i ∈ R n : i ∈ V }, and their rounding algorithm uses a random hyperplane to obtain an approximation for the Max-Cut on the given graph G.We propose to create a warm-start to QAOA by instead rounding each of these vectors to R k (with k ∈ {2, 3}) and then mapping the rounded vectors to the Bloch sphere. 6pecifically, given u ∈ R n for some positive integer n, and a linear subspace A of R n , let Π A (u) denote the (Euclidean) projection of u on A. Given Π A (u) ̸ = 0, define as the unit-scale projection of u on A. This corresponds to normalizing Π A (u) so it is a unit vector.
To motivate warm-starts using solutions for the Goemans-Williamson SDP, consider the following two-step process for rounding an optimal SDP solution {u i : i ∈ [n]} to a cut: • Choose a uniformly random linear subspace A of R n of dimension k ∈ {2, 3}, and consider the unit-scale projections

• Use Goemans-Williams hyperplane rounding on vectors Λ
to a random k-dimensional subspace first, and then subsequently use the Goemans-Williamson hyperplane rounding on these kdimensional vectors.
The following theorem shows that this twostep rounding is equivalent to the Goemans-Williamson hyperplane rounding on vectors u i , i ∈ [n]; we include the proof in Appendix A.
Theorem 2. Suppose we are given unit vectors u 1 , . . ., u n ∈ R n that form an optimal solution to the SDP relaxation for Max-Cut on some graph G = (V, E) with n vertices and non-negative weights on the edges.Suppose the GW random hyperplane rounding on u 1 , . . ., u n obtains a (random) cut M of value X, and the two-step rounding described above produces a (random) cut M ′ of value Y .Then, 1.The random variables X = Y .In particular, E(X) = E(Y ) and therefore, in expectation, M ′ provides a 0.878approximation to Max-Cut on G.
2. Furthermore, the two-step rounding procedures produces a cut of value (0.878−ϵ) times the Max-Cut value with high probability if performed independently log n ϵ times for any constant ϵ ∈ (0, 1/2).Further, if A 1 , . . ., A log n ϵ are the intermediate k-dimensional subspaces in these log n ϵ runs, there is at least some A i (with high probability) such that performing the hyperplane rounding on A i produces a (random) cut of average value at least (0.878 − ϵ) times the Max-Cut.
The last part of Theorem 2 illustrates that we can obtain a high-quality k-dimensional projected GW solution in regards to hyperplane rounding; however, it is natural to ask if any kind of guarantee can be preserved when mapping the solution to a quantum state.By adapting a theorem by Tate et al. [31], we can answer the question in the affirmative as seen in Corollary 3 below.

Corollary 3.
Let G be a graph with nonnegative edge weights and let x be a corresponding κ-approximate projected GW solution in R 3 with respect to hyperplane round-ing. 7Let R U (x) denote random uniform rotation applied to x, i.e., a global rotation where a uniformly selected point on the sphere gets mapped to (0, 0, 1).Then initialization of QAOA with R U (x) has an (worst-case) approximation ratio of 2  3 κ at p = 0, i.e., only using quantum sampling with initial state creation and no algorithmic depth for QAOA.
A proof of Corollary8 3 can be found in Appendix A. This motivates us to use the projected vectors Λ A (u i ), i ∈ [n] to warm-start QAOA.We then use the same scheme as proposed by Tate et al. [31] to rotate and map the solution x * = {Λ A (u i )} i∈ [n] to the Bloch sphere.More specifically, we first perform either (1) a random vertex-at-top rotation of x * (a global rotation to x * so that a random vertex v coincides with (0,0,1)) or (2) a uniformly random rotation R U , and apply the natural mapping from the rotated solution to the Bloch sphere to obtain a separable, unentangled state |s 0 ⟩ [31].Figure 1 illustrates the two-step rounding procedure and this warmstart.

Burer-Monteiro's Relaxation
For faster computational performance, we find a warm-start by using a locally optimal kdimensional solution x * (with k ∈ {2, 3}) obtained by the Burer-Monteiro relaxation for Max-Cut on a (k − 1)-sphere.Similar to pro-jected GW solutions, we use the same rotation and quantum-mapping scheme to obtain an unentangled initial quantum state.
Theorem 2 of Tate et al. [31] shows that the same approximation ratios of 3  4 κ and 2 3 κ (for k = 2 and k = 3 solutions respectively) can be achieved for depth-0 QAOAwarmest if the solution is instead κ-close, i.e., BM-MC k (x * ) ≥ κMax-Cut(G).Using the same analysis as Goemans and Williamson [5], it is straightforward to show that a κ-close solution must also be 0.878κ-approximate (i.e.HP(x) ≥ 0.878κMax-Cut(G)).For locally optimal9 BM-MC k solutions, Mei et al. [39] shows that such solutions are κ-close (and hence 0.878κ-approximate) with κ = 1 − 1 k−1 ; thus, hyperplane rounding of such solutions yields (worst-case) approximation ratios of 0.878(0) = 0 and 0.878 1 2 = 0.439 while depth-0 QAOA-warmest can obtain (worstcase) approximation ratios of 3 4 (0) = 0 and Experiments in Appendix F.3 compare the (instance-specific) approximation ratios achieved by hyperplane rounding of both types of warm-start initializations discussed; in particular, projected GW SDP solutions do well (compared to n-dimensional GW SDP solutions) but approximate BM-MC k solutions degrade in performance as the number of nodes increases.

Perturbed Single Cut Initializations
For the purposes of comparison in our numerical simulations (Section 4), we briefly review one more type of warm-start which we refer to as perturbed single-cut initializations that other researchers [8,9] have used for QAOA; more details regarding this approach can be We prove that this two-step rounding procedure is equivalent to Goemans-Willimson hyperplane rounding in Theorem 2. The third procedure is our proposed warm-start of QAOA using the SDP solution (highlighted in blue, labelled III).This procedure involves rounding the SDP solution to R k first, then rotating this solution using uniform or vertex-at-top rotations and mapping to the Bloch sphere to get an initial state for QAOA, and finally running QAOA on this initial state.
found in (Appendix D).Given a regularization angle θ * ∈ [0, π/2] and a cut (S, V \ S), one can initialize the initial quantum state so that qubits lie along the xz-plane of the Bloch sphere with vertices in S and V \ S being initialized at an angle θ * away from the north and south poles of the Bloch sphere respectively; such a regularization angle aims to circumvent issues regarding reachability [9]. 10  Though not stated in the related works, if the warm-starts are initialized close to singlecuts, then they retain the classical approximation factors of the respective single-cut:

and suppose a cut (S, V \ S) is an α-approximation to Max-Cut(G), and is used to initialize a warm-start using regularization angle of θ
Then, a quantum measurement (i.e.depth-0 QAOA) of this state yields an approximation ratio of at least α • cos(θ * /2) 2|V | . 10It can be shown that if the initial qubit position lies on the poles of the Bloch sphere, e.g. if the initial qubit position is |0⟩ (north pole), then it is guaranteed that that qubit will be measured to be |0⟩ as the end of the QAOA algorithm if custom mixers (as described in Section 2) are used.
It is easy to see that such warm-starts approach an approximation ratio (at p = 0) of 0.878, when initialized randomly with the distribution of cuts obtained using the Goemans-Williamson algorithm, as the regularization angle θ * approaches zero.We discuss a proof in the Appendix A to show this.Although Proposition 4 suggests it may be the superior warm-start technique for small regularization angle θ * ; convergence is either lacking or slow empirically depending on the mixer used (see Table 1).

Numerical Simulations and Experiments
In this section, we demonstrate the importance of using a suitable warm-start and show that with such warm-starts, QAOA-warmest outperforms the Goemans-Williamson Max-Cut algorithm as well as standard QAOA [3] and QAOA-warm [31].In particular, we show that QAOA-warmest does at least as well as these other algorithms at depths p ≥ 4 for nearly every instance in our instance library.We consider comparisons with respect to a recent warm-starts approach of Egger et al. [9] in Appendix F.
As discussed earlier (Section 3), for positiveweighted graphs, perturbed single-cut initializations have better depth-0 guarantees compared to the projected GW warm-starts that we propose.While possibly more advantageous at extremely low depths (p = 0, 1), we show that QAOA with perturbed single-cut initializations empirically converge to an optimal cut very slowly with increased circuit depth; for small enough regularization angle θ * , the convergence towards an optimal cut is not even perceivable at the circuit depths tested.Meanwhile, with suitable warm-starts, the convergence is much quicker: for over 98% of instances tested, we found that depth-8 QAOA-warmest with BM-MC 2 initializations yields nearly optimal cuts.Lastly, we consider the effects of noise on QAOA and its variants in the context of actual quantum devices (i.e. the IBM-Q Guadalupe and Quantinuum H1-1 devices).We show that for QAOA and its variants, the noise from these devices flattens the landscape without significantly altering the location of local extrema.Additionally, we show that even with noise, QAOA-warmest (with suitable warmstart) maintains a significant fraction of its expected solution quality, which suggests it may be useful for near-term NISQ devices (potentially with some noise mitigation) [1].

Simulation Details
For our simulations, we use the CI-QuBe library 11 [41] which contains graphs up to 11 nodes using a variety of random graph models (Erdős-Rényi, Barabasi Albert, Dual of Barabasi-Albert, Watts-Strogatz, Newman-Watts-Strogatz, and random regular graphs) and edge weight distributions.These instances, which we refer to as G, have a varied distribution of various graph properties, which is important when testing heuristics and algorithms for solving this problem.
In our simulations, for each instance, we first find five locally approximate solutions to BM-MC 2 and keep the best (in terms of the BM-MC 2 objective value).We do the same for BM-MC 3 .Similarly, for each instance, we solve the GW SDP, perform 5 projections to random 2-dimensional subspaces, and keep the best (in terms of the BM-MC 2 objective); this process is repeated (using the same GW SDP solution) with projections to 3-dimensional subspaces.Next, for both the best BM-MC 2 and best BM-MC 3 solution, and for both of the best projected GW solutions (in 2 and 3 dimensions), we perform 5 different vertex-at-top rotations and 5 different uniform rotations, yielding 40 different initial warm-started quantum states per instance.We run QAOA-warm and QAOAwarmest using all 40 of these warm-started states and, for each combination of dimension and rotation scheme, record which one performed the best in terms of (instance-specific) approximation ratio (as defined in Section 1.2.3).Finally, we run standard QAOA on the instance.
For each run for each variant of QAOA, we initialize the variational parameters γ and β close to zero 12 and each run terminates when the difference in successive values of F p (γ, β) in the optimization loop is less than W × 10 −6 where W is the sum of the absolute values of the edge weights.
To simplify the results, the figures and tables in this section will only consider runs of QAOA-warmest that use BM-MC 2 initializations with vertex-at-top rotations.This choice is due to runtime considerations and to allow for easier comparisons with previous related literature [9,31]; more details on the results and the choice of this decision can be found in Appendix F.
Additionally, for conciseness, in this section we will use "approximation ratio" to mean the instance-specific approximation ratio as described in Section 1.2.3.we report the percentage of instances for which it did the best and second-best (in terms of approximation ratio).Both QAOA-warm and QAOA-warmest use BM-MC2 warm-starts.There is a tie (last column) if the top two algorithms have approximation ratios that differ by no more than 0.01.QAOA-warmest is a part of every tie.Each instance is either accounted for in "Tie" or the other columns.For the column labeled *, we report, for each circuit depth, the percentage of instances for which QAOA-warmest was within 0.01 approximation ratio of the best algorithm.

Comparing QAOA-warmest to Other Methods
In Table 2, we show the proportion of graphs where each Max-Cut algorithm (GW and variants of QAOA) performs the best for varying values of depth p = 1, 2, 4, 8.We observe that for nearly all instances, QAOAwarmest beats or performs as well as every other QAOA variant considered and eventually performs at least as well as GW as the circuit depth increases.We note that at p = 8, QAOA-warmest beats GW on all but three instances but this is easily rectified with a suitable vertex-at-top rotation.Also at p = 8, QAOA-warmest outperforms standard QAOA on all but two instances but the gap in ap-proximation ratio is less than 0.02.More information regarding these five instances can be found in Appendix F.
We next report the improvement in approximation ratios when considering standard QAOA, QAOA-warm, and QAOA-warmest with circuit depths p = 1, 8. For convenience, for any Max-Cut algorithm, we define the approximation error (AE) by AE = 1 − AR where AR is the approximation ratio.Additionally, we will refer to log 10 (AE) as the log-error.Figure 2 gives a comparison of log-errors achieved for various instances.All points below the x = y solid line indicate instances where QAOA-warmest beats either standard QAOA or QAOA-warm.Note that due to the plots being log-scaled, being below -2 on each axis corresponds to having an approximation ratio of at least 0.99.For both plots, we see that higher approximation ratios can be achieved for positive-weighted graphs (cross-marks) and that QAOA-warmest performs significantly better for most instances.When comparing QAOA-warmest and standard QAOA at various circuit depths (red v/s blue), we see that the performance for both standard QAOA and QAOA-warmest improves at p = 8; however, this phenomenon is not that apparent for QAOA-warm (which is known to plateau in performance with increased circuit depth for small instances).Next, we show the trend in approximation quality with increase in the number of nodes n and the depth of the circuit p, in Figure 3.We see that, across all node sizes, that circuit depth plays an important role in how good an approximation ratio one can expect to achieve using QAOA-warmest.It is clear that QAOA-warmest has superior (median) performance compared to the other algorithms for every combination of circuit depth and node-size.We remark that in contrast, an increased circuit depth resulted in only a marginal improvement in the approximation ratio for QAOA-warm, bolstering our claim that custom mixers are crucial to the improvement in performance of QAOA.
In Figure 4, we compare the convergence rates of standard QAOA and QAOA-warmest with various initilializations: BM-MC k , perturbed single-cut initializations (as described in Section 3.3), and uniform random initializations 13 with custom mixers for a random 10-node instance in our graph library.Consistent with what is seen in the other figures, we see that standard QAOA, QAOAwarmest, and uniform random initializations quickly achieve high approximation ratios at relatively low circuit depths, with the BM-MC 2 initialization doing the best amongst the three across all circuit depths tested.On the other hand, perturbed single-cut initializations do not converge as quickly; in particular, when θ * is small (θ * ∈ {0.1, 0.01}) hardly any improvement in the approximation ratio is observed at all.For larger regularization angles (θ * = 0.5), we do see worse performance at low depths (p = 0, 1) as well as a noticeable increase in performance with increased circuit depth; however, the amount of this increase is small compared to achieved by QAOA-warmest which begins to outperform the perturbed single-cut initialization (with θ * = 0.5) for p ≥ 2. We find similar qualitative results for most other instances in our graph ensemble G.
Table 3 provides a more aggregated view of the convergence of QAOA-warmest with dif- 13 Here, a uniform random initialization refers to a separable state that is randomly created by (independently) picking a position on the surface of the Bloch sphere uniformly at random for each qubit, and then tensorizing the qubits.Although the phase of each qubit does not effect the expected cut value obtained from QAOA-warmest (as discussed in Section 2), it should be noted that the distribution of initializations obtained from sampling from the surface of the Bloch sphere and then removing the phases is different than the distribution of initializations obtained from sampling uniformly from the portion of the greatcircle formed by the intersection of the Bloch sphere surface with the xz-plane with positive x-coordinate.Nonetheless, we have verified (via numerical simulation) that QAOA-warmest performs similarly between the two randomization schemes.
ferent choices of initializations across the entire instance library G.For each combination of initialization method and circuit depth, the table states the percentage of instances in the library which achieved an instance-specific approximation ratio of 99.0% of higher.The data for the perturbed single-cut initializations were obtained as follows: for each instance, we obtained an optimal solution to the GW SDP relaxation, we performed 100 hyperplane roundings on the optimal SDP solution to obtain 100 cuts, we discarded all cuts whose value is more than 0.98 • Max-Cut(G), and we used the best remaining cut to create a perturbed single-cut initialization with regularization angle θ * = 0.1 (as described in Section 3.3); the discarding of cuts with very high values were done in order to ensure that, in the case of a high instance-specific approximation ratio, such a ratio can be partly attributed to the quantum circuit and not just the initial cut itself.When using the BM-MC 2 initializations (with vertex-at-top rotations), there are steady improvements with increased circuit depths with 42.3% of the instances achieving an instance-specific AR of 99.0% at depth-0; this percentage increases to 98.1% at depth-8.With the standard QAOA initialization, |+⟩ ⊗ , none of the instances achieve an instance-specific AR of 99.0% or more; it is not until depth p = 8 that we see a considerable fraction of the graphs (39.4% respectively) achieving such an AR.As for the perturbed single-cut initialization with small regularization angle, we find that nearly none of the instances achieve an instance-specific AR of 99.0% with the exception of a few instances at depth p = 8.

QAOA-warm With Noise
In addition to the theoretical (noise-less) behavior of QAOA-warmest, we also demonstrate its performance with several example cases using noise models and experiments on IBM-Q hardware.For both the ideal and noisy simulation, we use IBM's Qiskit software package [7].In the case of the noisy simulation, we exercise the capability of Qiskit  warmest achieves an instance-specific AR of 99.0% for each combination of circuit depth and initialization method (standard initialization, BM-MC2 initialization, perturbed single-cut initialization with θ * = 0.1).
Figure 4: Instance specific approximation ratios achieved by QAOA-warmest with various types of initializations: standard initialization (equivalent to standard QAOA), BM-MC2 initialization, perturbed single-cut initialization, and uniform random initializations) for a randomly selected 10-node instance.For QAOA-warmest, we used a BM-MC2 initialization with a vertex-at-top rotation; here we intentionally chose the worst vertex (i.e. the one with worst AR at depth-0) to better illustrate the convergence rate.For the perturbed single-cut initialization, we chose a cut that obtains an instance-specific approximation ratio of 0.848 and created initial quantum states using regularization angles of θ * = 0.01, 0.1, and 0.5 radians.For the uniform random initializations, five such initializations were created and only the best one was kept (i.e. the one with best AR at depth-0).
to pull calibration data directly from the Guadalupe device and use it to construct a noise model for use in the simulator.In principle, this combination of actual hardware calibration and noise simulation should predict the behavior of the device.However, the noise models themselves have inherent assumptions that the noise itself is uncorrelated and only directly models effects such as single and two-qubit gate errors, finite qubit lifetime and dephasing time, and readout noise.While these serve as a good starting point to model the noise in a quantum device, as shown in the Figure 8, there is significant disagreement between the noise simulation and the actual hardware results.This disagreement is mainly attributed to the assumptions mentioned earlier, specifically the assumption of uncorrelated noise, where physical hardware experience significant crosstalk.For an example demonstration of how typical noise models are utilized, see Appendix E. We show in Figure 5 the performance of QAOA-warmest and standard QAOA on an instance generated via a construction by Karloff [42]; this unweighted graph is cho-sen due to the fact that it is a small graph that achieves a GW approximation ratio of 0.912 (see Appendix G) which is close to the lower bound of 0.878 provided by the GW algorithm.In contrast, both QAOA-warmest and standard QAOA are able to outperform this approximation ratio, under ideal, noiseless conditions.However, note that QAOAwarmest outperforms standard QAOA for all QAOA depths and outperforms GW after p > 1.We also consider a noise model utilizing Qiskit's built in modules [7] and use calibration data in order to simulate IBM's Guadalupe device.We note that QAOAwarmest outperforms standard QAOA for all noisy simulations, using the same fixed noise model.
In addition to this device-focused noise simulation, we also run QAOA-warmest on a native hardware graph matching IBM's Guadalupe device.In general, the connectivity of the graph and its matching to physical qubit hardware connectivity plays a key role in performance due to the overhead of inserting swap operations in order to compensate for limited connectivity [43].Therefore, the simplest graph is a so-called native graph, which is a graph with exactly the same connectivity as the underlying physical qubit device.This graph is shown in Figure 6.We assign randomly chosen weights to each edge chosen from a uniform distribution [−10, 10].Finding the Max-Cut solution to this graph can still be done by brute force and for a fixed choice of randomly weighted edges, we find the Max-Cut value to be approximately 33.96209.
We show the results of QAOA-warmest and standard QAOA in an ideal simulation and on hardware in Figure 7.The color scale is shared across all plots, showing that QAOA-warmest is able to find larger cut values as compared to standard QAOA, both in simulation and on actual hardware.For hardware results, we apply the efficient SPAM noise mitigation strategy based on a CTMP strategy [44,45].
In order to demonstrate the scaling of QAOA-warmest, we also show results for depths p = 0, 1, 2 in ideal simulation, noisy simulation, and on hardware, as shown in Fig- ure 8.We define p = 0 to simply mean the preparation and measurement of the ini-tial state. 14In the case of QAOA-warmest, this directly demonstrates the ability of the QPU to create and measure the classically suggested cut.In addition, Figure 8 shows results for two different choices of the state initialization for QAOA-warmest.The left plot shows the result of applying a uniform rotation in the classical preprocessing stage whereas the right shows the result of using the best vertex-attop rotation amongst the 16 possible vertices, i.e., the rotation that gives the largest approximation ratio at p = 0.These two plots clearly show the importance of initializing the initial quantum state in an optimal way.Another important point shown in these plots is that small scale QAOA problems on 16 nodes, are nearly exactly solved when a suitable vertex-at-top rotation is chosen.When the best vertex-at-top rotation is used, the use of QAOA actually shows a decrease in solution quality on hardware.This is due to the inherent noise on the device and the fact that the solution quality is nearly optimal in the initial state.The presence of noisy two-qubit gates in further layers of the algorithm (32 CNOT gates per layer), overwhelm the small benefit of the algorithm itself for these small problems.A remaining goal then is to find native graphs on hardware for large systems, while also offering sufficiently low error rates, in order to demonstrate improved solutions with an optimally chosen initial quantum state and increased algorithmic depth (p > 0).

Warmest
Finally, we show results for QAOA-warmest run on Quantinuum hardware in Figure 9.This 20-ion linear trap allows for arbitrary qubit connectivity and thus has no overhead associated with mapping a specific graph to the hardware.In this case, we again consider the 20-node Karloff instance graphs used in Figure 5, but here we use the GW warmest start initialization.Notably we utilize a uniform rotation which gives a large initial approximation ratio (at p = 0) and while hardtions whereas as the GW algorithm uses n-dimensional solutions.Moreover, the way cuts are determined are different (hyperplane rounding vs quantum measurement) so we should expect there to be a difference in approximation ratios.ware cannot improve on this initial state, the degradation is small considering that each p layer requires 90 two-qubit ZZ interactions (among many other single qubit operations).We also note the close agreement of the noisy simulator to the actual hardware results.In order to reduce the cost of these hardware runs, we only consider a single objective function evaluation (with 1000 shots), using noiseless simulations to find the optimal γ i , β i at each p depth.Even with these considerations, we see that the GW Warmest initialization outperforms the average GW performance on hardware up to p ≤ 2. These results indicate that current quantum hardware is very close to demonstrating improvement over the Goemans-Williamson algorithm for Max-Cut on known hard instance graphs when using the QAOA-warmest initialization procedure, and it already outperforms the average performance of GW on this particular graph.

Discussion
Our experimental results suggest that our QAOA-warmest method combined with initializations obtained by classical means can outperform both the standard QAOA and the Goemans-Williamson algorithm at relatively shallow circuit depths.Conversely, not all initializations on the Bloch sphere are useful; in particular random initializations underperform compared to classically obtained ini-  4: These tables reports the approximation ratios achieved for the five instances (amongst those in our instance library G) for which depth-8 QAOA-warmest did not obtain the best approximation ratio when compared to depth-8 QAOA-warm, depth-8 standard QAOA, and GW.The top and bottom tables are for instances in which standard QAOA and GW performed the best respectively.The instances in the bottom table have the property that there exists exactly one negative edge weight whose magnitude is much larger than the other edge weights.For the bottom table, in the last column, we also include the approximation ratio for QAOA-warmest in the case where a more suitable vertex-at-top rotation is used; i.e., we take one of the vertices incident to the large-magnitude negative edge and rotate it to the top.tializations.Moreover, adversarial initializations could be chosen if one wanted QAOA to perform poorly (i.e. by putting qubits near the poles of the Bloch sphere that correspond to the minimum cut).Overall, finding a suitable initialization is needed in order to see success in QAOA-warmest.In the case of classically-inspired initializations (e.g.Burer-Monteiro Max-Cut relaxations or projected GW SDP solutions) which are (classically) invariant under global rotations, this also includes picking a suitable rotation scheme before embedding the solution into a quantum state.
According to a paper by Farhi, Gamarnik, and Gutmann [26], QAOA needs to "see the whole graph" (i.e. have a high enough circuit depth) in order to achieve desireable results.Their results rely on the fact that local changes in the graph (e.g.modifying an edge weight) give uncorrelated results in regards to measured qubits that are sufficiently far away from such a local change.In other words, standard QAOA cannot distinguish between graphs whose local subgraph-structure is identical.It should be noted that the circuit used in QAOA-warmest also suffers from such a locality property; however, if we consider the entirety of the QAOA-warmest procedure, including the preprocessing stage of computing warm-starts, then this procedure can possibly distinguish between graphs with identical local subgraph structure since the initial state is sensitive to the global structure of the graph (when using BM-MC k relaxations or projected GW SDP solutions).This suggests that certain negative theoretical results  seen for standard QAOA may not necessarily hold for QAOA-warmest since the distinguishability arguments used would no longer apply.
The approximation guarantees for our warm-starts at p = 0 and convergence to Max-Cut (under adiabatic limit) combined with superior empirical performance provide strong evidence for quantum advantage of this approach at low circuit depths compared to existing classical methods, especially the Goemans-Williamson approximation.An interesting open question would be to quantify the approximation bounds obtained by QAOA-warmest for finite circuit depth greater than zero.

Methods
Numerical simulations were performed using both custom and pre-packaged codes in the Tensorflow [46] and Qiskit [7] software packages.Numerical experiments in Section 4.2 were performed on the high performance computing cluster at the School of Industrial and Systems Engineering at Georgia Institute of Technology.Jobs were sent to various servers in the cluster as they became available; a listing of the servers and their specifications can be found in Table 5.
Classical optimization was performed using standard optimizers available in python, including ADAM [47], L-BFGS-B [48], and COBYLA [49].For hardware results, we first describe the usage of IBM's Guadalupe device along with Qiskit software and the COBYLA optimizer.The Guadalupe device is a 16 qubit superconducting hardware with a heavy hexagonal connectivity.This device typically has average single qubit gate errors of 3.7204 × 10 −4 , two-qubit gate errors of 1.075 × 10 −2 and measurement error of 1.776 × 10 −2 , representing a quantum device of comparable quality to the state of the art.For the QAOA-Warmest runs shown in Fig 8, the run time on hardware can be estimated by the schedule method of Qiskit for each corresponding circuit.The run times for p = 0, 1, 2 are 7182 ns, 14968 ns, 17379 ns, respectively.Standard QAOA runs have comparable run times as they differ only by single qubit gates as compared to QAOA-Warmest.For these hardware runs, we seed the classical optimization process with ideal parameters (γ * , β * ) found in simulation and perform 20 optimization steps each with 8192 shots on hardware.Noisy simulations using hardwareinformed noise models were performed with 200 optimization steps and 3000 shots.These simulations were performed on GTRI's Icehammer cluster using a node with a Xeon-Gold6242R processor with 80 cores and 376 GB of memory.Secondly, for our results on Quantinuum hardware, we only evaluate a single point in parameter space (at γ * , β * ) at each p depth with 1000 shots.This choice was motivated in order to reduce the cost of hardware runs and do not represent any device or fundamental limitation.The Quantinuum H1-1 20 qubit device reports average single qubit gate errors of 5 × 10 −5 , two qubit gate errors of 3 × 10 −3 and SPAM errors of 3 × 10 −3 .Noisy simulation of the Quantinuum device were also performed through the cloud, provided by the Quantinuum service.
We consider the state |s 0 ⟩ ′ = n j=1 |s 0,j ⟩ ′ where |s 0,j ⟩ ′ = cos(θ j /2) |0⟩ + sin(θ j /2) |1⟩.Geometrically, going from |s 0 ⟩ to |s 0 ⟩ ′ has the effect of dropping the phase for all qubits so that they lie in the xz-plane of the Bloch sphere with positive x-coordinate (assuming that none of the qubits are at the poles).
It suffices to show that we can drop the phase for single qubit of |s 0 ⟩ (say qubit k) without changing the expected cut value; the argument can then be easily repeated for the remaining qubits to show that |s 0 ⟩ and |s 0 ⟩ ′ yield identical expected cut values.In this case, we consider the initial state |" s 0 ⟩ = n j=1 | s 0,j ⟩ where | s 0,k ⟩ = |s 0,k ⟩ ′ and | s 0,j ⟩ = |s 0,j ⟩ for j ̸ = k (i.e.only the position of qubit k is modified).Letting R x,k (θ), R y,k (θ), R z,k (θ) represent the standard rotation operator of the kth qubit (about axes x, y, z respectively) about the Bloch sphere by angle θ, we can also write i.e., |" s 0 ⟩ can be obtained from |s 0 ⟩ by rotating around the z-axis (of the Bloch sphere) by the appropriate amount.
Let H B and H B be the corresponding custom mixers for |s 0 ⟩ and |" s 0 ⟩ respectively.
We can write U B (β ℓ ) = U B,̸ =k (β ℓ )U B,k (β ℓ ) where U B,k (β ℓ ) is the portion of U B (β ℓ ) that acts on qubit k and U B,̸ =k (β ℓ ) is the portion that acts on the remaining qubits; we can similarly write ) (the part of the mixer that does not affect the kth qubit remains the same).Geometrically, the operation U B,k (β ℓ ) corresponds to rotating qubit k around its original position on the Bloch sphere by angle 2β ℓ so,

The equation above yields the following key relation between U B,k and '
For convenience, we will let and i.e., U B and " U B correspond to the QAOAwarmest circuit (excluding the initial state) for |s 0 ⟩ and |" s 0 ⟩ respectively.
First we observe that, (by Equation 6) We now finally show that QAOA-warmest initialized with |s 0 ⟩ and |" s 0 ⟩ yield the same value; in particular the extraneous R z (ϕ k ) term from the previous calculations will not effect the measurement due to commutativity with the cost Hamiltonian: where the last equality follows since H C commutes with R z (ϕ k ).This completes the proof.
Proof of Theorem 2. Given a subspace A of R n , if A = span(v) for some unit vector v ∈ R, we abuse the notation and denote Π v (u) = Π span(v) (u) and Λ v (u) = Λ span(v) (u).We need two lemmas before we prove the theorem.

That is, unit-scale projection of u on v is equivalent to first unit-scale projecting u to A and projecting this projection Λ
Note that the above lemma is deterministic statement; we have not used any randomness so far.
Let us consider what happens if we select a linear subspace A of R n of dimension k uniformly randomly from R n (one way to ensure it is chosen uniformly randomly is to select unit vectors v i ∈ R n , i ∈ [k] recursively so that v i is chosen uniformly randomly in the space orthogonal to v 1 , . . ., v i−1 ).Once we have A, let us select a vector v ∈ A uniformly randomly again.Is this equivalent to choosing a vector v ∈ R n uniformly randomly?By symmetry, it is, since the former experiment is not biased in favor of any direction.We omit the formal proof and state it as a lemma here: Lemma 6.Let E denote the experiment of choosing a unit vector v chosen uniformly randomly from R n .Let E ′ denote the experiment of choosing a linear subspace A of dimension k uniformly randomly from R n , and then choosing a unit vector v ′ uniformly randomly from A. Then E ′ = E, i.e., they correspond to the same probability space.
We are ready for the proof of Theorem 2.
Proof.Let U n = {v ∈ R n : ∥v∥ 2 = 1} be the set of unit vectors in R n .Recall that for a given probability space, a random variable is a real-valued function on the sample space, or that X, Y : U n → R. From Lemma 6, the two experiments correspond to the same probability space.Therefore, it is enough to show that One key observation is that rounding on a random hyperplane is equivalent to unit-scale projecting to a uniformly random vector v: indeed, let v be the vector normal to the uniform hyperplane, then any unit vector u is rounded to Since dim(A) = k is a constant and A is chosen uniformly randomly, Π A (u i ) ̸ = 0 for each i with probability 1.From Lemma 5, we have Λ v Λ A (u) = Λ v (u) for all unit vectors u and for all A.
Therefore, we have, Since X and Y have the same distribution, the same approximation guarantee holds for X, Y .This proves part 1 of the theorem.
We prove part 2 next.Let C denote the maximum cut value on graph G, and denote α = 0.878 for convenience.Then, part 1 shows that EY ≥ αC.We first show that Pr (Y > (1 − ϵ)EY ) ≥ αϵ using Markov inequality: Suppose that log n ϵ independent cuts are produced by applying the two-step rounding procedure log n times.Then the probability that all of these cuts have value less than where we have used the standard inequality exp(−x) ≥ 1 − x.Since α ∈ (0, 1), this probability goes to 0 as n goes to ∞.We prove the second claim of part 2. Given a k-dimensional subspace A of R n , let w A denote the average cut value after Goemans-Williamson hyperplane rounding is performed on A, i.e.
where U A is the set of all unit vectors in A.
Notice that For the first step of the two-step rounding procedure (i.e., the step selecting a random subspace of dimension k), let Z denote the random variable that takes value w A when subspace A is selected.We need to show that for log n ϵ i.i.d.random variables Z 1 , . . ., Z log n ϵ , there is at least some Z i such that Z i ≥ (1 − ϵ)EY .Since the random subspace is selected uniformly randomly, we have that A similar Markov inequality analysis on Z then gives the result.
For i, j ∈ [n], let θ ij denote the angle between x i and x j .
We can write E[1[i and j have different spins]] = f k (θ ij ), that is, this expectation is solely a function of the angle between the adjacent vertices and the dimension k considered.In particular, in Theorem 2 of Tate et al. [31], they show for k ∈ {2, 3} that Additionally, arccos(x i • x j ) = θ ij .Using these substitutions, we have For k ∈ {2, 3}, it is straightforward to verify that the minimum in the last line above is achieved at θ = π (see Figure 10) which leads to the ratio 3 κ respectively for k = 2, 3.

Proof of Observation 4
Proof.We prove something more general than what is stated in Proposition 4. Instead of considering a particular cut where hyperplane-rounding yields an approximation ratio of α, we will instead consider a random cut (obtained via hyperplane rounding where the hyperplane is selected uniformly at random).We will show that in expectation (considering both the randomness of the hyperplane rounding and the randomness of the quantum sampling), that the expected cut value is at least 0.878 cos 2|V | (θ * /2).Let X denote the random variable corresponding to the cut value obtained from the cut obtained by quantum measurement of a perturbed single-cut initialization obtained by GW hyperplane rounding as described in Section 3.3.For any S ⊆ V , let GW(S) denote the event that the cut (S, V \ S) was obtained from the GW hyperplane rounding step.Similarly, let QM(S) denote the event that quantum measurement of the initial state resulted in the cut (S, V \ S).
First, observe that if the cut (S, V \ S) is used to initialize the quantum state with regularization angle θ * , then the probability of quantum measurement getting the same cut is cos 2|V | (θ * /2); this is because each vertex independently has probability cos 2 (θ * /2) of remaining on the same side of the cut used to initialize the quantum state.Using this fact, we find that, and thus Max-Cut(G) ≥ 0.878 cos 2|V | (θ * /2) as desired.In the above formulas, we used the fact that E[X | GW(S), QM(S)] = cut(S) and that the sum S⊆V (cut(S) Pr(GW(S))) is simply the expected cut value of the GW algorithm, which we know is at least 0.878 of the optimal cut value for graphs with nonnegative weights [5].
Here, θ j and ϕ j can be interpreted as the polar and azimuthal angle respectively of the jth qubit on the Bloch sphere.The position of the jth qubit on the Bloch sphere can also be described in Cartesian coordinates nj = (x j , y j , z j ) via the following transformation from spherical to Cartesian coordinates: Recall (from Section 2) the custom mixing Hamiltonian H B is then constructed as follows: where H B,j = x j σ x + y j σ y + z j σ z .To develop a geometrical understanding of the custom mixer, consider the operator R n,j (α) that rotates the jth qubit by angle α about the naxis for some unit vector n = (x, y, z); such as operation can be written as: Recall that for the kth of the p stages of the QAOA circuit (where p is the circuit depth), one applies the unitary operator e −iβ k H B with β k being a variational parameter (to be optimized); this operator, e −iβ k H B , can be written as n j=1 R nj ,j (2β k ), i.e., in the kth stage of the QAOA circuit, one independently rotates the jth qubit about the axis determined by its original position by angle 2β k .

C Convergence of Custom Mixers in xz-plane
In order to prove Proposition 1 using the adiabatic theorem, we need to show that (1) ∥s 0 ⟩ is indeed the highest-energy eigenstate of the corresponding custom mixer H B , and (2) the difference between the largest and the secondlargest eigenvalues of H(t) = (1 − t/T )H B + (t/T )H C is strictly positive.We divide this section into two parts to prove these two statements.

C.1 Eigenstates of Custom Mixers
We show first that |s 0 ⟩ is the highest energy eigenstate of the corresponding custom mixer for a single qubit, and then generalize to the Kronecker sums of matrices (Proposition 3).Lemma 7. Let |s⟩ = cos(θ/2) |0⟩ + e iϕ sin(θ/2) |1⟩ be a single-qubit quantum state and let n = (x, y, z) be the Cartesian coordinates of that qubit on the Bloch sphere.Let U = xσ x + yσ y + zσ z .Then |s⟩ is the mostexcited eigenstate of U .
Proof.We have the following relationship between the Cartesian and spherical coordinates: x = cos ϕ sin θ, y = sin ϕ sin θ, z = cos θ.Thus, the matrix U = xσ x + yσ y + zσ z is given by One can show that the matrix can be diagonalized as U = P DP −1 where Thus, v 1 = cos(θ/2) |0⟩ + sin(θ/2)e iϕ |1⟩ is the highest-energy eigenstate of U .
We can then formulate the most-excited eigenstate of U using the following relation between eigenvalues of matrices involved in a Kronecker sum and the resultant matrix.Theorem 8. (Theorem 13.16 in [50]) Let A ∈ C n×n have eigenvalues λ 1 , . . ., λ n and let B ∈ C m×m have eigenvalues µ 1 , . . ., µ m .Then the Kronecker sum A⊕B has mn eigenvalues given by {λ i + µ j : i ∈ [n], j ∈ [m]}.Moreover, if x 1 , . . ., x p (p ≤ n) are linearly independent eigenvectors of A corresponding to λ 1 , . . ., λ n and z 1 , . . ., z q (q ≤ m) are linearly independent eigenvectors of B corresponding to µ 1 , . . ., µ q , then, for all i ∈ [p] and j ∈ [q], we have that x i ⊗ z j are linearly independent eigenvectors of A⊕B corresponding to λ i + µ j .
By applying Theorem 8 with the summands H B,j of the mixing Hamiltonian H B , we get the desired result as shown in Proposition 3. Proposition 3. Let |s 0 ⟩ be any separable initial state and let H B be its corresponding custom mixer.Then |s 0 ⟩ is the highest-energy eigenstate of H B .
Proof.Suppose for each j = 1, . . ., n we have a matrix A j with real eigenvalues and suppose the largest eigenvalue of A j is λ j with corresponding eigenvector v j .As a consequence of Theorem 8, we have that the largest eigenvalue of n j=1 A j is n j=1 λ j with one corresponding eigenvector being n j=1 v j .Letting A j = H B,j and v j = |s 0,j ⟩ and applying Lemma 7, we see that |s 0 ⟩ = n j=1 |s 0,j ⟩ is a highest-energy eigenstate for H B = n j=1 H B,j .

C.2 Eigenvalue Gap For Custom Mixers
It is known that if the Quantum Adiabatic Algorithm is run for large enough time T with time-varying Hamiltonian H(t) = (1 − t/T )H B + (t/T )H C starting with the highestenergy eigenstate of H(0) = H B , then one can arrive at the highest-energy eigenstate of H(T ) = H C , i.e. the optimal solution, provided that the gap between the largest and second-largest eigenvalue of H(t) is strictly positive for all t < T .This translates to finding an optimal solution when running QAOA as we let the circuit depth p tend to infinity.Farhi et al. showed that this eigenvalue gap was strictly positive for standard QAOA [3], thus guaranteeing convergence to the optimal solution.In particular, they applied the following Perron-Frobenius theorem to irreducible stoquastic 15 matrices.Theorem 9. [51] Let A be an irreducible matrix whose entries are all real and nonnegative.Let r be the spectral radius of A, i.e., 15 Stoquastic matrices are square matrices with real entries so that all of the off-diagonal entries are nonnegative.Let A be an n × n square matrix.Construct a directed graph GA with vertex set [n] where the edge (i, j) is included if and only if Aij > 0. If GA is strongly connected, then we say that A is irreducible.Otherwise, we say that A is reducible.r = max{|λ| : λ is eigenvalue of A}.Then r is an eigenvalue of A and furthermore, it has algebraic multiplicity of 1.
If the eigenvalues of an n × n matrix A are real (e.g. if A is Hermitian), then its eigenvalues (with multiplicity) can be ordered as λ 1 ≥ • • • ≥ λ n ; if A is also irreducible and has real, non-negative entries then Theorem 9 ensures a gap between the two largest eigenvalues (otherwise, if λ 1 = λ 2 , then the algebraic multiplicity of λ 1 would be at least 2, contradicting the statement of the theorem.)This observation still holds if we relax the nonnegativity condition to allow negative entries along the diagonal as seen in the following lemma.
Lemma 10.Let A be an irreducible stoquastic Hermitian matrix.Then the difference between the largest and second-largest eigenvalue of A is strictly positive.
Proof.Since A is stoquastic, then all of the off-diagonal elements are already nonnegative; however, the diagonal elements may be negative.Observe that for large enough k, we have that A + kI is a matrix with all nonnegative entries.Note that A + kI is Hermitian (since A and I are) and thus the eigenvalues of A + kI are real.If we apply the Perron-Frobenius theorem to A + kI, one observes that the gap between the largest and second-largest eigenvalue is strictly positive.
One can show that the eigenvalues of A+kI can be obtained by shifting all of the eigenvalues of A by k (i.e. of λ is an eigenvalue of A, then λ + k is an eigenvalue of A + kI).Moreover, the multiplicities of these shifted eigenvalues are preserved.Thus, the gap between the largest and second-largest eigenvalue of A + kI (which is strictly positive) is equal to the gap between the largest and secondlargest eigenvalue of A.
If the custom mixer H B has the form n j=1 (x j σ x j + z j σ z j ) with x j ∈ R + and z j ∈ R for j = 1, . . ., n, then one can show that H(t) is an irreducible, stoquastic matrix.Thus by Lemma 10, the eigenvalue gap of H(t) is strictly positive meaning that one can achieve the optimal solution as the circuit depth p → ∞ in QAOA-warmest.Geometrically, this special case corresponds to an initial separable state whose qubits lie in the xz-plane on the Bloch sphere with x > 0. The stoquasticity and irreducibility of this special case is formalized in the following two propositions respectively.We include their proofs at the end of this section, and go on to prove our main result first.

Proposition 4.
Let n be a positive integer.For each j = 1, . . ., n let x j be any nonnegative real number and let z j be any real number.Let H B = n j=1 (x j σ x j +z j σ z j ) and let H C be the problem Hamiltonian for QAOA.Then H(t) = (1 − t/T )H B + (t/T )H C is stoquastic for all 0 ≤ t ≤ T .Proposition 5. Let n be a positive integer.For each j = 1, . . ., n let x j be a positive real number and let z j be any real real number.Let H B = n j=1 (x j σ x j + z j σ z j ) and let H C be the problem Hamiltonian for QAOA.Then We can now prove the convergence for custom mixers of the special form ( n j=1 (x j σ x j + z j σ z j ) with x j ∈ R + and z j ∈ R for j = 1, . . ., n) and their corresponding initializations as described in Proposition 1 in Section 2.
Proof.By construction, the corresponding custom mixers will have the form n j=1 (x j σ x j + z j σ z j ) with x j ∈ R + and z j ∈ R for j = 1, . . ., n.The result then follows from Proposition 3, Lemma 10 (which is applicable due to Proposition 4 and Proposition 5), and the adiabatic theorem.
Recall that for standard QAOA, we have that . Thus, the fact that standard QAOA converges to the optimal cut as p → ∞ is a special case of Propositions 4 and 5.

Proof of Proposition 4
We first prove a technical lemma that is needed in order to prove the proposition.Observe that, Note that for i ̸ = j, the ijth block in the block matrix above is A ij I m , which contains only non-negative entries since A ij is an offdiagonal element of A and A is stoquastic.Now consider the entries in the ijth block where i = j.Note that if there is an offdiagonal entry of A ⊗ I m that is part of the iith block, then it is also an off-diagonal entry of that block but all the off-diagonal entries of the iith block (A ii I) are zero.Thus, we have shown that every off-diagonal element is nonnegative, thus A ⊗ I m is stoquastic.Next, observe that Let H B,j = x j σ x + z j σ z .Expanding σ x and σ z , we have that which is clearly stoquastic as we assumed that x j ≥ 0. As H B = n j=1 H B,j , the result now follows from Lemma 11.

Proof of Proposition 5
Proof.First, we recall the definition of irreducible matrix.Let A be an n × n square matrix.Construct a directed graph G A with vertex set [n] where the edge (i, j) is included if and only if A ij > 0. If G A is strongly connected, then we say that A is irreducible.Otherwise, we say that A is reducible.
For any square matrix M , let G M be the corresponding directed graph as described above.Observe that H C (and hence (t/T )H C ) is a diagonal matrix, thus, by the definition of irreducibility, the irreducibility of (1 − t/T )H B + (t/T )H C is the same as (1 − t/T )H B .Similarly, scaling a matrix by a positive constant does not affect its irreducibility, so it suffices to prove the irreducibility of H B .
Observe that σ x , σ z are symmetric and thus it is not very difficult to show that H B is also symmetric.This means, for the purposes of showing irreducibility, G H B is effectively an undirected graph and we just need to show that it is connected.One can write H B as H B = n j=1 (x j σ x + z j σ z ) where denotes the Kronecker sum.According to [52], this means that G H B can be written as the Cartesian graph product of the graphs H 1 , H 2 , . . ., H n where H j = G A j with A j = x j σ x + z j σ z .Observe, that each of the H j 's are connected if and only if x j ̸ = 0 which is true by assumption.Since each of the H j 's are connected, then it is also the case that G H B is connected as well (see Theorem 1 of [53]) which finishes the proof.

D Details on Perturbed Single-Cut Initialization
In a perturbed single-cut initialization scheme with cut (S, V \S) and regularization angle θ * , the quantum state is given by, where R Y (θ) is a single-qubit rotation about the y-axis by angle θ.Geometrically, qubits lie on the xz-plane of the Bloch sphere (with x > 0) so that they lie at an angle θ * away from either the north or south pole of the Bloch sphere, depending on which side of the cut (S, V \ S) the vertex is on.
In one of their approaches for Max-Cut, Egger et al. [9] use a mixer for QAOA that is different than both the standard mixer and the custom mixers described in Section 2. They show that their mixer has the property that, when a {perturbed single-cut initialization (based on a cut (S, V \ S)) with regularization parameter θ * = π/3 is used, that measurement of the depth-1 QAOA with variational parameters (γ 1 , β 1 ) = (0, π 2 ) produces exactly the cut (S, V \ S) that was used to initialize the initial quantum state.The drawback is that, with such a mixer proposed by Egger et al., no convergence guarantees are known and experiments suggest that, unlike standard QAOA, the optimal cut value is not achieved in expectation with increased circuit depth.
Cain et al. [8] consider the case where θ * = 0 and the standard mixer is used.They find that such an approach performs very poorly; in particular, no convergence towards the optimal cut is found with increased circuit depth either.
One can also consider using a perturbed single-cut initialization together with the custom mixers proposed in Section 2; this idea was very briefly explored in the appendices of Egger et al.'s work [9].From Proposition 4 and Theorem 1, it is clear that this approach (with non-negative weighted graphs) yields an approximation ratio approaching 0.878 for θ * → 0 for depth-0 QAOA and that such an approach convergences to the optimal cut with infinite circuit depth.
Recall (Section 3.1) that QAOA with a (2dimensional) projected GW initialization has a depth-0 approximation ratio of 0.658.One may be led to believe that, when custom mixers are used, that a perturbed single-cut initialization (with small regularization angle θ * ) is the better choice due to its (theoretically) better approximation ratio at depth-0.However, as seen empirically in Section 4, this is not the case: when θ * is small, the convergence rate of QAOA with single-cut initializations is (empirically) incredibly slow across all instances.For small θ * , QAOA with custom mixers geometrically performs rotations around axes that are near the poles of the Bloch sphere about the qubits' initial positions; it is possible that this geometric interpretation is responsible for the slow convergence for small θ * .

D.1 Relation to QUBO Approach
Egger et al. [9] consider a warm-start approach (which they call continuous warmstarted QAOA) for QUBO's of the form min y∈{0,1} n y T M y, where M ∈ R n×n is a real-symmetric matrix.They then consider the relaxation min y∈[0,1] n y T M y, i.e. the binary variables are relaxed to lie in the interval [0, 1].For certain matrices 16 M , this yields a convex quadratic program which can be easily solved to (global) optimality [54]; in this case, Egger et al. [9] find and use the globally optimal solution y * of the relaxation to produce a separable quantum initial state.We consider this approach in the context of Max-Cut, specifically in the case of graphs with non-negative edge weights.
One can formulate Max-Cut on a graph G = (V, E) with edge weights w : E → R as follows [4].Simply construct the QUBO matrix M by setting M ij = w ij for i ̸ = j and M ii = − n j=1 w ij for i ∈ {1, . . ., n}.If x * is an optimal solution to Max-Cut (using the formulation in Equation 2), then there is a corresponding y * that is an optimal solution of the QUBO such that x * i = 2y * i −1 for i = 1, . . ., n. Observe that M = −L where L Laplacian matrix of the graph G which is known to be positive-semidefinite (for graphs with non-negative edge weights) [55].Since L is positive-semidefinite, then the function f (x) = x T Lx is convex in x (as the Hessian of f , ∇ 2 f (x) = 2L, is positive-semidefinite).Thus, x T M x = −f (x) is generally not convex, and hence solving min x∈[0,1] n x T M x to global optimality (as is done by Egger et al. [9]) is non-trivial in the case of Max-Cut.However, we can still consider locally optimal solutions to the relaxation.Observe, i.e., the QUBO relaxation amounts to maximizing a convex function over a polytope, in which case, all strictly local maxima lie on the vertices of the polytope. 17The vertices of the polytope [0, 1] n correspond to cuts in the graph, thus, using the strictly locally optimal solution to the relaxation of the QUBO corresponding to Max-Cut degenerates to solutions corresponding to a single-cut; this means that, for Max-Cut (with non-negative edge-weights), this QUBO approach is (effectively) a single-cut initialization approach as described in Section 3.3. 17To see this, suppose by means of contradiction that y * was a strict local maximum that did not lie at a vertex of the polytope.Then there exists z ∈ R n such that y * −z and y * +z lie in the polytope such that f (y * ) > f (y * −z) and f (y * ) > f (y * +z).By convexity,

E Noise Simulations
In Figure 11, we consider 20 instances of Erdős-Rényi graphs, with edge probabilities of 50% and 8 nodes.In this case, we also choose random negative and positive edges weights from a uniform distribution defined on [−1, 1].
We show results for the ideal, noiseless case as well as the noisy case when 3% phase noise is present.For this noisy case, we consider only one simple source of noise, phase damping, which is present for every single qubit gate operation.This choice is made for simplicity and the following discussion is used as a example of generic noise channel modelling.
As an example, we consider the modelling of phase noise using Kraus operators which equivalently can be descried using noise channels.Generically, this noise is due to interactions with the environment which is composed of many subsystems.Each interaction itself is weak, but the result of many such interactions, while not likely to cause energy transitions, does introduce a loss of phase coherence.We can describe this process with a set of (nonunique) Kraus operators [56], given by, where q is the probability of a dephasing event occurring.These Kraus operators then have the effect on the state evolution as, If we associate the probability of a dephasing event occuring during a time interval ∆t, then q = Γ∆t with Γ being the characteristic dephasing rate and we can write n applications of the noisy channel, S(ρ, t) in matrix form as,
We can then see that this dephasing process preserves population as ρ 00 , ρ 11 are preserved but exponentially suppresses coherences at a rate defined by Γ, which also defines the dephasing time, T 2 = 1/Γ.In addition to modelling phase noise, we also include several other noise models in the results shown in Figures 5, 8, and 9 of the main text.The specifications for these noise models are generated from Qiskit's Noise-Model.from_backendmethod while using the "ibm_guadalupe" device as the targetted backend.In total, these noisy simulations utilize noise models that incorporate gate error probability of each gate, the gate length of all gates, the T 1 and T 2 times of all qubits, as well as the readout error probability.Each gate error consists of a deploarizing_error() followed by a thermal_relaxation_error() error channel.One can define similar Kraus operators for these noise channels as well [56].However, while these are a comprehensive treatment of quantum noise they do not accurately capture crosstalk and other correlated noise sources.
As mentioned in the main text, we apply a SPAM mitigation strategy to our hardware results from IBM. Typically these SPAM errors can vary significantly across qubits on a single device.Specifically, typical readout errors on the ibmq_guadalupe device range from [1.07%, 12.95%].Since the overall result is limited by the largest, worst case error, for simplicity, we only mitigate SPAM errors due to the fact that these errors are roughly an order of magnitude larger than gate errors on the same device as well as requiring only two additional circuit runs (independent of circuit width or depth) and do not implement any gate error mitigation strategies, which typically require significantly larger overhead [57,58].

F.1 GW vs BM-MC Warm-Starts
As described in Section 3, we consider two approaches for generating warm-starts: projected GW solutions and locally optimal BM-MC k solutions, with the former approach having better theoretical guarantees in regard to solution quality.However, numerical simulations displayed in Figure 14 show both approaches on the instance library G (graphs with at most 11 nodes) achieve similar expected cut values at depth p = 1 QAOA-warmest; in particular, the difference in (instance-specific) approximation ratio is less than 0.04 for nearly all instances.This similarity in solution quality is even more pronounced at depth p = 8.
Since both warm-start approaches yield similar results (in numerical simulations) and since the Burer-Monteiro approach scales better in regards to runtime (Section 3), the results in Section 4 assumes that locally optimal BM-MC k solutions are used to produce the warm-starts for QAOA-warm and QAOAwarmest.

F.2 Choice of Dimension and Rotation
Table 6 demonstrates the average (instancespecific) approximation ratio achieved by QAOA-warmest for various combinations of the dimension used for BM-MC k and the rotation scheme applied to the BM-MC k solution.We find that vertex-at-top rotations perform better than uniform rotations, especially in the context of k = 3 solutions.The data is inconclusive in regards to if k = 2 or k = 3 solutions are better for QAOA-warmest, both are promising.Finally, we remark that for depth-8 QAOA-warmest, any choice of dimension or rotation scheme gave at least a 0.996 average approximation ratio across the instances.
To give fair comparison against QAOAwarm [31] (and also for comparisons with Egger et al.'s approach [9]), the results in Section 4 assumes that we are using k = 2 initializations and vertex-at-top rotations unless otherwise stated, since these were the recommended setting in [31].

F.3 GW and BM-MC Scaling
In Section 3, two methods of creating a warmstart initialization are discussed: projecting GW SDP solutions and finding approximate BM-MC k solutions.Figure 12   from the MQLib [4] library 18 to compare these two methods at different dimensions (k = 2, 3) with respect to the (instance-specific) approximation ratio they achieve with hyperplane rounding; these approximation ratios are compared against hyperplane rounding of the ndimensional GW solution.It is clear from the figure that the projecting of GW solutions preserves the approximation ratio (from hyperplane rounding); this is consistent with the results of Theorem 2. On the other hand, while BM-MC k solutions preserve the approximation ratio (from hyperplane rounding) for small graphs, the gap in approximation ratios (compared to n-dimensional GW hyperplane rounding) grows as the number of nodes increases.

F.4 Interesting Instances
In Section 4.2, Table 2 shows that at circuit depth p = 8, there are five instances for which QAOA-warmest did not achieve the highest (instance-specific) approximation ratio compared to the other algorithms considered.
Of these five instances, standard QAOA was 18 MQLib [4] is a diverse library of Max-Cut instances; the results of Figure 12 may differ if different types/families of graphs are used (e.g.random Erdős-Rényi graphs).For each MQLib instance, we used the exact Max-Cut solver BiqCrunch [59] to try to find the optimal cut. Figure 12 uses all positiveweighted MQLib instances up to 663 nodes for which BiqCrunch was able to find the optimal cut within 24 hours.
the best algorithm for precisely two of these (instances #778 and #1820).For the remaining three instances (#1698, #1889, #2010), GW was the best algorithm; however, all three of these instances have the property that there is a single negative edge-weight whose magnitude is much larger than the other edge weights in the graph and additional numerical simulations show that a suitable vertex-at-top rotation (selecting the vertex that is incident to the large-magnitude negative edge weight) allows QAOA-warmest to outperform GW.
Table 4 gives detailed statistics for the approximation ratios achieved by each of the Max-Cut algorithms considered for these five instances.

F.5 Comparison with Egger et al.
Figure 13 compares the (instance-specific) approximation ratios achieved by QAOAwarmest and a variant of QAOA considered by Egger et al. [9].In the context of the Max-Cut problem, Egger et al. considered an approach which takes a good starting cut (S, V \ S) (obtained via GW or possibly other means) and uses this cut to construct an initial quantum state |s 0 ⟩.With this modified initial quantum state and an appropriate modification of the mixing Hamiltonian, Egger et al. show that their variation of QAOA is able to recover the cut at circuit depth p = 1, i.e., there is a choice of variational parameters γ 1 and β 1 such that the only cut obtained at those parameters is precisely (S, V \ S).
To give a fair comparison for Egger et al.'s approach, we consider 10 cuts generated by the GW algorithm and take the best 5. Due to the size of the instances we consider, usually at least one of the best 5 cuts would be optimal and hence Egger et al.'s approach would essentially already start with an optimal solution which is not interesting.For this reason, in Figure 13, we only consider those instances (22.7% of the instance library) for which neither QAOA-warmest or Egger et al.'s approach starts with the optimal solution.
For Egger et al.'s approach, we consider two different choices for initialization of the vari-ational parameters: (1) near the origin and (2) the choice of parameters that recovers the value of cut used to initialize the QAOA variant (i.e.β 1 = π/2 with the remaining parameters being set to zero).In both cases, Figure 13 demonstrates that QAOA-warmest typically has the superior performance.
There are a total of 163 instances for which the approximation ratio achieved by depth-8 QAOA-warmest beats GW (by at least 0.001) and the approximation ratio achieved by GW beats Egger et al.'s approach (with initialization β 1 = π/2 with the remaining parameters being set to zero) at depth-8 (by at least 0.001).For these instances, the median gap in approximation ratio between QAOA-warmest and GW was 0.0466 and the median gap in approximation ratio between GW and Egger et al.'s approach is 0.0458.

G Twenty-Node Graph
In this section, we discuss why the graph considered in Figure 6 is interesting.The graph used is a 20-node instance, where Goemans-Williamson achieves an (instance-specific) approximation ratio of 0.912.We briefly summarize the construction and properties of this graph.
Recall that the worst-case approximation ratio for the Goemans-Williamson (GW) algorithm is 0.878.Karloff [42] showed that the 0.878 bound for GW is tight by constructing a family of graphs whose (instance-specific) approximation ratios approaches 0.878 as graph size increases.The construction for this family of instances is as follows: consider nonnegative integers b ≤ t ≤ m and let J(m, t, b) denote the graph with vertex set m t , i.e., the vertices are all t-element subsets of

Figure 1 :
Figure1: A schematic for GW warm-starts.There are three procedures to obtain a cut from an SDP solution.The first is to use Goemans-Williamson hyperplane rounding (on the top labelled I).The second (labelled II) is to do a two-step rounding through an intermediate state in R k , k ∈ {2, 3}.We prove that this two-step rounding procedure is equivalent to Goemans-Willimson hyperplane rounding in Theorem 2. The third procedure is our proposed warm-start of QAOA using the SDP solution (highlighted in blue, labelled III).This procedure involves rounding the SDP solution to R k first, then rotating this solution using uniform or vertex-at-top rotations and mapping to the Bloch sphere to get an initial state for QAOA, and finally running QAOA on this initial state.

Figure 2 :
Figure 2: For both plots, we compare the log-error of QAOA-warmest to both QAOA-warm (right) and standard QAOA (left).BM-MC2 warm-starts are used for both approaches.Each marker in the plot corresponds to a combination of instance (from our graph ensemble G) and circuit depth (either p = 1 or p = 8) with the shape of the marker being used to denote if the instance has only positive edge weights or not.Points below the black line correspond to instances where QAOA-warmest performs better than the other algorithm being compared.

Figure 3 :
Figure 3: Trends in median log-error of standard QAOA(dotted), QAOA-warm (dashed), and QAOA-warmest (solid) as one varies the number of nodes and circuit depths; the median is taken across graphs in instance library G. BM-MC2 warm-starts are used for both QAOAwarm and QAOA-warmest.

Figure 5 :
Figure 5: Performance of QAOA-warmest (with BM-MC2 warm-starts) and standard QAOA as a function of QAOA depth for an ideal (dashed) and noisy simulation (dotted).For the chosen 20-node graph, GW acheives an approximation ratio of 0.912, while in the ideal case, QAOA-warmest outperforms GW for p ≥ 2 while standard QAOA requires p > 4. The noise simulation is based on calibration data from IBM-Q's Guadalupe device.

Figure 6 :
Figure 6: IBMQ Guadalupe device which shows the physical connectivity of qubits.We choose a native graph which matches this connectivity and random weights as indicated by color.

Figure 7 :Figure 8 :
Figure 7: Performance of QAOA-warmest (with BM-MC2 warm-starts) compared to standard QAOA in an ideal simulation and on IBM-Guadalupe hardware.Each subfigure is a scan of p = 1 parameters β vs γ, brighter regions indicating values which result in a larger cut.All figures share the same absolute color scale.

Figure 9 :
Figure9: QAOA-warmest performance on Quantinuum simulators and hardware.The 20-node Karloff instance considered here is directly mapped to the fully-connected Quantinuum 20-ion hardware.In contrast to Figure5, here we use a GW warmest start and find that this particular initialization outperforms GW on average for p ≤ 2.

Figure 11 :
Figure 11: Comparison between standard QAOA mixer to using BM-MC2 warm-starts with custom mixers .We show the noiseless (left) and noisy (right) case.In both cases, the custom mixer significantly outperforms the standard mixer.Shaded regions indicate the distribution of results for 20 randomly chosen 8 node graphs with positive and negative weights.

Figure 12 :
Figure 12: Difference in approximation ratio between n-dimensional GW hyperplane rounding and hyperplane rounding of various warm-start initializations (kdimensional projected GW SDP solutions (Proj k -GW) and approximate BM-MC k solutions for k = 2, 3) as the number of nodes varies.For each instance and each dimension k, we obtained 5 random projections and 5 approximate BM-MC k solutions, and then kept the best one (of 5) in regards to the BM-MC k objective.Each circle in the figure corresponds to an instance from (a subset of) the MQLib library [4]; see Appendix F.3 for details.

Figure 13 :
Figure13: For both plots, we compare the log-error of QAOA-warmest (with BM-MC2 warm-starts) to the variant of QAOA proposed by Egger et al.[9] for Max-Cut.For Egger et al.'s approach, we consider two different initializations: initializing the variational parameters to the origin (left) and initializing the parameters in a way that recovers the cut used to initialize the quantum state (right).Each marker in the plot corresponds to a combination of instance (from our graph ensemble G) and circuit depth (either p = 1 or p = 8) with the shape of the marker being used to denote if the instance has only positive edge weights or not.Points below the black line correspond to instances where QAOA-warmest performs better than the other algorithm being compared.

Lemma 11 .
If A and B are n × n and m × m stoquastic matrices respectively, then so is A ⊕ B. Proof.By definition, A⊕B = A⊗I m +I n ⊗B.Since we know the sum of stoquastic matrices is stoquastic, it suffices to show that A ⊗ I m and I n ⊗ B is stoquastic.

Table 6 :
Multiple tables comparing the average (instance-specific) approximation ratio achieved during QAOA-warmest when utilizing different combinations of dimension and rotations during the preprocessing stage.For the top row of tables, these averages were computed using all the graphs in our graph library G whereas for the bottom row, we restrict our attention to only those graphs in G with positive edge weights.

Figure 14 :
Figure 14: Histogram showing the difference in (instance-specific) approximation ratio (AR) when using QAOA-warmest (on instance library G) with various warm-start strategies: projected GW and locally optimal BM-MC warm-starts.The blue and red bars correspond to depth p = 1 and p = 8 QAOA-warmest respectively with the purple regions indicating an overlap in the histograms.For both approaches, k = 3 solutions and vertex-at-top rotations are used to produce the figure; the results are similar if one instead uses a different combination of dimension and/or rotation scheme.
[m]; two distinct vertices/subsets S and T of J(m, t, b) are adjacent if and only if they have exactly b elements in common, i.e. |S ∩ T | = b.Karloff proved the approximation ratio for GW on some of these instances: Theorem 12. [42] Let m be an even positive integer and G = J(m, m/2, b).If 0 ≤ b ≤ m/12, then the (instance-specific) approxima-

1. Proposition 2. Let
|s 0 ⟩ be any separable initial state and let H B be its corresponding custom mixer.Let |s ′ 0 ⟩ be the state |s 0 ⟩ where each qubit's phase is set to 0, and let H ′

Table 1 :
A summary of results for different variants of QAOA (for Max-Cut) based on various combinations of initializa- [9]ns (equal superposition, BM-MC k , projected GW, and perturbed single-cut initializations) and mixing Hamiltonian (standard, custom mixers (Section 2), and the mixer proposed by Egger et al.[9]for perturbed single-cut initializations).For various combinations, we state what is known regarding the convergence and worst-case approximation ratio (for depths p ≥ 0) of the corresponding QAOA variant for graphs with non-negative edge weights.

Table 3 :
The percentage of instances for which QAOA-

Table 5 :
This table details the specifications of the server groups in the high performance computing cluster at Georgia Institute of Technology that were used in our numerical experiments.