Local classical MAX-CUT algorithm outperforms $p=2$ QAOA on high-girth regular graphs

The $p$-stage Quantum Approximate Optimization Algorithm (QAOA$_p$) is a promising approach for combinatorial optimization on noisy intermediate-scale quantum (NISQ) devices, but its theoretical behavior is not well understood beyond $p=1$. We analyze QAOA$_2$ for the maximum cut problem (MAX-CUT), deriving a graph-size-independent expression for the expected cut fraction on any $D$-regular graph of girth $>5$ (i.e. without triangles, squares, or pentagons). We show that for all degrees $D \ge 2$ and every $D$-regular graph $G$ of girth $>5$, QAOA$_2$ has a larger expected cut fraction than QAOA$_1$ on $G$. However, we also show that there exists a $2$-local randomized classical algorithm $A$ such that $A$ has a larger expected cut fraction than QAOA$_2$ on every graph $G$. This supports our conjecture that for every constant $p$, there exists a local classical MAX-CUT algorithm that performs as well as QAOA$_p$ on all graphs.


I. INTRODUCTION
The p-stage Quantum Approximate Optimization Algorithm (QAOA p ) [FGG14] is a protocol to use near-term quantum computers for combinatorial optimization [Pre18]. The number p is called the algorithm's depth, with small values of p realizable on current devices [AAB + 20] [ZWC + 20], often called noisy intermediate-scale quantum (NISQ) devices. The performance of the algorithm is difficult to analyze, even for small values of p and in restricted settings [FGG14] [WL20]. In this work we analyze QAOA 2 for the maximum cut problem (MAX-CUT) and compare it with local classical algorithms. There is a classical algorithm (the Goemans-Williamson semidefinite program [GW95]) that, under certain computational complexity assumptions, achieves the optimal worst-case approximation ratio for MAX-CUT among all polynomial-time classical and quantum algorithms [Kho02]. a However, this is a worst-case ratio; there may be inputs where QAOA p outperforms the Goemans-Williamson program. Moreover, the Goemans-Williamson program is a global algorithm, for which the assignment of each vertex depends on the entire graph; by contrast, QAOA p is a p-local algorithm, where the assignment depends only on the vertex's radius p neighborhood. Does there exist a local classical algorithm that performs as well as QAOA p on every graph? In this work we answer this question positively for p = 2 on regular graphs of girth above 5.

A. Related work
QAOA is thought to be competitive with classical approximation algorithms. Initially, QAOA 1 was the best known approximate algorithm for MAX-3-LIN-2 [FGG15], although an improved classical algorithm was quickly found [BMO + 15]. For MAX-CUT, QAOA 1 guarantees an expected cut fraction of > 1 2 + 0.3032 √ D on triangle-free D-regular graphs [FGG14], outperforming the lower bound of the best known local classical algorithm, the threshold algorithm [HRSS14]. However, a direct calculation by Hastings [Has19] shows that this lower bound is not tight: The threshold algorithm with optimal parameter value outperforms QAOA 1 on triangle-free D-regular graphs for all but 4 choices of 2 ≤ D < 1000, and likely for all larger D. Hastings also introduces a family of local classical optimization algorithms, identifying algorithms from this family that outperform QAOA 1 for each of the remaining 4 choices of D.
QAOA has very few proven results about its performance, and QAOA p 's optimal parameters are only known for small p or for severely restricted problems. [WHJR18] derives a graph-size-independent expression of the expected cut fraction for QAOA 1 on any graph, determining the maximum for any triangle-free D-regular graph. [WHJR18] and [Sze19] use separate approaches to derive the expected and maximum expected cut fraction for QAOA 2 on any 2-regular graph. [WL20] shows that QAOA 2 has a maximum expected cut fraction of approximately 0.7559 on 3regular graphs. For larger p, the creators of QAOA have proposed a hybrid algorithm to find optimal parameters, alternating between the quantum circuit and a classical parameter optimizer; unfortunately, this approach involves non-convex classical optimization, making its runtime difficult to analyze [MRBAG16].

B. Our results
This work studies QAOA 2 and local classical algorithms for MAX-CUT on D-regular graphs of girth > 5. We derive a graph-size-independent expression for the expected cut fraction of QAOA 2 on these graphs. We numerically optimize this expression over QAOA's input parameters to find the maximum expected cut fraction for each D < 500. We then generalize the 1-local threshold algorithm with one parameter τ to the n-step threshold algorithm, an n-local algorithm with n parameters (τ 1 , · · · , τ n ). We derive the performance of the 2-step threshold algorithm on these graphs as a function of (τ 1 , τ 2 ). When τ 1 = τ 2 , a direct calculation shows that the optimal 2-step threshold algorithm outperforms QAOA 2 for all 41 < D < 500, and likely for all larger D based on asymptotic behavior. Another direct calculation on D < 50 shows that the optimal 2-step threshold algorithm outperforms QAOA 2 for all 5 < D < 50. We identify the 2-step threshold algorithm with the family of 2-local classical algorithms in [Has19], specifying instances that outperform QAOA 2 for the remaining 4 choices. This shows that for all D ≥ 2, there is a 2-local classical MAX-CUT algorithm that outperforms QAOA 2 on all D-regular graphs of girth above 5.
Why the restriction on girth? This condition massively simplifies the analysis of local algorithms. Consider an -local algorithm, where a vertex's assignment depends on its radius neighborhood. When girth > 2 + 1, the neighborhood looks like a tree. Previous analyses of QAOA 1 and the 1-step threshold algorithm use this property when studying triangle-free graphs (i.e. girth > 3). Similarly, our analysis of QAOA 2 and the 2-step threshold algorithm considers graphs of girth above 5 (no triangles, squares, or pentagons). Our analysis could be extended beyond = 2; we suspect that the p-step threshold algorithm performs as well as QAOA p on every graph of girth > 2p + 1.

II. QAOA2 PERFORMANCE ON MAX-CUT
Given a combinatorial problem on a graph G(V, E), QAOA is an algorithm on |V | qubits, where each vertex in G corresponds to a qubit. QAOA involves two Hamiltonians: the mixing Hamiltonian J = i∈V σ x i and the problem Hamiltonian C, where each eigenstate of C encodes a possible solution scored by its eigenvalue. The algorithm approximates adiabatic evolution from the maximal eigenstate of J to the maximal eigenstate of C, the optimal solution. Precisely, QAOA p takes 2p parameters (γ 1 , β 1 , · · · , γ p , β p ) and evolves the initial state ρ 0 = i∈V (I/2 + σ x i /2) with a unitary U = e −iβpJ e −iγpC · · · e −iβ1J e −iγ1C . The state is then measured in the eigenbasis of C, which gives an approximate solution. More information on QAOA can be found in [FGG14] and [WHJR18].
. The maximal eigenvalue of C is the maximum number of edges that can be cut on the graph G; an edge is "cut" if the qubits corresponding to its vertices do not agree when measured in the z-basis. The expected number of edges cut by QAOA is Tr Let f p,uv represent the chance that QAOA p cuts the edge (u, v) of graph G. For p = 2: This is hard to evaluate in general because of the dependence on C and J. However, f p,uv only involves terms within the radius p subgraph from vertices u and v. If this subgraph is identical for every edge, f p,uv = f p represents the expected cut fraction of QAOA p on the graph. For example, this happens for graphs of girth > 2p + 1.
Theorem. Consider any D-regular graph of girth > 5. Then: where c = cos(2β 2 ) m = cos(γ 2 ) r = cos(2β 1 ) y = cos(γ 1 ) s = sin(2β 2 ) n = sin(γ 2 ) t = sin(2β 1 ) z = sin(γ 1 ) and This formula notably does not depend on graph size |V | or the Hilbert space; it does not require evaluating a trace. See Appendix B for a proof. We also validate this formula by simplifying to known expressions; see Appendix C for details.
Using SciPy [VGO + 20], we numerically optimize the above formula for each 2 ≤ D < 500. Figure 2 shows the optimized performance of QAOA 2 for D < 500. In all cases, QAOA 2 outperforms both optimal QAOA 1 and the 1-local classical algorithms described in Section III of [Has19]. See Appendix A for the numerical values at small D.
The values at D = 2 and D = 3 reproduce known results by [WHJR18], [Sze19], and [WL20]. For D = 3, [WL20] finds the full set of optimal input parameters; we verify that our input parameters are in this set.

III. LOCAL CLASSICAL ALGORITHMS
Below is a simplified description of the 1-step threshold algorithm, first presented in [HRSS14].
Consider a graph and threshold τ . Randomly assign each vertex a spin +1 or −1 with equal probability. Then, consider any vertices with the same spin as ≥ τ of its neighbors. Flip the spin of those vertices. Cut the graph into "spin +1" and "spin −1" partitions.
We propose a n-step variation of the threshold algorithm, where the "Consider" and "Flip" commands are run n times before making a cut, using threshold τ i for i ∈ {1, . . . , n}.
Consider a graph and thresholds τ 1 , · · · , τ n . Randomly assign each vertex a spin +1 or −1 with equal probability. Then, for i = 1 . . . n: consider any vertices with the same spin as ≥ τ i of its neighbors, and flip the spin of those vertices. Cut the graph into "spin +1" and "spin −1" partitions.
The independence condition that simplifies triangle-free analysis of 1-local algorithms gets more restrictive with many steps. A similar condition on the n-step threshold algorithm requires the graph to have girth > 2n + 1, i.e. no cycles of length less than 2n + 2. However, unlike QAOA n , the n-step threshold algorithm has no guarantee of achieving the maximum cut fraction as n → ∞.
We derive an expression for the performance of the 2-step threshold algorithm on graphs of girth > 5 as a function of τ 1 , τ 2 . The derivation is in Appendix D. We directly calculate the maximum value across all thresholds for small values of D, first assuming τ 1 = τ 2 . The optimal threshold approximately matches the value given in [HRSS14]: For intermediate values of D up to 500, we limit our search of optimal τ 1 , τ 2 . See Figure 1 for details.
When the thresholds match, the performance of this algorithm stabilizes at 0.5 + b/ √ D, where b ≈ 0.417. This outperforms QAOA 2 for all 41 < D < 500. Considering Figure 2 in a similar spirit to Figure 3 of [Has19], we expect the 2-step threshold algorithm to outperform QAOA 2 for all D ≥ 500. Note the oscillations that decrease in value for large D, similar to those of the 1-step threshold algorithm. As [Has19] suggests, this likely happens from optimizing a discrete parameter τ instead of a continuous one. We also consider thresholds where τ 1 may not equal τ 2 in general for all 2 ≤ D < 50; when D > 5, there are choices of τ 1 , τ 2 that outperform QAOA 2 . See Figure 2 for a comparison. For D ∈ [2, 3, 4, 5], we draw inspiration from modifications made to the 1-step threshold algorithm in Section III of [Has19]. Using the local algorithm description from [Has19], we consider a 2-step linear algorithm with entries of v 0 chosen randomly from [−1, 1]. Because there are a finite number of initial values, we can exactly calculate the expected performance on a local subgraph. We search for values of (c 0 , c 1 ) that outperform QAOA 2 . This was successful for D ∈ [3, 4, 5], but did not work for D = 2. In this case, we also searched over choices of initial values to find parameters that outperform QAOA 2 . In particular, given initial values [−0.49, −0.45, 0.01, 0.03, 0.29, 0.85], a D = 2 local classical algorithm has expected performance 0.3343 over random assignment, whereas QAOA 2 has at most 1/3 over random assignment. Many choices of initial values give a D = 2 classical algorithm with maximum expected performance near this value; we are unsure why this is the case.
Thus, there exists a local classical MAX-CUT algorithm that, on average, finds a larger cut than QAOA 2 on every D-regular graph of girth above 5 when D < 500, and likely for all D. The expected performance values at small D are reproduced in Appendix A. An interactive notebook [PG07] with all relevant code and figures is available online.

IV. DISCUSSION
QAOA p 's optimal performance is guaranteed to increase with increasing p. At p = 2, this value was previously known only in a few cases [WHJR18] [WL20]. Our approach is to directly calculate a graph-size-independent formula FIG. 2. Performance of approximate MAX-CUT algorithms on D-regular graphs of girth > 5. This plot reproduces and extends Figure 3 of [Has19]. The scaled performance b is such that the cut fraction is 1/2 + bD −0.5 . For QAOA2, the plot is smooth and similar in shape to QAOA1, while the 2-step threshold algorithm oscillates like the 1-step variant. At large degree, b ≈ 0.417 for the 2-step threshold algorithm, which outperforms b ≈ 0.407 for QAOA2, itself an improvement over both b ≈ 0.33 for the 1-step threshold algorithm and b ≈ 0.303 for QAOA1. Judging by the convergence at larger D, we expect the 2-step classical algorithm to outperform QAOA2 for all D ≥ 500. and optimize over parameters, which requires careful accounting of all terms. Hypothetically, Appendix B can be extended to remove the high-girth and regularity constraints. Without the regularity constraint, a graph cannot be described with a single parameter D; without the high-girth constraint, there may be implicit combinatorial sums as in Appendix A of [WHJR18].
We wonder if QAOA p 's optimal performance on high-girth graphs is more connected to "locality" than "quantumness". The high-girth condition makes the graph look like a tree to any local algorithm. Given any constant p, we expect that a graph-size-independent expression can be efficiently calculated for both QAOA p and the p-step threshold algorithm on regular graphs of girth > 2p + 1; we suspect the p-local classical algorithm, on average, will find a larger cut than QAOA p on every such graph. Importantly, we compare QAOA 2 with local classical MAX-CUT algorithms. This constraint may be useful in distributed settings, where computing nodes do not have access to the entire graph and communication between computing nodes is limited. Even so, this work suggests that QAOA p is outperformed by a local classical algorithm of similar depth. We conjecture that for every constant p, there exists a local classical algorithm that, on average, finds a cut at least as large as QAOA p on every possible graph.
We only study D-regular graphs. If the graph has instead maximum degree D, one can still use this analysis by instituting "phantom vertices", where random data is generated for each missing vertex. We first learned of this idea from [She92] but do not think it is optimal. We also compare graphs only of girth above 5. If there are a small number of short cycles, we do not expect this result to change; however, threshold algorithms are not very effective on highly connected graphs, a topic we plan to study further.  TABLE I. Improvement over random assignment for MAX-CUT on regular graphs of girth above 5. The degree D is shown in the first column. "Threshold1 (τ )", "Modified Threshold1" and "QAOA1" are reproduced from Table I of [Has19]. The QAOA2 and 2-step threshold outperform all previously known values at each listed D, and all D < 500 as shown numerically. "Threshold2" refers to the best 2-step threshold algorithm with τ1 and τ2 possibly unequal. It does not outperform QAOA2 when D ∈ [2, 3, 4, 5], but a related 2-step local classical algorithm using [Has19]'s framework does.

a. Evaluating exponentials
Since the graph is identical from u and v's perspective, B 1 = B 2 .
The only terms of C that do not commute with σ y u σ z v involve nodes w 1 , ...w D−1 , v adjacent to u.
The only terms of J that do not commute with the inner expression reference the nodes u, v, w 1 , ..., w D−1 .
The terms in parentheses can be simplified.

. Breaking apart into terms
The only terms of C that may not commute involve edges connected to a node in S = {u, v, w 1 , ..., w D−1 }. Terms of e iγ1C that connect j and k have the form p − iqσ z j σ z k . Consider edges where both j, k ∈ S; we discuss the other edges in Appendix B 2 c.
We will expand the products carefully. Many terms do not contribute to trace. Only half include an even number of σ z v ; the rest will not contribute to trace because of the definition of ρ 0 . Note that each of these terms includes an odd number of σ z u , correctly cancelling out the rightmost σ y u in the trace. Consider the terms that include σ z v : Similarly, of the expressions that include σ z wi , only half include an even number. Only these corresponding terms can contribute to trace. Note that each of these terms includes an even number of σ z u , which will cancel out. Sometimes the exponential sign flips when commuting with σ z .
The last step is to consider the expression e iβ1σ x u = k+ilσ x u . Each combination of terms is matched with its contribution to trace. If the first expression e iβ1σ x u includes ilσ x u , it will flip the sign of V 1 , V 4 , W 3 , W 4 . If the second expression e iβ1σ x u includes ilσ x u , it will flip the sign of V 1 , V 2 , W 2 , W 3 .
The V and W sums can then be simplified. Any expressions using e iaσ x will become e ia after taking the trace. Here are the V sums: Here are the W sums: We sum all terms. Notice that any V i W j will include an odd number of σ z u , which combined with σ y u will add a factor of −i to each term in the trace: In a simplified form,

c. Effect of the crossover terms
Now we consider the crossover terms. For edges that introduced a new node k, the associated σ k z must be cancelled. Since the graph has girth above 5, any neighbor k / ∈ S to ∈ S has a unique neighbor . So, the above expression is modified by x v m. This swaps V 1 and V 3 , which only adjusts mtz → −mtz in the 2nd and 3rd term of the above expression. Thus, mtz → (p 2 − q 2 )mtz = mtyz. All other terms are T → (p 2 + q 2 )T = T . This process happens D − 1 times (once for each neighbor x j = u of v). Consider = w i for i ∈ {1, · · · , D − 1}. The only term that does not commute with σ z wi is e 2iβ1σ x w i → e −2iβ1σ x w i . This swaps W 2 and W 4 , which converts (m ± intz) to (m ∓ intz). Thus, (m ± intz) → (p 2 (m ± intz) + q 2 (m ∓ intz) = (m ± intyz). All other terms are T → (p 2 + q 2 )T = T . This process happens D − 1 times (once for each neighbor x j = u of w i ). There are D − 1 nodes of the form w i .

a. Evaluating exponentials
Most terms of C commute with σ y u σ y v . The ones that do not correspond to the other D − 1 neighbors of u and v.
The terms within the parentheses can be simplified.

Breaking apart into terms
Terms of e iγ1C that connect j and k have the form p − iqσ z j σ z k . If a term of C does not commute, it must involve an edge connected to a node in R = {u, v, w 1 , ..., w D−1 , χ 1 , ..., χ D−1 }. For now, consider only edges where j, k ∈ R. We consider the "crossover terms" in Appendix B 3 c. Because of ρ 0 , the trace is nonzero only for terms that are proportional to I or σ x n for all nodes n.
Any nonzero trace terms including σ z or σ y must be converted to I or σ x . Of the expressions that include σ z wi , only half include an even number. Only these corresponding terms will contribute to trace. Note that each of these terms includes an even number of σ z u which cancel. Sometimes the exponential sign flips when commuting with σ z . This entire process also holds for σ z χj ; let's call those terms X 1 , X 2 , X 3 , X 4 .
Consider the σ y u σ y v term. This can only be cancelled with (p ± iqσ z u σ z v ), since other terms include additional σ z wi or σ z χj which cannot be cancelled out alone. These are the only allowed terms from (p − iqσ z u σ z v ) · · · (p + iqσ z u σ z v )σ y u σ y v : Expanding the mentioned exponentials and summing the two terms simplifies to plus the terms reversed: Each combination of terms is matched with its contribution to trace. Note that if the first expression includes σ x u , it will flip the sign of W 3 , W 4 . If the second expression includes σ x u , it will flip the sign of W 2 , W 3 . The same is true for σ x v and X terms.
In sum: Z i2 Z j2 = n k=0 n u=0 P n (k)P n (u)Z i2,k Z j2,u This equation is symmetric; if Z i0 = Z j0 = −1, the result is the same.

Zi, Zj initially unequal
Suppose instead that Z j0 = −1, and still u/n of its other neighbors agree with j. Then, Z i1,k = q 1 (k), and Z j1,u = −q 1 (u). So the above expressions for Z i2,k , Z j2,u , and Z i2 Z j2 still hold with new values of Z 1 . This equation is symmetric; if Z i0 = −1 and Z j0 = 1, the result is the same.